Logo Home IT maintenance

AI server

  • IQ4_NL is much slower than Q4_0/1. So maybe quantize kv cache with IQ4_NL is not such a good idea…
  • The new generation of mistral models (devstral 2, ministral 3, etc.) seem to have problem with the chat template… https://github.com/anomalyco/opencode/issues/5034  
  • Clawdbot: With gemma 3, I got the “role must alternate between assistant/user” error. With devstral 2, the tool calls are broken. Have to use gemini for now. The local LLM setup was

      "models": {
        "providers": {
          "local-llm": {
            "baseUrl": "https://llm.xeno.darksair.org/v1",
            "apiKey": "sk-intelvsamd",
            "api": "openai-completions",
            "models": [
              {
                "id": "default",
                "name": "Local LLM",
                "reasoning": false,
                "input": [
                  "text"
                ],
                "cost": {
                  "input": 0,
                  "output": 0,
                  "cacheRead": 0,
                  "cacheWrite": 0
                },
                "contextWindow": 32768,
                "maxTokens": 4096
              }
            ]
          }
        }
      },
      "agents": {
        "defaults": {
          "workspace": "/opt/clawdbot/clawd",
          "model": {
            "primary": "local-llm/default"
          }
        }
      },
  • SillyTavern seems to have a 1-minute timeout for first token generation, even though here the timeout is disabled.

    Article Image
    Article Image

    For now I reduced the context limit to 12k.

  •