Connecting a local model

There are several ways to deploy a language model locally. Popular solutions include Ollama, LM Studio, KoboldCpp, and others. As an example, let’s consider installation using KoboldCpp.

  1. Download the LLM you want to deploy in .gguf format. You can find them, for example, at https://huggingface.co/.

  2. Download and open “KoboldCpp.” In the launcher, specify the path to the downloaded model. Check the “Remote Tunnel” option and click “Launch.”

  3. After launching, a command line window will appear. Find the line that says “Your remote OpenAI Compatible API...” — it will contain a temporary URL (for example: https://john-loving-cm-lows.trycloudflare.com/v1). Copy it.

  4. Return to our site, go to the model catalog, and select the “Hosts” tab. Click “Add Host.”

  5. In the window that opens, paste the copied link into the “Endpoint URL” field, and add /chat/completions at the end. In my example, the link will be https://john-loving-cm-lows.trycloudflare.com/v1/chat/completions. Fill out the other fields as you prefer.

  6. Select the “Models” tab and click “Add Model.”

  7. In the “Host” field, select the host you created earlier. In “Display Name,” enter the name that will appear in the catalog. In “Model Name,” enter the exact name of the .gguf file you downloaded. In “Description,” describe the strengths and weaknesses of the model.

  8. Below, specify the maximum context size, privacy settings, one or more functionality tags, and additional settings supported by the model.

  9. Click “Create Model” to create the model. Wait a moment, and the model will appear in the list.

Last updated