← Back to Work

// 01 / LOCAL AI + RAG  ·  20 Mar 2026

Building a Personal
AI Dev Assistant

StackOllama + Open WebUI
ModelDeepSeek Coder 6.7B
MethodRAG + Knowledge Base
Runs onLocal hardware, no cloud

// the goal

An AI that actually knows your project

Running local LLMs is useful. But a model that only knows what it was trained on has a hard ceiling — it has no idea what's in your codebase, your documentation, your architecture decisions. The next step is making it aware of your specific project.

This walkthrough builds on the Local LLMs on Repurposed Hardware project. With Ollama and Open WebUI already running, the goal here is to create a persistent, domain-specific AI assistant — loaded with source code and documentation — that runs entirely offline and remembers your project across sessions.

The specific use case: WyseDSP plugin development. A C++/JUCE audio plugin suite with multiple plugins, custom DSP code, and detailed user manuals. The assistant needs to know all of it.

// the approach

RAG — not fine-tuning

The instinct when you want a model to "know" something is to retrain it. That's rarely the right answer for a working developer. Fine-tuning bakes a snapshot of your data into the model weights — meaning every time your code changes, you'd need to retrain. It also requires significant VRAM and hours of compute.

RAG (Retrieval Augmented Generation) is different. Your documents and source files live in a knowledge base. At query time, the most relevant chunks are retrieved and injected into the model's context. The model weights don't change — but the model can answer questions grounded in your actual files.

RAG
What we're using

Documents sit in a knowledge base. Relevant chunks are retrieved at query time and fed as context. Always up to date. No retraining needed.

Fine-tuning
What we're not doing

Knowledge baked into model weights. Requires significant GPU resources, hours of training, and becomes stale the moment your code changes.

For a live codebase, RAG is the correct choice. Open WebUI has it built in — no extra infrastructure needed.

// step by step

Setting up Open WebUI with Ollama

STEP 01

The networking problem — and the fix

By default, Ollama only listens on 127.0.0.1:11434 (localhost). Docker containers live on a separate network bridge and can't reach that address. The fix is to bind Ollama to 0.0.0.0, then point Open WebUI at the Docker host IP.

Edit the Ollama systemd service to add the environment variable:

terminalbash
$ sudo systemctl edit ollama

Add the following under the [Service] section:

ollama.service overrideini
[Service]
Environment="OLLAMA_HOST=0.0.0.0"
terminalbash
$ sudo systemctl daemon-reload
$ sudo systemctl restart ollama
# Verify it's listening on 0.0.0.0, not 127.0.0.1
$ ss -tlnp | grep 11434
LISTEN  0.0.0.0:11434
Verify

Navigate to http://172.17.0.1:11434 in your browser. You should see: Ollama is running.

STEP 02

Deploy Open WebUI

Run Open WebUI as a Docker container, pointing it at the host's Ollama instance via host-gateway:

terminalbash
$ docker run -d \
  -p 3000:8080 \
  --add-host=host-gateway:host-gateway \
  -e OLLAMA_BASE_URL=http://host-gateway:11434 \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

The key flags: --add-host=host-gateway:host-gateway lets Docker resolve the host machine, and OLLAMA_BASE_URL points Open WebUI at Ollama on the host. The named volume keeps your data safe across container restarts.

Give it a moment

Open WebUI takes 30–60 seconds to start on first launch. Then hit http://localhost:3000.

STEP 03

Verify the Ollama connection

Once inside Open WebUI, go to Settings → Admin Panel → Settings → Connections. The Ollama URL should read http://host-gateway:11434. Hit the refresh icon next to it — you should see Server connection verified and your models appear in the selector.

Common issue

If the URL shows localhost:11434 or 127.0.0.1:11434, update it manually to http://host-gateway:11434 and refresh.

STEP 04

Pull DeepSeek Coder

For code-aware RAG, DeepSeek Coder is significantly better than a general model. It was trained on a large corpus of code, so it retrieves and reasons about source files far more effectively than Mistral or Qwen for this use case.

terminalbash
$ ollama pull deepseek-coder
pulling manifest
pulling ████████████████ 3.8 GB
success

// building the knowledge base

Loading your project into the model

STEP 05

Create a Knowledge Base in Open WebUI

Navigate to Workspace → Knowledge → New Knowledge Base. Give it a meaningful name — something like WyseDSP. Hit Create Knowledge.

You'll land inside the knowledge base where you can upload files. Open WebUI accepts .pdf, .md, .txt, .cpp, .h, and more.

STEP 06

Upload documentation first

Start with your manuals and documentation — structured prose is easiest for the RAG chunker to handle and gives the model high-quality context to work from. For the WyseDSP suite, this means the four plugin manuals: GREC Amp, GREC Mini, Bass Amp, and Drummer.

Tip

PDFs index well. Manuals that include parameter descriptions, signal chains, and feature explanations give the model strong conceptual grounding before it tries to reason about code.

STEP 07

Upload source code — flattened

Add your .cpp and .h files. Flatten the folder structure — upload all files directly without subdirectory hierarchy. Open WebUI's RAG pipeline doesn't benefit from the folder organisation, it just chunks the file contents.

Skip the .jucer file, build outputs, and binaries — these are either generated or binary format that the model can't meaningfully parse. Source files only.

Retrieval note

RAG retrieves by relevance, not by loading the whole codebase at once. Large methods may be split across chunks. For best results, ask specific, targeted questions rather than requesting broad implementations.

// creating the model

A persistent, project-aware model

STEP 08

Create a custom model in Open WebUI

Go to Workspace → Models → New Model. Set the base model to deepseek-coder, give it a name like WyseDSP Coder, and add a system prompt that focuses it on your project:

system prompttext
You are an expert JUCE/C++ audio plugin development assistant
for the WyseDSP plugin suite. You have access to documentation
covering GREC Amp, GREC Mini, Bass Amp, and Drummer plugins,
plus the full project source code. Always reference the provided
documentation and code when answering questions. When helping
with code, follow JUCE best practices and the conventions already
established in the codebase.

Scroll down to Knowledge, attach your WyseDSP knowledge base, then hit Save.

STEP 09

Test the model

Select WyseDSP Coder from the model selector in the main chat. Start with small, targeted queries to verify the knowledge base is being hit:

example queriestext
What classes or files relate to compression in the codebase?

What member variables does WyseDSPGRECAudioProcessor use?

What parameters does the compressor expose?
Query strategy

Break large questions into smaller ones. Ask for class names first, then member variables, then specific methods. RAG retrieves in chunks — the more targeted your query, the better the retrieval.

Working correctly

When you see references to your actual class names, parameter lists, and file names in the responses — it's reading your code. The model's general JUCE knowledge and your specific codebase are now combined.

// in action

The assistant responding from your docs

Below is a real query to the WyseDSP Coder model, asking it to list the drum sounds available in the Drummer plugin. The response is pulled directly from the Drummer manual — note the source citation at the end. This is RAG working exactly as intended: the model isn't guessing, it's retrieving.

open-webui — WyseDSP Coder live session
Open WebUI showing WyseDSP Coder responding with drum list from Drummer manual

The model correctly lists all 16 drum sounds and cites WyseDSP_Drummer_Manual.pdf as its source. No hallucination, no generic answer — the knowledge base is doing its job.

// what this demonstrates

The result — and what it means

The finished setup is a local AI assistant that knows the WyseDSP codebase, the plugin architecture, the parameter names, and the documentation — and can reason across all of it using DeepSeek Coder's general JUCE and C++ knowledge as a foundation.

It runs entirely on a repurposed laptop. No API calls, no cloud costs, no data leaving the machine. The knowledge base is persistent — it's there every time you open Open WebUI, and it can be updated any time by adding or replacing files.

The broader point: this is the pattern for any project. Source code plus documentation, loaded into a code-focused local model, gives you a development assistant that's grounded in your actual work rather than giving generic answers. The setup took an afternoon. The ongoing cost is zero.

It's not a replacement for a full IDE integration or a large-context cloud model — RAG has real limits, particularly for cross-file reasoning across large codebases. But for targeted questions, debugging help, and exploring unfamiliar parts of your own code, it's genuinely useful. And it's yours, running on your hardware, offline.