All posts
Open Source Memories: How Local Vector Databases Let You Use Any LLM Without Sharing Your Data
data-sovereigntyvector-databaseragenterprisemulti-llm

Open Source Memories: How Local Vector Databases Let You Use Any LLM Without Sharing Your Data

Synaplan Team31 views

That little badge in the screenshot above — "8 memories used" — is the most important detail in any modern enterprise AI architecture, and almost no one talks about it.

It tells you that the answer you just got was not generated from raw model weights alone. Before the LLM ever saw the prompt, the platform reached into a local vector database, retrieved the eight most relevant snippets from your own documents and prior conversations, and fed them to the model as context.

The model wrote the words. Your data shaped the answer.

That tiny architectural decision — splitting memory from inference — is what makes it safe for European, Swiss, or otherwise regulated organisations to use the world’s best language models without giving up data sovereignty.

The Sovereignty Problem Nobody Wants to Solve Twice

If you are running an enterprise AI programme in 2026, you already know the trade-off:

  • The strongest reasoning models — GPT-5, Claude Opus 4, Gemini 3, Qwen 3, DeepSeek V4 — sit in data centres in the United States and Asia.
  • Your data is governed by GDPR, the EU AI Act, FINMA, BaFin, HIPAA, or whichever sector regulator owns your compliance officer’s nightmares.
  • Your CISO rightfully panics every time an unfiltered document gets pasted into a foreign chat window.

The default reaction is to ban the cloud models, build a wall around an on-premise Llama instance, and accept that your team will quietly use ChatGPT on their phones anyway.

There is a better path. The trick is to stop treating "the model" and "the data" as one product.

Memory and Inference Are Different Workloads

A well-designed AI platform has two separate layers:

  1. A memory layer — vector embeddings of your documents, conversations, contracts, tickets, wiki pages. Pure data. Heavy. Sensitive. Constantly growing.
  2. An inference layer — the LLM that turns a prompt plus retrieved context into a useful answer. Compute-heavy. Replaceable. Often best when bought from someone else.

The mistake most enterprises make is to assume those two layers must come from the same vendor. They do not. Once you separate them, the sovereignty problem dissolves into a much smaller one: what gets sent across the border, and what stays at home?

In a properly architected RAG (Retrieval-Augmented Generation) stack, the answer is: only the prompt, the retrieved snippets, and the answer cross the border. The full document corpus — gigabytes of contracts, research, customer history, code — never leaves your data centre.

How Synaplan Implements This

Synaplan was designed around this split from day one. The platform uses two complementary local stores:

  • MariaDB with the VECTOR extension for structured embeddings, keyword indexes, and metadata.
  • Qdrant for high-throughput semantic search across millions of vectors.

Both run as Docker containers on your own hardware (or your own cloud account). Every document you upload — PDFs, DOCX files, emails, transcripts, code — is chunked, embedded, and indexed locally. The raw bytes never leave your network.

When a user asks a question, the flow looks like this:

  1. The user’s question is embedded locally.
  2. Synaplan queries Qdrant and MariaDB VECTOR for the most relevant memories.
  3. Only the question plus the retrieved snippets are sent to the configured LLM provider.
  4. The response is logged locally and added to the memory store.

Notice what is not happening: your full document base is not uploaded to OpenAI, Anthropic, or Alibaba. They see a few hundred tokens of context per request, no more.

"Switch the Engine, Keep the Pile"

This is the line that resonates most in enterprise procurement meetings.

The vector store and the documents inside it are the asset you have been building for months — the company memory. The LLM is the engine that turns that memory into answers. With Synaplan, the engine is configurable per user, per chat, even per request:

  • Use GPT-5 for the marketing team that needs polish.
  • Route the legal team to Claude Opus 4 for long-context contract analysis.
  • Send code questions to Qwen 3 Coder running on a Groq endpoint.
  • Fall back to a local Llama 3.3 on Ollama for anything that must never leave the building.

When a new model launches next quarter, you change one line of configuration. The memory pile — your real intellectual property — stays exactly where it is.

This is the opposite of the lock-in pattern that traditional AI vendors push. Instead of pouring your data into OpenAI’s "Custom GPTs" or Anthropic’s "Projects" feature — and watching your knowledge become inseparable from their pricing power — you keep the knowledge under your own roof and rent the compute.

Why This Matters Most for Larger Corporations

Three reasons, in the order they come up most often in our customer conversations:

1. Compliance is not a single decision

A 50-person startup can usually write one DPA and move on. A 50,000-person corporation has dozens of legal entities, hundreds of process owners, and a different data classification scheme in every business unit. A central memory store that the company controls — backed up, encrypted, audit-logged, and physically located in a known jurisdiction — is the only architecture that survives an internal audit at that scale.

2. Model risk is now a real risk category

Boards have started asking concrete questions: "What happens if our primary AI provider doubles their price?" "What happens if a US export control rule cuts us off from a model family?" "What happens if a model gets recalled for safety reasons?"

If your data is welded to a single provider’s platform, the answer is "we have a problem." If your data lives in your own vector database, the answer is "we change a config and keep working."

3. The good models keep moving

In 2024 the best general model was GPT-4. In 2025 it was Claude 3.5 Sonnet for reasoning and Gemini 1.5 for context length. In 2026 the leaderboard reshuffles every few months. Any architecture that assumes "we picked the right vendor in February" is wrong by September.

A sovereign memory layer turns that churn into an opportunity instead of a disaster. You can A/B test new models against the same knowledge base, measure quality with the same evals, and switch when the numbers say so.

What Stays Local, What Goes Out

For the engineers reading this — the honest accounting:

Stays localCrosses the border
Full document corpusThe user’s prompt
Vector embeddingsA handful of retrieved snippets
Conversation history (long-term)The model’s response
User accounts and ACLs(Optionally) usage telemetry
API keys and audit logs

If you need to push that "crosses the border" column down to zero, Synaplan supports a fully local mode using Ollama, vLLM, or Triton. Most enterprises do not need that. They need the option, the control, and the visibility to know exactly which bytes go where.

How to Get Started

Synaplan is open source under the Apache 2.0 licence and ships as a Docker Compose stack. A typical first deployment looks like this:

  1. Clone the repo from github.com/metadist/synaplan.
  2. Spin up the stack with docker compose up -d — you get FrankenPHP, Vue 3, MariaDB with VECTOR, Qdrant, and the chat widget out of the box.
  3. Add your first AI provider key (OpenAI, Anthropic, Google, Groq, Mistral, or any OpenAI-compatible endpoint).
  4. Upload a few documents and watch the memory layer fill up.
  5. Open a chat and notice the same "memories used" badge from the screenshot at the top of this article.

From there, you scale by adding more documents, more providers, and — when you are ready — by pointing the platform at your existing identity provider, your existing storage, and your existing observability stack.

The model is replaceable. The memory is yours.