9.3 KiB
Ollama AI Answers Plugin for SearXNG
Based on ai-answers-searxng by cra88y
A SearXNG plugin that generates local AI overviews powered by Ollama, using search results as RAG context.
Features:
- Token-by-token UI streaming
- Clickable inline citations
- Interactive mode: continue summary, ask follow-ups, copy, or regenerate
- Simple response mode with no extras
- Internally called low-latency RAG for follow-ups (bypasses HTTP loopback)
- Native network integration via
searx.network(respects proxy/SSL settings) - Stateless conversation persistence/shareability via URL hash
- Model selector in the AI overview widget
- Does not slow down result loading
- One file install
- Real-time streaming via Valkey — responses stream token by token using a background thread + Valkey job queue, working around granian's broken generator support for true streaming feel
- TF-IDF result reranking — fetched page content is scored against the query using BM25-style TF-IDF before being sent to Ollama, surfacing the most relevant sources first
- Smart chunking — pages are split into 512-token overlapping segments and the highest-scoring chunk per page is selected for context
- Intent detection — queries are automatically classified into 8 intent types (factual, howto, technical, comparison, opinion, current, local, general) with tailored system prompts per type
- Conversation memory — 30-minute cross-search conversation history stored in Valkey, so follow-up questions work even after navigating to a new search
- Markdown rendering — AI responses render bold, italic, lists, headers, and inline code natively in the result box
- Intent emoji badge — a small emoji appears next to "AI Overview" indicating the detected query type
Install
-
Download the plugin:
curl -o ollama_answers.py https://raw.githubusercontent.com/TySP-Dev/ollama-ai-answers-searxng/master/ollama_answers.py -
Copy to your SearXNG plugins directory:
cp ollama_answers.py ~/searxng/plugins/ollama_answers.py -
Add the volume mount to your
docker-compose.ymlunder the searxng service:volumes: - ./plugins/ollama_answers.py:/usr/local/searxng/searx/plugins/ollama_answers.py:Z -
Add environment variables to
docker-compose.yml:environment: - LLM_URL=http://ollama:11434/v1/chat/completions - LLM_MODEL=qwen3.5:9b - VALKEY_HOST=searxng-valkey -
Add to
settings.ymlplugins section:plugins: searx.plugins.ollama_answers.SXNGPlugin: active: true -
Restart SearXNG:
docker compose up -d --force-recreate core
Configuration
Configure via environment variables.
| Variable | Default | Description |
|---|---|---|
LLM_URL |
http://ollama:11434/v1/chat/completions |
Ollama endpoint |
LLM_MODEL |
qwen3.5:9b |
Default model |
LLM_MAX_TOKENS |
200 |
Max response tokens |
LLM_TEMPERATURE |
0.2 |
Response temperature |
LLM_TABS |
general,science,it,news |
Tabs to show AI overview on |
LLM_QUESTION_MARK_REQUIRED |
false |
Only trigger on queries with ? |
LLM_INTERACTIVE |
true |
Show copy/regen/follow-up UI |
LLM_SYSTEM_PROMPT |
(built-in) | Override the system prompt |
LLM_CONTEXT_DEEP_COUNT |
5 |
Full-content results to fetch |
LLM_CONTEXT_SHALLOW_COUNT |
15 |
Headline-only results |
VALKEY_HOST |
searxng-valkey |
Valkey container hostname |
VALKEY_PORT |
6379 |
Valkey port |
How It Works
- User performs a search
- Results return server-side
post_searchplugin hook fires- Token-optimized context is extracted from results
- UI/logic shell injected into the standard answers object
- Client-side script calls a signed endpoint (
/ai-stream) - Ollama streams a response token-by-token in the UI
Architecture
┌─────────────────────────────────────────────────────┐
│ Browser │
│ POST /ai-stream → GET /ai-status/{id} (poll 150ms) │
└────────────────┬────────────────────────────────────┘
│
┌────────────────▼────────────────────────────────────┐
│ SearXNG + Plugin │
│ │
│ post_search() │
│ → _enrich_results() ← ThreadPoolExecutor │
│ → _fetch_page_text() × 5 parallel │
│ → _chunk_text() + _tfidf_score() │
│ → rerank by score │
│ → _assemble_context() │
│ → inject AI Overview HTML + JS │
│ │
│ /ai-stream │
│ → validate token │
│ → _detect_intent() → select system prompt │
│ → _load_conversation() from Valkey │
│ → launch stream_to_valkey() thread │
│ → return {job_id} immediately │
│ │
│ stream_to_valkey() [background thread] │
│ → Ollama stream=True │
│ → RPUSH tokens to Valkey │
│ → RPUSH __DONE__ when complete │
│ │
│ /ai-status/{job_id} │
│ → LRANGE chunks from offset │
│ → return {chunks, done} │
└────────────────┬────────────────────────────────────┘
│
┌────────────────▼────────────────────────────────────┐
│ Valkey │
│ ai:job:{id}:chunks (list, TTL 120s) │
│ ai:job:{id}:status (string, TTL 120s) │
│ ai:conv:{session} (JSON, TTL 1800s) │
└─────────────────────────────────────────────────────┘
Docker Compose Example
services:
searxng:
environment:
- LLM_URL=http://ollama:11434/v1/chat/completions
- LLM_MODEL=qwen3.5:9b
- VALKEY_HOST=searxng-valkey
volumes:
- ./ollama_answers.py:/usr/local/searxng/searx/plugins/ollama_answers.py
ollama:
image: ollama/ollama
volumes:
- ollama_data:/root/.ollama
volumes:
ollama_data:
Remote Ollama
If your Ollama instance is remote or behind a reverse proxy, set LLM_URL to the full endpoint and provide an API key if required. The plugin supports Bearer token auth and follows HTTP redirects.
environment:
- LLM_URL=https://ollama.example.com/v1/chat/completions
- LLM_API_KEY=your-bearer-token
Project Structure
ollama-ai-answers-searxng/
├── ollama_answers.py # single plugin file — all logic here
├── README.md
├── requirements.txt # flask, flask-babel (for local dev only)
└── tests/
└── dev.py # local dev server
Development — Dev Server
A standalone Flask dev server is included in tests/dev.py. It mocks the SearXNG plugin environment so you can test the full UI without a running SearXNG instance.
Setup
pip install flask flask-babel certifi
Run
python tests/dev.py
Then open http://127.0.0.1:5000/ in your browser.
Note: Use
127.0.0.1:5000, notlocalhost:5000— macOS AirPlay Receiver can occupy the IPv6 loopback on port 5000.
Usage
- Type a query in the search bar and hit Search to trigger an AI overview.
- Expand Ollama Configuration at the top to change the endpoint URL or Bearer token for the current session. Click Apply to save and re-run the current query.
- The model selector in the AI overview widget (loaded from
/ai-models) shows all models available on the configured Ollama server and persists your choice in the session URL.
Environment Variables (dev)
The dev server reads the same variables as the plugin:
LLM_URL=http://localhost:11434/v1/chat/completions \
LLM_MODEL=qwen3.5:9b \
python tests/dev.py
Or export them before running. Any values set in the config panel at runtime take priority for that session.