Reviewed-on: tyler/ollama-ai-answers-searxng#1
Ollama AI Answers Plugin for SearXNG
Based on ai-answers-searxng by cra88y
A SearXNG plugin that generates local AI overviews powered by Ollama, using search results as RAG context.
Features:
- Token-by-token UI streaming
- Clickable inline citations
- Interactive mode: continue summary, ask follow-ups, copy, or regenerate
- Simple response mode with no extras
- Internally called low-latency RAG for follow-ups (bypasses HTTP loopback)
- Native network integration via
searx.network(respects proxy/SSL settings) - Stateless conversation persistence/shareability via URL hash
- Model selector in the AI overview widget
- Does not slow down result loading
- One file install
Installation
Place ollama_answers.py into the searx/plugins directory of your SearXNG instance (or mount it in a container) and enable it in settings.yml:
plugins:
searx.plugins.ollama_answers.SXNGPlugin:
active: true
Configuration
Configure via environment variables.
Required
| Variable | Description | Default |
|---|---|---|
LLM_URL |
Ollama chat completions endpoint | http://ollama:11434/v1/chat/completions |
LLM_MODEL |
Model name as listed in Ollama | qwen3.5:9b |
Optional
| Variable | Description | Default |
|---|---|---|
LLM_SYSTEM_PROMPT |
Overrides the default system prompt | You are a direct, citation-accurate search synthesis engine. |
LLM_MAX_TOKENS |
Max tokens in the AI response | 200 |
LLM_TEMPERATURE |
Sampling temperature | 0.2 |
LLM_CONTEXT_DEEP_COUNT |
Results used with full snippets | 5 |
LLM_CONTEXT_SHALLOW_COUNT |
Results with headlines only (breadth) | 15 |
LLM_TABS |
Comma-delimited tab whitelist | general,science,it,news |
LLM_INTERACTIVE |
Interactive UI mode (copy, regenerate, follow-up) | true |
LLM_QUESTION_MARK_REQUIRED |
Only trigger on queries containing ? |
false |
How It Works
- User performs a search
- Results return server-side
post_searchplugin hook fires- Token-optimized context is extracted from results
- UI/logic shell injected into the standard answers object
- Client-side script calls a signed endpoint (
/ai-stream) - Ollama streams a response token-by-token in the UI
Docker Compose Example
services:
searxng:
environment:
- LLM_URL=http://ollama:11434/v1/chat/completions
- LLM_MODEL=qwen3.5:9b
volumes:
- ./ollama_answers.py:/usr/local/searxng/searx/plugins/ollama_answers.py
ollama:
image: ollama/ollama
volumes:
- ollama_data:/root/.ollama
volumes:
ollama_data:
Remote Ollama
If your Ollama instance is remote or behind a reverse proxy, set LLM_URL to the full endpoint and provide an API key if required. The plugin supports Bearer token auth and follows HTTP redirects.
environment:
- LLM_URL=https://ollama.example.com/v1/chat/completions
- LLM_API_KEY=your-bearer-token
Development — Dev Server
A standalone Flask dev server is included in tests/dev.py. It mocks the SearXNG plugin environment so you can test the full UI without a running SearXNG instance.
Setup
pip install flask flask-babel certifi
Run
python tests/dev.py
Then open http://127.0.0.1:5000/ in your browser.
Note: Use
127.0.0.1:5000, notlocalhost:5000— macOS AirPlay Receiver can occupy the IPv6 loopback on port 5000.
Usage
- Type a query in the search bar and hit Search to trigger an AI overview.
- Expand Ollama Configuration at the top to change the endpoint URL or Bearer token for the current session. Click Apply to save and re-run the current query.
- The model selector in the AI overview widget (loaded from
/ai-models) shows all models available on the configured Ollama server and persists your choice in the session URL.
Environment Variables (dev)
The dev reads the same variables as the plugin:
LLM_URL=http://localhost:11434/v1/chat/completions \
LLM_MODEL=qwen3.5:9b \
python tests/dev.py
Or export them before running. Any values set in the config panel at runtime take priority for that session.