Ollama AI Answers Plugin for SearXNG
Based on ai-answers-searxng by cra88y
A SearXNG plugin that generates local AI overviews powered by Ollama, using search results as RAG context.
Features:
- Inline numbered citations
- Interactive mode - Continue summary, ask follow-ups, copy, or regenerate
- Overview of ranked results with prompts based on detected query intent:
How ToTechnicalFactualComparisonOpinionCurrentLocalGeneal
- Internally called RAG for follow-ups
- Native network integration via
searx.network - Stateless conversation presistence/shareability via URL hash
- Ollama model selector
- Feeds fetched results to Ollama without slowing down SearXNG results
- Real-time streaming via Valkey (No waiting for a completed response)
- TF-IDF result ranking before being sent to Ollama
- Smart chunking - Pages are split into 512-token segments and highest-scoring chunk per page used for context
- Conversation memory - 30-minute cross-search conversation history via Valkey for follow-up questions
- Markdown support
- Intent emoji badge showing what intent prompt was used
Install
-
Download the plugin:
Main repo (Gitea)
curl -o ollama_answers.py https://git.tysstech.com/TySS-Dev/ollama-ai-answers-searxng/raw/branch/main/ollama_answers.pyMirror repo (Github):
curl -o ollama_answers.py https://raw.githubusercontent.com/TySP-Dev/ollama-ai-answers-searxng/main/ollama_answers.py -
Copy to your SearXNG plugins directory:
cp ollama_answers.py path_to/searxng/plugins/ollama_answers.py -
Add the volume mount to your
docker-compose.ymlunder the searxng service:volumes: - ./plugins/ollama_answers.py:/usr/local/searxng/searx/plugins/ollama_answers.py:Z -
Add environment variables to
docker-compose.yml:environment: - LLM_URL=http://ollama:11434/v1/chat/completions - LLM_MODEL=qwen3.5:9b - VALKEY_HOST=searxng-valkey -
Add to
settings.ymlplugins section:plugins: searx.plugins.ollama_answers.SXNGPlugin: active: true -
Restart SearXNG:
docker compose up -d --force-recreate core
Configuration
Configure via environment variables.
| Variable | Default | Description |
|---|---|---|
LLM_URL |
http://ollama:11434/v1/chat/completions |
Ollama endpoint |
LLM_MODEL |
qwen3.5:9b |
Default model |
LLM_MAX_TOKENS |
200 |
Max response tokens |
LLM_TEMPERATURE |
0.2 |
Response temperature |
LLM_TABS |
general,science,it,news |
Tabs to show AI overview on |
LLM_QUESTION_MARK_REQUIRED |
false |
Only trigger on queries with ? |
LLM_INTERACTIVE |
true |
Show copy/regen/follow-up UI |
LLM_SYSTEM_PROMPT |
(built-in) | Override the system prompt |
LLM_CONTEXT_DEEP_COUNT |
5 |
Full-content results to fetch |
LLM_CONTEXT_SHALLOW_COUNT |
15 |
Headline-only results |
VALKEY_HOST |
searxng-valkey |
Valkey container hostname |
VALKEY_PORT |
6379 |
Valkey port |
How It Works
- User performs a search
- Results return server-side
post_searchplugin hook fires- Token-optimized context is extracted from results
- UI/logic shell injected into the standard answers object
- Client-side script calls a signed endpoint (
/ai-stream) - Ollama streams a response token-by-token in the UI
Known Issues
- When asking a follow up question the previous output disappears
- Parts of the UI are not theme aware resulting in a unpolished look when not using a dark theme
- When SearXNG provides a info blob for a search it appears on top of the overview i.e.
WikipediaorLinux
For any issues not stated here please create an issue ticket on Gitea or GitHub and add the bug tag.
Roadmap
- Working on feature plans
Architecture
┌─────────────────────────────────────────────────────┐
│ Browser │
│ POST /ai-stream → GET /ai-status/{id} (poll 150ms) │
└────────────────┬────────────────────────────────────┘
│
┌────────────────▼────────────────────────────────────┐
│ SearXNG + Plugin │
│ │
│ post_search() │
│ → _enrich_results() ← ThreadPoolExecutor │
│ → _fetch_page_text() × 5 parallel │
│ → _chunk_text() + _tfidf_score() │
│ → rerank by score │
│ → _assemble_context() │
│ → inject AI Overview HTML + JS │
│ │
│ /ai-stream │
│ → validate token │
│ → _detect_intent() → select system prompt │
│ → _load_conversation() from Valkey │
│ → launch stream_to_valkey() thread │
│ → return {job_id} immediately │
│ │
│ stream_to_valkey() [background thread] │
│ → Ollama stream=True │
│ → RPUSH tokens to Valkey │
│ → RPUSH __DONE__ when complete │
│ │
│ /ai-status/{job_id} │
│ → LRANGE chunks from offset │
│ → return {chunks, done} │
└────────────────┬────────────────────────────────────┘
│
┌────────────────▼────────────────────────────────────┐
│ Valkey │
│ ai:job:{id}:chunks (list, TTL 120s) │
│ ai:job:{id}:status (string, TTL 120s) │
│ ai:conv:{session} (JSON, TTL 1800s) │
└─────────────────────────────────────────────────────┘
Docker Compose Example
services:
searxng:
environment:
- LLM_URL=http://ollama:11434/v1/chat/completions
- LLM_MODEL=qwen3.5:9b
- VALKEY_HOST=searxng-valkey
volumes:
- ./ollama_answers.py:/usr/local/searxng/searx/plugins/ollama_answers.py
ollama:
image: ollama/ollama
volumes:
- ollama_data:/root/.ollama
volumes:
ollama_data:
Remote Ollama
If your Ollama instance is remote or behind a reverse proxy, set LLM_URL to the full endpoint and provide an API key if required. The plugin supports Bearer token auth and follows HTTP redirects.
environment:
- LLM_URL=https://ollama.example.com/v1/chat/completions
- LLM_API_KEY=your-bearer-token
Project Structure
ollama-ai-answers-searxng/
├── ollama_answers.py # single plugin file — all logic here
├── README.md
├── requirements.txt # flask, flask-babel (for local dev only)
└── tests/
└── dev.py # local dev server
Development — Dev Server
A standalone Flask dev server is included in tests/dev.py. It mocks the SearXNG plugin environment so you can test the full UI without a running SearXNG instance.
Setup
pip install flask flask-babel certifi
Run
python tests/dev.py
Then open http://127.0.0.1:5000/ in your browser.
Note: Use
127.0.0.1:5000, notlocalhost:5000— macOS AirPlay Receiver can occupy the IPv6 loopback on port 5000.
Usage
- Type a query in the search bar and hit Search to trigger an AI overview.
- Expand Ollama Configuration at the top to change the endpoint URL or Bearer token for the current session. Click Apply to save and re-run the current query.
- The model selector in the AI overview widget (loaded from
/ai-models) shows all models available on the configured Ollama server and persists your choice in the session URL.
Environment Variables (dev)
The dev server reads the same variables as the plugin:
LLM_URL=http://localhost:11434/v1/chat/completions \
LLM_MODEL=qwen3.5:9b \
python tests/dev.py
Or export them before running. Any values set in the config panel at runtime take priority for that session.