Ollama AI Answers Plugin for SearXNG
Based on ai-answers-searxng by cra88y
A SearXNG plugin that generates local AI overviews powered by Ollama, using search results as RAG context.
Features:
- Inline numbered citations
- Interactive mode - Continue summary, ask follow-ups, copy, or regenerate
- Overview of ranked results with prompts based on detected query intent:
How ToTechnicalFactualComparisonOpinionCurrentLocalGeneal
- Internally called RAG for follow-ups
- Native network integration via
searx.network - Stateless conversation presistence/shareability via URL hash
- Ollama model selector
- Feeds fetched results to Ollama without slowing down SearXNG results
- Real-time streaming via Valkey (No waiting for a completed response)
- TF-IDF result ranking before being sent to Ollama
- Smart chunking - Pages are split into 512-token segments and highest-scoring chunk per page used for context
- Conversation memory - 30-minute cross-search conversation history via Valkey for follow-up questions
- Markdown support
- Intent emoji badge showing what intent prompt was used
Install
-
Download the plugin:
Main repo (Gitea)
curl -o ollama_answers.py https://git.tysstech.com/TySS-Dev/ollama-ai-answers-searxng/raw/branch/main/ollama_answers.pyMirror repo (Github):
curl -o ollama_answers.py https://raw.githubusercontent.com/TySP-Dev/ollama-ai-answers-searxng/main/ollama_answers.py -
Copy to your SearXNG plugins directory:
cp ollama_answers.py path_to/searxng/plugins/ollama_answers.py -
Add the volume mount to your
docker-compose.ymlunder the searxng service:volumes: - ./plugins/ollama_answers.py:/usr/local/searxng/searx/plugins/ollama_answers.py:Z -
Add environment variables to
docker-compose.yml:environment: - LLM_URL=http://ollama:11434/v1/chat/completions - LLM_MODEL=qwen3.5:9b - VALKEY_HOST=searxng-valkey -
Add to
settings.ymlplugins section:plugins: searx.plugins.ollama_answers.SXNGPlugin: active: true -
Restart SearXNG:
docker compose up -d --force-recreate core
Configuration
Configure via environment variables.
| Variable | Default | Description |
|---|---|---|
LLM_URL |
http://ollama:11434/v1/chat/completions |
Ollama endpoint |
LLM_MODEL |
qwen3.5:9b |
Default model |
LLM_MAX_TOKENS |
200 |
Max response tokens |
LLM_TEMPERATURE |
0.2 |
Response temperature |
LLM_TABS |
general,science,it,news |
Tabs to show AI overview on |
LLM_QUESTION_MARK_REQUIRED |
false |
Only trigger on queries with ? |
LLM_INTERACTIVE |
true |
Show copy/regen/follow-up UI |
LLM_SYSTEM_PROMPT |
(built-in) | Override the system prompt |
LLM_CONTEXT_DEEP_COUNT |
5 |
Full-content results to fetch |
LLM_CONTEXT_SHALLOW_COUNT |
15 |
Headline-only results |
VALKEY_HOST |
searxng-valkey |
Valkey container hostname |
VALKEY_PORT |
6379 |
Valkey port |
How It Works
- User performs a search
- Results return server-side
post_searchplugin hook fires- Token-optimized context is extracted from results
- UI/logic shell injected into the standard answers object
- Client-side script calls a signed endpoint (
/ai-stream) - Ollama streams a response token-by-token in the UI
Known Issues
- When asking a follow up question the previous output disappears
For any issues not stated here please create an issue ticket on Gitea or GitHub and add the bug tag.
Roadmap
- Working on feature plans
Architecture
┌─────────────────────────────────────────────────────┐
│ Browser │
│ POST /ai-stream → GET /ai-status/{id} (poll 150ms) │
└────────────────┬────────────────────────────────────┘
│
┌────────────────▼────────────────────────────────────┐
│ SearXNG + Plugin │
│ │
│ post_search() │
│ → _enrich_results() ← ThreadPoolExecutor │
│ → _fetch_page_text() × 5 parallel │
│ → _chunk_text() + _tfidf_score() │
│ → rerank by score │
│ → _assemble_context() │
│ → inject AI Overview HTML + JS │
│ │
│ /ai-stream │
│ → validate token │
│ → _detect_intent() → select system prompt │
│ → _load_conversation() from Valkey │
│ → launch stream_to_valkey() thread │
│ → return {job_id} immediately │
│ │
│ stream_to_valkey() [background thread] │
│ → Ollama stream=True │
│ → RPUSH tokens to Valkey │
│ → RPUSH __DONE__ when complete │
│ │
│ /ai-status/{job_id} │
│ → LRANGE chunks from offset │
│ → return {chunks, done} │
└────────────────┬────────────────────────────────────┘
│
┌────────────────▼────────────────────────────────────┐
│ Valkey │
│ ai:job:{id}:chunks (list, TTL 120s) │
│ ai:job:{id}:status (string, TTL 120s) │
│ ai:conv:{session} (JSON, TTL 1800s) │
└─────────────────────────────────────────────────────┘
Docker Compose Example
services:
searxng:
environment:
- LLM_URL=http://ollama:11434/v1/chat/completions
- LLM_MODEL=qwen3.5:9b
- VALKEY_HOST=searxng-valkey
volumes:
- ./ollama_answers.py:/usr/local/searxng/searx/plugins/ollama_answers.py
ollama:
image: ollama/ollama
volumes:
- ollama_data:/root/.ollama
volumes:
ollama_data:
Remote Ollama
If your Ollama instance is remote or behind a reverse proxy, set LLM_URL to the full endpoint and provide an API key if required. The plugin supports Bearer token auth and follows HTTP redirects.
environment:
- LLM_URL=https://ollama.example.com/v1/chat/completions
- LLM_API_KEY=your-bearer-token
Project Structure
ollama-ai-answers-searxng/
├── ollama_answers.py # single plugin file — all logic here
├── README.md
├── requirements.txt # flask, flask-babel (for local dev only)
└── tests/
└── dev.py # local dev server
Development — Dev Server
A standalone Flask dev server is included in tests/dev.py. It mocks the SearXNG plugin environment so you can test the full UI without a running SearXNG instance.
Setup
pip install flask flask-babel certifi
Run
python tests/dev.py
Then open http://127.0.0.1:5000/ in your browser.
Note: Use
127.0.0.1:5000, notlocalhost:5000— macOS AirPlay Receiver can occupy the IPv6 loopback on port 5000.
Usage
- Type a query in the search bar and hit Search to trigger an AI overview.
- Expand Ollama Configuration at the top to change the endpoint URL or Bearer token for the current session. Click Apply to save and re-run the current query.
- The model selector in the AI overview widget (loaded from
/ai-models) shows all models available on the configured Ollama server and persists your choice in the session URL.
Environment Variables (dev)
The dev server reads the same variables as the plugin:
LLM_URL=http://localhost:11434/v1/chat/completions \
LLM_MODEL=qwen3.5:9b \
python tests/dev.py
Or export them before running. Any values set in the config panel at runtime take priority for that session.