Main Repo Mirror Repo

Ollama AI Answers Plugin for SearXNG

Based on ai-answers-searxng by cra88y

A SearXNG plugin that generates local AI overviews powered by Ollama, using search results as RAG context.

Features:

  • Inline numbered citations
  • Interactive mode - Continue summary, ask follow-ups, copy, or regenerate
  • Overview of ranked results with prompts based on detected query intent:
    • How To Technical Factual Comparison Opinion Current Local Geneal
  • Internally called RAG for follow-ups
  • Native network integration via searx.network
  • Stateless conversation presistence/shareability via URL hash
  • Ollama model selector
  • Feeds fetched results to Ollama without slowing down SearXNG results
  • Real-time streaming via Valkey (No waiting for a completed response)
  • TF-IDF result ranking before being sent to Ollama
  • Smart chunking - Pages are split into 512-token segments and highest-scoring chunk per page used for context
  • Conversation memory - 30-minute cross-search conversation history via Valkey for follow-up questions
  • Markdown support
  • Intent emoji badge showing what intent prompt was used

Install

  1. Download the plugin:

    Main repo (Gitea)

    curl -o ollama_answers.py https://git.tysstech.com/TySS-Dev/ollama-ai-answers-searxng/raw/branch/main/ollama_answers.py
    

    Mirror repo (Github):

    curl -o ollama_answers.py https://raw.githubusercontent.com/TySP-Dev/ollama-ai-answers-searxng/main/ollama_answers.py
    
  2. Copy to your SearXNG plugins directory:

    cp ollama_answers.py path_to/searxng/plugins/ollama_answers.py
    
  3. Add the volume mount to your docker-compose.yml under the searxng service:

    volumes:
      - ./plugins/ollama_answers.py:/usr/local/searxng/searx/plugins/ollama_answers.py:Z
    
  4. Add environment variables to docker-compose.yml:

    environment:
      - LLM_URL=http://ollama:11434/v1/chat/completions
      - LLM_MODEL=qwen3.5:9b
      - VALKEY_HOST=searxng-valkey
    
  5. Add to settings.yml plugins section:

    plugins:
      searx.plugins.ollama_answers.SXNGPlugin:
        active: true
    
  6. Restart SearXNG:

    docker compose up -d --force-recreate core
    

Configuration

Configure via environment variables.

Variable Default Description
LLM_URL http://ollama:11434/v1/chat/completions Ollama endpoint
LLM_MODEL qwen3.5:9b Default model
LLM_MAX_TOKENS 200 Max response tokens
LLM_TEMPERATURE 0.2 Response temperature
LLM_TABS general,science,it,news Tabs to show AI overview on
LLM_QUESTION_MARK_REQUIRED false Only trigger on queries with ?
LLM_INTERACTIVE true Show copy/regen/follow-up UI
LLM_SYSTEM_PROMPT (built-in) Override the system prompt
LLM_CONTEXT_DEEP_COUNT 5 Full-content results to fetch
LLM_CONTEXT_SHALLOW_COUNT 15 Headline-only results
VALKEY_HOST searxng-valkey Valkey container hostname
VALKEY_PORT 6379 Valkey port

How It Works

  1. User performs a search
  2. Results return server-side
  3. post_search plugin hook fires
  4. Token-optimized context is extracted from results
  5. UI/logic shell injected into the standard answers object
  6. Client-side script calls a signed endpoint (/ai-stream)
  7. Ollama streams a response token-by-token in the UI

Known Issues

  • When asking a follow up question the previous output disappears
  • Parts of the UI are not theme aware resulting in a unpolished look when not using a dark theme
  • When SearXNG provides a info blob for a search it appears on top of the overview i.e. Wikipedia or Linux

For any issues not stated here please create an issue ticket on Gitea or GitHub and add the bug tag.

Roadmap

  • Working on feature plans

Architecture

┌─────────────────────────────────────────────────────┐
│                   Browser                           │
│  POST /ai-stream → GET /ai-status/{id} (poll 150ms) │
└────────────────┬────────────────────────────────────┘
                 │
┌────────────────▼────────────────────────────────────┐
│              SearXNG + Plugin                       │
│                                                     │
│  post_search()                                      │
│    → _enrich_results()  ← ThreadPoolExecutor        │
│      → _fetch_page_text() × 5 parallel              │
│      → _chunk_text() + _tfidf_score()               │
│      → rerank by score                              │
│    → _assemble_context()                            │
│    → inject AI Overview HTML + JS                   │
│                                                     │
│  /ai-stream                                         │
│    → validate token                                 │
│    → _detect_intent() → select system prompt        │
│    → _load_conversation() from Valkey               │
│    → launch stream_to_valkey() thread               │
│    → return {job_id} immediately                    │
│                                                     │
│  stream_to_valkey() [background thread]             │
│    → Ollama stream=True                             │
│    → RPUSH tokens to Valkey                         │
│    → RPUSH __DONE__ when complete                   │
│                                                     │
│  /ai-status/{job_id}                                │
│    → LRANGE chunks from offset                      │
│    → return {chunks, done}                          │
└────────────────┬────────────────────────────────────┘
                 │
┌────────────────▼────────────────────────────────────┐
│                  Valkey                             │
│  ai:job:{id}:chunks  (list, TTL 120s)               │
│  ai:job:{id}:status  (string, TTL 120s)             │
│  ai:conv:{session}   (JSON, TTL 1800s)              │
└─────────────────────────────────────────────────────┘

Docker Compose Example

services:
  searxng:
    environment:
      - LLM_URL=http://ollama:11434/v1/chat/completions
      - LLM_MODEL=qwen3.5:9b
      - VALKEY_HOST=searxng-valkey
    volumes:
      - ./ollama_answers.py:/usr/local/searxng/searx/plugins/ollama_answers.py

  ollama:
    image: ollama/ollama
    volumes:
      - ollama_data:/root/.ollama

volumes:
  ollama_data:

Remote Ollama

If your Ollama instance is remote or behind a reverse proxy, set LLM_URL to the full endpoint and provide an API key if required. The plugin supports Bearer token auth and follows HTTP redirects.

environment:
  - LLM_URL=https://ollama.example.com/v1/chat/completions
  - LLM_API_KEY=your-bearer-token

Project Structure

ollama-ai-answers-searxng/
├── ollama_answers.py      # single plugin file — all logic here
├── README.md
├── requirements.txt       # flask, flask-babel (for local dev only)
└── tests/
    └── dev.py             # local dev server

Development — Dev Server

A standalone Flask dev server is included in tests/dev.py. It mocks the SearXNG plugin environment so you can test the full UI without a running SearXNG instance.

Setup

pip install flask flask-babel certifi

Run

python tests/dev.py

Then open http://127.0.0.1:5000/ in your browser.

Note: Use 127.0.0.1:5000, not localhost:5000 — macOS AirPlay Receiver can occupy the IPv6 loopback on port 5000.

Usage

  • Type a query in the search bar and hit Search to trigger an AI overview.
  • Expand Ollama Configuration at the top to change the endpoint URL or Bearer token for the current session. Click Apply to save and re-run the current query.
  • The model selector in the AI overview widget (loaded from /ai-models) shows all models available on the configured Ollama server and persists your choice in the session URL.

Environment Variables (dev)

The dev server reads the same variables as the plugin:

LLM_URL=http://localhost:11434/v1/chat/completions \
LLM_MODEL=qwen3.5:9b \
python tests/dev.py

Or export them before running. Any values set in the config panel at runtime take priority for that session.

S
Description
A SearXNG plugin that generates local AI overviews powered by Ollama, using search results as RAG context.
Readme 785 KiB
Languages
Python 100%