Main Repo Mirror Repo

Ollama AI Answers Plugin for SearXNG

Based on ai-answers-searxng by cra88y

A SearXNG plugin that generates local AI overviews powered by Ollama, using search results as RAG context.

Features:

  • Inline numbered citations
  • Interactive mode - Continue summary, ask follow-ups, copy, or regenerate
  • Overview of ranked results with prompts based on detected query intent:
    • How To Technical Factual Comparison Opinion Current Local `Geneal, General
  • Internally called RAG for follow-ups
  • Native network integration via searx.network
  • Stateless conversation presistence/shareability via URL hash
  • Ollama model selector
  • Feeds fetched results to Ollama without slowing down SearXNG results
  • Real-time streaming via Valkey (No waiting for a completed response)
  • TF-IDF result ranking before being sent to Ollama
  • Smart chunking - Pages are split into 512-token segments and highest-scoring chunk per page used for context
  • Conversation memory - 30-minute cross-search conversation history via Valkey for follow-up questions
  • Markdown support
  • Intent emoji badge showing what intent prompt was used

Install

  1. Download the plugin:

    Main repo (Gitea)

    curl -o ollama_answers.py https://git.tysstech.com/TySS-Dev/ollama-ai-answers-searxng/raw/branch/main/ollama_answers.py
    

    Mirror repo (Github):

    curl -o ollama_answers.py https://raw.githubusercontent.com/TySP-Dev/ollama-ai-answers-searxng/main/ollama_answers.py
    
  2. Copy to your SearXNG plugins directory:

    cp ollama_answers.py path_to/searxng/plugins/ollama_answers.py
    
  3. Add the volume mount to your docker-compose.yml under the searxng service:

    volumes:
      - ./plugins/ollama_answers.py:/usr/local/searxng/searx/plugins/ollama_answers.py:Z
    
  4. Add environment variables to docker-compose.yml:

    environment:
      - LLM_URL=http://ollama:11434/v1/chat/completions
      - LLM_MODEL=qwen3.5:9b
      - VALKEY_HOST=searxng-valkey
    
  5. Add to settings.yml plugins section:

    plugins:
      searx.plugins.ollama_answers.SXNGPlugin:
        active: true
    
  6. Restart SearXNG:

    docker compose up -d --force-recreate core
    

Configuration

Configure via environment variables.

Variable Default Description
LLM_URL http://ollama:11434/v1/chat/completions Ollama endpoint
LLM_MODEL qwen3.5:9b Default model
LLM_MAX_TOKENS 200 Max response tokens
LLM_TEMPERATURE 0.2 Response temperature
LLM_TABS general,science,it,news Tabs to show AI overview on
LLM_QUESTION_MARK_REQUIRED false Only trigger on queries with ?
LLM_INTERACTIVE true Show copy/regen/follow-up UI
LLM_SYSTEM_PROMPT (built-in) Override the system prompt
LLM_CONTEXT_DEEP_COUNT 5 Full-content results to fetch
LLM_CONTEXT_SHALLOW_COUNT 15 Headline-only results
VALKEY_HOST searxng-valkey Valkey container hostname
VALKEY_PORT 6379 Valkey port

How It Works

  1. User performs a search
  2. Results return server-side
  3. post_search plugin hook fires
  4. Token-optimized context is extracted from results
  5. UI/logic shell injected into the standard answers object
  6. Client-side script calls a signed endpoint (/ai-stream)
  7. Ollama streams a response token-by-token in the UI

Architecture

┌─────────────────────────────────────────────────────┐
│                   Browser                           │
│  POST /ai-stream → GET /ai-status/{id} (poll 150ms) │
└────────────────┬────────────────────────────────────┘
                 │
┌────────────────▼────────────────────────────────────┐
│              SearXNG + Plugin                        │
│                                                      │
│  post_search()                                       │
│    → _enrich_results()  ← ThreadPoolExecutor        │
│      → _fetch_page_text() × 5 parallel              │
│      → _chunk_text() + _tfidf_score()               │
│      → rerank by score                              │
│    → _assemble_context()                            │
│    → inject AI Overview HTML + JS                   │
│                                                      │
│  /ai-stream                                          │
│    → validate token                                  │
│    → _detect_intent() → select system prompt        │
│    → _load_conversation() from Valkey               │
│    → launch stream_to_valkey() thread               │
│    → return {job_id} immediately                    │
│                                                      │
│  stream_to_valkey() [background thread]             │
│    → Ollama stream=True                             │
│    → RPUSH tokens to Valkey                         │
│    → RPUSH __DONE__ when complete                   │
│                                                      │
│  /ai-status/{job_id}                                │
│    → LRANGE chunks from offset                      │
│    → return {chunks, done}                          │
└────────────────┬────────────────────────────────────┘
                 │
┌────────────────▼────────────────────────────────────┐
│                  Valkey                              │
│  ai:job:{id}:chunks  (list, TTL 120s)               │
│  ai:job:{id}:status  (string, TTL 120s)             │
│  ai:conv:{session}   (JSON, TTL 1800s)              │
└─────────────────────────────────────────────────────┘

Docker Compose Example

services:
  searxng:
    environment:
      - LLM_URL=http://ollama:11434/v1/chat/completions
      - LLM_MODEL=qwen3.5:9b
      - VALKEY_HOST=searxng-valkey
    volumes:
      - ./ollama_answers.py:/usr/local/searxng/searx/plugins/ollama_answers.py

  ollama:
    image: ollama/ollama
    volumes:
      - ollama_data:/root/.ollama

volumes:
  ollama_data:

Remote Ollama

If your Ollama instance is remote or behind a reverse proxy, set LLM_URL to the full endpoint and provide an API key if required. The plugin supports Bearer token auth and follows HTTP redirects.

environment:
  - LLM_URL=https://ollama.example.com/v1/chat/completions
  - LLM_API_KEY=your-bearer-token

Project Structure

ollama-ai-answers-searxng/
├── ollama_answers.py      # single plugin file — all logic here
├── README.md
├── requirements.txt       # flask, flask-babel (for local dev only)
└── tests/
    └── dev.py             # local dev server

Development — Dev Server

A standalone Flask dev server is included in tests/dev.py. It mocks the SearXNG plugin environment so you can test the full UI without a running SearXNG instance.

Setup

pip install flask flask-babel certifi

Run

python tests/dev.py

Then open http://127.0.0.1:5000/ in your browser.

Note: Use 127.0.0.1:5000, not localhost:5000 — macOS AirPlay Receiver can occupy the IPv6 loopback on port 5000.

Usage

  • Type a query in the search bar and hit Search to trigger an AI overview.
  • Expand Ollama Configuration at the top to change the endpoint URL or Bearer token for the current session. Click Apply to save and re-run the current query.
  • The model selector in the AI overview widget (loaded from /ai-models) shows all models available on the configured Ollama server and persists your choice in the session URL.

Environment Variables (dev)

The dev server reads the same variables as the plugin:

LLM_URL=http://localhost:11434/v1/chat/completions \
LLM_MODEL=qwen3.5:9b \
python tests/dev.py

Or export them before running. Any values set in the config panel at runtime take priority for that session.

S
Description
A SearXNG plugin that generates local AI overviews powered by Ollama, using search results as RAG context.
Readme 785 KiB
Languages
Python 100%