TySS-Dev/ollama-ai-answers-searxng

Fork 0

Files

T

Tyler 904cf945a2 Updated README

2026-05-17 16:07:00 -04:00

9.3 KiB

Raw Blame History

Ollama AI Answers Plugin for SearXNG

Based on ai-answers-searxng by cra88y

A SearXNG plugin that generates local AI overviews powered by Ollama, using search results as RAG context.

Features:

Token-by-token UI streaming
Clickable inline citations
Interactive mode: continue summary, ask follow-ups, copy, or regenerate
Simple response mode with no extras
Internally called low-latency RAG for follow-ups (bypasses HTTP loopback)
Native network integration via searx.network (respects proxy/SSL settings)
Stateless conversation persistence/shareability via URL hash
Model selector in the AI overview widget
Does not slow down result loading
One file install
Real-time streaming via Valkey — responses stream token by token using a background thread + Valkey job queue, working around granian's broken generator support for true streaming feel
TF-IDF result reranking — fetched page content is scored against the query using BM25-style TF-IDF before being sent to Ollama, surfacing the most relevant sources first
Smart chunking — pages are split into 512-token overlapping segments and the highest-scoring chunk per page is selected for context
Intent detection — queries are automatically classified into 8 intent types (factual, howto, technical, comparison, opinion, current, local, general) with tailored system prompts per type
Conversation memory — 30-minute cross-search conversation history stored in Valkey, so follow-up questions work even after navigating to a new search
Markdown rendering — AI responses render bold, italic, lists, headers, and inline code natively in the result box
Intent emoji badge — a small emoji appears next to "AI Overview" indicating the detected query type

Install

Download the plugin:

curl -o ollama_answers.py https://raw.githubusercontent.com/TySP-Dev/ollama-ai-answers-searxng/master/ollama_answers.py

Copy to your SearXNG plugins directory:

cp ollama_answers.py ~/searxng/plugins/ollama_answers.py

Add the volume mount to your docker-compose.yml under the searxng service:

volumes:
  - ./plugins/ollama_answers.py:/usr/local/searxng/searx/plugins/ollama_answers.py:Z

Add environment variables to docker-compose.yml:

environment:
  - LLM_URL=http://ollama:11434/v1/chat/completions
  - LLM_MODEL=qwen3.5:9b
  - VALKEY_HOST=searxng-valkey

Add to settings.yml plugins section:

plugins:
  searx.plugins.ollama_answers.SXNGPlugin:
    active: true

Restart SearXNG:

docker compose up -d --force-recreate core

Configuration

Configure via environment variables.

Variable	Default	Description
`LLM_URL`	`http://ollama:11434/v1/chat/completions`	Ollama endpoint
`LLM_MODEL`	`qwen3.5:9b`	Default model
`LLM_MAX_TOKENS`	`200`	Max response tokens
`LLM_TEMPERATURE`	`0.2`	Response temperature
`LLM_TABS`	`general,science,it,news`	Tabs to show AI overview on
`LLM_QUESTION_MARK_REQUIRED`	`false`	Only trigger on queries with `?`
`LLM_INTERACTIVE`	`true`	Show copy/regen/follow-up UI
`LLM_SYSTEM_PROMPT`	(built-in)	Override the system prompt
`LLM_CONTEXT_DEEP_COUNT`	`5`	Full-content results to fetch
`LLM_CONTEXT_SHALLOW_COUNT`	`15`	Headline-only results
`VALKEY_HOST`	`searxng-valkey`	Valkey container hostname
`VALKEY_PORT`	`6379`	Valkey port

How It Works

User performs a search
Results return server-side
post_search plugin hook fires
Token-optimized context is extracted from results
UI/logic shell injected into the standard answers object
Client-side script calls a signed endpoint (/ai-stream)
Ollama streams a response token-by-token in the UI

Architecture

┌─────────────────────────────────────────────────────┐
│                   Browser                           │
│  POST /ai-stream → GET /ai-status/{id} (poll 150ms) │
└────────────────┬────────────────────────────────────┘
                 │
┌────────────────▼────────────────────────────────────┐
│              SearXNG + Plugin                        │
│                                                      │
│  post_search()                                       │
│    → _enrich_results()  ← ThreadPoolExecutor        │
│      → _fetch_page_text() × 5 parallel              │
│      → _chunk_text() + _tfidf_score()               │
│      → rerank by score                              │
│    → _assemble_context()                            │
│    → inject AI Overview HTML + JS                   │
│                                                      │
│  /ai-stream                                          │
│    → validate token                                  │
│    → _detect_intent() → select system prompt        │
│    → _load_conversation() from Valkey               │
│    → launch stream_to_valkey() thread               │
│    → return {job_id} immediately                    │
│                                                      │
│  stream_to_valkey() [background thread]             │
│    → Ollama stream=True                             │
│    → RPUSH tokens to Valkey                         │
│    → RPUSH __DONE__ when complete                   │
│                                                      │
│  /ai-status/{job_id}                                │
│    → LRANGE chunks from offset                      │
│    → return {chunks, done}                          │
└────────────────┬────────────────────────────────────┘
                 │
┌────────────────▼────────────────────────────────────┐
│                  Valkey                              │
│  ai:job:{id}:chunks  (list, TTL 120s)               │
│  ai:job:{id}:status  (string, TTL 120s)             │
│  ai:conv:{session}   (JSON, TTL 1800s)              │
└─────────────────────────────────────────────────────┘

Docker Compose Example

services:
  searxng:
    environment:
      - LLM_URL=http://ollama:11434/v1/chat/completions
      - LLM_MODEL=qwen3.5:9b
      - VALKEY_HOST=searxng-valkey
    volumes:
      - ./ollama_answers.py:/usr/local/searxng/searx/plugins/ollama_answers.py

  ollama:
    image: ollama/ollama
    volumes:
      - ollama_data:/root/.ollama

volumes:
  ollama_data:

Remote Ollama

If your Ollama instance is remote or behind a reverse proxy, set LLM_URL to the full endpoint and provide an API key if required. The plugin supports Bearer token auth and follows HTTP redirects.

environment:
  - LLM_URL=https://ollama.example.com/v1/chat/completions
  - LLM_API_KEY=your-bearer-token

Project Structure

ollama-ai-answers-searxng/
├── ollama_answers.py      # single plugin file — all logic here
├── README.md
├── requirements.txt       # flask, flask-babel (for local dev only)
└── tests/
    └── dev.py             # local dev server

Development — Dev Server

A standalone Flask dev server is included in tests/dev.py. It mocks the SearXNG plugin environment so you can test the full UI without a running SearXNG instance.

Setup

pip install flask flask-babel certifi

Run

python tests/dev.py

Then open http://127.0.0.1:5000/ in your browser.

Note: Use 127.0.0.1:5000, not localhost:5000 — macOS AirPlay Receiver can occupy the IPv6 loopback on port 5000.

Usage

Type a query in the search bar and hit Search to trigger an AI overview.
Expand Ollama Configuration at the top to change the endpoint URL or Bearer token for the current session. Click Apply to save and re-run the current query.
The model selector in the AI overview widget (loaded from /ai-models) shows all models available on the configured Ollama server and persists your choice in the session URL.

Environment Variables (dev)

The dev server reads the same variables as the plugin:

LLM_URL=http://localhost:11434/v1/chat/completions \
LLM_MODEL=qwen3.5:9b \
python tests/dev.py

Or export them before running. Any values set in the config panel at runtime take priority for that session.

9.3 KiB Raw Blame History Unescape Escape