TySS-Dev/ollama-ai-answers-searxng

Fork 0

T

TySS-Dev 85d1481bd9 Updated README

2026-05-17 19:45:37 -04:00

tests

Updated file name, and updated call to main program

2026-05-15 15:50:07 -04:00

.gitignore

feats: native searxng networking, code composition, ux polish, follow up querying via internals, config var clarity, readme

2026-01-20 21:35:43 -06:00

ollama_answers.py

Better markdown support

2026-05-17 16:02:31 -04:00

README.md

Updated README

2026-05-17 19:45:37 -04:00

requirements.txt

Updated the demo.py to work with the changes in ai_answers.py

2026-05-15 15:25:37 -04:00

README.md

Ollama AI Answers Plugin for SearXNG

Based on ai-answers-searxng by cra88y

A SearXNG plugin that generates local AI overviews powered by Ollama, using search results as RAG context.

Features:

Inline numbered citations
Interactive mode - Continue summary, ask follow-ups, copy, or regenerate
Overview of ranked results with prompts based on detected query intent:
- How To Technical Factual Comparison Opinion Current Local `Geneal, General
Internally called RAG for follow-ups
Native network integration via searx.network
Stateless conversation presistence/shareability via URL hash
Ollama model selector
Feeds fetched results to Ollama without slowing down SearXNG results
Real-time streaming via Valkey (No waiting for a completed response)
TF-IDF result ranking before being sent to Ollama
Smart chunking - Pages are split into 512-token segments and highest-scoring chunk per page used for context
Conversation memory - 30-minute cross-search conversation history via Valkey for follow-up questions
Markdown support
Intent emoji badge showing what intent prompt was used

Install

Download the plugin:

Main repo (Gitea)

curl -o ollama_answers.py https://git.tysstech.com/TySS-Dev/ollama-ai-answers-searxng/raw/branch/main/ollama_answers.py

Mirror repo (Github):

curl -o ollama_answers.py https://raw.githubusercontent.com/TySP-Dev/ollama-ai-answers-searxng/main/ollama_answers.py

Copy to your SearXNG plugins directory:

cp ollama_answers.py path_to/searxng/plugins/ollama_answers.py

Add the volume mount to your docker-compose.yml under the searxng service:

volumes:
  - ./plugins/ollama_answers.py:/usr/local/searxng/searx/plugins/ollama_answers.py:Z

Add environment variables to docker-compose.yml:

environment:
  - LLM_URL=http://ollama:11434/v1/chat/completions
  - LLM_MODEL=qwen3.5:9b
  - VALKEY_HOST=searxng-valkey

Add to settings.yml plugins section:

plugins:
  searx.plugins.ollama_answers.SXNGPlugin:
    active: true

Restart SearXNG:

docker compose up -d --force-recreate core

Configuration

Configure via environment variables.

Variable	Default	Description
`LLM_URL`	`http://ollama:11434/v1/chat/completions`	Ollama endpoint
`LLM_MODEL`	`qwen3.5:9b`	Default model
`LLM_MAX_TOKENS`	`200`	Max response tokens
`LLM_TEMPERATURE`	`0.2`	Response temperature
`LLM_TABS`	`general,science,it,news`	Tabs to show AI overview on
`LLM_QUESTION_MARK_REQUIRED`	`false`	Only trigger on queries with `?`
`LLM_INTERACTIVE`	`true`	Show copy/regen/follow-up UI
`LLM_SYSTEM_PROMPT`	(built-in)	Override the system prompt
`LLM_CONTEXT_DEEP_COUNT`	`5`	Full-content results to fetch
`LLM_CONTEXT_SHALLOW_COUNT`	`15`	Headline-only results
`VALKEY_HOST`	`searxng-valkey`	Valkey container hostname
`VALKEY_PORT`	`6379`	Valkey port

How It Works

User performs a search
Results return server-side
post_search plugin hook fires
Token-optimized context is extracted from results
UI/logic shell injected into the standard answers object
Client-side script calls a signed endpoint (/ai-stream)
Ollama streams a response token-by-token in the UI

Architecture

┌─────────────────────────────────────────────────────┐
│                   Browser                           │
│  POST /ai-stream → GET /ai-status/{id} (poll 150ms) │
└────────────────┬────────────────────────────────────┘
                 │
┌────────────────▼────────────────────────────────────┐
│              SearXNG + Plugin                        │
│                                                      │
│  post_search()                                       │
│    → _enrich_results()  ← ThreadPoolExecutor        │
│      → _fetch_page_text() × 5 parallel              │
│      → _chunk_text() + _tfidf_score()               │
│      → rerank by score                              │
│    → _assemble_context()                            │
│    → inject AI Overview HTML + JS                   │
│                                                      │
│  /ai-stream                                          │
│    → validate token                                  │
│    → _detect_intent() → select system prompt        │
│    → _load_conversation() from Valkey               │
│    → launch stream_to_valkey() thread               │
│    → return {job_id} immediately                    │
│                                                      │
│  stream_to_valkey() [background thread]             │
│    → Ollama stream=True                             │
│    → RPUSH tokens to Valkey                         │
│    → RPUSH __DONE__ when complete                   │
│                                                      │
│  /ai-status/{job_id}                                │
│    → LRANGE chunks from offset                      │
│    → return {chunks, done}                          │
└────────────────┬────────────────────────────────────┘
                 │
┌────────────────▼────────────────────────────────────┐
│                  Valkey                              │
│  ai:job:{id}:chunks  (list, TTL 120s)               │
│  ai:job:{id}:status  (string, TTL 120s)             │
│  ai:conv:{session}   (JSON, TTL 1800s)              │
└─────────────────────────────────────────────────────┘

Docker Compose Example

services:
  searxng:
    environment:
      - LLM_URL=http://ollama:11434/v1/chat/completions
      - LLM_MODEL=qwen3.5:9b
      - VALKEY_HOST=searxng-valkey
    volumes:
      - ./ollama_answers.py:/usr/local/searxng/searx/plugins/ollama_answers.py

  ollama:
    image: ollama/ollama
    volumes:
      - ollama_data:/root/.ollama

volumes:
  ollama_data:

Remote Ollama

If your Ollama instance is remote or behind a reverse proxy, set LLM_URL to the full endpoint and provide an API key if required. The plugin supports Bearer token auth and follows HTTP redirects.

environment:
  - LLM_URL=https://ollama.example.com/v1/chat/completions
  - LLM_API_KEY=your-bearer-token

Project Structure

ollama-ai-answers-searxng/
├── ollama_answers.py      # single plugin file — all logic here
├── README.md
├── requirements.txt       # flask, flask-babel (for local dev only)
└── tests/
    └── dev.py             # local dev server

Development — Dev Server

A standalone Flask dev server is included in tests/dev.py. It mocks the SearXNG plugin environment so you can test the full UI without a running SearXNG instance.

Setup

pip install flask flask-babel certifi

Run

python tests/dev.py

Then open http://127.0.0.1:5000/ in your browser.

Note: Use 127.0.0.1:5000, not localhost:5000 — macOS AirPlay Receiver can occupy the IPv6 loopback on port 5000.

Usage

Type a query in the search bar and hit Search to trigger an AI overview.
Expand Ollama Configuration at the top to change the endpoint URL or Bearer token for the current session. Click Apply to save and re-run the current query.
The model selector in the AI overview widget (loaded from /ai-models) shows all models available on the configured Ollama server and persists your choice in the session URL.

Environment Variables (dev)

The dev server reads the same variables as the plugin:

LLM_URL=http://localhost:11434/v1/chat/completions \
LLM_MODEL=qwen3.5:9b \
python tests/dev.py

Or export them before running. Any values set in the config panel at runtime take priority for that session.

README.md Unescape Escape

Ollama AI Answers Plugin for SearXNG

Features:

Install

Main repo (Gitea)

Mirror repo (Github):

Configuration

How It Works

Architecture

Docker Compose Example

Remote Ollama

Project Structure

Development — Dev Server

Setup

Run

Usage

Environment Variables (dev)

README.md