ollama-ai-answers-searxng

Local AI search overviews for SearXNG, powered by Ollama.

One-line Install

bash <(curl -fsSL https://raw.githubusercontent.com/TySP-Dev/ollama-ai-answers-searxng/master/install.sh)

Features

AI Overview box at the top of every search result page
Powered entirely by your local Ollama instance — no external API calls
Page content fetching — enriches context beyond SearXNG snippets
Model selector dropdown — switch models per-search without restarting
Inline citations with clickable source links
Citation footer listing all referenced sources
Follow-up questions with conversation history
Copy and Regenerate buttons
Typewriter animation (granian-compatible buffered response)
Ollama-only — no OpenAI, Gemini, or other provider bloat

Requirements

SearXNG installed via Docker Compose
Ollama running and accessible from the SearXNG container
Python 3.8+ (for build.py and install.sh)
Docker + Docker Compose

Install

One-line (recommended)

bash <(curl -fsSL https://raw.githubusercontent.com/TySP-Dev/ollama-ai-answers-searxng/master/install.sh)

The script will clone the repo, build the plugin, detect your SearXNG Docker Compose installation, copy the plugin, update docker-compose.yml and settings.yml, and optionally restart SearXNG.

Manual

git clone https://github.com/TySP-Dev/ollama-ai-answers-searxng
cd ollama-ai-answers-searxng
python3 build.py
bash install.sh

Or manually copy the built plugin and update your config:

# docker-compose.yml — searxng service
environment:
  - LLM_URL=http://ollama:11434/v1/chat/completions
  - LLM_MODEL=qwen3.5:9b
volumes:
  - ./plugins/ollama_answers.py:/usr/local/searxng/searx/plugins/ollama_answers.py:Z

# settings.yml
plugins:
  searx.plugins.ollama_answers.SXNGPlugin:
    active: true

Configuration

All configuration is done via environment variables on the SearXNG container.

Variable	Default	Description
`LLM_URL`	`http://ollama:11434/v1/chat/completions`	Ollama endpoint
`LLM_MODEL`	`qwen3.5:9b`	Default model
`LLM_MAX_TOKENS`	`200`	Max response tokens
`LLM_TEMPERATURE`	`0.2`	Response temperature
`LLM_TABS`	`general,science,it,news`	Search tabs to show AI overview on
`LLM_QUESTION_MARK_REQUIRED`	`false`	Only trigger on queries ending with `?`
`LLM_INTERACTIVE`	`true`	Show copy/regenerate/follow-up UI
`LLM_SYSTEM_PROMPT`	(built-in)	Override the system prompt
`LLM_CONTEXT_DEEP_COUNT`	`5`	Results fetched for full page content
`LLM_CONTEXT_SHALLOW_COUNT`	`15`	Results used as headline-only context

Project Structure

ollama-ai-answers-searxng/
├── ollama_answers.py      # Source plugin — reads UI from assets/
├── build.py               # Assembles dist/ollama_answers.py (self-contained)
├── install.sh             # Full automated Docker Compose installer
├── assets/
│   ├── ui.css             # Interactive widget styles
│   ├── ui.html            # Interactive widget HTML (copy/regen/follow-up bar)
│   └── ui.js              # Frontend JS (typewriter, citations, streaming)
├── dist/                  # Output of build.py — gitignored
│   └── ollama_answers.py  # Self-contained, ready to deploy
├── dev/
│   └── dev.py             # Local Flask dev server (no SearXNG required)
└── README.md

Development

# Edit source files
vim ollama_answers.py
vim assets/ui.css

# Build dist file for deployment
python3 build.py

# Deploy to server
cp dist/ollama_answers.py ~/searxng/plugins/ollama_answers.py
cd ~/searxng && docker compose up -d --force-recreate core

# Run local dev server
PYTHONPATH=. python3 dev/dev.py

The dev server mocks the SearXNG plugin environment so you can test the full UI without a running SearXNG instance. Open http://127.0.0.1:5000/ after starting it.

Note: Use 127.0.0.1:5000, not localhost:5000 — macOS AirPlay Receiver can occupy the IPv6 loopback on port 5000.

How It Works

User searches on SearXNG
post_search hook fires after results are fetched
Top result URLs are fetched in parallel for full page content
Context is assembled from page content + snippets + infoboxes
A signed token is generated and injected into the page
The browser POSTs to /ai-stream with the token and context
The server calls Ollama with the enriched context
The response is returned as JSON and animated with a typewriter effect
Citations are rendered inline and collected in a footer

License

MIT License

5.0 KiB Raw Blame History