Files
ollama-ai-answers-searxng/README.md
T
2026-05-17 16:07:00 -04:00

222 lines
9.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<div align="center">
[![Main Repo](https://img.shields.io/badge/Main%20Repo-gits.tysstech.com-blue?logo=gitea)](https://git.tysstech.com/TySS-Dev/ollama-ai-answers-searxng)
[![Mirror Repo](https://img.shields.io/badge/Mirror%20Repo-github.com-blue?logo=github)](https://github.com/TySP-Dev/ollama-ai-answers-searxng)
<div align="left">
# Ollama AI Answers Plugin for SearXNG
**Based on [ai-answers-searxng](https://github.com/cra88y/ai-answers-searxng) by [cra88y](https://github.com/cra88y)**
A SearXNG plugin that generates local AI overviews powered by Ollama, using search results as RAG context.
Features:
- Token-by-token UI streaming
- Clickable inline citations
- Interactive mode: continue summary, ask follow-ups, copy, or regenerate
- Simple response mode with no extras
- Internally called low-latency RAG for follow-ups (bypasses HTTP loopback)
- Native network integration via `searx.network` (respects proxy/SSL settings)
- Stateless conversation persistence/shareability via URL hash
- Model selector in the AI overview widget
- Does not slow down result loading
- One file install
- Real-time streaming via Valkey — responses stream token by token using a background thread + Valkey job queue, working around granian's broken generator support for true streaming feel
- TF-IDF result reranking — fetched page content is scored against the query using BM25-style TF-IDF before being sent to Ollama, surfacing the most relevant sources first
- Smart chunking — pages are split into 512-token overlapping segments and the highest-scoring chunk per page is selected for context
- Intent detection — queries are automatically classified into 8 intent types (factual, howto, technical, comparison, opinion, current, local, general) with tailored system prompts per type
- Conversation memory — 30-minute cross-search conversation history stored in Valkey, so follow-up questions work even after navigating to a new search
- Markdown rendering — AI responses render bold, italic, lists, headers, and inline code natively in the result box
- Intent emoji badge — a small emoji appears next to "AI Overview" indicating the detected query type
## Install
1. Download the plugin:
```bash
curl -o ollama_answers.py https://raw.githubusercontent.com/TySP-Dev/ollama-ai-answers-searxng/master/ollama_answers.py
```
2. Copy to your SearXNG plugins directory:
```bash
cp ollama_answers.py ~/searxng/plugins/ollama_answers.py
```
3. Add the volume mount to your `docker-compose.yml` under the searxng service:
```yaml
volumes:
- ./plugins/ollama_answers.py:/usr/local/searxng/searx/plugins/ollama_answers.py:Z
```
4. Add environment variables to `docker-compose.yml`:
```yaml
environment:
- LLM_URL=http://ollama:11434/v1/chat/completions
- LLM_MODEL=qwen3.5:9b
- VALKEY_HOST=searxng-valkey
```
5. Add to `settings.yml` plugins section:
```yaml
plugins:
searx.plugins.ollama_answers.SXNGPlugin:
active: true
```
6. Restart SearXNG:
```bash
docker compose up -d --force-recreate core
```
## Configuration
Configure via environment variables.
| Variable | Default | Description |
|---|---|---|
| `LLM_URL` | `http://ollama:11434/v1/chat/completions` | Ollama endpoint |
| `LLM_MODEL` | `qwen3.5:9b` | Default model |
| `LLM_MAX_TOKENS` | `200` | Max response tokens |
| `LLM_TEMPERATURE` | `0.2` | Response temperature |
| `LLM_TABS` | `general,science,it,news` | Tabs to show AI overview on |
| `LLM_QUESTION_MARK_REQUIRED` | `false` | Only trigger on queries with `?` |
| `LLM_INTERACTIVE` | `true` | Show copy/regen/follow-up UI |
| `LLM_SYSTEM_PROMPT` | *(built-in)* | Override the system prompt |
| `LLM_CONTEXT_DEEP_COUNT` | `5` | Full-content results to fetch |
| `LLM_CONTEXT_SHALLOW_COUNT` | `15` | Headline-only results |
| `VALKEY_HOST` | `searxng-valkey` | Valkey container hostname |
| `VALKEY_PORT` | `6379` | Valkey port |
## How It Works
1. User performs a search
2. Results return server-side
3. `post_search` plugin hook fires
4. Token-optimized context is extracted from results
5. UI/logic shell injected into the standard answers object
6. Client-side script calls a signed endpoint (`/ai-stream`)
7. Ollama streams a response token-by-token in the UI
## Architecture
```
┌─────────────────────────────────────────────────────┐
│ Browser │
│ POST /ai-stream → GET /ai-status/{id} (poll 150ms) │
└────────────────┬────────────────────────────────────┘
┌────────────────▼────────────────────────────────────┐
│ SearXNG + Plugin │
│ │
│ post_search() │
│ → _enrich_results() ← ThreadPoolExecutor │
│ → _fetch_page_text() × 5 parallel │
│ → _chunk_text() + _tfidf_score() │
│ → rerank by score │
│ → _assemble_context() │
│ → inject AI Overview HTML + JS │
│ │
│ /ai-stream │
│ → validate token │
│ → _detect_intent() → select system prompt │
│ → _load_conversation() from Valkey │
│ → launch stream_to_valkey() thread │
│ → return {job_id} immediately │
│ │
│ stream_to_valkey() [background thread] │
│ → Ollama stream=True │
│ → RPUSH tokens to Valkey │
│ → RPUSH __DONE__ when complete │
│ │
│ /ai-status/{job_id} │
│ → LRANGE chunks from offset │
│ → return {chunks, done} │
└────────────────┬────────────────────────────────────┘
┌────────────────▼────────────────────────────────────┐
│ Valkey │
│ ai:job:{id}:chunks (list, TTL 120s) │
│ ai:job:{id}:status (string, TTL 120s) │
│ ai:conv:{session} (JSON, TTL 1800s) │
└─────────────────────────────────────────────────────┘
```
## Docker Compose Example
```yaml
services:
searxng:
environment:
- LLM_URL=http://ollama:11434/v1/chat/completions
- LLM_MODEL=qwen3.5:9b
- VALKEY_HOST=searxng-valkey
volumes:
- ./ollama_answers.py:/usr/local/searxng/searx/plugins/ollama_answers.py
ollama:
image: ollama/ollama
volumes:
- ollama_data:/root/.ollama
volumes:
ollama_data:
```
## Remote Ollama
If your Ollama instance is remote or behind a reverse proxy, set `LLM_URL` to the full endpoint and provide an API key if required. The plugin supports Bearer token auth and follows HTTP redirects.
```yaml
environment:
- LLM_URL=https://ollama.example.com/v1/chat/completions
- LLM_API_KEY=your-bearer-token
```
## Project Structure
```
ollama-ai-answers-searxng/
├── ollama_answers.py # single plugin file — all logic here
├── README.md
├── requirements.txt # flask, flask-babel (for local dev only)
└── tests/
└── dev.py # local dev server
```
## Development — Dev Server
A standalone Flask dev server is included in `tests/dev.py`. It mocks the SearXNG plugin environment so you can test the full UI without a running SearXNG instance.
### Setup
```bash
pip install flask flask-babel certifi
```
### Run
```bash
python tests/dev.py
```
Then open [http://127.0.0.1:5000/](http://127.0.0.1:5000/) in your browser.
> **Note:** Use `127.0.0.1:5000`, not `localhost:5000` — macOS AirPlay Receiver can occupy the IPv6 loopback on port 5000.
### Usage
- Type a query in the search bar and hit **Search** to trigger an AI overview.
- Expand **Ollama Configuration** at the top to change the endpoint URL or Bearer token for the current session. Click **Apply** to save and re-run the current query.
- The model selector in the AI overview widget (loaded from `/ai-models`) shows all models available on the configured Ollama server and persists your choice in the session URL.
### Environment Variables (dev)
The dev server reads the same variables as the plugin:
```bash
LLM_URL=http://localhost:11434/v1/chat/completions \
LLM_MODEL=qwen3.5:9b \
python tests/dev.py
```
Or export them before running. Any values set in the config panel at runtime take priority for that session.