Updated README

Better markdown support
Adding intent based prompting
2026-05-17 16:07:00 -04:00 · 2026-05-17 16:02:31 -04:00 · 2026-05-17 15:53:44 -04:00 · 2026-05-17 15:44:53 -04:00 · 2026-05-17 15:27:21 -04:00 · 2026-05-17 15:19:44 -04:00
2 changed files with 901 additions and 221 deletions
@@ -21,40 +21,70 @@ Features:
 - Model selector in the AI overview widget
 - Does not slow down result loading
 - One file install
+- Real-time streaming via Valkey — responses stream token by token using a background thread + Valkey job queue, working around granian's broken generator support for true streaming feel
+- TF-IDF result reranking — fetched page content is scored against the query using BM25-style TF-IDF before being sent to Ollama, surfacing the most relevant sources first
+- Smart chunking — pages are split into 512-token overlapping segments and the highest-scoring chunk per page is selected for context
+- Intent detection — queries are automatically classified into 8 intent types (factual, howto, technical, comparison, opinion, current, local, general) with tailored system prompts per type
+- Conversation memory — 30-minute cross-search conversation history stored in Valkey, so follow-up questions work even after navigating to a new search
+- Markdown rendering — AI responses render bold, italic, lists, headers, and inline code natively in the result box
+- Intent emoji badge — a small emoji appears next to "AI Overview" indicating the detected query type

-## Installation
+## Install

-Place `ollama_answers.py` into the `searx/plugins` directory of your SearXNG instance (or mount it in a container) and enable it in `settings.yml`:
+1. Download the plugin:
+   ```bash
+   curl -o ollama_answers.py https://raw.githubusercontent.com/TySP-Dev/ollama-ai-answers-searxng/master/ollama_answers.py
+   ```

-```yaml
-plugins:
+2. Copy to your SearXNG plugins directory:
+   ```bash
+   cp ollama_answers.py ~/searxng/plugins/ollama_answers.py
+   ```
+
+3. Add the volume mount to your `docker-compose.yml` under the searxng service:
+   ```yaml
+   volumes:
+     - ./plugins/ollama_answers.py:/usr/local/searxng/searx/plugins/ollama_answers.py:Z
+   ```
+
+4. Add environment variables to `docker-compose.yml`:
+   ```yaml
+   environment:
+     - LLM_URL=http://ollama:11434/v1/chat/completions
+     - LLM_MODEL=qwen3.5:9b
+     - VALKEY_HOST=searxng-valkey
+   ```
+
+5. Add to `settings.yml` plugins section:
+   ```yaml
+   plugins:
     searx.plugins.ollama_answers.SXNGPlugin:
       active: true
-```
+   ```
+
+6. Restart SearXNG:
+   ```bash
+   docker compose up -d --force-recreate core
+   ```

 ## Configuration

 Configure via environment variables.

-### Required
-
-| Variable | Description | Default |
+| Variable | Default | Description |
 |---|---|---|
-| `LLM_URL` | Ollama chat completions endpoint | `http://ollama:11434/v1/chat/completions` |
-| `LLM_MODEL` | Model name as listed in Ollama | `qwen3.5:9b` |
-
-### Optional
-
-| Variable | Description | Default |
-|---|---|---|
-| `LLM_SYSTEM_PROMPT` | Overrides the default system prompt | `You are a direct, citation-accurate search synthesis engine.` |
-| `LLM_MAX_TOKENS` | Max tokens in the AI response | `200` |
-| `LLM_TEMPERATURE` | Sampling temperature | `0.2` |
-| `LLM_CONTEXT_DEEP_COUNT` | Results used with full snippets | `5` |
-| `LLM_CONTEXT_SHALLOW_COUNT` | Results with headlines only (breadth) | `15` |
-| `LLM_TABS` | Comma-delimited tab whitelist | `general,science,it,news` |
-| `LLM_INTERACTIVE` | Interactive UI mode (copy, regenerate, follow-up) | `true` |
-| `LLM_QUESTION_MARK_REQUIRED` | Only trigger on queries containing `?` | `false` |
+| `LLM_URL` | `http://ollama:11434/v1/chat/completions` | Ollama endpoint |
+| `LLM_MODEL` | `qwen3.5:9b` | Default model |
+| `LLM_MAX_TOKENS` | `200` | Max response tokens |
+| `LLM_TEMPERATURE` | `0.2` | Response temperature |
+| `LLM_TABS` | `general,science,it,news` | Tabs to show AI overview on |
+| `LLM_QUESTION_MARK_REQUIRED` | `false` | Only trigger on queries with `?` |
+| `LLM_INTERACTIVE` | `true` | Show copy/regen/follow-up UI |
+| `LLM_SYSTEM_PROMPT` | *(built-in)* | Override the system prompt |
+| `LLM_CONTEXT_DEEP_COUNT` | `5` | Full-content results to fetch |
+| `LLM_CONTEXT_SHALLOW_COUNT` | `15` | Headline-only results |
+| `VALKEY_HOST` | `searxng-valkey` | Valkey container hostname |
+| `VALKEY_PORT` | `6379` | Valkey port |

 ## How It Works

@@ -66,6 +96,50 @@ Configure via environment variables.
 6. Client-side script calls a signed endpoint (`/ai-stream`)
 7. Ollama streams a response token-by-token in the UI

+## Architecture
+
+```
+┌─────────────────────────────────────────────────────┐
+│                   Browser                           │
+│  POST /ai-stream → GET /ai-status/{id} (poll 150ms) │
+└────────────────┬────────────────────────────────────┘
+                 │
+┌────────────────▼────────────────────────────────────┐
+│              SearXNG + Plugin                        │
+│                                                      │
+│  post_search()                                       │
+│    → _enrich_results()  ← ThreadPoolExecutor        │
+│      → _fetch_page_text() × 5 parallel              │
+│      → _chunk_text() + _tfidf_score()               │
+│      → rerank by score                              │
+│    → _assemble_context()                            │
+│    → inject AI Overview HTML + JS                   │
+│                                                      │
+│  /ai-stream                                          │
+│    → validate token                                  │
+│    → _detect_intent() → select system prompt        │
+│    → _load_conversation() from Valkey               │
+│    → launch stream_to_valkey() thread               │
+│    → return {job_id} immediately                    │
+│                                                      │
+│  stream_to_valkey() [background thread]             │
+│    → Ollama stream=True                             │
+│    → RPUSH tokens to Valkey                         │
+│    → RPUSH __DONE__ when complete                   │
+│                                                      │
+│  /ai-status/{job_id}                                │
+│    → LRANGE chunks from offset                      │
+│    → return {chunks, done}                          │
+└────────────────┬────────────────────────────────────┘
+                 │
+┌────────────────▼────────────────────────────────────┐
+│                  Valkey                              │
+│  ai:job:{id}:chunks  (list, TTL 120s)               │
+│  ai:job:{id}:status  (string, TTL 120s)             │
+│  ai:conv:{session}   (JSON, TTL 1800s)              │
+└─────────────────────────────────────────────────────┘
+```
+
 ## Docker Compose Example

 ```yaml
@@ -74,6 +148,7 @@ services:
    environment:
      - LLM_URL=http://ollama:11434/v1/chat/completions
      - LLM_MODEL=qwen3.5:9b
+      - VALKEY_HOST=searxng-valkey
    volumes:
      - ./ollama_answers.py:/usr/local/searxng/searx/plugins/ollama_answers.py

@@ -96,6 +171,17 @@ environment:
  - LLM_API_KEY=your-bearer-token
 ```

+## Project Structure
+
+```
+ollama-ai-answers-searxng/
+├── ollama_answers.py      # single plugin file — all logic here
+├── README.md
+├── requirements.txt       # flask, flask-babel (for local dev only)
+└── tests/
+    └── dev.py             # local dev server
+```
+
 ## Development — Dev Server

 A standalone Flask dev server is included in `tests/dev.py`. It mocks the SearXNG plugin environment so you can test the full UI without a running SearXNG instance.
@@ -124,7 +210,7 @@ Then open [http://127.0.0.1:5000/](http://127.0.0.1:5000/) in your browser.

 ### Environment Variables (dev)

-The dev reads the same variables as the plugin:
+The dev server reads the same variables as the plugin:

 ```bash
 LLM_URL=http://localhost:11434/v1/chat/completions \
Author	SHA1	Message	Date
Tyler	904cf945a2	Updated README	2026-05-17 16:07:00 -04:00
Tyler	b3dc603b94	Better markdown support	2026-05-17 16:02:31 -04:00
Tyler	4e2f9d97d7	Adding intent based prompting	2026-05-17 15:53:44 -04:00
Tyler	1f7d54590f	Adding conversation memory	2026-05-17 15:44:53 -04:00
Tyler	2ed6a0aae9	Result filtering by relevance and RAG with chucnking logic	2026-05-17 15:27:21 -04:00
Tyler	9d6d4ec160	Fixing content not loading	2026-05-17 15:19:44 -04:00
Tyler	e4880a7a51	Adding debug logic	2026-05-17 15:17:44 -04:00
Tyler	332834a126	Adding better AI response streaming logic	2026-05-17 15:11:01 -04:00