Updated the demo.py to work with the changes in ai_answers.py

2026-05-15 15:25:37 -04:00
parent c9d00ca52a
commit dec64bda9b
4 changed files with 515 additions and 78 deletions
@@ -8,70 +8,121 @@ A SearXNG plugin that generates local AI overviews powered by Ollama, using sear
 Features:
 - token-by-token UI streaming
 - clickable inline citations
- interactive mode to continue summary, ask follow ups, copy, or regenerate
+- interactive mode: continue summary, ask follow-ups, copy, or regenerate
 - simple response mode with no extras
- internally called low-latency RAG for follow ups (bypasses http loopback)
+- internally called low-latency RAG for follow-ups (bypasses HTTP loopback)
 - native network integration via `searx.network` (respects proxy/SSL settings)
- stateless conversation persistence/sharability via URL
-
+- stateless conversation persistence/shareability via URL hash
+- model selector in the AI overview widget

 ## Installation

-Place `ai_answers.py` into the `searx/plugins` directory of your instance (or mount it in a container) and enable it in `settings.yml`:
+Place `ai_answers.py` into the `searx/plugins` directory of your SearXNG instance (or mount it in a container) and enable it in `settings.yml`:

 ```yaml
 plugins:
-  searx.plugins.ai_answers.SXNGPlugin:  
+  searx.plugins.ai_answers.SXNGPlugin:
    active: true
 ```

 ## Configuration

-Configure via environment variables:
+Configure via environment variables.

 ### Required

- `LLM_URL`: Ollama chat completions endpoint. Default: `http://ollama:11434/v1/chat/completions`
- `LLM_MODEL`: Model name as listed in Ollama. Default: `llama3.2`
+| Variable | Description | Default |
+|---|---|---|
+| `LLM_URL` | Ollama chat completions endpoint | `http://ollama:11434/v1/chat/completions` |
+| `LLM_MODEL` | Model name as listed in Ollama | `qwen3.5:9b` |

 ### Optional

- `LLM_SYSTEM_PROMPT`: Overrides the system prompt. Default: `You are a direct, citation-accurate search synthesis engine.`
- `LLM_MAX_TOKENS`: Default `200`.
- `LLM_TEMPERATURE`: Default `0.2`.
- `LLM_CONTEXT_DEEP_COUNT`: Results used as context with full snippets. Default `5`.
- `LLM_CONTEXT_SHALLOW_COUNT`: Results with headlines only (additional breadth). Default `15`.
- `LLM_TABS`: Tab whitelist, comma delimited. Default `general,science,it,news`.
- `LLM_INTERACTIVE`: UI mode. Default `true` (interactive: copy, regenerate, follow up). Set to `false` for simple response only.
- `LLM_QUESTION_MARK_REQUIRED`: Only trigger AI answers when the query contains `?`. Default `false`.
+| Variable | Description | Default |
+|---|---|---|
+| `LLM_SYSTEM_PROMPT` | Overrides the default system prompt | `You are a direct, citation-accurate search synthesis engine.` |
+| `LLM_MAX_TOKENS` | Max tokens in the AI response | `200` |
+| `LLM_TEMPERATURE` | Sampling temperature | `0.2` |
+| `LLM_CONTEXT_DEEP_COUNT` | Results used with full snippets | `5` |
+| `LLM_CONTEXT_SHALLOW_COUNT` | Results with headlines only (breadth) | `15` |
+| `LLM_TABS` | Comma-delimited tab whitelist | `general,science,it,news` |
+| `LLM_INTERACTIVE` | Interactive UI mode (copy, regenerate, follow-up) | `true` |
+| `LLM_QUESTION_MARK_REQUIRED` | Only trigger on queries containing `?` | `false` |

 ## How It Works
-1. User performs initial search
-2. Results return server side
+
+1. User performs a search
+2. Results return server-side
 3. `post_search` plugin hook fires
-4. Token-optimized context extracted from results
-5. UI/logic shell injected into the standard results answer object
-6. Client-side script calls custom endpoint with a signed token
-7. Ollama response renders token by token in the UI
+4. Token-optimized context is extracted from results
+5. UI/logic shell injected into the standard answers object
+6. Client-side script calls a signed endpoint (`/ai-stream`)
+7. Ollama streams a response token-by-token in the UI

-## Example
+## Docker Compose Example
+
+```yaml
+services:
+  searxng:
+    environment:
+      - LLM_URL=http://ollama:11434/v1/chat/completions
+      - LLM_MODEL=qwen3.5:9b
+    volumes:
+      - ./ai_answers.py:/usr/local/searxng/searx/plugins/ai_answers.py
+
+  ollama:
+    image: ollama/ollama
+    volumes:
+      - ollama_data:/root/.ollama
+
+volumes:
+  ollama_data:
+```
+
+## Remote Ollama
+
+If your Ollama instance is remote or behind a reverse proxy, set `LLM_URL` to the full endpoint and provide an API key if required. The plugin supports Bearer token auth and follows HTTP redirects.

-### Docker Compose
 ```yaml
 environment:
-  - LLM_URL=http://ollama:11434/v1/chat/completions
-  - LLM_MODEL=llama3.2
+  - LLM_URL=https://ollama.example.com/v1/chat/completions
+  - LLM_API_KEY=your-bearer-token
 ```

-### Environment variables
-```
-LLM_URL=http://ollama:11434/v1/chat/completions
-LLM_MODEL=llama3.2
-```
+## Development — Demo Server

-## Development
+A standalone Flask demo server is included in `tests/demo.py`. It mocks the SearXNG plugin environment so you can test the full UI without a running SearXNG instance.
+
+### Setup

 ```bash
-pip install flask flask-babel
-python tests/demo.py   # UI demo at localhost:5000
+pip install flask flask-babel certifi
 ```
+
+### Run
+
+```bash
+python tests/demo.py
+```
+
+Then open [http://127.0.0.1:5000/](http://127.0.0.1:5000/) in your browser.
+
+> **Note:** Use `127.0.0.1:5000`, not `localhost:5000` — macOS AirPlay Receiver can occupy the IPv6 loopback on port 5000.
+
+### Usage
+
+- Type a query in the search bar and hit **Search** to trigger an AI overview.
+- Expand **Ollama Configuration** at the top to change the endpoint URL or Bearer token for the current session. Click **Apply** to save and re-run the current query.
+- The model selector in the AI overview widget (loaded from `/ai-models`) shows all models available on the configured Ollama server and persists your choice in the session URL.
+
+### Environment Variables (demo)
+
+The demo reads the same variables as the plugin:
+
+```bash
+LLM_URL=http://localhost:11434/v1/chat/completions \
+LLM_MODEL=qwen3.5:9b \
+python tests/demo.py
+```
+
+Or export them before running. Any values set in the config panel at runtime take priority for that session.