c4fb33fbf5bba52e31721fc1f9d45b325725aa32
Build & Deploy ResearchOwl / build-and-push (push) Successful in 5s
database.py: enable PRAGMA journal_mode=WAL + synchronous=NORMAL so /status reads from concurrent connections see committed data without blocking behind the scraper's writes; add 'skipped' to get_session_stats bot.py: show skipped count in fmt_progress and cmd_status; use 'or 0' to guard against NULL from SUM(); label active research in /status processor.py: raise generate() temperature default to 0.7 + add repeat_penalty=1.15/repeat_last_n=128 to Ollama options to stop qwen2.5:3b from looping; scoring prompt keeps temperature=0.1 generator.py: rewrite all prompts with explicit "NEVER repeat" constraints and distinct-content rules per section; podcast prompt now asks for spoken-word style (no formal headers); reduce thread to 12-18 tweets (was 15-25) to fit model context; pass temperature=0.7 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
🦉 ResearchOwl
Exhaustive research engine with Telegram interface.
Recursively discovers, scrapes, and processes sources from across the web, then generates podcast scripts, blog posts, reports, or social threads using Ollama.
Architecture
Telegram (/research <topic>)
↓
ExhaustiveScraper
├── DuckDuckGo (8 queries × 5 results)
├── Wikipedia + recursive internal links
├── Reddit (top posts + top comments)
├── YouTube (transcripts)
├── PDFs (public documents)
└── Web scraping (trafilatura)
↓ recursive expansion (depth 1-3)
ContentProcessor (Ollama qwen2.5:3b)
├── Chunking (800 token chunks, 100 overlap)
├── Quality scoring (0-10 per chunk)
├── Embeddings (cosine similarity RAG)
└── Deduplication
↓
OutputGenerator (Ollama)
├── 🎙️ Podcast script (20-30 min)
├── 📝 Blog post (1500-2500 words)
├── 📊 Research report (structured)
└── 🐦 Social thread (15-25 tweets)
Telegram Commands
| Command | Description |
|---|---|
/research <topic> |
Start exhaustive research |
/status |
Check progress |
/finish |
Stop early, proceed to generation |
/generate podcast|blog|report|thread |
Generate output |
/sources |
List all sources found |
/cancel |
Cancel current research |
Local Development
# 1. Clone and setup
git clone https://git.chemavx.xyz/chemavx/researchowl
cd researchowl
# 2. Create virtualenv
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt
# 3. Configure
cp .env.example .env
# Edit .env with your values
# 4. Run
python main.py
Deploy to k3s
# 1. Create namespace and secrets
kubectl create namespace researchowl
kubectl create secret generic researchowl-secrets \
--from-literal=telegram-bot-token=YOUR_TOKEN \
--from-literal=telegram-allowed-users=YOUR_USER_ID \
-n researchowl
# 2. Copy manifests to your k8s-manifests repo
cp k8s/*.yaml /path/to/k8s-manifests/researchowl/
# 3. Apply ArgoCD app
kubectl apply -f k8s/argocd-app.yaml
# 4. Push to Gitea → Gitea Actions builds → ArgoCD deploys
git add . && git commit -m "feat: add researchowl" && git push
Tuning
| Variable | Default | Description |
|---|---|---|
MAX_SOURCES |
150 | Hard cap on sources |
MAX_DEPTH |
3 | Link recursion depth |
QUALITY_THRESHOLD |
0.4 | Min chunk quality (0-1) |
REQUEST_DELAY |
1.0s | Delay between requests |
Want more thoroughness?
- Increase
MAX_SOURCESto 300+ - Increase
MAX_DEPTHto 4-5 - Lower
QUALITY_THRESHOLDto 0.3
Want faster results?
- Lower
MAX_SOURCESto 50 - Set
MAX_DEPTHto 1-2 - Higher
QUALITY_THRESHOLDto 0.6
Notes
- Uses qwen2.5:3b (your existing Ollama) for all AI tasks — zero API cost
- Optionally add
ANTHROPIC_API_KEYfor Claude fallback on generation - SQLite database stored in
/data/researchowl.db - All outputs saved to DB and available via
/outputs
Description
Languages
Python
99.7%
Dockerfile
0.3%