Files
ChemaVX ba08536337
Build & Deploy ResearchOwl / build (push) Failing after 1m38s
feat: initial ResearchOwl
2026-04-27 13:49:07 +00:00

109 lines
3.0 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 🦉 ResearchOwl
**Exhaustive research engine with Telegram interface.**
Recursively discovers, scrapes, and processes sources from across the web,
then generates podcast scripts, blog posts, reports, or social threads using Ollama.
## Architecture
```
Telegram (/research <topic>)
ExhaustiveScraper
├── DuckDuckGo (8 queries × 5 results)
├── Wikipedia + recursive internal links
├── Reddit (top posts + top comments)
├── YouTube (transcripts)
├── PDFs (public documents)
└── Web scraping (trafilatura)
↓ recursive expansion (depth 1-3)
ContentProcessor (Ollama qwen2.5:3b)
├── Chunking (800 token chunks, 100 overlap)
├── Quality scoring (0-10 per chunk)
├── Embeddings (cosine similarity RAG)
└── Deduplication
OutputGenerator (Ollama)
├── 🎙️ Podcast script (20-30 min)
├── 📝 Blog post (1500-2500 words)
├── 📊 Research report (structured)
└── 🐦 Social thread (15-25 tweets)
```
## Telegram Commands
| Command | Description |
|---------|-------------|
| `/research <topic>` | Start exhaustive research |
| `/status` | Check progress |
| `/finish` | Stop early, proceed to generation |
| `/generate podcast\|blog\|report\|thread` | Generate output |
| `/sources` | List all sources found |
| `/cancel` | Cancel current research |
## Local Development
```bash
# 1. Clone and setup
git clone https://git.chemavx.xyz/chemavx/researchowl
cd researchowl
# 2. Create virtualenv
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt
# 3. Configure
cp .env.example .env
# Edit .env with your values
# 4. Run
python main.py
```
## Deploy to k3s
```bash
# 1. Create namespace and secrets
kubectl create namespace researchowl
kubectl create secret generic researchowl-secrets \
--from-literal=telegram-bot-token=YOUR_TOKEN \
--from-literal=telegram-allowed-users=YOUR_USER_ID \
-n researchowl
# 2. Copy manifests to your k8s-manifests repo
cp k8s/*.yaml /path/to/k8s-manifests/researchowl/
# 3. Apply ArgoCD app
kubectl apply -f k8s/argocd-app.yaml
# 4. Push to Gitea → Gitea Actions builds → ArgoCD deploys
git add . && git commit -m "feat: add researchowl" && git push
```
## Tuning
| Variable | Default | Description |
|----------|---------|-------------|
| `MAX_SOURCES` | 150 | Hard cap on sources |
| `MAX_DEPTH` | 3 | Link recursion depth |
| `QUALITY_THRESHOLD` | 0.4 | Min chunk quality (0-1) |
| `REQUEST_DELAY` | 1.0s | Delay between requests |
**Want more thoroughness?**
- Increase `MAX_SOURCES` to 300+
- Increase `MAX_DEPTH` to 4-5
- Lower `QUALITY_THRESHOLD` to 0.3
**Want faster results?**
- Lower `MAX_SOURCES` to 50
- Set `MAX_DEPTH` to 1-2
- Higher `QUALITY_THRESHOLD` to 0.6
## Notes
- Uses **qwen2.5:3b** (your existing Ollama) for all AI tasks — zero API cost
- Optionally add `ANTHROPIC_API_KEY` for Claude fallback on generation
- SQLite database stored in `/data/researchowl.db`
- All outputs saved to DB and available via `/outputs`