T

Build & Deploy ResearchOwl / build-and-push (push) Successful in 7s

Details

feat(seo): aviso de colisión de tema en /generate blog en

Al publicar el draft EN en Ghost, el título propuesto se compara contra
los posts published+scheduled del sitio (corpus vía Admin API — incluye
la cola programada, justo el caso del doble Kecksburg del 2026-07-10) con
el topic_collision vendorizado. Si colisiona, el notice de Telegram lleva
un bloque "🚨 Posible colisión de tema" en las tres rutas (live, dryrun,
bare). Nunca bloquea: el draft se crea igual, el humano decide (fusionar,
retitular o enlazar a propósito). Solo lang=en (stopwords inglesas).
Títulos ajenos saneados de entidades Markdown (regla de _safe_send).
Aislamiento: cualquier fallo del check → notice sin bloque y warning en
logs, jamás rompe la publicación.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

2026-07-10 11:17:33 +00:00

.gitea/workflows

Vendor shared SEO rule engine + CI drift guard (no pipeline wiring yet)

2026-06-24 08:17:37 +00:00

k8s

feat: initial ResearchOwl

2026-04-27 13:49:07 +00:00

src

feat(seo): aviso de colisión de tema en /generate blog en

2026-07-10 11:17:33 +00:00

tests

feat(seo): aviso de colisión de tema en /generate blog en

2026-07-10 11:17:33 +00:00

.env.example

feat: initial ResearchOwl

2026-04-27 13:49:07 +00:00

.gitignore

chore: add .gitignore y dejar de trackear bytecode .pyc

2026-06-15 14:38:46 +00:00

CLAUDE.md

feat: initial ResearchOwl

2026-04-27 13:49:07 +00:00

Dockerfile

feat: fase 3 — export PDF con reportlab + /export command

2026-05-04 12:57:21 +00:00

KNOWN-ISSUES.md

docs: gotcha de OOM por fuentes grandes en KNOWN-ISSUES

2026-07-10 09:38:43 +00:00

main.py

feat: initial ResearchOwl

2026-04-27 13:49:07 +00:00

Makefile

Vendor shared SEO rule engine + CI drift guard (no pipeline wiring yet)

2026-06-24 08:17:37 +00:00

README.md

feat: initial ResearchOwl

2026-04-27 13:49:07 +00:00

renovate.json

Add renovate.json

2026-05-20 13:52:38 +00:00

requirements.txt

fix(deps): quita brotlicffi — hacía que aiohttp anunciara br por defecto

2026-07-04 20:17:56 +00:00

README.md

🦉 ResearchOwl

Exhaustive research engine with Telegram interface.

Recursively discovers, scrapes, and processes sources from across the web, then generates podcast scripts, blog posts, reports, or social threads using Ollama.

Architecture

Telegram (/research <topic>)
    ↓
ExhaustiveScraper
    ├── DuckDuckGo (8 queries × 5 results)
    ├── Wikipedia + recursive internal links
    ├── Reddit (top posts + top comments)
    ├── YouTube (transcripts)
    ├── PDFs (public documents)
    └── Web scraping (trafilatura)
         ↓ recursive expansion (depth 1-3)
ContentProcessor (Ollama qwen2.5:3b)
    ├── Chunking (800 token chunks, 100 overlap)
    ├── Quality scoring (0-10 per chunk)
    ├── Embeddings (cosine similarity RAG)
    └── Deduplication
         ↓
OutputGenerator (Ollama)
    ├── 🎙️ Podcast script (20-30 min)
    ├── 📝 Blog post (1500-2500 words)
    ├── 📊 Research report (structured)
    └── 🐦 Social thread (15-25 tweets)

Telegram Commands

Command	Description
`/research <topic>`	Start exhaustive research
`/status`	Check progress
`/finish`	Stop early, proceed to generation
`/generate podcast\|blog\|report\|thread`	Generate output
`/sources`	List all sources found
`/cancel`	Cancel current research

Local Development

# 1. Clone and setup
git clone https://git.chemavx.xyz/chemavx/researchowl
cd researchowl

# 2. Create virtualenv
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt

# 3. Configure
cp .env.example .env
# Edit .env with your values

# 4. Run
python main.py

Deploy to k3s

# 1. Create namespace and secrets
kubectl create namespace researchowl
kubectl create secret generic researchowl-secrets \
  --from-literal=telegram-bot-token=YOUR_TOKEN \
  --from-literal=telegram-allowed-users=YOUR_USER_ID \
  -n researchowl

# 2. Copy manifests to your k8s-manifests repo
cp k8s/*.yaml /path/to/k8s-manifests/researchowl/

# 3. Apply ArgoCD app
kubectl apply -f k8s/argocd-app.yaml

# 4. Push to Gitea → Gitea Actions builds → ArgoCD deploys
git add . && git commit -m "feat: add researchowl" && git push

Tuning

Variable	Default	Description
`MAX_SOURCES`	150	Hard cap on sources
`MAX_DEPTH`	3	Link recursion depth
`QUALITY_THRESHOLD`	0.4	Min chunk quality (0-1)
`REQUEST_DELAY`	1.0s	Delay between requests

Want more thoroughness?

Increase MAX_SOURCES to 300+
Increase MAX_DEPTH to 4-5
Lower QUALITY_THRESHOLD to 0.3

Want faster results?

Lower MAX_SOURCES to 50
Set MAX_DEPTH to 1-2
Higher QUALITY_THRESHOLD to 0.6

Notes

Uses qwen2.5:3b (your existing Ollama) for all AI tasks — zero API cost
Optionally add ANTHROPIC_API_KEY for Claude fallback on generation
SQLite database stored in /data/researchowl.db
All outputs saved to DB and available via /outputs

README.md Unescape Escape

🦉 ResearchOwl

Architecture

Telegram Commands

Local Development

Deploy to k3s

Tuning

Notes

README.md