Commit Graph

39 Commits

Author SHA1 Message Date
ChemaVX 36984657a8 fix: Ghost 5.x — usar mobiledoc+HTML card en lugar del campo html
Build & Deploy ResearchOwl / build-and-push (push) Successful in 6s
El campo "html" en Ghost Admin API v5 (Lexical editor) es de solo
lectura. El contenido se debe enviar via mobiledoc con HTML card,
que Ghost acepta en todas las versiones de v5 y renderiza sin
conversión. Añadidos logs de diagnóstico y validación de HTML vacío.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 10:44:38 +00:00
ChemaVX 83eb2359be feat: Ghost CMS integration — auto-publish blog + /publish command
Build & Deploy ResearchOwl / build-and-push (push) Successful in 6s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 10:26:22 +00:00
ChemaVX 94d209dd8a test: webhook sync automático
Build & Deploy ResearchOwl / build-and-push (push) Successful in 6s
2026-05-06 11:35:14 +00:00
ChemaVX 7a156e2af1 fix: mover alerta de coste a /generate donde está el gasto real
Build & Deploy ResearchOwl / build-and-push (push) Successful in 5s
2026-05-06 07:49:29 +00:00
ChemaVX 279475a175 feat: alerta de coste — aviso si sesión supera COST_ALERT_THRESHOLD
Build & Deploy ResearchOwl / build-and-push (push) Successful in 5s
2026-05-06 07:23:11 +00:00
ChemaVX 82e614e285 feat: caché de contenido de fuentes — reutiliza URLs scrapeadas en últimos 7 días
Build & Deploy ResearchOwl / build-and-push (push) Successful in 6s
2026-05-06 07:05:41 +00:00
ChemaVX aa83cfacbd fix: truncar contextos en /compare a 3000 palabras para evitar límite de tokens
Build & Deploy ResearchOwl / build-and-push (push) Successful in 5s
2026-05-06 06:51:36 +00:00
ChemaVX e8034f3f37 feat: /compare — análisis comparativo de dos temas en paralelo
Build & Deploy ResearchOwl / build-and-push (push) Successful in 34s
2026-05-06 06:40:31 +00:00
ChemaVX c2bb301103 feat: dedup semántico antes del scoring — hash MD5 + similitud Jaccard
Build & Deploy ResearchOwl / build-and-push (push) Successful in 5s
2026-05-05 08:58:53 +00:00
ChemaVX 53cf7a04a8 feat: modo diff para /watch — notifica solo si hay novedades reales
Build & Deploy ResearchOwl / build-and-push (push) Successful in 7s
2026-05-05 07:43:41 +00:00
ChemaVX f4e167f3b6 feat: SearXNG como motor principal, DDG como fallback
Build & Deploy ResearchOwl / build-and-push (push) Successful in 6s
2026-05-04 20:00:24 +00:00
ChemaVX ba2b366534 fix: delay DDG 3-8s aleatorio, logging mejorado en query generation
Build & Deploy ResearchOwl / build-and-push (push) Successful in 6s
2026-05-04 13:28:54 +00:00
ChemaVX 4bef9d2d17 feat: queries DDG generadas por Claude en lugar de plantillas hardcodeadas
Build & Deploy ResearchOwl / build-and-push (push) Successful in 6s
2026-05-04 13:24:25 +00:00
ChemaVX 7a012c2c28 fix: _remove_duplicate_headings usa ventana de 5 líneas en lugar de break
Build & Deploy ResearchOwl / build-and-push (push) Successful in 6s
2026-05-04 13:19:08 +00:00
ChemaVX 6aaa85a1f8 fix: eliminar títulos h1 duplicados en export PDF
Build & Deploy ResearchOwl / build-and-push (push) Successful in 5s
2026-05-04 13:12:32 +00:00
ChemaVX e0a42f0b91 ci: retrigger PDF build
Build & Deploy ResearchOwl / build-and-push (push) Successful in 6s
2026-05-04 13:02:57 +00:00
ChemaVX 4c7f5b521b feat: fase 3 — export PDF con reportlab + /export command
Build & Deploy ResearchOwl / build-and-push (push) Successful in 1m2s
2026-05-04 12:57:21 +00:00
ChemaVX c33bb5337d fix: títulos de sección en español, sin encabezado duplicado en extended
Build & Deploy ResearchOwl / build-and-push (push) Successful in 6s
2026-05-04 11:40:07 +00:00
ChemaVX 566f685578 ci: retrigger tras fix DinD
Build & Deploy ResearchOwl / build-and-push (push) Successful in 1m42s
2026-05-04 11:13:10 +00:00
ChemaVX 8c259b2b2e ci: clean ci-builder before create to prevent stale BuildKit state
Build & Deploy ResearchOwl / build-and-push (push) Successful in 5s
2026-05-04 11:09:56 +00:00
ChemaVX a47d7b26ca feat: fase 2 — generación por secciones report_extended, blog_extended, podcast_extended
Build & Deploy ResearchOwl / build-and-push (push) Successful in 5s
2026-05-04 10:58:06 +00:00
ChemaVX e5b77ad72d fix: QUALITY_THRESHOLD 0.5→0.3, prompt scoring más generoso
Build & Deploy ResearchOwl / build-and-push (push) Successful in 5s
2026-05-04 10:35:08 +00:00
ChemaVX 0d8aee63be feat: fase 1 — top_k 30→80, pool 100→300, sin truncado, max_tokens 16000
Build & Deploy ResearchOwl / build-and-push (push) Successful in 5s
2026-05-04 10:23:19 +00:00
ChemaVX b5518ac95a feat: scheduler /watch — watched_topics + scheduler loop + /watch /unwatch /watches
Build & Deploy ResearchOwl / build-and-push (push) Successful in 5s
2026-05-04 07:48:05 +00:00
ChemaVX b33ae202b8 feat: trackeo de coste por llamada Claude — tabla api_usage + /costs
Build & Deploy ResearchOwl / build-and-push (push) Successful in 6s
2026-05-03 20:06:06 +00:00
ChemaVX 65917518ce ci: retrigger build for a681627
Build & Deploy ResearchOwl / build-and-push (push) Successful in 5s
2026-05-03 17:14:50 +00:00
ChemaVX a681627d2e feat: TTL purge — purge_old_sessions + /purge command + startup hook
Build & Deploy ResearchOwl / build-and-push (push) Successful in 5s
2026-05-03 16:56:37 +00:00
ChemaVX 7704f071d6 feat: retry+backoff en scraper, ProgressReporter en bot
Build & Deploy ResearchOwl / build-and-push (push) Successful in 6s
2026-05-03 16:40:37 +00:00
ChemaVX e66d728d68 fix: wrap YouTubeTranscriptApi in run_in_executor with 30s timeout
Build & Deploy ResearchOwl / build-and-push (push) Successful in 5s
The synchronous get_transcript() call was blocking the asyncio event
loop indefinitely, freezing the entire bot (including Telegram polling).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 12:59:40 +00:00
ChemaVX 65b1739943 feat: Claude Haiku for content generation, Ollama fallback
Build & Deploy ResearchOwl / build-and-push (push) Successful in 6s
Use Claude Haiku (via ANTHROPIC_API_KEY) for all output generation.
Falls back to Ollama qwen2.5:3b if no API key is set.
Also translates all user-turn prompts to Spanish for consistency.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 09:06:06 +00:00
ChemaVX 54b3841d32 feat: generate all outputs in Spanish
Add "Escribe SIEMPRE en español" at the start of all system prompts
(podcast, blog, report, thread) so Ollama generates content in Spanish.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 08:40:38 +00:00
ChemaVX d0e55ddb50 feat: Claude Haiku for relevance scoring, fallback to Ollama
Build & Deploy ResearchOwl / build-and-push (push) Successful in 45s
processor.py: split _score_quality into _score_with_claude and
  _score_with_ollama; if ANTHROPIC_API_KEY is set, use Claude Haiku
  (claude-haiku-4-5) with max_tokens=10 for fast, accurate 0-10
  relevance scoring; falls back to Ollama on any error

requirements.txt: add anthropic>=0.40.0

k8s: ANTHROPIC_API_KEY added to researchowl-secrets and mounted in
  deployment; QUALITY_THRESHOLD restored to 0.4 (Claude scoring
  is accurate enough to use the threshold)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 08:04:12 +00:00
ChemaVX 5feff6073e fix: send new message if edit_text fails silently in /process
Build & Deploy ResearchOwl / build-and-push (push) Successful in 7s
If the bot restarted between sending the progress message and the
completion callback, edit_text may fail silently (Conflict/stale ref).
Store completion text and reply_text as fallback so the user always
sees the result.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-28 10:53:59 +00:00
ChemaVX c4fb33fbf5 fix: WAL mode for concurrent reads, skipped stats, anti-repetition prompts
Build & Deploy ResearchOwl / build-and-push (push) Successful in 5s
database.py: enable PRAGMA journal_mode=WAL + synchronous=NORMAL so
  /status reads from concurrent connections see committed data without
  blocking behind the scraper's writes; add 'skipped' to get_session_stats

bot.py: show skipped count in fmt_progress and cmd_status; use 'or 0'
  to guard against NULL from SUM(); label active research in /status

processor.py: raise generate() temperature default to 0.7 + add
  repeat_penalty=1.15/repeat_last_n=128 to Ollama options to stop
  qwen2.5:3b from looping; scoring prompt keeps temperature=0.1

generator.py: rewrite all prompts with explicit "NEVER repeat"
  constraints and distinct-content rules per section; podcast prompt
  now asks for spoken-word style (no formal headers); reduce thread
  to 12-18 tweets (was 15-25) to fit model context; pass temperature=0.7

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-28 10:15:30 +00:00
ChemaVX f7d62345b8 fix: relevance scoring per topic + URL keyword filter for child pages
Build & Deploy ResearchOwl / build-and-push (push) Successful in 6s
processor.py: simplify _score_quality prompt to single axis —
  "how relevant is this text to topic X?" — instead of averaging
  relevance + density + credibility, which let off-topic but
  well-written content pass through

exhaustive.py: pre-compute topic keywords (stopword-filtered) at
  scraper init; filter child URLs (discovered during crawl, depth>0)
  to only add ones whose URL path or title contains a topic keyword;
  seed URLs (depth=0, from DDG/Wikipedia/Reddit) are always included
  since those searches are already topic-scoped

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-27 20:52:43 +00:00
ChemaVX 0c7176dd0b fix: add /process command, log quality filtering, improve Reddit headers
Build & Deploy ResearchOwl / build-and-push (push) Successful in 5s
- bot.py: add cmd_process handler to manually trigger chunk processing
  on the last session; register CommandHandler("process")
- processor.py: log exceptions from asyncio.gather instead of silently
  dropping them; add per-chunk quality score debug logging; warn when
  all chunks filtered by quality threshold with actionable hint;
  raise fallback score to 0.6 so Ollama failures don't filter chunks
- exhaustive.py: replace bot User-Agent with full browser UA + headers
  for REDDIT_HEADERS; downgrade Reddit 403 from warning to info since
  server IPs are routinely blocked; use content_type=None on json()
  to avoid aiohttp content-type mismatch errors

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-27 20:37:39 +00:00
ChemaVX bb8171359d fix: scraper - DDG per-query instances, Wikipedia bilingual seed, Reddit throttling
Build & Deploy ResearchOwl / build-and-push (push) Successful in 6s
2026-04-27 20:22:16 +00:00
ChemaVX 6a88b7ab10 ci: rewrite workflow with internal registry + BuildKit (polymarket-bot pattern)
Build & Deploy ResearchOwl / build-and-push (push) Successful in 1m4s
2026-04-27 14:00:05 +00:00
ChemaVX ba08536337 feat: initial ResearchOwl
Build & Deploy ResearchOwl / build (push) Failing after 1m38s
2026-04-27 13:49:07 +00:00