Add "Escribe SIEMPRE en español" at the start of all system prompts
(podcast, blog, report, thread) so Ollama generates content in Spanish.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
processor.py: split _score_quality into _score_with_claude and
_score_with_ollama; if ANTHROPIC_API_KEY is set, use Claude Haiku
(claude-haiku-4-5) with max_tokens=10 for fast, accurate 0-10
relevance scoring; falls back to Ollama on any error
requirements.txt: add anthropic>=0.40.0
k8s: ANTHROPIC_API_KEY added to researchowl-secrets and mounted in
deployment; QUALITY_THRESHOLD restored to 0.4 (Claude scoring
is accurate enough to use the threshold)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
If the bot restarted between sending the progress message and the
completion callback, edit_text may fail silently (Conflict/stale ref).
Store completion text and reply_text as fallback so the user always
sees the result.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
database.py: enable PRAGMA journal_mode=WAL + synchronous=NORMAL so
/status reads from concurrent connections see committed data without
blocking behind the scraper's writes; add 'skipped' to get_session_stats
bot.py: show skipped count in fmt_progress and cmd_status; use 'or 0'
to guard against NULL from SUM(); label active research in /status
processor.py: raise generate() temperature default to 0.7 + add
repeat_penalty=1.15/repeat_last_n=128 to Ollama options to stop
qwen2.5:3b from looping; scoring prompt keeps temperature=0.1
generator.py: rewrite all prompts with explicit "NEVER repeat"
constraints and distinct-content rules per section; podcast prompt
now asks for spoken-word style (no formal headers); reduce thread
to 12-18 tweets (was 15-25) to fit model context; pass temperature=0.7
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
processor.py: simplify _score_quality prompt to single axis —
"how relevant is this text to topic X?" — instead of averaging
relevance + density + credibility, which let off-topic but
well-written content pass through
exhaustive.py: pre-compute topic keywords (stopword-filtered) at
scraper init; filter child URLs (discovered during crawl, depth>0)
to only add ones whose URL path or title contains a topic keyword;
seed URLs (depth=0, from DDG/Wikipedia/Reddit) are always included
since those searches are already topic-scoped
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- bot.py: add cmd_process handler to manually trigger chunk processing
on the last session; register CommandHandler("process")
- processor.py: log exceptions from asyncio.gather instead of silently
dropping them; add per-chunk quality score debug logging; warn when
all chunks filtered by quality threshold with actionable hint;
raise fallback score to 0.6 so Ollama failures don't filter chunks
- exhaustive.py: replace bot User-Agent with full browser UA + headers
for REDDIT_HEADERS; downgrade Reddit 403 from warning to info since
server IPs are routinely blocked; use content_type=None on json()
to avoid aiohttp content-type mismatch errors
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>