- Add ANTHROPIC_API_KEY from secret for Claude Haiku relevance scoring
- Fix OLLAMA_URL to internal k8s DNS (ollama.ollama.svc.cluster.local)
- Remove Secret resource (was causing ArgoCD to overwrite with REPLACE_ME)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Deleted open-webui namespace, deployment, service, ingress, and PVC
from cluster (replaced by OpenClaw using Claude API)
- Removed openclaw PVC and RBAC manifests no longer needed
- Removed Uptime Kuma monitor for chat.chemavx.xyz
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Each alert rule's summary annotation now renders a formatted Telegram
message with emoji and multiline context. The contact point passes the
pre-rendered summary through, adding "✅ Resuelto" on resolution.
Also restores the == 1 filter on Pod Failed/Unknown lost in prior rebase.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Grafana threshold expression requires a scalar input, not a raw time
series. Added explicit reduce step (type: reduce, reducer: last) as
refId B between the Prometheus query (A) and the threshold check (C).
All 4 rules updated: CrashLoopBackOff, Disco >80%, RAM >85%, Pod Failed.
condition field changed from B → C on each rule.
Grafana env var substitution of a numeric TELEGRAM_CHAT_ID caused
json unmarshal error (number into string field). chatid is not sensitive
so hardcode it directly; only bottoken uses ${TELEGRAM_BOT_TOKEN}.
- Add hostPath volume for /var/lib/rancher/k3s/server/db (readOnly)
- Script copies state.db + WAL files → k3s-db_<date>.tar.gz in /data/backups/backups/
- Rotation: keeps last 7 copies (same policy as other services)
- rclone-mega-backup picks it up automatically (syncs full /data/backups/backups/)
- Also tracks the CronJob manifest in git (was previously untracked)
Note: k3s uses SQLite/kine (not embedded etcd). etcd-snapshot is disabled.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>