Commit Graph

69 Commits

Author SHA1 Message Date
chemavx 859bed930f chore: remove Open WebUI and unused OpenClaw k8s resources
- Deleted open-webui namespace, deployment, service, ingress, and PVC
  from cluster (replaced by OpenClaw using Claude API)
- Removed openclaw PVC and RBAC manifests no longer needed
- Removed Uptime Kuma monitor for chat.chemavx.xyz

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-27 13:13:38 +00:00
chemavx 4897ca3334 feat(grafana): custom emoji message templates per alert + resolve format
Each alert rule's summary annotation now renders a formatted Telegram
message with emoji and multiline context. The contact point passes the
pre-rendered summary through, adding " Resuelto" on resolution.
Also restores the == 1 filter on Pod Failed/Unknown lost in prior rebase.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-27 07:26:01 +00:00
chemavx 4facdd8515 fix(monitoring): correct alert rule pipeline to A→B(reduce)→C(threshold)
Grafana threshold expression requires a scalar input, not a raw time
series. Added explicit reduce step (type: reduce, reducer: last) as
refId B between the Prometheus query (A) and the threshold check (C).

All 4 rules updated: CrashLoopBackOff, Disco >80%, RAM >85%, Pod Failed.
condition field changed from B → C on each rule.
2026-04-26 15:46:39 +00:00
chemavx bb64cc9e62 fix(monitoring): hardcode chatid as string in Telegram contact point
Grafana env var substitution of a numeric TELEGRAM_CHAT_ID caused
json unmarshal error (number into string field). chatid is not sensitive
so hardcode it directly; only bottoken uses ${TELEGRAM_BOT_TOKEN}.
2026-04-26 15:40:21 +00:00
chemavx 94c059ccb9 feat(monitoring): Grafana alerting → Telegram for homelab
- Secret grafana-telegram: bot token + chat ID (env var injection)
- ConfigMap grafana-alerting: provisioning files for contact point,
  notification policy, and 4 alert rules
  * Pod CrashLoopBackOff (for: 1m, noData: OK)
  * Disk > 80% on non-tmpfs filesystems (for: 5m)
  * RAM > 85% (for: 5m)
  * Pod Failed/Unknown (for: 3m, noData: OK)
- Deployment: TELEGRAM_* env vars from secret + alerting volume mount

Token interpolated via ${TELEGRAM_BOT_TOKEN} in provisioning YAML.
2026-04-26 15:25:07 +00:00
Gitea CI 5df2e9746a ci: update polymarket-bot images to 39cebd3b [skip ci] 2026-04-26 15:03:41 +00:00
chemavx ef11391c80 feat(polymarket): add Telegram bot credentials to bot-secrets 2026-04-26 15:02:22 +00:00
chemavx 48a1ce80f6 backup: add k3s SQLite backup to daily CronJob
- Add hostPath volume for /var/lib/rancher/k3s/server/db (readOnly)
- Script copies state.db + WAL files → k3s-db_<date>.tar.gz in /data/backups/backups/
- Rotation: keeps last 7 copies (same policy as other services)
- rclone-mega-backup picks it up automatically (syncs full /data/backups/backups/)
- Also tracks the CronJob manifest in git (was previously untracked)

Note: k3s uses SQLite/kine (not embedded etcd). etcd-snapshot is disabled.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 10:23:18 +00:00
Gitea CI 4d8c783be2 ci: update polymarket-bot images to 1f40c59e [skip ci] 2026-04-25 10:06:24 +00:00
Gitea CI f25bded509 ci: update n8n image to b6a83c68 [skip ci] 2026-04-25 10:03:27 +00:00
Gitea CI e4fab51d31 ci: update polymarket-bot images to fe242ca5 [skip ci] 2026-04-25 10:03:23 +00:00
chemavx cc8140760f argocd: configure Telegram notifications and add Application manifests
- Configure argocd-notifications-cm with Telegram service, templates and triggers
  for sync-succeeded, sync-failed, and app-degraded events
- Add application-polymarket-bot.yaml and application-n8n.yaml with notification
  subscription annotations (chat_id: 5138407666)

Note: requires kubectl patch of argocd-notifications-secret with telegram-token

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 09:56:35 +00:00
chemavx 8bab07201a ollama: elimina GPU, fija imagen 0.20.7, reduce a qwen2.5:3b
- Imagen: ollama/ollama:latest → ollama/ollama:0.20.7
- Elimina NVIDIA_VISIBLE_DEVICES, NVIDIA_DRIVER_CAPABILITIES
- Elimina nvidia.com/gpu: "1" de resources limits
- Reduce memory: 8/20Gi → 4/8Gi (solo CPU, modelo 3b)
- Startup: auto-pull cambiado a qwen2.5:3b

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 15:34:37 +00:00
chemavx 792b53dee7 openclaw: añade kubectl-ro via initContainer setup-kubectl
- initContainer bitnami/kubectl copia kubectl y crea wrapper kubectl-ro en emptyDir /opt/kube
- kubectl-ro deniega verbos destructivos (delete/apply/patch/edit/exec/scale/rollout/drain/...)
- Main container monta /opt/kube; SA token automontado para in-cluster auth
- Sin kubeconfig manual: kubectl detecta KUBERNETES_SERVICE_HOST/PORT automáticamente

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 14:33:17 +00:00
chemavx e176bb9810 openclaw: actualiza imagen a 2026.4.22
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 14:15:49 +00:00
chemavx 74b9a31352 openclaw: corrige mountPath a /home/node/.openclaw
El config dir de OpenClaw es /home/node/.openclaw, no /data.
Monta el PVC en la ruta correcta para que openclaw.json persista.
Elimina OPENCLAW_DATA_DIR (no era el config dir).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 10:45:56 +00:00
chemavx f465f190d8 openclaw: reinstall limpio con Claude API y ArgoCD
- Manifiestos limpios: namespace, rbac, pvc (5Gi local-path), deployment, service, ingress
- nodeSelector chemavx-k8 en deployment para fijar PVC en el nodo correcto
- Imagen fijada a ghcr.io/openclaw/openclaw:2026.4.12
- Sin initContainers ni secrets en el deployment (config post-arranque via exec)
- Elimina artefactos: configmap-kube-root-ca.crt.yaml, serviceaccount-default.yaml, pvc-openclaw-pvc.yaml, rbac-openclaw-agent.yaml
- Añade argocd/application-openclaw.yaml para gestión GitOps

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 10:40:08 +00:00
chemavx 8a8f33704c fix: smoke test grep to match compact JSON (no space after colon) 2026-04-23 09:22:53 +00:00
chemavx d3c03d5462 argocd: add PostSync smoke test hooks for polymarket-bot, n8n, portfolio 2026-04-23 09:14:12 +00:00
Gitea CI 6fdad3b667 ci: update n8n image to b9ce8e20 [skip ci] 2026-04-22 20:41:56 +00:00
Gitea CI e5e0d174b0 ci: update polymarket-bot images to ffd3ee2f [skip ci] 2026-04-22 20:37:12 +00:00
chemavx 62abb6134b registry-cache: switch upstream to mirror.gcr.io (bypass Cloudflare R2 block) 2026-04-22 20:29:11 +00:00
Gitea CI e895fc6104 ci: update polymarket-bot images to adf2917c [skip ci] 2026-04-22 16:38:04 +00:00
chemavx 0bf2e746dd feat(registry-cache): add Docker Hub pull-through cache + dind mirror config
Deploy registry:2 as Docker Hub pull-through cache on chemavx-k8 (hostPort 5000,
ClusterIP 10.43.163.56:5000). Configures dind runner to use local mirror via
daemon.json to eliminate Docker Hub rate limit failures in CI/CD.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-22 11:35:43 +00:00
Gitea CI 25ea82f696 ci: update polymarket-bot images to 6d23e804 [skip ci] 2026-04-22 11:31:08 +00:00
Gitea CI bf7ac532de ci: update polymarket-bot images to 8a56bf77 [skip ci] 2026-04-22 11:11:47 +00:00
Gitea CI 47841eef19 ci: update polymarket-bot images to 8479a631 [skip ci] 2026-04-22 07:09:04 +00:00
Gitea CI 81b4c30fbb ci: update polymarket-bot images to 9a5be275 [skip ci] 2026-04-21 17:37:45 +00:00
Gitea CI 45495a78c7 ci: update polymarket-bot images to 9b62636a [skip ci] 2026-04-21 17:27:59 +00:00
Gitea CI 8ca403f0d3 ci: update polymarket-bot images to 46f8f4b7 [skip ci] 2026-04-21 09:50:40 +00:00
Gitea CI 986c74004b ci: update polymarket-bot images to e2fb697c [skip ci] 2026-04-21 09:41:33 +00:00
chemavx a5aac4dd83 chore(openclaw): golden config snapshot + RBAC manifest in git
- Add openclaw/golden/ with stable copies of openclaw.json, SOUL.md,
  TOOLS.md, HOMELAB.md, kubectl-ro
- Fix HOMELAB.md model roles (qwen3-es:14b=primary, llama3.1-es:8b=fallback)
- Add rbac-openclaw-agent.yaml (ClusterRole read-only + binding + SA)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 09:18:39 +00:00
chemavx 8592a09bc7 fix(ollama): use Recreate strategy to avoid RWO PVC conflict
RollingUpdate caused rollout deadlocks because the PVC (ReadWriteOnce)
cannot be mounted by two pods simultaneously.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 09:03:47 +00:00
chemavx 8b7d3c0659 feat(ollama): migrate GPU from AMD ROCm to NVIDIA CUDA (RTX 3060 via OCuLink)
Switch from ollama/ollama:rocm + amd.com/gpu to standard CUDA image + nvidia.com/gpu.
RTX 3060 (GA106, 12GB) now used via NVIDIA GPU Operator on chemavx-k8.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 17:13:07 +00:00
Gitea CI b805c2c9e5 ci: update polymarket-bot images to d698544f [skip ci] 2026-04-17 10:46:27 +00:00
Gitea CI b0813bad40 ci: update polymarket-bot images to 9add52ab [skip ci] 2026-04-17 10:37:42 +00:00
Gitea CI 3076129d5a ci: update polymarket-bot images to ebdcff5a [skip ci] 2026-04-17 10:29:07 +00:00
Gitea CI 0e308d890a ci: update polymarket-bot images to 0cdb0758 [skip ci] 2026-04-17 10:10:12 +00:00
Gitea CI 704301032a ci: update polymarket-bot images to 411d3462 [skip ci] 2026-04-16 15:57:45 +00:00
Gitea CI a91f6226c2 ci: update polymarket-bot images to 63d9f637 [skip ci] 2026-04-16 15:37:23 +00:00
Gitea CI 6fc882f619 ci: update polymarket-bot images to a0cbdc02 [skip ci] 2026-04-16 14:35:02 +00:00
chemavx 72be7ebac8 feat(portfolio): add ChemaVX portfolio with Polymarket live metrics 2026-04-16 10:00:16 +00:00
chemavx a0d208db63 feat(grafana): add ChemaVX Homelab Overview dashboard as ConfigMap 2026-04-16 09:54:19 +00:00
chemavx 0927658f58 chore: pin ollama and cloudflare-ddns to exact running versions
- ollama/ollama:latest → 0.20.7
- favonia/cloudflare-ddns:latest → 1.16.2

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 08:13:13 +00:00
chemavx 22ae5d7d4b chore: pin all floating image tags to exact running versions
- vaultwarden/server:latest → 1.35.4
- redis:alpine → 8.6.2-alpine (authentik)
- homarr-labs/homarr:latest → 1.0.0
- gitea/gitea:latest → 1.25.5
- uptime-kuma:1 → 1.23.17

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 08:11:22 +00:00
chemavx c1e57613ed chore(openclaw): update to 2026.4.12
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 07:58:37 +00:00
chemavx 0841d6bbe6 fix: add CreateOnly sync option to n8n-secret to prevent ArgoCD from overwriting encryption key 2026-04-14 20:30:36 +00:00
chemavx 7397c1d939 refactor: rewrite n8n manifests as clean GitOps specs, remove server-exported fields 2026-04-14 20:25:16 +00:00
chemavx 192a0bfa7a fix: delete secret-n8n-tls.yaml — kubernetes.io/tls type requires data fields, cert-manager manages this secret directly 2026-04-14 20:06:32 +00:00
chemavx f42cdee585 security: remove all REDACTED secrets from repo, add pre-commit guard
- Delete 26 secret manifests containing REDACTED placeholder values
  (15 cert-manager TLS + 11 app secrets across 8 namespaces)
- REDACTED is valid base64 that decodes to non-UTF-8 bytes — ArgoCD
  applying these manifests corrupts live secrets in the cluster
- Add .githooks/pre-commit that rejects any .yaml with REDACTED
- Add README.md documenting secret management policy and manual
  creation commands for each service
- n8n secret manifests already fixed in previous commits (618b1e8, db04fd2)
2026-04-14 20:02:51 +00:00