Files
gh-ahmedasmar-devops-claude…/references/troubleshooting.md
2025-11-29 17:51:15 +08:00

3.1 KiB

GitOps Troubleshooting Guide (2024-2025)

Common ArgoCD Issues

1. Application OutOfSync

Symptoms: Application shows OutOfSync status Causes: Git changes not applied, manual cluster changes Fix:

argocd app sync my-app
argocd app diff my-app  # See differences

2. Annotation Tracking Migration (ArgoCD 3.x)

Symptoms: Resources not tracked after upgrade to 3.x Cause: Default changed from labels to annotations Fix: Resources auto-migrate on next sync, or force:

argocd app sync my-app --force

3. Sync Fails with "Resource is Invalid"

Cause: YAML validation error, CRD mismatch Fix:

argocd app get my-app --show-operation
kubectl apply --dry-run=client -f manifest.yaml  # Test locally

4. Image Pull Errors

Cause: Registry credentials, network issues Fix:

kubectl get events -n <namespace>
kubectl describe pod <pod-name> -n <namespace>
# Check image pull secret
kubectl get secret -n <namespace>

Common Flux Issues

1. GitRepository Not Ready

Symptoms: source not ready, no artifact Causes: Auth failure, branch doesn't exist Fix:

flux get sources git
flux reconcile source git <name> -n flux-system
kubectl describe gitrepository <name> -n flux-system

2. Kustomization Build Failed

Cause: Invalid kustomization.yaml, missing resources Fix:

flux get kustomizations
kubectl describe kustomization <name> -n flux-system
# Test locally
kustomize build <path>

3. HelmRelease Install Failed

Cause: Values error, chart incompatibility Fix:

flux get helmreleases
kubectl logs -n flux-system -l app=helm-controller
# Test locally
helm template <chart> -f values.yaml

4. OCI Repository Issues (Flux 2.6+)

Cause: Registry auth, OCI artifact not found Fix:

flux get sources oci
kubectl describe ocirepository <name>
# Verify artifact exists
crane digest ghcr.io/org/app:v1.0.0

SOPS Decryption Failures

Symptom: Secret not decrypted Fix:

# Check age secret exists
kubectl get secret sops-age -n flux-system

# Test decryption locally
export SOPS_AGE_KEY_FILE=key.txt
sops -d secret.enc.yaml

Performance Issues

ArgoCD Slow Syncs

Cause: Too many resources, inefficient queries Fix (ArgoCD 3.x):

  • Use default resource exclusions
  • Enable server-side diff
  • Increase controller replicas

Flux Slow Reconciliation

Cause: Large monorepos, many sources Fix (Flux 2.7+):

  • Enable source-watcher
  • Increase interval
  • Use OCI artifacts instead of Git

Debugging Commands

ArgoCD:

argocd app get <app> --refresh
argocd app logs <app>
kubectl logs -n argocd -l app.kubernetes.io/name=argocd-application-controller

Flux:

flux logs --all-namespaces
flux check
flux get all
kubectl -n flux-system get events --sort-by='.lastTimestamp'

Quick Wins

  1. Use --dry-run before applying
  2. Check controller logs first
  3. Verify RBAC permissions
  4. Test manifests locally (kubectl apply --dry-run, kustomize build)
  5. Check Git connectivity (credentials, network)