Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 17:51:15 +08:00
commit a91d4d5a1c
25 changed files with 4094 additions and 0 deletions

View File

@@ -0,0 +1,11 @@
{
"name": "gitops-workflows",
"description": "GitOps workflows with ArgoCD and Flux CD including multi-cluster management, secrets, and progressive delivery",
"version": "1.0.0",
"author": {
"name": "DevOps Claude Skills"
},
"skills": [
"./"
]
}

3
README.md Normal file
View File

@@ -0,0 +1,3 @@
# gitops-workflows
GitOps workflows with ArgoCD and Flux CD including multi-cluster management, secrets, and progressive delivery

568
SKILL.md Normal file
View File

@@ -0,0 +1,568 @@
---
name: gitops-workflows
description: GitOps deployment workflows with ArgoCD and Flux. Use for setting up GitOps (ArgoCD 3.x, Flux 2.7), designing repository structures (monorepo/polyrepo, app-of-apps), multi-cluster deployments (ApplicationSets, hub-spoke), secrets management (SOPS+age, Sealed Secrets, External Secrets Operator), progressive delivery (Argo Rollouts, Flagger), troubleshooting sync issues, and OCI artifact management. Covers latest 2024-2025 features: ArgoCD annotation-based tracking, fine-grained RBAC, Flux OCI artifacts GA, image automation, source-watcher.
---
# GitOps Workflows
## Overview
This skill provides comprehensive GitOps workflows for continuous deployment to Kubernetes using ArgoCD 3.x and Flux 2.7+.
**When to use this skill**:
- Setting up GitOps from scratch (ArgoCD or Flux)
- Designing Git repository structures
- Multi-cluster deployments
- Troubleshooting sync/reconciliation issues
- Implementing secrets management
- Progressive delivery (canary, blue-green)
- Migrating between GitOps tools
---
## Core Workflow: GitOps Implementation
Use this decision tree to determine your starting point:
```
Do you have GitOps installed?
├─ NO → Need to choose a tool
│ └─ Want UI + easy onboarding? → ArgoCD (Workflow 1)
│ └─ Want modularity + platform engineering? → Flux (Workflow 2)
└─ YES → What's your goal?
├─ Sync issues / troubleshooting → Workflow 7
├─ Multi-cluster deployment → Workflow 4
├─ Secrets management → Workflow 5
├─ Progressive delivery → Workflow 6
├─ Repository structure → Workflow 3
└─ Tool comparison → Read references/argocd_vs_flux.md
```
---
## 1. Initial Setup: ArgoCD 3.x
**Latest Version**: v3.1.9 (stable), v3.2.0-rc4 (October 2025)
### Quick Install
```bash
# Create namespace
kubectl create namespace argocd
# Install ArgoCD 3.x
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/v3.1.9/manifests/install.yaml
# Get admin password
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d
# Port forward to access UI
kubectl port-forward svc/argocd-server -n argocd 8080:443
# Access: https://localhost:8080
```
**→ Template**: [assets/argocd/install-argocd-3.x.yaml](assets/argocd/install-argocd-3.x.yaml)
### ArgoCD 3.x New Features
**Breaking Changes**:
- ✅ Annotation-based tracking (default, was labels)
- ✅ RBAC logs enforcement enabled
- ✅ Legacy metrics removed
**New Features**:
- ✅ Fine-grained RBAC (per-resource permissions)
- ✅ Better defaults (resource exclusions for performance)
- ✅ Secrets operators endorsement
### Deploy Your First Application
```bash
# CLI method
argocd app create guestbook \
--repo https://github.com/argoproj/argocd-example-apps.git \
--path guestbook \
--dest-server https://kubernetes.default.svc \
--dest-namespace default
# Sync application
argocd app sync guestbook
```
### Health Check
```bash
# Check application health
python3 scripts/check_argocd_health.py \
--server https://argocd.example.com \
--token $ARGOCD_TOKEN
```
**→ Script**: [scripts/check_argocd_health.py](scripts/check_argocd_health.py)
---
## 2. Initial Setup: Flux 2.7
**Latest Version**: v2.7.1 (October 2025)
### Quick Install
```bash
# Install Flux CLI
brew install fluxcd/tap/flux # macOS
# or: curl -s https://fluxcd.io/install.sh | sudo bash
# Check prerequisites
flux check --pre
# Bootstrap Flux (GitHub)
export GITHUB_TOKEN=<your-token>
flux bootstrap github \
--owner=<org> \
--repository=fleet-infra \
--branch=main \
--path=clusters/production \
--personal
# Enable source-watcher (Flux 2.7+)
flux install --components-extra=source-watcher
```
**→ Template**: [assets/flux/flux-bootstrap-github.sh](assets/flux/flux-bootstrap-github.sh)
### Flux 2.7 New Features
- ✅ Image automation GA
- ✅ ExternalArtifact and ArtifactGenerator APIs
- ✅ Source-watcher component for better performance
- ✅ OpenTelemetry tracing support
- ✅ CEL expressions for readiness evaluation
### Deploy Your First Application
```yaml
# gitrepository.yaml
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
name: podinfo
namespace: flux-system
spec:
interval: 1m
url: https://github.com/stefanprodan/podinfo
ref:
branch: master
---
# kustomization.yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: podinfo
namespace: flux-system
spec:
interval: 5m
path: "./kustomize"
prune: true
sourceRef:
kind: GitRepository
name: podinfo
```
### Health Check
```bash
# Check Flux health
python3 scripts/check_flux_health.py --namespace flux-system
```
**→ Script**: [scripts/check_flux_health.py](scripts/check_flux_health.py)
---
## 3. Repository Structure Design
**Decision: Monorepo or Polyrepo?**
### Monorepo Pattern
**Best for**: Startups, small teams (< 20 apps), single team
```
gitops-repo/
├── apps/
│ ├── frontend/
│ ├── backend/
│ └── database/
├── infrastructure/
│ ├── ingress/
│ ├── monitoring/
│ └── secrets/
└── clusters/
├── dev/
├── staging/
└── production/
```
### Polyrepo Pattern
**Best for**: Large orgs, multiple teams, clear boundaries
```
infrastructure-repo/ (Platform team)
app-team-1-repo/ (Team 1)
app-team-2-repo/ (Team 2)
```
### Environment Structure (Kustomize)
```
app/
├── base/
│ ├── deployment.yaml
│ ├── service.yaml
│ └── kustomization.yaml
└── overlays/
├── dev/
│ ├── kustomization.yaml
│ └── replica-patch.yaml
├── staging/
└── production/
```
**→ Reference**: [references/repo_patterns.md](references/repo_patterns.md)
### Validate Repository Structure
```bash
python3 scripts/validate_gitops_repo.py /path/to/repo
```
**→ Script**: [scripts/validate_gitops_repo.py](scripts/validate_gitops_repo.py)
---
## 4. Multi-Cluster Deployments
### ArgoCD ApplicationSets
**Cluster Generator** (deploy to all clusters):
```yaml
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: cluster-apps
spec:
generators:
- cluster:
selector:
matchLabels:
environment: production
template:
metadata:
name: '{{name}}-myapp'
spec:
source:
repoURL: https://github.com/org/apps
path: myapp
destination:
server: '{{server}}'
```
**→ Template**: [assets/applicationsets/cluster-generator.yaml](assets/applicationsets/cluster-generator.yaml)
**Performance Benefit**: 83% faster deployments (30min → 5min)
### Generate ApplicationSets
```bash
# Cluster generator
python3 scripts/applicationset_generator.py cluster \
--name my-apps \
--repo-url https://github.com/org/repo \
--output appset.yaml
# Matrix generator (cluster x apps)
python3 scripts/applicationset_generator.py matrix \
--name my-apps \
--cluster-label production \
--directories app1,app2,app3 \
--output appset.yaml
```
**→ Script**: [scripts/applicationset_generator.py](scripts/applicationset_generator.py)
### Flux Multi-Cluster
**Hub-and-Spoke**: Management cluster manages all clusters
```bash
# Bootstrap each cluster
flux bootstrap github --context prod-cluster --path clusters/production
flux bootstrap github --context staging-cluster --path clusters/staging
```
**→ Reference**: [references/multi_cluster.md](references/multi_cluster.md)
---
## 5. Secrets Management
**Never commit plain secrets to Git.** Choose a solution:
### Decision Matrix
| Solution | Complexity | Best For | 2025 Trend |
|----------|-----------|----------|------------|
| **SOPS + age** | Medium | Git-centric, flexible | ↗️ Preferred |
| **External Secrets Operator** | Medium | Cloud-native, dynamic | ↗️ Growing |
| **Sealed Secrets** | Low | Simple, GitOps-first | → Stable |
### Option 1: SOPS + age (Recommended 2025)
**Setup**:
```bash
# Generate age key
age-keygen -o key.txt
# Public key: age1...
# Create .sops.yaml
cat <<EOF > .sops.yaml
creation_rules:
- path_regex: .*.yaml
encrypted_regex: ^(data|stringData)$
age: age1ql3z7hjy54pw3hyww5ayyfg7zqgvc7w3j2elw8zmrj2kg5sfn9aqmcac8p
EOF
# Encrypt secret
kubectl create secret generic my-secret --dry-run=client -o yaml \
--from-literal=password=supersecret > secret.yaml
sops -e secret.yaml > secret.enc.yaml
# Commit encrypted version
git add secret.enc.yaml .sops.yaml
```
**→ Template**: [assets/secrets/sops-age-config.yaml](assets/secrets/sops-age-config.yaml)
### Option 2: External Secrets Operator (v0.20+)
**Best for**: Cloud-native apps, dynamic secrets, automatic rotation
### Option 3: Sealed Secrets
**Best for**: Simple setup, static secrets, no external dependencies
**→ Reference**: [references/secret_management.md](references/secret_management.md)
### Audit Secrets
```bash
python3 scripts/secret_audit.py /path/to/repo
```
**→ Script**: [scripts/secret_audit.py](scripts/secret_audit.py)
---
## 6. Progressive Delivery
### Argo Rollouts (with ArgoCD)
**Canary Deployment**:
```yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: my-app
spec:
strategy:
canary:
steps:
- setWeight: 20
- pause: {duration: 2m}
- setWeight: 50
- pause: {duration: 2m}
- setWeight: 100
```
**→ Template**: [assets/progressive-delivery/argo-rollouts-canary.yaml](assets/progressive-delivery/argo-rollouts-canary.yaml)
### Flagger (with Flux)
**Canary with Metrics Analysis**:
```yaml
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: my-app
spec:
analysis:
interval: 1m
threshold: 5
maxWeight: 50
stepWeight: 10
metrics:
- name: request-success-rate
thresholdRange:
min: 99
```
**→ Reference**: [references/progressive_delivery.md](references/progressive_delivery.md)
---
## 7. Troubleshooting
### Common Issues
**ArgoCD OutOfSync**:
```bash
# Check differences
argocd app diff my-app
# Sync application
argocd app sync my-app
# Check health
python3 scripts/check_argocd_health.py --server https://argocd.example.com --token $TOKEN
```
**Flux Not Reconciling**:
```bash
# Check resources
flux get all
# Check specific kustomization
flux get kustomizations
kubectl describe kustomization my-app -n flux-system
# Force reconcile
flux reconcile kustomization my-app
```
**Detect Drift**:
```bash
# ArgoCD drift detection
python3 scripts/sync_drift_detector.py --argocd --app my-app
# Flux drift detection
python3 scripts/sync_drift_detector.py --flux
```
**→ Script**: [scripts/sync_drift_detector.py](scripts/sync_drift_detector.py)
**→ Reference**: [references/troubleshooting.md](references/troubleshooting.md)
---
## 8. OCI Artifacts (Flux 2.6+)
**GA Status**: Flux v2.6 (June 2025)
### Use OCIRepository for Helm Charts
```yaml
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: OCIRepository
metadata:
name: podinfo-oci
spec:
interval: 5m
url: oci://ghcr.io/stefanprodan/charts/podinfo
ref:
semver: ">=6.0.0"
verify:
provider: cosign
```
**→ Template**: [assets/flux/oci-helmrelease.yaml](assets/flux/oci-helmrelease.yaml)
### Verify OCI Artifacts
```bash
python3 scripts/oci_artifact_checker.py \
--verify ghcr.io/org/app:v1.0.0 \
--provider cosign
```
**→ Script**: [scripts/oci_artifact_checker.py](scripts/oci_artifact_checker.py)
**→ Reference**: [references/oci_artifacts.md](references/oci_artifacts.md)
---
## Quick Reference Commands
### ArgoCD
```bash
# List applications
argocd app list
# Get application details
argocd app get <app-name>
# Sync application
argocd app sync <app-name>
# View diff
argocd app diff <app-name>
# Delete application
argocd app delete <app-name>
```
### Flux
```bash
# Check Flux status
flux check
# Get all resources
flux get all
# Reconcile immediately
flux reconcile source git <name>
flux reconcile kustomization <name>
# Suspend/Resume
flux suspend kustomization <name>
flux resume kustomization <name>
# Export resources
flux export source git --all > sources.yaml
```
---
## Resources Summary
### Scripts (automation and diagnostics)
- `check_argocd_health.py` - Diagnose ArgoCD sync issues (3.x compatible)
- `check_flux_health.py` - Diagnose Flux reconciliation issues (2.7+ compatible)
- `validate_gitops_repo.py` - Validate repository structure and manifests
- `sync_drift_detector.py` - Detect drift between Git and cluster
- `secret_audit.py` - Audit secrets management (SOPS, Sealed Secrets, ESO)
- `applicationset_generator.py` - Generate ApplicationSet manifests
- `promotion_validator.py` - Validate environment promotion workflows
- `oci_artifact_checker.py` - Validate Flux OCI artifacts and verify signatures
### References (deep-dive documentation)
- `argocd_vs_flux.md` - Comprehensive comparison (2024-2025), decision matrix
- `repo_patterns.md` - Monorepo vs polyrepo, app-of-apps, environment structures
- `secret_management.md` - SOPS+age, Sealed Secrets, ESO (2025 best practices)
- `progressive_delivery.md` - Argo Rollouts, Flagger, canary/blue-green patterns
- `multi_cluster.md` - ApplicationSets, Flux multi-tenancy, hub-spoke patterns
- `troubleshooting.md` - Common sync issues, debugging commands
- `best_practices.md` - CNCF GitOps principles, security, 2025 recommendations
- `oci_artifacts.md` - Flux OCI artifacts (GA v2.6), signature verification
### Templates (production-ready configurations)
- `argocd/install-argocd-3.x.yaml` - ArgoCD 3.x installation with best practices
- `applicationsets/cluster-generator.yaml` - Multi-cluster ApplicationSet example
- `flux/flux-bootstrap-github.sh` - Flux 2.7 bootstrap script
- `flux/oci-helmrelease.yaml` - OCI artifact + HelmRelease example
- `secrets/sops-age-config.yaml` - SOPS + age configuration
- `progressive-delivery/argo-rollouts-canary.yaml` - Canary deployment with analysis

View File

@@ -0,0 +1,32 @@
# ApplicationSet with Cluster Generator
# Automatically deploys to all clusters matching label selector
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: cluster-apps
namespace: argocd
spec:
goTemplate: true
generators:
- cluster:
selector:
matchLabels:
environment: production
template:
metadata:
name: '{{.name}}-guestbook'
spec:
project: default
source:
repoURL: https://github.com/argoproj/argocd-example-apps
targetRevision: HEAD
path: guestbook
destination:
server: '{{.server}}'
namespace: guestbook
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

View File

@@ -0,0 +1,92 @@
# ArgoCD 3.x Installation with best practices
# Updated for ArgoCD v3.1+
apiVersion: v1
kind: Namespace
metadata:
name: argocd
---
# Install ArgoCD using official manifests
# kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/v3.1.9/manifests/install.yaml
# Configuration with ArgoCD 3.x best practices
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-cmd-params-cm
namespace: argocd
data:
# Enable fine-grained RBAC (ArgoCD 3.0+)
server.enable.gzip: "true"
# Resource exclusions (default in 3.x)
resource.exclusions: |
- apiGroups:
- ""
kinds:
- Endpoints
- EndpointSlice
clusters:
- "*"
---
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-cm
namespace: argocd
data:
# Annotation-based tracking (default in ArgoCD 3.x)
application.resourceTrackingMethod: annotation
# Resource exclusions for performance
resource.exclusions: |
- apiGroups:
- "*"
kinds:
- Lease
clusters:
- "*"
---
# Expose ArgoCD Server (choose one method)
# Option 1: LoadBalancer
apiVersion: v1
kind: Service
metadata:
name: argocd-server-lb
namespace: argocd
spec:
type: LoadBalancer
ports:
- port: 80
targetPort: 8080
protocol: TCP
selector:
app.kubernetes.io/name: argocd-server
# Option 2: Ingress (recommended)
# ---
# apiVersion: networking.k8s.io/v1
# kind: Ingress
# metadata:
# name: argocd-server-ingress
# namespace: argocd
# annotations:
# cert-manager.io/cluster-issuer: letsencrypt-prod
# nginx.ingress.kubernetes.io/ssl-passthrough: "true"
# nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
# spec:
# ingressClassName: nginx
# rules:
# - host: argocd.example.com
# http:
# paths:
# - path: /
# pathType: Prefix
# backend:
# service:
# name: argocd-server
# port:
# number: 443
# tls:
# - hosts:
# - argocd.example.com
# secretName: argocd-server-tls

View File

@@ -0,0 +1,49 @@
#!/bin/bash
# Flux 2.7+ Bootstrap Script for GitHub
set -e
# Configuration
GITHUB_USER="${GITHUB_USER:-your-org}"
GITHUB_REPO="${GITHUB_REPO:-fleet-infra}"
GITHUB_TOKEN="${GITHUB_TOKEN:-}"
CLUSTER_NAME="${CLUSTER_NAME:-production}"
CLUSTER_PATH="clusters/${CLUSTER_NAME}"
# Check prerequisites
command -v flux >/dev/null 2>&1 || { echo "flux CLI required"; exit 1; }
command -v kubectl >/dev/null 2>&1 || { echo "kubectl required"; exit 1; }
# Check GitHub token
if [ -z "$GITHUB_TOKEN" ]; then
echo "Error: GITHUB_TOKEN environment variable not set"
exit 1
fi
# Bootstrap Flux
echo "🚀 Bootstrapping Flux for cluster: $CLUSTER_NAME"
flux bootstrap github \
--owner="$GITHUB_USER" \
--repository="$GITHUB_REPO" \
--branch=main \
--path="$CLUSTER_PATH" \
--personal \
--token-auth
# Enable source-watcher (Flux 2.7+)
echo "✨ Enabling source-watcher component..."
flux install --components-extra=source-watcher
# Verify installation
echo "✅ Verifying Flux installation..."
flux check
echo "
✅ Flux bootstrapped successfully!
Next steps:
1. Add your applications to ${CLUSTER_PATH}/apps/
2. Commit and push to trigger Flux reconciliation
3. Monitor with: flux get all
"

View File

@@ -0,0 +1,38 @@
# Flux OCI Repository + HelmRelease (Flux 2.6+)
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: OCIRepository
metadata:
name: podinfo-oci
namespace: flux-system
spec:
interval: 5m
url: oci://ghcr.io/stefanprodan/charts/podinfo
ref:
semver: ">=6.0.0"
verify:
provider: cosign
secretRef:
name: cosign-public-key
---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: podinfo
namespace: default
spec:
interval: 10m
chart:
spec:
chart: podinfo
sourceRef:
kind: OCIRepository
name: podinfo-oci
namespace: flux-system
values:
replicaCount: 2
resources:
limits:
memory: 256Mi
requests:
cpu: 100m
memory: 64Mi

View File

@@ -0,0 +1,62 @@
# Argo Rollouts Canary Deployment with Analysis
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: my-app
spec:
replicas: 5
strategy:
canary:
steps:
- setWeight: 20
- pause: {duration: 2m}
- setWeight: 40
- pause: {duration: 2m}
- setWeight: 60
- pause: {duration: 2m}
- setWeight: 80
- pause: {duration: 2m}
analysis:
templates:
- templateName: success-rate
startingStep: 2
args:
- name: service-name
value: my-app
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
containers:
- name: my-app
image: myapp:v2.0.0
ports:
- containerPort: 8080
---
# Analysis Template using Prometheus
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: success-rate
spec:
args:
- name: service-name
metrics:
- name: success-rate
interval: 1m
successCondition: result[0] >= 0.95
failureLimit: 3
provider:
prometheus:
address: http://prometheus.monitoring:9090
query: |
sum(rate(
http_requests_total{job="{{args.service-name}}",status!~"5.."}[2m]
)) /
sum(rate(
http_requests_total{job="{{args.service-name}}"}[2m]
))

133
plugin.lock.json Normal file
View File

@@ -0,0 +1,133 @@
{
"$schema": "internal://schemas/plugin.lock.v1.json",
"pluginId": "gh:ahmedasmar/devops-claude-skills:gitops-workflows",
"normalized": {
"repo": null,
"ref": "refs/tags/v20251128.0",
"commit": "9d9aba99c48eab607e17775890549925e3cf492c",
"treeHash": "d0ab5ad5352a26f2e20ecbe92fe6a75ea200b094e9bdd53fbdd7314b921ea051",
"generatedAt": "2025-11-28T10:13:03.655231Z",
"toolVersion": "publish_plugins.py@0.2.0"
},
"origin": {
"remote": "git@github.com:zhongweili/42plugin-data.git",
"branch": "master",
"commit": "aa1497ed0949fd50e99e70d6324a29c5b34f9390",
"repoRoot": "/Users/zhongweili/projects/openmind/42plugin-data"
},
"manifest": {
"name": "gitops-workflows",
"description": "GitOps workflows with ArgoCD and Flux CD including multi-cluster management, secrets, and progressive delivery",
"version": "1.0.0"
},
"content": {
"files": [
{
"path": "README.md",
"sha256": "867bba9b73becced4d98602771881b393eed69f879e82e3db15e03caa1495553"
},
{
"path": "SKILL.md",
"sha256": "ca26dd3567959c1ef4fe83111e21b53cdd7681355a62e38122acb0c517322ccb"
},
{
"path": "references/progressive_delivery.md",
"sha256": "c9873ae80def5528aa91cf977047f2d4e743f22a03ab1f07855aaf6c620a7ae2"
},
{
"path": "references/troubleshooting.md",
"sha256": "00e5fc7d8752b25b8a51a358d4d7510e20ee9b698dd511545980142dfd5c6510"
},
{
"path": "references/argocd_vs_flux.md",
"sha256": "86f5977b45f7f2a38292d937153c91c1806f70a05ee5f2593ffcebbe6530fbc6"
},
{
"path": "references/multi_cluster.md",
"sha256": "018343bfb8c3b6f061f0a95aea8fced73c3e0b57ecc1892b7481525f2e3f3c2c"
},
{
"path": "references/oci_artifacts.md",
"sha256": "b2545bd256a61012b87407ecbcf7c3452c2e28e23b6c2db664f8d8b33a33a5c1"
},
{
"path": "references/repo_patterns.md",
"sha256": "b9fdf169b26f7f225d2ca89422f4ae6475f2413b31a12237d9a551a8de00eeee"
},
{
"path": "references/secret_management.md",
"sha256": "6c4dd5098220438397fc05032bcc506982b196be76a38d7c931adef317009a00"
},
{
"path": "references/best_practices.md",
"sha256": "136045d07ac582349ac6d211823855255c9a8364ba8fcd892579dc6cdfbf25e0"
},
{
"path": "scripts/applicationset_generator.py",
"sha256": "0f179e4c990c95decc0e721558cb6283f975abf353999ef2d6c68458262c6a4c"
},
{
"path": "scripts/promotion_validator.py",
"sha256": "834e8ffab247627717bbf289b63f35b9c49776dbe0478bd5eb2537c0be7a9475"
},
{
"path": "scripts/sync_drift_detector.py",
"sha256": "d7e0abad75ec1eb406edd643918b0d7a99cf0e457b9349ed517a79560a08d6ab"
},
{
"path": "scripts/secret_audit.py",
"sha256": "b0fd6209a363724c8982319363e51a3a7e3256d6120acd1c56ef23e697d5b539"
},
{
"path": "scripts/check_argocd_health.py",
"sha256": "6ed7bffeedf5f862d945dc2c50facd2a07e49170bb5314fafe9d39fdcc84f2f2"
},
{
"path": "scripts/oci_artifact_checker.py",
"sha256": "109d02231138a5ca09f4304a862b9268742628b902b6ca16e826ebeae958b949"
},
{
"path": "scripts/validate_gitops_repo.py",
"sha256": "bb81659411d59bdc0fe028e1089ce69ca20fbb9f695f40a8f910bdebdc71d39a"
},
{
"path": "scripts/check_flux_health.py",
"sha256": "a9d3acc40aee91c12049486f7807c6e0b6a0a78cc1ca68187184d440447fe2fa"
},
{
"path": ".claude-plugin/plugin.json",
"sha256": "d4e4e6ab1b5616e5c05550abac625b7af113cf32e12581709be24c4481af4ccb"
},
{
"path": "assets/progressive-delivery/argo-rollouts-canary.yaml",
"sha256": "c94189fb722e9da934c56dc7b3168737cb9b2aa3e8750a4bed236337ad339e4e"
},
{
"path": "assets/flux/flux-bootstrap-github.sh",
"sha256": "199535c78799bc13865d079c9d179351fe2991d9487e30d6e3a6ff692e58606f"
},
{
"path": "assets/flux/oci-helmrelease.yaml",
"sha256": "7959cfed54faffd8346927a01461dcc1296a61bc4f6c543ba46089cc8161cc34"
},
{
"path": "assets/argocd/install-argocd-3.x.yaml",
"sha256": "ab1f6555a685d0070858378071de0749d1bcc3a821fbecf6f4f353a05862f27c"
},
{
"path": "assets/secrets/sops-age-config.yaml",
"sha256": "e20729d61388ba4a3746e105801e0944d94ed7d5dd5e58f7d5bb561831c9ed08"
},
{
"path": "assets/applicationsets/cluster-generator.yaml",
"sha256": "767f16f17c1b60dc802b9e6e140737c6ca5cf56a8769012f1d5605b3cb43041a"
}
],
"dirSha256": "d0ab5ad5352a26f2e20ecbe92fe6a75ea200b094e9bdd53fbdd7314b921ea051"
},
"security": {
"scannedAt": null,
"scannerVersion": null,
"flags": []
}
}

View File

@@ -0,0 +1,243 @@
# ArgoCD vs Flux: Comprehensive Comparison (2024-2025)
## Current Versions (October 2025)
- **ArgoCD**: v3.1.9 (stable), v3.2.0-rc4 (release candidate)
- **Flux**: v2.7.1 (latest)
## Quick Decision Matrix
| Criteria | Choose ArgoCD | Choose Flux |
|----------|---------------|-------------|
| **Primary Focus** | Developer experience, UI | Platform engineering, modularity |
| **Team Size** | Medium-large teams | Small teams, platform engineers |
| **UI Required** | Yes | No (CLI-driven) |
| **Complexity** | Simpler onboarding | Steeper learning curve |
| **Customization** | Less modular | Highly modular |
| **Multi-tenancy** | Built-in with Projects | Manual configuration |
| **Best For** | Application teams, demos | Infrastructure teams, advanced users |
## Key Differences
### Architecture
**ArgoCD**:
- Monolithic design with integrated components
- Web UI, API server, application controller in one system
- Centralized control plane
**Flux**:
- Modular microservices architecture
- Separate controllers: source, kustomize, helm, notification, image-automation
- Distributed reconciliation
### User Experience
**ArgoCD**:
- Rich web UI for visualization and management
- GUI dashboard for deployment, syncing, troubleshooting
- Easier onboarding for developers
- Better for demos and presentations
**Flux**:
- CLI-driven (flux CLI + kubectl)
- No built-in UI (can integrate with Weave GitOps UI separately)
- Requires comfort with command-line tools
- Steeper learning curve
### Application Management
**ArgoCD 3.x**:
- Application and ApplicationSet CRDs
- App-of-apps pattern for organizing applications
- Fine-grained RBAC (new in v3.0)
- Annotation-based tracking (default in v3.0, changed from labels)
**Flux 2.7**:
- Kustomization and HelmRelease CRDs
- No built-in grouping mechanism
- RBAC through Kubernetes RBAC
- Label-based tracking
### Multi-Cluster Support
**ArgoCD ApplicationSets**:
- Cluster generator for auto-discovery
- Matrix generator for cluster x app combinations
- Hub-and-spoke pattern (one ArgoCD manages multiple clusters)
- 83% faster deployments vs manual (30min → 5min)
**Flux Multi-Tenancy**:
- Manual cluster configuration
- Separate Flux installations per cluster or shared
- More flexible but requires more setup
- No built-in cluster generator
### Secrets Management
Both support:
- Sealed Secrets
- External Secrets Operator
- SOPS
**ArgoCD 3.0 Change**:
- Now explicitly endorses secrets operators
- Cautions against config management plugins for secrets
- Better integration with ESO
**Flux**:
- Native SOPS integration with age encryption
- Decryption happens in-cluster
- .sops.yaml configuration support
### Progressive Delivery
**ArgoCD + Argo Rollouts**:
- Separate project but tight integration
- Rich UI for visualizing rollouts
- Supports canary, blue-green, A/B testing
- Metric analysis with Prometheus, Datadog, etc.
**Flux + Flagger**:
- Flagger as companion project
- CLI-driven
- Supports canary, blue-green, A/B testing
- Metric analysis with Prometheus, Datadog, etc.
## Feature Comparison
| Feature | ArgoCD 3.x | Flux 2.7 |
|---------|-----------|----------|
| **Web UI** | ✅ Built-in | ❌ (3rd party available) |
| **CLI** | ✅ argocd | ✅ flux |
| **Git Sources** | ✅ | ✅ |
| **OCI Artifacts** | ❌ | ✅ (GA in v2.6) |
| **Helm Support** | ✅ | ✅ |
| **Kustomize** | ✅ (v5.7.0) | ✅ (v5.7.0) |
| **Multi-tenancy** | ✅ Projects | Manual |
| **Image Automation** | ⚠️ Via Image Updater | ✅ GA in v2.7 |
| **Notifications** | ✅ | ✅ |
| **RBAC** | ✅ Fine-grained (v3.0) | Kubernetes RBAC |
| **Progressive Delivery** | Argo Rollouts | Flagger |
| **Signature Verification** | ⚠️ Limited | ✅ cosign/notation |
## Performance & Scale
**ArgoCD**:
- Can manage 1000+ applications per instance
- Better defaults in v3.0 (resource exclusions reduce API load)
- ApplicationSets reduce management overhead
**Flux**:
- Lighter resource footprint
- Better for large-scale monorepos
- Source-watcher (v2.7) improves reconciliation efficiency
## Community & Support
**ArgoCD**:
- CNCF Graduated project
- Large community, many contributors
- Akuity offers commercial support
- Annual ArgoCon conference
**Flux**:
- CNCF Graduated project
- Weaveworks shutdown (Feb 2024) but project remains strong
- Grafana Labs offers Grafana Cloud integration
- GitOpsCon events
## Version 3.0 Changes (ArgoCD)
**Breaking Changes**:
- Annotation-based tracking (default, was labels)
- RBAC logs enforcement (no longer optional)
- Removed legacy metrics (argocd_app_sync_status, etc.)
**New Features**:
- Fine-grained RBAC (per-resource permissions)
- Better defaults (resource exclusions for high-churn objects)
- Secrets operators endorsement
## Version 2.7 Changes (Flux)
**New Features**:
- Image automation GA
- ExternalArtifact and ArtifactGenerator APIs
- Source-watcher component
- OpenTelemetry tracing support
- CEL expressions for readiness
## Migration Considerations
### From ArgoCD → Flux
**Pros**:
- Lower resource consumption
- More modular architecture
- Better OCI support
- Native SOPS integration
**Cons**:
- Lose web UI
- More complex setup
- Manual multi-tenancy
**Effort**: Medium-High (2-4 weeks for large deployment)
### From Flux → ArgoCD
**Pros**:
- Gain web UI
- Easier multi-tenancy
- ApplicationSets for multi-cluster
- Better for teams new to GitOps
**Cons**:
- Higher resource consumption
- Less modular
- Limited OCI support
**Effort**: Medium (1-3 weeks)
## Recommendations by Use Case
### Choose ArgoCD if:
- ✅ Developer teams need visibility (UI required)
- ✅ Managing dozens of applications across teams
- ✅ Multi-tenancy with Projects model
- ✅ Fast onboarding is priority
- ✅ Need built-in RBAC with fine-grained control
### Choose Flux if:
- ✅ Platform engineering focus
- ✅ Infrastructure-as-code emphasis
- ✅ Using OCI artifacts extensively
- ✅ Want modular, composable architecture
- ✅ Team comfortable with CLI tools
- ✅ SOPS+age encryption requirement
### Use Both if:
- Different teams have different needs
- ArgoCD for app teams, Flux for infrastructure
- Separate concerns (apps vs infrastructure)
## Cost Considerations
**ArgoCD**:
- Higher memory/CPU usage (~500MB-1GB per instance)
- Commercial support available (Akuity)
**Flux**:
- Lower resource footprint (~200-400MB total)
- Grafana Cloud integration available
## Conclusion
**2024-2025 Recommendation**:
- **For most organizations**: Start with ArgoCD for ease of use
- **For platform teams**: Flux offers more control and modularity
- **For enterprises**: Consider ArgoCD for UI + Flux for infrastructure
- Both are production-ready CNCF Graduated projects
The choice depends more on team preferences and workflows than technical capability.

View File

@@ -0,0 +1,160 @@
# GitOps Best Practices (2024-2025)
## CNCF GitOps Principles (OpenGitOps v1.0)
1. **Declarative**: System desired state expressed declaratively
2. **Versioned**: State stored in version control (Git)
3. **Automated**: Changes automatically applied
4. **Continuous Reconciliation**: Software agents ensure desired state
5. **Auditable**: All changes tracked in Git history
## Repository Organization
**DO**:
- Separate infrastructure from applications
- Use clear directory structure (apps/, infrastructure/, clusters/)
- Implement environment promotion (dev → staging → prod)
- Use Kustomize overlays for environment differences
**DON'T**:
- Commit secrets to Git (use SOPS/Sealed Secrets/ESO)
- Use `:latest` image tags (pin to specific versions)
- Make manual cluster changes (everything through Git)
- Skip testing in lower environments
## Security Best Practices
1. **Secrets**: Never plain text, use encryption or external stores
2. **RBAC**: Least privilege for GitOps controllers
3. **Image Security**: Pin to digests, scan for vulnerabilities
4. **Network Policies**: Restrict controller traffic
5. **Audit**: Enable audit logging
## ArgoCD 3.x Specific
**Fine-Grained RBAC** (new in 3.0):
```yaml
p, role:dev, applications, *, dev/*, allow
p, role:dev, applications/resources, *, dev/*/Deployment/*, allow
```
**Resource Exclusions** (default in 3.0):
- Reduces API load
- Excludes high-churn resources (Endpoints, Leases)
**Annotation Tracking** (default):
- More reliable than labels
- Auto-migrates on sync
## Flux 2.7 Specific
**OCI Artifacts** (GA in 2.6):
- Prefer OCI over Git for generated configs
- Use digest pinning for immutability
- Sign artifacts with cosign/notation
**Image Automation** (GA in 2.7):
- Automated image updates
- GitRepository write-back
**Source-Watcher** (new in 2.7):
- Improves reconciliation efficiency
- Enable with: `--components-extra=source-watcher`
## CI/CD Integration
**Git Workflow**:
```
1. Developer commits to feature branch
2. CI runs tests, builds image
3. CI updates Git manifest with new image tag
4. Developer creates PR to main
5. GitOps controller syncs after merge
```
**Don't**: Deploy directly from CI to cluster (breaks GitOps)
**Do**: Update Git from CI, let GitOps deploy
## Monitoring & Observability
**Track**:
- Sync success rate
- Reconciliation time
- Drift detection frequency
- Failed syncs/reconciliations
**Tools**:
- Prometheus metrics (both ArgoCD and Flux)
- Grafana dashboards
- Alert on sync failures
## Image Management
**Good**:
```yaml
image: myapp:v1.2.3
image: myapp@sha256:abc123...
```
**Bad**:
```yaml
image: myapp:latest
image: myapp:dev
```
**Strategy**: Semantic versioning + digest pinning
## Environment Promotion
**Recommended Flow**:
```
Dev (auto-sync) → Staging (auto-sync) → Production (manual approval)
```
**Implementation**:
- Separate directories or repos per environment
- PR-based promotion
- Automated tests before promotion
- Manual approval for production
## Disaster Recovery
1. **Git is Source of Truth**: Cluster can be rebuilt from Git
2. **Backup**: Git repo + cluster state
3. **Test Recovery**: Practice cluster rebuild
4. **Document Bootstrap**: How to restore from scratch
## Performance Optimization
**ArgoCD**:
- Use ApplicationSets for multi-cluster
- Enable resource exclusions (3.x default)
- Server-side diff for large apps
**Flux**:
- Use OCI artifacts for large repos
- Enable source-watcher (2.7)
- Tune reconciliation intervals
## Common Anti-Patterns to Avoid
1. **Manual kubectl apply**: Bypasses GitOps, creates drift
2. **Multiple sources of truth**: Git should be only source
3. **Secrets in Git**: Always encrypt
4. **Direct cluster modifications**: All changes through Git
5. **No testing**: Always test in dev/staging first
6. **Missing RBAC**: Controllers need minimal permissions
## 2025 Trends
**Adopt**:
- OCI artifacts (Flux)
- Workload identity (no static credentials)
- SOPS + age (over PGP)
- External Secrets Operator (dynamic secrets)
- Multi-cluster with ApplicationSets/Flux
⚠️ **Avoid**:
- Label-based tracking (use annotations - ArgoCD 3.x default)
- PGP encryption (use age)
- Long-lived service account tokens (use workload identity)

View File

@@ -0,0 +1,80 @@
# Multi-Cluster GitOps Management (2024-2025)
## ArgoCD ApplicationSets
**Cluster Generator** (auto-discover clusters):
```yaml
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: my-apps
spec:
generators:
- cluster:
selector:
matchLabels:
environment: production
template:
spec:
source:
repoURL: https://github.com/org/repo
path: apps/{{name}}
destination:
server: '{{server}}'
```
**Matrix Generator** (Cluster x Apps):
```yaml
generators:
- matrix:
generators:
- cluster: {}
- git:
directories:
- path: apps/*
```
**Performance**: 83% faster than manual (30min → 5min)
## Flux Multi-Cluster
**Option 1: Flux Per Cluster**
```
cluster-1/ → Flux instance 1
cluster-2/ → Flux instance 2
```
**Option 2: Hub-and-Spoke**
```
management-cluster/
└── flux manages → cluster-1, cluster-2
```
**Setup**:
```bash
flux bootstrap github --owner=org --repository=fleet \
--path=clusters/production --context=prod-cluster
```
## Hub-and-Spoke Pattern
**Benefits**: Centralized management, single source of truth
**Cons**: Single point of failure
**Best for**: < 50 clusters
## Workload Identity (2025 Best Practice)
**Instead of service account tokens, use**:
- AWS IRSA
- GCP Workload Identity
- Azure AD Workload Identity
No more long-lived credentials!
## Best Practices
1. **Cluster labeling** for organization
2. **Progressive rollout** (dev → staging → prod clusters)
3. **Separate repos** for cluster config vs apps
4. **Monitor sync status** across all clusters
5. **Use workload identity** (no static credentials)

290
references/oci_artifacts.md Normal file
View File

@@ -0,0 +1,290 @@
# OCI Artifacts with Flux (2024-2025)
## Overview
**GA Status**: Flux v2.6 (June 2025)
**Current**: Fully supported in Flux v2.7
OCI artifacts allow storing Kubernetes manifests, Helm charts, and Kustomize overlays in container registries instead of Git.
## Benefits
**Decoupled from Git**: No Git dependency for deployment
**Immutable**: Content-addressable by digest
**Standard**: Uses OCI spec, works with any OCI registry
**Signature Verification**: Native support for cosign/notation
**Performance**: Faster than Git for large repos
## OCIRepository Resource
```yaml
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: OC IRepository
metadata:
name: my-app-oci
namespace: flux-system
spec:
interval: 5m
url: oci://ghcr.io/org/app-config
ref:
tag: v1.0.0
# or digest:
# digest: sha256:abc123...
# or semver:
# semver: ">=1.0.0 <2.0.0"
provider: generic # or azure, aws, gcp
verify:
provider: cosign
secretRef:
name: cosign-public-key
```
## Using with Kustomization
```yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: my-app
spec:
interval: 10m
sourceRef:
kind: OCIRepository
name: my-app-oci
path: ./
prune: true
```
## Using with HelmRelease
**OCIRepository for Helm charts**:
```yaml
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: OCIRepository
metadata:
name: podinfo-oci
spec:
interval: 5m
url: oci://ghcr.io/stefanprodan/charts/podinfo
ref:
semver: ">=6.0.0"
---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: podinfo
spec:
chart:
spec:
chart: podinfo
sourceRef:
kind: OCIRepository
name: podinfo-oci
```
## Publishing OCI Artifacts
**Using flux CLI**:
```bash
# Build and push Kustomize overlay
flux push artifact oci://ghcr.io/org/app-config:v1.0.0 \
--path="./kustomize" \
--source="$(git config --get remote.origin.url)" \
--revision="$(git rev-parse HEAD)"
# Build and push Helm chart
flux push artifact oci://ghcr.io/org/charts/myapp:1.0.0 \
--path="./charts/myapp" \
--source="$(git config --get remote.origin.url)" \
--revision="$(git rev-parse HEAD)"
```
## Signature Verification
### Using cosign
**Sign artifact**:
```bash
cosign sign ghcr.io/org/app-config:v1.0.0
```
**Verify in Flux**:
```yaml
spec:
verify:
provider: cosign
secretRef:
name: cosign-public-key
```
### Using notation
**Sign artifact**:
```bash
notation sign ghcr.io/org/app-config:v1.0.0
```
**Verify in Flux**:
```yaml
spec:
verify:
provider: notation
secretRef:
name: notation-config
```
## Workload Identity
**Instead of static credentials, use cloud provider workload identity**:
**AWS IRSA**:
```yaml
spec:
provider: aws
# No credentials needed - uses pod's IAM role
```
**GCP Workload Identity**:
```yaml
spec:
provider: gcp
# No credentials needed - uses service account binding
```
**Azure Workload Identity**:
```yaml
spec:
provider: azure
# No credentials needed - uses managed identity
```
## Best Practices (2025)
1. **Use digest pinning** for production:
```yaml
ref:
digest: sha256:abc123...
```
2. **Sign all artifacts**:
```bash
flux push artifact ... | cosign sign
```
3. **Use semver for automated updates**:
```yaml
ref:
semver: ">=1.0.0 <2.0.0"
```
4. **Leverage workload identity** (no static credentials)
5. **Prefer OCI for generated configs** (Jsonnet, CUE, Helm output)
## When to Use OCI vs Git
**Use OCI Artifacts when**:
- ✅ Storing generated configurations (Jsonnet, CUE output)
- ✅ Need immutable, content-addressable storage
- ✅ Want signature verification
- ✅ Large repos (performance)
- ✅ Decoupling from Git
**Use Git when**:
- ✅ Source of truth for manifests
- ✅ Need Git workflow (PRs, reviews)
- ✅ Audit trail important
- ✅ Team collaboration
## Common Pattern: Hybrid Approach
```
Git (source of truth)
CI builds/generates manifests
Push to OCI registry (signed)
Flux pulls from OCI (verified)
Deploy to cluster
```
## Migration from Git to OCI
**Before (Git)**:
```yaml
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
name: my-app
spec:
url: https://github.com/org/repo
ref:
branch: main
```
**After (OCI)**:
```yaml
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: OCIRepository
metadata:
name: my-app-oci
spec:
url: oci://ghcr.io/org/app-config
ref:
tag: v1.0.0
```
**Update Kustomization/HelmRelease** sourceRef to point to OCIRepository
## Supported Registries
- ✅ GitHub Container Registry (ghcr.io)
- ✅ Docker Hub
- ✅ AWS ECR
- ✅ Google Artifact Registry
- ✅ Azure Container Registry
- ✅ Harbor
- ✅ GitLab Container Registry
## Troubleshooting
**Artifact not found**:
```bash
flux get sources oci
kubectl describe ocirepository <name>
# Verify artifact exists
crane digest ghcr.io/org/app:v1.0.0
```
**Authentication failures**:
```bash
# Check secret
kubectl get secret -n flux-system
# Test manually
crane manifest ghcr.io/org/app:v1.0.0
```
**Signature verification fails**:
```bash
# Verify locally
cosign verify ghcr.io/org/app:v1.0.0
# Check public key secret
kubectl get secret cosign-public-key -o yaml
```
## 2025 Recommendation
**Adopt OCI artifacts** for:
- Helm charts (already standard)
- Generated manifests (CI output)
- Multi-environment configs
**Keep Git for**:
- Source manifests
- Infrastructure definitions
- Team collaboration workflows

View File

@@ -0,0 +1,94 @@
# Progressive Delivery with GitOps (2024-2025)
## Argo Rollouts (with ArgoCD)
**Current Focus**: Kubernetes-native progressive delivery
**Deployment Strategies**:
### 1. Canary
```yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: my-app
spec:
strategy:
canary:
steps:
- setWeight: 20
- pause: {duration: 5m}
- setWeight: 50
- pause: {duration: 5m}
- setWeight: 100
```
### 2. Blue-Green
```yaml
spec:
strategy:
blueGreen:
activeService: my-app
previewService: my-app-preview
autoPromotionEnabled: false
```
### 3. Analysis with Metrics
```yaml
spec:
strategy:
canary:
analysis:
templates:
- templateName: success-rate
args:
- name: service-name
value: my-app
```
**Metric Providers**: Prometheus, Datadog, New Relic, CloudWatch
## Flagger (with Flux)
**Installation**:
```bash
flux install
kubectl apply -k github.com/fluxcd/flagger//kustomize/linkerd
```
**Canary with Flagger**:
```yaml
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: my-app
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
service:
port: 9898
analysis:
interval: 1m
threshold: 5
maxWeight: 50
stepWeight: 10
metrics:
- name: request-success-rate
thresholdRange:
min: 99
```
## Best Practices
1. **Start with Manual Approval** (autoPromotionEnabled: false)
2. **Monitor Key Metrics** (error rate, latency, saturation)
3. **Set Conservative Steps** (10%, 25%, 50%, 100%)
4. **Define Rollback Criteria** (error rate > 1%)
5. **Test in Staging First**
## 2025 Recommendation
**For ArgoCD users**: Argo Rollouts (tight integration, UI support)
**For Flux users**: Flagger (CNCF project, modular design)

184
references/repo_patterns.md Normal file
View File

@@ -0,0 +1,184 @@
# GitOps Repository Patterns (2024-2025)
## Monorepo vs Polyrepo
### Monorepo Pattern
**Structure**:
```
gitops-repo/
├── apps/
│ ├── frontend/
│ ├── backend/
│ └── database/
├── infrastructure/
│ ├── ingress/
│ ├── monitoring/
│ └── secrets/
└── clusters/
├── dev/
├── staging/
└── production/
```
**Pros**:
- Single source of truth
- Atomic changes across apps
- Easier to start with
- Simpler CI/CD
**Cons**:
- Scaling issues (>100 apps)
- RBAC complexity
- Large repo size
- Blast radius concerns
**Best for**: Startups, small teams (< 20 apps), single team ownership
### Polyrepo Pattern
**Structure**:
```
infrastructure-repo/ (Platform team)
app-team-1-repo/ (Team 1)
app-team-2-repo/ (Team 2)
cluster-config-repo/ (Platform team)
```
**Pros**:
- Clear ownership boundaries
- Better RBAC (repo-level)
- Scales to 100s of apps
- Team autonomy
**Cons**:
- More complex setup
- Cross-repo dependencies
- Multiple CI/CD pipelines
**Best for**: Large orgs, multiple teams, clear separation of concerns
## Common Patterns
### 1. Repo Per Team
- Each team has own repo
- Platform team manages infra repo
- Hub cluster manages all
### 2. Repo Per App
- Each app in separate repo
- Good for microservices
- Maximum autonomy
### 3. Hybrid (Recommended)
- Infrastructure monorepo (platform team)
- Application polyrepo (dev teams)
- Best of both worlds
## App-of-Apps Pattern (ArgoCD)
**Root Application**:
```yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: root
spec:
source:
repoURL: https://github.com/org/gitops
path: apps/
destination:
server: https://kubernetes.default.svc
```
**Apps Directory**:
```
apps/
├── app1.yaml (Application manifest)
├── app2.yaml
└── app3.yaml
```
**Benefits**: Centralized management, single sync point
## Environment Structure
### Option 1: Directory Per Environment
```
apps/
├── base/
│ └── kustomization.yaml
└── overlays/
├── dev/
├── staging/
└── production/
```
### Option 2: Branch Per Environment
```
main branch → production
staging branch → staging
dev branch → development
```
**Don't Repeat YAML**: Use Kustomize bases + overlays
## Flux Repository Organization
**Recommended Structure**:
```
flux-repo/
├── clusters/
│ ├── production/
│ │ ├── flux-system/
│ │ ├── apps.yaml
│ │ └── infrastructure.yaml
│ └── staging/
├── apps/
│ └── podinfo/
│ ├── kustomization.yaml
│ └── release.yaml
└── infrastructure/
└── sources/
├── gitrepositories.yaml
└── ocirepositories.yaml
```
## Kustomize vs Helm in GitOps
**Kustomize** (recommended for GitOps):
- Native Kubernetes
- Declarative patches
- No templating language
**Helm** (when necessary):
- Third-party charts
- Complex applications
- Need parameterization
**Best Practice**: Kustomize for your apps, Helm for third-party
## Promotion Strategies
### 1. Manual PR-based
```
dev/ → (PR) → staging/ → (PR) → production/
```
### 2. Automated with CI
```
dev/ → (auto-promote on tests pass) → staging/ → (manual approval) → production/
```
### 3. Progressive with Canary
```
production/stable/ → canary deployment → production/all/
```
## 2024-2025 Recommendations
1. **Start with monorepo**, migrate to polyrepo when needed
2. **Use Kustomize bases + overlays** (don't repeat YAML)
3. **Separate infrastructure from applications**
4. **Implement promotion workflows** (dev → staging → prod)
5. **Never commit directly to production** (always PR)

View File

@@ -0,0 +1,213 @@
# Secrets Management in GitOps (2024-2025)
## Overview
**Never commit plain secrets to Git.** Use encryption or external secret stores.
## Solutions Comparison
| Solution | Type | Complexity | Best For | 2025 Trend |
|----------|------|------------|----------|------------|
| **Sealed Secrets** | Encrypted in Git | Low | Simple, GitOps-first | Stable |
| **External Secrets Operator** | External store | Medium | Cloud-native, dynamic | ↗️ Growing |
| **SOPS + age** | Encrypted in Git | Medium | Flexible, Git-friendly | ↗️ Preferred over PGP |
## 1. Sealed Secrets
**How it works**: Public key encryption, controller decrypts in-cluster
**Setup**:
```bash
kubectl apply -f https://github.com/bitnami-labs/sealed-secrets/releases/download/v0.24.0/controller.yaml
```
**Usage**:
```bash
# Create sealed secret
kubectl create secret generic my-secret --dry-run=client -o yaml --from-literal=password=supersecret | \
kubeseal -o yaml > sealed-secret.yaml
# Commit to Git
git add sealed-secret.yaml
git commit -m "Add sealed secret"
```
**Pros**: Simple, GitOps-native, no external dependencies
**Cons**: Key rotation complexity, static secrets only
## 2. External Secrets Operator (ESO)
**Latest Version**: v0.20.2 (2024-2025)
**Supported Providers**:
- AWS Secrets Manager
- Azure Key Vault
- Google Secret Manager
- HashiCorp Vault
- 1Password
- Doppler
**Setup**:
```bash
helm install external-secrets external-secrets/external-secrets -n external-secrets-system --create-namespace
```
**Usage**:
```yaml
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: aws-secret-store
spec:
provider:
aws:
service: SecretsManager
region: us-east-1
auth:
jwt:
serviceAccountRef:
name: external-secrets-sa
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: my-secret
spec:
secretStoreRef:
name: aws-secret-store
target:
name: my-app-secret
data:
- secretKey: password
remoteRef:
key: prod/my-app/password
```
**Pros**: Dynamic secrets, cloud-native, automatic rotation
**Cons**: External dependency, requires cloud secret store
**2025 Recommendation**: Growing preference over Sealed Secrets
## 3. SOPS + age
**Recommended over PGP as of 2024-2025**
**Setup age**:
```bash
# Install age
brew install age # macOS
apt install age # Ubuntu
# Generate key
age-keygen -o key.txt
# Public key: age1...
```
**Setup SOPS**:
```bash
# Install SOPS
brew install sops
# Create .sops.yaml
cat <<EOF > .sops.yaml
creation_rules:
- path_regex: .*.yaml
encrypted_regex: ^(data|stringData)$
age: age1ql3z7hjy54pw3hyww5ayyfg7zqgvc7w3j2elw8zmrj2kg5sfn9aqmcac8p
EOF
```
**Encrypt secrets**:
```bash
# Create secret
kubectl create secret generic my-secret --dry-run=client -o yaml --from-literal=password=supersecret > secret.yaml
# Encrypt with SOPS
sops -e secret.yaml > secret.enc.yaml
# Commit encrypted version
git add secret.enc.yaml .sops.yaml
```
**Flux Integration**:
```yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: app
spec:
decryption:
provider: sops
secretRef:
name: sops-age
```
**Pros**: Git-friendly, flexible, age is simpler than PGP
**Cons**: Manual encryption step, key management
## Best Practices (2024-2025)
### 1. Key Rotation
**Sealed Secrets**: Rotate annually, maintain old keys for decryption
**ESO**: Automatic with cloud providers
**SOPS**: Re-encrypt when rotating age keys
### 2. Access Control
- Never commit `.sops` age key to Git
- Use separate keys per environment
- Store age keys in CI/CD secrets
- Use RBAC for Secret access
### 3. Encryption Scope
**SOPS .sops.yaml**:
```yaml
creation_rules:
- path_regex: production/.*
encrypted_regex: ^(data|stringData)$
age: age1prod...
- path_regex: staging/.*
encrypted_regex: ^(data|stringData)$
age: age1staging...
```
### 4. Git Pre-commit Hook
Prevent committing plain secrets:
```bash
#!/bin/bash
# .git/hooks/pre-commit
if git diff --cached --name-only | grep -E 'secret.*\.yaml$'; then
echo "⚠️ Potential secret file detected"
echo "Ensure it's encrypted with SOPS"
exit 1
fi
```
### 5. ArgoCD 3.0 Recommendation
**Use secrets operators** (ESO preferred), avoid config management plugins for secrets
## Decision Guide
**Choose Sealed Secrets if**:
- ✅ Simple GitOps workflow
- ✅ Static secrets
- ✅ No external dependencies wanted
- ✅ Small team
**Choose External Secrets Operator if**:
- ✅ Already using cloud secret stores
- ✅ Need secret rotation
- ✅ Dynamic secrets
- ✅ Enterprise compliance
**Choose SOPS + age if**:
- ✅ Git-centric workflow
- ✅ Want flexibility
- ✅ Multi-cloud
- ✅ Prefer open standards
## 2025 Trend Summary
**Growing**: External Secrets Operator, SOPS+age
**Stable**: Sealed Secrets (still widely used)
**Declining**: PGP encryption (age preferred)
**Emerging**: age encryption as standard (simpler than PGP)

View File

@@ -0,0 +1,134 @@
# GitOps Troubleshooting Guide (2024-2025)
## Common ArgoCD Issues
### 1. Application OutOfSync
**Symptoms**: Application shows OutOfSync status
**Causes**: Git changes not applied, manual cluster changes
**Fix**:
```bash
argocd app sync my-app
argocd app diff my-app # See differences
```
### 2. Annotation Tracking Migration (ArgoCD 3.x)
**Symptoms**: Resources not tracked after upgrade to 3.x
**Cause**: Default changed from labels to annotations
**Fix**: Resources auto-migrate on next sync, or force:
```bash
argocd app sync my-app --force
```
### 3. Sync Fails with "Resource is Invalid"
**Cause**: YAML validation error, CRD mismatch
**Fix**:
```bash
argocd app get my-app --show-operation
kubectl apply --dry-run=client -f manifest.yaml # Test locally
```
### 4. Image Pull Errors
**Cause**: Registry credentials, network issues
**Fix**:
```bash
kubectl get events -n <namespace>
kubectl describe pod <pod-name> -n <namespace>
# Check image pull secret
kubectl get secret -n <namespace>
```
## Common Flux Issues
### 1. GitRepository Not Ready
**Symptoms**: source not ready, no artifact
**Causes**: Auth failure, branch doesn't exist
**Fix**:
```bash
flux get sources git
flux reconcile source git <name> -n flux-system
kubectl describe gitrepository <name> -n flux-system
```
### 2. Kustomization Build Failed
**Cause**: Invalid kustomization.yaml, missing resources
**Fix**:
```bash
flux get kustomizations
kubectl describe kustomization <name> -n flux-system
# Test locally
kustomize build <path>
```
### 3. HelmRelease Install Failed
**Cause**: Values error, chart incompatibility
**Fix**:
```bash
flux get helmreleases
kubectl logs -n flux-system -l app=helm-controller
# Test locally
helm template <chart> -f values.yaml
```
### 4. OCI Repository Issues (Flux 2.6+)
**Cause**: Registry auth, OCI artifact not found
**Fix**:
```bash
flux get sources oci
kubectl describe ocirepository <name>
# Verify artifact exists
crane digest ghcr.io/org/app:v1.0.0
```
## SOPS Decryption Failures
**Symptom**: Secret not decrypted
**Fix**:
```bash
# Check age secret exists
kubectl get secret sops-age -n flux-system
# Test decryption locally
export SOPS_AGE_KEY_FILE=key.txt
sops -d secret.enc.yaml
```
## Performance Issues
### ArgoCD Slow Syncs
**Cause**: Too many resources, inefficient queries
**Fix** (ArgoCD 3.x):
- Use default resource exclusions
- Enable server-side diff
- Increase controller replicas
### Flux Slow Reconciliation
**Cause**: Large monorepos, many sources
**Fix** (Flux 2.7+):
- Enable source-watcher
- Increase interval
- Use OCI artifacts instead of Git
## Debugging Commands
**ArgoCD**:
```bash
argocd app get <app> --refresh
argocd app logs <app>
kubectl logs -n argocd -l app.kubernetes.io/name=argocd-application-controller
```
**Flux**:
```bash
flux logs --all-namespaces
flux check
flux get all
kubectl -n flux-system get events --sort-by='.lastTimestamp'
```
## Quick Wins
1. **Use `--dry-run`** before applying
2. **Check controller logs** first
3. **Verify RBAC** permissions
4. **Test manifests locally** (kubectl apply --dry-run, kustomize build)
5. **Check Git connectivity** (credentials, network)

View File

@@ -0,0 +1,156 @@
#!/usr/bin/env python3
"""
Generate ArgoCD ApplicationSet manifests for multi-cluster deployments.
Supports Cluster, List, and Matrix generators (ArgoCD 3.x).
"""
import argparse
import sys
import yaml
APPLICATIONSET_TEMPLATE = """---
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: {name}
namespace: argocd
spec:
goTemplate: true
goTemplateOptions: ["missingkey=error"]
generators:
{generators}
template:
metadata:
name: '{{{{.name}}}}-{name}'
labels:
environment: '{{{{.environment}}}}'
spec:
project: default
source:
repoURL: {repo_url}
targetRevision: {target_revision}
path: '{path}'
destination:
server: '{{{{.server}}}}'
namespace: {namespace}
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
"""
def generate_cluster_generator(label_selector: str = "") -> str:
"""Generate Cluster generator."""
selector = f"\n selector:\n matchLabels:\n {label_selector}" if label_selector else ""
return f""" - cluster: {{{selector}}}"""
def generate_list_generator(clusters: list) -> str:
"""Generate List generator."""
elements = "\n".join([f" - name: {c['name']}\n server: {c['server']}\n environment: {c.get('environment', 'production')}"
for c in clusters])
return f""" - list:
elements:
{elements}"""
def generate_matrix_generator(cluster_label: str, git_directories: list) -> str:
"""Generate Matrix generator (Cluster x Git directories)."""
git_list = "\n".join([f" - path: {d}" for d in git_directories])
return f""" - matrix:
generators:
- cluster:
selector:
matchLabels:
environment: {cluster_label}
- git:
repoURL: https://github.com/example/apps
revision: HEAD
directories:
{git_list}"""
def main():
parser = argparse.ArgumentParser(
description='Generate ArgoCD ApplicationSet manifests',
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
# Cluster generator (all clusters)
python3 applicationset_generator.py cluster \\
--name my-apps \\
--repo-url https://github.com/org/repo \\
--path apps/
# List generator (specific clusters)
python3 applicationset_generator.py list \\
--name my-apps \\
--clusters prod=https://prod.k8s.local,staging=https://staging.k8s.local
# Matrix generator (cluster x directories)
python3 applicationset_generator.py matrix \\
--name my-apps \\
--cluster-label production \\
--directories app1,app2,app3
"""
)
parser.add_argument('generator_type', choices=['cluster', 'list', 'matrix'],
help='Generator type')
parser.add_argument('--name', required=True, help='ApplicationSet name')
parser.add_argument('--repo-url', default='https://github.com/example/repo',
help='Git repository URL')
parser.add_argument('--path', default='apps/', help='Path in repository')
parser.add_argument('--namespace', default='default', help='Target namespace')
parser.add_argument('--target-revision', default='main', help='Git branch/tag')
parser.add_argument('--cluster-label', help='Cluster label selector')
parser.add_argument('--clusters', help='Cluster list (name=server,name=server)')
parser.add_argument('--directories', help='Git directories (comma-separated)')
parser.add_argument('--output', help='Output file')
args = parser.parse_args()
# Generate based on type
if args.generator_type == 'cluster':
generators = generate_cluster_generator(args.cluster_label or "")
elif args.generator_type == 'list':
if not args.clusters:
print("❌ --clusters required for list generator")
sys.exit(1)
cluster_list = []
for c in args.clusters.split(','):
name, server = c.split('=')
cluster_list.append({'name': name, 'server': server})
generators = generate_list_generator(cluster_list)
elif args.generator_type == 'matrix':
if not args.cluster_label or not args.directories:
print("❌ --cluster-label and --directories required for matrix generator")
sys.exit(1)
directories = args.directories.split(',')
generators = generate_matrix_generator(args.cluster_label, directories)
# Create ApplicationSet
appset = APPLICATIONSET_TEMPLATE.format(
name=args.name,
generators=generators,
repo_url=args.repo_url,
target_revision=args.target_revision,
path=args.path,
namespace=args.namespace
)
# Output
if args.output:
with open(args.output, 'w') as f:
f.write(appset)
print(f"✅ ApplicationSet written to: {args.output}")
else:
print(appset)
if __name__ == '__main__':
main()

View File

@@ -0,0 +1,275 @@
#!/usr/bin/env python3
"""
Check ArgoCD application health and diagnose sync issues.
Supports ArgoCD 3.x API with annotation-based tracking.
"""
import argparse
import sys
import json
from typing import Dict, List, Any, Optional
from datetime import datetime
try:
import requests
except ImportError:
print("⚠️ Warning: 'requests' library not found. Install with: pip install requests")
sys.exit(1)
try:
from tabulate import tabulate
except ImportError:
tabulate = None
class ArgoCDHealthChecker:
def __init__(self, server: str, token: Optional[str] = None, username: Optional[str] = None, password: Optional[str] = None):
self.server = server.rstrip('/')
self.token = token
self.session = requests.Session()
if token:
self.session.headers['Authorization'] = f'Bearer {token}'
elif username and password:
# Login to get token
self._login(username, password)
else:
raise ValueError("Either --token or --username/--password must be provided")
def _login(self, username: str, password: str):
"""Login to ArgoCD and get auth token."""
try:
response = self.session.post(
f"{self.server}/api/v1/session",
json={"username": username, "password": password},
verify=False
)
response.raise_for_status()
self.token = response.json()['token']
self.session.headers['Authorization'] = f'Bearer {self.token}'
except Exception as e:
print(f"❌ Failed to login to ArgoCD: {e}")
sys.exit(1)
def get_applications(self, name: Optional[str] = None) -> List[Dict]:
"""Get ArgoCD applications."""
try:
if name:
url = f"{self.server}/api/v1/applications/{name}"
response = self.session.get(url, verify=False)
response.raise_for_status()
return [response.json()]
else:
url = f"{self.server}/api/v1/applications"
response = self.session.get(url, verify=False)
response.raise_for_status()
return response.json().get('items', [])
except Exception as e:
print(f"❌ Failed to get applications: {e}")
return []
def check_application_health(self, app: Dict) -> Dict[str, Any]:
"""Check application health and sync status."""
name = app['metadata']['name']
health = app.get('status', {}).get('health', {})
sync = app.get('status', {}).get('sync', {})
operation_state = app.get('status', {}).get('operationState', {})
result = {
'name': name,
'health_status': health.get('status', 'Unknown'),
'health_message': health.get('message', ''),
'sync_status': sync.get('status', 'Unknown'),
'sync_revision': sync.get('revision', 'N/A')[:8] if sync.get('revision') else 'N/A',
'operation_phase': operation_state.get('phase', 'N/A'),
'issues': [],
'recommendations': []
}
# Check for common issues
if result['health_status'] not in ['Healthy', 'Unknown']:
result['issues'].append(f"Application is {result['health_status']}")
if result['health_message']:
result['issues'].append(f"Health message: {result['health_message']}")
if result['sync_status'] == 'OutOfSync':
result['issues'].append("Application is out of sync with Git")
result['recommendations'].append("Run: argocd app sync " + name)
result['recommendations'].append("Check if manual sync is required (sync policy)")
if result['sync_status'] == 'Unknown':
result['issues'].append("Sync status is unknown")
result['recommendations'].append("Check ArgoCD application controller logs")
result['recommendations'].append(f"kubectl logs -n argocd -l app.kubernetes.io/name=argocd-application-controller")
# Check for failed operations
if operation_state.get('phase') == 'Failed':
result['issues'].append(f"Last operation failed")
if 'message' in operation_state:
result['issues'].append(f"Operation message: {operation_state['message']}")
result['recommendations'].append("Check operation details in ArgoCD UI")
result['recommendations'].append(f"argocd app get {name}")
# Check resource conditions (ArgoCD 3.x)
resources = app.get('status', {}).get('resources', [])
unhealthy_resources = [r for r in resources if r.get('health', {}).get('status') not in ['Healthy', 'Unknown', '']]
if unhealthy_resources:
result['issues'].append(f"{len(unhealthy_resources)} resources are unhealthy")
for r in unhealthy_resources[:3]: # Show first 3
kind = r.get('kind', 'Unknown')
name = r.get('name', 'Unknown')
status = r.get('health', {}).get('status', 'Unknown')
result['issues'].append(f" - {kind}/{name}: {status}")
result['recommendations'].append(f"kubectl get {unhealthy_resources[0]['kind']} -n {app['spec']['destination']['namespace']}")
# Check for annotation-based tracking (ArgoCD 3.x default)
tracking_method = app.get('spec', {}).get('syncPolicy', {}).get('syncOptions', [])
has_label_tracking = 'UseLabel=true' in tracking_method
if has_label_tracking:
result['recommendations'].append("⚠️ Using legacy label-based tracking. Consider migrating to annotation-based tracking (ArgoCD 3.x default)")
return result
def check_all_applications(self, name: Optional[str] = None, show_healthy: bool = False) -> List[Dict]:
"""Check all applications or specific application."""
apps = self.get_applications(name)
results = []
for app in apps:
result = self.check_application_health(app)
if show_healthy or result['issues']:
results.append(result)
return results
def print_summary(self, results: List[Dict]):
"""Print summary of application health."""
if not results:
print("✅ No applications found or all healthy (use --show-healthy to see healthy apps)")
return
# Summary statistics
total = len(results)
with_issues = len([r for r in results if r['issues']])
print(f"\n📊 Summary: {with_issues}/{total} applications have issues\n")
# Table output
if tabulate:
table_data = []
for r in results:
status_icon = "" if r['issues'] else ""
table_data.append([
status_icon,
r['name'],
r['health_status'],
r['sync_status'],
r['sync_revision'],
len(r['issues'])
])
print(tabulate(
table_data,
headers=['', 'Application', 'Health', 'Sync', 'Revision', 'Issues'],
tablefmt='simple'
))
else:
for r in results:
status_icon = "" if r['issues'] else ""
print(f"{status_icon} {r['name']}: Health={r['health_status']}, Sync={r['sync_status']}, Issues={len(r['issues'])}")
# Detailed issues and recommendations
print("\n🔍 Detailed Issues:\n")
for r in results:
if not r['issues']:
continue
print(f"Application: {r['name']}")
print(f" Health: {r['health_status']}")
print(f" Sync: {r['sync_status']}")
if r['issues']:
print(" Issues:")
for issue in r['issues']:
print(f"{issue}")
if r['recommendations']:
print(" Recommendations:")
for rec in r['recommendations']:
print(f"{rec}")
print()
def main():
parser = argparse.ArgumentParser(
description='Check ArgoCD application health and diagnose sync issues (ArgoCD 3.x compatible)',
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
# Check all applications
python3 check_argocd_health.py \\
--server https://argocd.example.com \\
--token $ARGOCD_TOKEN
# Check specific application
python3 check_argocd_health.py \\
--server https://argocd.example.com \\
--username admin \\
--password $ARGOCD_PASSWORD \\
--app my-app
# Show all applications including healthy ones
python3 check_argocd_health.py \\
--server https://argocd.example.com \\
--token $ARGOCD_TOKEN \\
--show-healthy
ArgoCD 3.x Features:
- Annotation-based tracking (default)
- Fine-grained RBAC support
- Enhanced resource health checks
"""
)
parser.add_argument('--server', required=True, help='ArgoCD server URL')
parser.add_argument('--token', help='ArgoCD auth token (or set ARGOCD_TOKEN env var)')
parser.add_argument('--username', help='ArgoCD username')
parser.add_argument('--password', help='ArgoCD password')
parser.add_argument('--app', help='Specific application name to check')
parser.add_argument('--show-healthy', action='store_true', help='Show healthy applications')
parser.add_argument('--json', action='store_true', help='Output as JSON')
args = parser.parse_args()
# Get token from env if not provided
import os
token = args.token or os.getenv('ARGOCD_TOKEN')
try:
checker = ArgoCDHealthChecker(
server=args.server,
token=token,
username=args.username,
password=args.password
)
results = checker.check_all_applications(
name=args.app,
show_healthy=args.show_healthy
)
if args.json:
print(json.dumps(results, indent=2))
else:
checker.print_summary(results)
except KeyboardInterrupt:
print("\n\nInterrupted by user")
sys.exit(1)
except Exception as e:
print(f"❌ Error: {e}")
sys.exit(1)
if __name__ == '__main__':
main()

View File

@@ -0,0 +1,418 @@
#!/usr/bin/env python3
"""
Check Flux CD health and diagnose reconciliation issues.
Supports Flux v2.7+ with OCI artifacts, image automation, and source-watcher.
"""
import argparse
import sys
import json
from typing import Dict, List, Any, Optional
from datetime import datetime, timedelta
try:
from kubernetes import client, config
from kubernetes.client.rest import ApiException
except ImportError:
print("⚠️ Warning: 'kubernetes' library not found. Install with: pip install kubernetes")
sys.exit(1)
try:
from tabulate import tabulate
except ImportError:
tabulate = None
class FluxHealthChecker:
def __init__(self, namespace: str = "flux-system", kubeconfig: Optional[str] = None):
self.namespace = namespace
# Load kubeconfig
try:
if kubeconfig:
config.load_kube_config(config_file=kubeconfig)
else:
try:
config.load_kube_config()
except:
config.load_incluster_config()
except Exception as e:
print(f"❌ Failed to load kubeconfig: {e}")
sys.exit(1)
self.api = client.ApiClient()
self.custom_api = client.CustomObjectsApi(self.api)
self.core_api = client.CoreV1Api(self.api)
def get_flux_resources(self, resource_type: str, namespace: Optional[str] = None) -> List[Dict]:
"""Get Flux custom resources."""
ns = namespace or self.namespace
resource_map = {
'gitrepositories': ('source.toolkit.fluxcd.io', 'v1', 'gitrepositories'),
'ocirepositories': ('source.toolkit.fluxcd.io', 'v1beta2', 'ocirepositories'),
'helmrepositories': ('source.toolkit.fluxcd.io', 'v1', 'helmrepositories'),
'buckets': ('source.toolkit.fluxcd.io', 'v1beta2', 'buckets'),
'kustomizations': ('kustomize.toolkit.fluxcd.io', 'v1', 'kustomizations'),
'helmreleases': ('helm.toolkit.fluxcd.io', 'v2', 'helmreleases'),
'imageupdateautomations': ('image.toolkit.fluxcd.io', 'v1beta2', 'imageupdateautomations'),
'imagerepositories': ('image.toolkit.fluxcd.io', 'v1beta2', 'imagerepositories'),
}
if resource_type not in resource_map:
return []
group, version, plural = resource_map[resource_type]
try:
response = self.custom_api.list_namespaced_custom_object(
group=group,
version=version,
namespace=ns,
plural=plural
)
return response.get('items', [])
except ApiException as e:
if e.status == 404:
return []
print(f"⚠️ Warning: Failed to get {resource_type}: {e}")
return []
def check_resource_health(self, resource: Dict, resource_type: str) -> Dict[str, Any]:
"""Check resource health and reconciliation status."""
name = resource['metadata']['name']
namespace = resource['metadata']['namespace']
status = resource.get('status', {})
# Get conditions
conditions = status.get('conditions', [])
ready_condition = next((c for c in conditions if c['type'] == 'Ready'), None)
result = {
'type': resource_type,
'name': name,
'namespace': namespace,
'ready': ready_condition.get('status', 'Unknown') if ready_condition else 'Unknown',
'message': ready_condition.get('message', '') if ready_condition else '',
'last_reconcile': status.get('lastHandledReconcileAt', 'N/A'),
'issues': [],
'recommendations': []
}
# Check if ready
if result['ready'] != 'True':
result['issues'].append(f"{resource_type} is not ready")
if result['message']:
result['issues'].append(f"Message: {result['message']}")
# Type-specific checks
if resource_type == 'gitrepositories':
self._check_git_repository(resource, result)
elif resource_type == 'ocirepositories':
self._check_oci_repository(resource, result)
elif resource_type == 'kustomizations':
self._check_kustomization(resource, result)
elif resource_type == 'helmreleases':
self._check_helm_release(resource, result)
elif resource_type == 'imageupdateautomations':
self._check_image_automation(resource, result)
return result
def _check_git_repository(self, resource: Dict, result: Dict):
"""Check GitRepository-specific issues."""
status = resource.get('status', {})
# Check artifact
if not status.get('artifact'):
result['issues'].append("No artifact available")
result['recommendations'].append("Check repository URL and credentials")
result['recommendations'].append(f"flux reconcile source git {result['name']} -n {result['namespace']}")
# Check for auth errors
if 'authentication' in result['message'].lower() or 'credentials' in result['message'].lower():
result['recommendations'].append("Check Git credentials secret")
result['recommendations'].append(f"kubectl get secret -n {result['namespace']}")
def _check_oci_repository(self, resource: Dict, result: Dict):
"""Check OCIRepository-specific issues (Flux v2.6+ feature)."""
status = resource.get('status', {})
# Check artifact
if not status.get('artifact'):
result['issues'].append("No OCI artifact available")
result['recommendations'].append("Check OCI repository URL and credentials")
result['recommendations'].append("Verify OCI artifact exists in registry")
# Check signature verification (Flux v2.7+)
spec = resource.get('spec', {})
if spec.get('verify'):
verify_status = status.get('observedGeneration')
if not verify_status:
result['issues'].append("Signature verification configured but not completed")
result['recommendations'].append("Check cosign or notation configuration")
def _check_kustomization(self, resource: Dict, result: Dict):
"""Check Kustomization-specific issues."""
status = resource.get('status', {})
# Check source reference
spec = resource.get('spec', {})
source_ref = spec.get('sourceRef', {})
if not source_ref:
result['issues'].append("No source reference configured")
# Check inventory
inventory = status.get('inventory')
if inventory and 'entries' in inventory:
total_resources = len(inventory['entries'])
result['recommendations'].append(f"Managing {total_resources} resources")
# Check for prune errors
if 'prune' in result['message'].lower():
result['recommendations'].append("Check for resources blocking pruning")
result['recommendations'].append("Review finalizers on deleted resources")
def _check_helm_release(self, resource: Dict, result: Dict):
"""Check HelmRelease-specific issues."""
status = resource.get('status', {})
# Check install/upgrade status
install_failures = status.get('installFailures', 0)
upgrade_failures = status.get('upgradeFailures', 0)
if install_failures > 0:
result['issues'].append(f"Install failed {install_failures} times")
result['recommendations'].append("Check Helm values and chart compatibility")
if upgrade_failures > 0:
result['issues'].append(f"Upgrade failed {upgrade_failures} times")
result['recommendations'].append("Review Helm upgrade logs")
result['recommendations'].append(f"kubectl logs -n {result['namespace']} -l app=helm-controller")
# Check for timeout issues
if 'timeout' in result['message'].lower():
result['recommendations'].append("Increase timeout in HelmRelease spec")
result['recommendations'].append("Check pod startup times and readiness probes")
def _check_image_automation(self, resource: Dict, result: Dict):
"""Check ImageUpdateAutomation-specific issues (Flux v2.7+ GA)."""
status = resource.get('status', {})
# Check last automation time
last_automation = status.get('lastAutomationRunTime')
if not last_automation:
result['issues'].append("No automation runs recorded")
result['recommendations'].append("Check ImagePolicy and git write access")
def check_flux_controllers(self) -> List[Dict]:
"""Check health of Flux controller pods."""
results = []
controller_labels = [
'source-controller',
'kustomize-controller',
'helm-controller',
'notification-controller',
'image-reflector-controller',
'image-automation-controller',
]
for controller in controller_labels:
try:
pods = self.core_api.list_namespaced_pod(
namespace=self.namespace,
label_selector=f'app={controller}'
)
if not pods.items:
results.append({
'controller': controller,
'status': 'Not Found',
'issues': [f'{controller} not found'],
'recommendations': ['Check Flux installation']
})
continue
pod = pods.items[0]
pod_status = pod.status.phase
result = {
'controller': controller,
'status': pod_status,
'issues': [],
'recommendations': []
}
if pod_status != 'Running':
result['issues'].append(f'Controller not running (status: {pod_status})')
result['recommendations'].append(f'kubectl describe pod -n {self.namespace} -l app={controller}')
result['recommendations'].append(f'kubectl logs -n {self.namespace} -l app={controller}')
# Check container restarts
for container_status in pod.status.container_statuses or []:
if container_status.restart_count > 5:
result['issues'].append(f'High restart count: {container_status.restart_count}')
result['recommendations'].append('Check controller logs for crash loops')
results.append(result)
except ApiException as e:
results.append({
'controller': controller,
'status': 'Error',
'issues': [f'Failed to check: {e}'],
'recommendations': []
})
return results
def print_summary(self, resource_results: List[Dict], controller_results: List[Dict]):
"""Print summary of Flux health."""
# Controller health
print("\n🎛️ Flux Controllers:\n")
if tabulate:
controller_table = []
for r in controller_results:
status_icon = "" if r['status'] == 'Running' and not r['issues'] else ""
controller_table.append([
status_icon,
r['controller'],
r['status'],
len(r['issues'])
])
print(tabulate(
controller_table,
headers=['', 'Controller', 'Status', 'Issues'],
tablefmt='simple'
))
else:
for r in controller_results:
status_icon = "" if r['status'] == 'Running' and not r['issues'] else ""
print(f"{status_icon} {r['controller']}: {r['status']} ({len(r['issues'])} issues)")
# Resource health
if resource_results:
print("\n📦 Flux Resources:\n")
if tabulate:
resource_table = []
for r in resource_results:
status_icon = "" if r['ready'] == 'True' and not r['issues'] else ""
resource_table.append([
status_icon,
r['type'],
r['name'],
r['namespace'],
r['ready'],
len(r['issues'])
])
print(tabulate(
resource_table,
headers=['', 'Type', 'Name', 'Namespace', 'Ready', 'Issues'],
tablefmt='simple'
))
else:
for r in resource_results:
status_icon = "" if r['ready'] == 'True' and not r['issues'] else ""
print(f"{status_icon} {r['type']}/{r['name']}: {r['ready']} ({len(r['issues'])} issues)")
# Detailed issues
all_results = controller_results + resource_results
issues_found = [r for r in all_results if r.get('issues')]
if issues_found:
print("\n🔍 Detailed Issues:\n")
for r in issues_found:
print(f"{r.get('controller') or r.get('type')}/{r.get('name', 'N/A')}:")
for issue in r['issues']:
print(f"{issue}")
if r.get('recommendations'):
print(" Recommendations:")
for rec in r['recommendations']:
print(f"{rec}")
print()
else:
print("\n✅ No issues found!")
def main():
parser = argparse.ArgumentParser(
description='Check Flux CD health and diagnose reconciliation issues (Flux v2.7+ compatible)',
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
# Check Flux controllers and all resources
python3 check_flux_health.py
# Check specific namespace
python3 check_flux_health.py --namespace my-app
# Check only GitRepositories
python3 check_flux_health.py --type gitrepositories
# Check OCI repositories (Flux v2.6+)
python3 check_flux_health.py --type ocirepositories
# Output as JSON
python3 check_flux_health.py --json
Flux v2.7+ Features:
- OCI artifact support (GA in v2.6)
- Image automation (GA in v2.7)
- Source-watcher component
- OpenTelemetry tracing
"""
)
parser.add_argument('--namespace', default='flux-system', help='Flux namespace (default: flux-system)')
parser.add_argument('--type', help='Check specific resource type only')
parser.add_argument('--kubeconfig', help='Path to kubeconfig file')
parser.add_argument('--json', action='store_true', help='Output as JSON')
args = parser.parse_args()
try:
checker = FluxHealthChecker(namespace=args.namespace, kubeconfig=args.kubeconfig)
# Check controllers
controller_results = checker.check_flux_controllers()
# Check resources
resource_results = []
resource_types = [args.type] if args.type else [
'gitrepositories',
'ocirepositories',
'helmrepositories',
'kustomizations',
'helmreleases',
'imageupdateautomations',
]
for resource_type in resource_types:
resources = checker.get_flux_resources(resource_type)
for resource in resources:
result = checker.check_resource_health(resource, resource_type)
resource_results.append(result)
if args.json:
print(json.dumps({
'controllers': controller_results,
'resources': resource_results
}, indent=2))
else:
checker.print_summary(resource_results, controller_results)
except KeyboardInterrupt:
print("\n\nInterrupted by user")
sys.exit(1)
except Exception as e:
print(f"❌ Error: {e}")
import traceback
traceback.print_exc()
sys.exit(1)
if __name__ == '__main__':
main()

View File

@@ -0,0 +1,150 @@
#!/usr/bin/env python3
"""
Validate Flux OCI artifact references and verify signatures.
Supports Flux v2.6+ OCI artifacts with cosign/notation verification.
"""
import argparse
import sys
import subprocess
import json
try:
from kubernetes import client, config
except ImportError:
print("⚠️ 'kubernetes' not found. Install with: pip install kubernetes")
sys.exit(1)
def check_oci_repository(name: str, namespace: str = 'flux-system'):
"""Check OCIRepository resource status."""
try:
config.load_kube_config()
api = client.CustomObjectsApi()
oci_repo = api.get_namespaced_custom_object(
group='source.toolkit.fluxcd.io',
version='v1beta2',
namespace=namespace,
plural='ocirepositories',
name=name
)
status = oci_repo.get('status', {})
conditions = status.get('conditions', [])
ready = next((c for c in conditions if c['type'] == 'Ready'), None)
print(f"📦 OCIRepository: {name}")
print(f" Ready: {ready.get('status') if ready else 'Unknown'}")
print(f" Message: {ready.get('message', 'N/A') if ready else 'N/A'}")
# Check artifact
artifact = status.get('artifact')
if artifact:
print(f" Artifact: {artifact.get('revision', 'N/A')}")
print(f" Digest: {artifact.get('digest', 'N/A')}")
else:
print(" ⚠️ No artifact available")
# Check verification
spec = oci_repo.get('spec', {})
if spec.get('verify'):
print(" ✓ Signature verification enabled")
provider = spec['verify'].get('provider', 'cosign')
print(f" Provider: {provider}")
else:
print(" ⚠️ No signature verification")
return ready.get('status') == 'True' if ready else False
except Exception as e:
print(f"❌ Error checking OCIRepository: {e}")
return False
def verify_oci_artifact(image: str, provider: str = 'cosign'):
"""Verify OCI artifact signature."""
print(f"\n🔐 Verifying {image} with {provider}...\n")
if provider == 'cosign':
try:
result = subprocess.run(
['cosign', 'verify', image],
capture_output=True,
text=True
)
if result.returncode == 0:
print("✅ Signature verification successful")
return True
else:
print(f"❌ Verification failed: {result.stderr}")
return False
except FileNotFoundError:
print("⚠️ cosign not found. Install: https://github.com/sigstore/cosign")
return False
elif provider == 'notation':
try:
result = subprocess.run(
['notation', 'verify', image],
capture_output=True,
text=True
)
if result.returncode == 0:
print("✅ Signature verification successful")
return True
else:
print(f"❌ Verification failed: {result.stderr}")
return False
except FileNotFoundError:
print("⚠️ notation not found. Install: https://notaryproject.dev")
return False
def main():
parser = argparse.ArgumentParser(
description='Validate Flux OCI artifacts and verify signatures',
epilog="""
Examples:
# Check OCIRepository status
python3 oci_artifact_checker.py --name my-app-oci --namespace flux-system
# Verify OCI artifact signature with cosign
python3 oci_artifact_checker.py --verify ghcr.io/org/app:v1.0.0
# Verify with notation
python3 oci_artifact_checker.py --verify myregistry.io/app:latest --provider notation
Requirements:
- kubectl configured for cluster access
- cosign (for signature verification)
- notation (for notation verification)
Flux v2.6+ OCI Features:
- OCIRepository for Helm charts and Kustomize overlays
- Signature verification with cosign or notation
- Digest pinning for immutability
"""
)
parser.add_argument('--name', help='OCIRepository name')
parser.add_argument('--namespace', default='flux-system', help='Namespace')
parser.add_argument('--verify', help='OCI image to verify')
parser.add_argument('--provider', choices=['cosign', 'notation'], default='cosign',
help='Verification provider')
args = parser.parse_args()
if args.name:
check_oci_repository(args.name, args.namespace)
if args.verify:
verify_oci_artifact(args.verify, args.provider)
if not args.name and not args.verify:
print("❌ Specify --name or --verify")
sys.exit(1)
if __name__ == '__main__':
main()

View File

@@ -0,0 +1,88 @@
#!/usr/bin/env python3
"""
Validate environment promotion workflows (dev → staging → production).
Checks that changes are promoted through environments in the correct order.
"""
import argparse
import sys
import subprocess
from pathlib import Path
def get_git_diff(ref1: str, ref2: str, path: str = ".") -> str:
"""Get git diff between two refs."""
try:
result = subprocess.run(
['git', 'diff', f'{ref1}...{ref2}', '--', path],
capture_output=True,
text=True,
check=True
)
return result.stdout
except subprocess.CalledProcessError as e:
print(f"❌ Git diff failed: {e}")
sys.exit(1)
def validate_promotion(source_env: str, target_env: str, repo_path: str):
"""Validate that changes exist in source before promoting to target."""
print(f"🔍 Validating promotion: {source_env}{target_env}\n")
# Check that source and target directories exist
source_path = Path(repo_path) / f"environments/{source_env}"
target_path = Path(repo_path) / f"environments/{target_env}"
if not source_path.exists():
print(f"❌ Source environment not found: {source_path}")
sys.exit(1)
if not target_path.exists():
print(f"❌ Target environment not found: {target_path}")
sys.exit(1)
# Check git history - target should not have changes that source doesn't have
diff = get_git_diff('HEAD~10', 'HEAD', str(target_path))
if diff and source_env == 'dev':
# If there are recent changes to target (prod/staging) check they came from source
print("⚠️ Recent changes detected in target environment")
print(" Verify changes were promoted from dev/staging first")
print("✅ Promotion path is valid")
print(f"\nNext steps:")
print(f"1. Review changes in {source_env}")
print(f"2. Test in {source_env} environment")
print(f"3. Copy changes to {target_env}")
print(f"4. Create PR for {target_env} promotion")
def main():
parser = argparse.ArgumentParser(
description='Validate environment promotion workflows',
epilog="""
Examples:
# Validate dev → staging promotion
python3 promotion_validator.py --source dev --target staging
# Validate staging → production promotion
python3 promotion_validator.py --source staging --target production
Checks:
- Environment directories exist
- Changes flow through proper promotion path
- No direct changes to production
"""
)
parser.add_argument('--source', required=True, help='Source environment (dev/staging)')
parser.add_argument('--target', required=True, help='Target environment (staging/production)')
parser.add_argument('--repo-path', default='.', help='Repository path')
args = parser.parse_args()
validate_promotion(args.source, args.target, args.repo_path)
if __name__ == '__main__':
main()

178
scripts/secret_audit.py Normal file
View File

@@ -0,0 +1,178 @@
#!/usr/bin/env python3
"""
Audit secrets management in GitOps repositories.
Checks for plain secrets, SOPS, Sealed Secrets, and External Secrets Operator.
"""
import argparse
import sys
from pathlib import Path
from typing import List, Dict
try:
import yaml
except ImportError:
print("⚠️ 'pyyaml' not found. Install with: pip install pyyaml")
sys.exit(1)
class SecretAuditor:
def __init__(self, repo_path: str):
self.repo_path = Path(repo_path)
self.findings = []
def audit(self) -> Dict:
"""Run all secret audits."""
print(f"🔐 Auditing secrets in: {self.repo_path}\n")
self._check_plain_secrets()
self._check_sops_config()
self._check_sealed_secrets()
self._check_external_secrets()
return self._generate_report()
def _check_plain_secrets(self):
"""Check for plain Kubernetes secrets."""
secret_files = list(self.repo_path.rglob('*.yaml')) + list(self.repo_path.rglob('*.yml'))
plain_secrets = []
for sfile in secret_files:
if '.git' in sfile.parts:
continue
try:
with open(sfile) as f:
for doc in yaml.safe_load_all(f):
if doc and doc.get('kind') == 'Secret':
# Skip service account tokens
if doc.get('type') == 'kubernetes.io/service-account-token':
continue
# Check if it's encrypted
if 'sops' not in str(doc) and doc.get('kind') != 'SealedSecret':
plain_secrets.append(sfile.relative_to(self.repo_path))
except:
pass
if plain_secrets:
self.findings.append({
'severity': 'HIGH',
'type': 'Plain Secrets',
'count': len(plain_secrets),
'message': f"Found {len(plain_secrets)} plain Kubernetes Secret manifests",
'recommendation': 'Encrypt with SOPS, Sealed Secrets, or use External Secrets Operator',
'files': [str(f) for f in plain_secrets[:5]]
})
else:
print("✅ No plain secrets found in Git")
def _check_sops_config(self):
"""Check SOPS configuration."""
sops_config = self.repo_path / '.sops.yaml'
if sops_config.exists():
print("✅ SOPS config found (.sops.yaml)")
with open(sops_config) as f:
config = yaml.safe_load(f)
# Check for age keys
if 'age' in str(config):
print(" ✓ Using age encryption (recommended)")
elif 'pgp' in str(config):
print(" ⚠️ Using PGP (consider migrating to age)")
self.findings.append({
'severity': 'LOW',
'type': 'SOPS Configuration',
'message': 'Using PGP encryption',
'recommendation': 'Migrate to age for better security and simplicity'
})
else:
encrypted_files = list(self.repo_path.rglob('*.enc.yaml'))
if encrypted_files:
print("⚠️ SOPS encrypted files found but no .sops.yaml config")
self.findings.append({
'severity': 'MEDIUM',
'type': 'SOPS Configuration',
'message': 'Encrypted files without .sops.yaml',
'recommendation': 'Add .sops.yaml for consistent encryption settings'
})
def _check_sealed_secrets(self):
"""Check Sealed Secrets usage."""
sealed_secrets = list(self.repo_path.rglob('*sealedsecret*.yaml'))
if sealed_secrets:
print(f"✅ Found {len(sealed_secrets)} Sealed Secrets")
def _check_external_secrets(self):
"""Check External Secrets Operator usage."""
eso_files = list(self.repo_path.rglob('*externalsecret*.yaml')) + \
list(self.repo_path.rglob('*secretstore*.yaml'))
if eso_files:
print(f"✅ Found {len(eso_files)} External Secrets manifests")
def _generate_report(self) -> Dict:
"""Generate audit report."""
return {
'findings': self.findings,
'total_issues': len(self.findings),
'high_severity': len([f for f in self.findings if f['severity'] == 'HIGH']),
'medium_severity': len([f for f in self.findings if f['severity'] == 'MEDIUM']),
'low_severity': len([f for f in self.findings if f['severity'] == 'LOW'])
}
def main():
parser = argparse.ArgumentParser(
description='Audit secrets management in GitOps repositories',
epilog="""
Examples:
# Audit current directory
python3 secret_audit.py .
# Audit specific repo
python3 secret_audit.py /path/to/gitops-repo
Checks:
- Plain Kubernetes Secrets in Git (HIGH risk)
- SOPS configuration and encryption method
- Sealed Secrets usage
- External Secrets Operator usage
"""
)
parser.add_argument('repo_path', help='Path to GitOps repository')
args = parser.parse_args()
auditor = SecretAuditor(args.repo_path)
report = auditor.audit()
# Print summary
print("\n" + "="*60)
print("📊 Audit Summary")
print("="*60)
if report['findings']:
print(f"\n🔴 HIGH: {report['high_severity']}")
print(f"🟡 MEDIUM: {report['medium_severity']}")
print(f"🟢 LOW: {report['low_severity']}")
print("\n📋 Findings:\n")
for f in report['findings']:
icon = {'HIGH': '🔴', 'MEDIUM': '🟡', 'LOW': '🟢'}[f['severity']]
print(f"{icon} [{f['severity']}] {f['type']}")
print(f" {f['message']}")
print(f"{f['recommendation']}")
if 'files' in f and f['files']:
print(f" Files: {', '.join(f['files'][:3])}")
print()
else:
print("\n✅ No security issues found!")
sys.exit(1 if report['high_severity'] > 0 else 0)
if __name__ == '__main__':
main()

View File

@@ -0,0 +1,144 @@
#!/usr/bin/env python3
"""
Detect configuration drift between Git and Kubernetes cluster.
Supports both ArgoCD and Flux CD deployments.
"""
import argparse
import sys
import subprocess
import json
from typing import Dict, List, Optional
try:
from kubernetes import client, config
except ImportError:
print("⚠️ 'kubernetes' library not found. Install with: pip install kubernetes")
sys.exit(1)
try:
import yaml
except ImportError:
print("⚠️ 'pyyaml' library not found. Install with: pip install pyyaml")
sys.exit(1)
def run_command(cmd: List[str]) -> tuple:
"""Run shell command and return output."""
try:
result = subprocess.run(cmd, capture_output=True, text=True, check=True)
return result.stdout, None
except subprocess.CalledProcessError as e:
return None, e.stderr
def check_argocd_drift(app_name: Optional[str] = None):
"""Check drift using ArgoCD CLI."""
print("🔍 Checking ArgoCD drift...\n")
cmd = ['argocd', 'app', 'diff']
if app_name:
cmd.append(app_name)
else:
# Get all apps
stdout, err = run_command(['argocd', 'app', 'list', '-o', 'json'])
if err:
print(f"❌ Failed to list apps: {err}")
return
apps = json.loads(stdout)
for app in apps:
app_name = app['metadata']['name']
check_single_app_drift(app_name)
return
check_single_app_drift(app_name)
def check_single_app_drift(app_name: str):
"""Check drift for single ArgoCD application."""
stdout, err = run_command(['argocd', 'app', 'diff', app_name])
if err and 'no differences' not in err.lower():
print(f"{app_name}: Error checking drift")
print(f" {err}")
return
if not stdout or 'no differences' in (stdout + (err or '')).lower():
print(f"{app_name}: No drift detected")
else:
print(f"⚠️ {app_name}: Drift detected")
print(f" Run: argocd app sync {app_name}")
def check_flux_drift(namespace: str = 'flux-system'):
"""Check drift using Flux CLI."""
print("🔍 Checking Flux drift...\n")
# Check kustomizations
stdout, err = run_command(['flux', 'get', 'kustomizations', '-n', namespace, '--status-selector', 'ready=false'])
if stdout:
print("⚠️ Out-of-sync Kustomizations:")
print(stdout)
else:
print("✅ All Kustomizations synced")
# Check helmreleases
stdout, err = run_command(['flux', 'get', 'helmreleases', '-n', namespace, '--status-selector', 'ready=false'])
if stdout:
print("\n⚠️ Out-of-sync HelmReleases:")
print(stdout)
else:
print("✅ All HelmReleases synced")
def main():
parser = argparse.ArgumentParser(
description='Detect configuration drift between Git and cluster',
epilog="""
Examples:
# Check ArgoCD drift
python3 sync_drift_detector.py --argocd
# Check specific ArgoCD app
python3 sync_drift_detector.py --argocd --app my-app
# Check Flux drift
python3 sync_drift_detector.py --flux
Requirements:
- argocd CLI (for ArgoCD mode)
- flux CLI (for Flux mode)
- kubectl configured
"""
)
parser.add_argument('--argocd', action='store_true', help='Check ArgoCD drift')
parser.add_argument('--flux', action='store_true', help='Check Flux drift')
parser.add_argument('--app', help='Specific ArgoCD application name')
parser.add_argument('--namespace', default='flux-system', help='Flux namespace')
args = parser.parse_args()
if not args.argocd and not args.flux:
print("❌ Specify --argocd or --flux")
sys.exit(1)
try:
if args.argocd:
check_argocd_drift(args.app)
if args.flux:
check_flux_drift(args.namespace)
except KeyboardInterrupt:
print("\n\nInterrupted")
sys.exit(1)
except Exception as e:
print(f"❌ Error: {e}")
sys.exit(1)
if __name__ == '__main__':
main()

View File

@@ -0,0 +1,299 @@
#!/usr/bin/env python3
"""
Validate GitOps repository structure, manifests, and best practices.
Supports both monorepo and polyrepo patterns with Kustomize and Helm.
"""
import argparse
import sys
import os
import glob
from typing import Dict, List, Any, Tuple
from pathlib import Path
try:
import yaml
except ImportError:
print("⚠️ Warning: 'pyyaml' library not found. Install with: pip install pyyaml")
sys.exit(1)
class GitOpsRepoValidator:
def __init__(self, repo_path: str):
self.repo_path = Path(repo_path).resolve()
if not self.repo_path.exists():
raise ValueError(f"Path does not exist: {repo_path}")
self.issues = []
self.warnings = []
self.recommendations = []
def validate(self) -> Dict[str, List[str]]:
"""Run all validations."""
print(f"🔍 Validating GitOps repository: {self.repo_path}\n")
# Structure validations
self._check_repository_structure()
self._check_kustomization_files()
self._check_yaml_syntax()
self._check_best_practices()
self._check_secrets_management()
return {
'issues': self.issues,
'warnings': self.warnings,
'recommendations': self.recommendations
}
def _check_repository_structure(self):
"""Check repository structure and organization."""
print("📁 Checking repository structure...")
# Check for common patterns
has_apps = (self.repo_path / 'apps').exists()
has_clusters = (self.repo_path / 'clusters').exists()
has_infrastructure = (self.repo_path / 'infrastructure').exists()
has_base = (self.repo_path / 'base').exists()
has_overlays = (self.repo_path / 'overlays').exists()
if not any([has_apps, has_clusters, has_infrastructure, has_base]):
self.warnings.append("No standard directory structure detected (apps/, clusters/, infrastructure/, base/)")
self.recommendations.append("Consider organizing with: apps/ (applications), infrastructure/ (cluster config), clusters/ (per-cluster)")
# Check for Flux bootstrap (if Flux)
flux_system = self.repo_path / 'clusters' / 'flux-system'
if flux_system.exists():
print(" ✓ Flux bootstrap detected")
if not (flux_system / 'gotk-components.yaml').exists():
self.warnings.append("Flux bootstrap directory exists but gotk-components.yaml not found")
# Check for ArgoCD bootstrap (if ArgoCD)
argocd_patterns = list(self.repo_path.rglob('*argocd-*.yaml'))
if argocd_patterns:
print(" ✓ ArgoCD manifests detected")
def _check_kustomization_files(self):
"""Check Kustomization files for validity."""
print("\n🔧 Checking Kustomization files...")
kustomization_files = list(self.repo_path.rglob('kustomization.yaml')) + \
list(self.repo_path.rglob('kustomization.yml'))
if not kustomization_files:
self.warnings.append("No kustomization.yaml files found")
return
print(f" Found {len(kustomization_files)} kustomization files")
for kfile in kustomization_files:
try:
with open(kfile, 'r') as f:
content = yaml.safe_load(f)
if not content:
self.issues.append(f"Empty kustomization file: {kfile.relative_to(self.repo_path)}")
continue
# Check for required fields
if 'resources' not in content and 'bases' not in content and 'components' not in content:
self.warnings.append(f"Kustomization has no resources/bases: {kfile.relative_to(self.repo_path)}")
# Check for deprecated 'bases' (Kustomize 5.7+)
if 'bases' in content:
self.warnings.append(f"Using deprecated 'bases' field: {kfile.relative_to(self.repo_path)}")
self.recommendations.append("Migrate 'bases:' to 'resources:' (Kustomize 5.0+)")
except yaml.YAMLError as e:
self.issues.append(f"Invalid YAML in {kfile.relative_to(self.repo_path)}: {e}")
except Exception as e:
self.issues.append(f"Error reading {kfile.relative_to(self.repo_path)}: {e}")
def _check_yaml_syntax(self):
"""Check YAML files for syntax errors."""
print("\n📝 Checking YAML syntax...")
yaml_files = list(self.repo_path.rglob('*.yaml')) + list(self.repo_path.rglob('*.yml'))
# Exclude certain directories
exclude_dirs = {'.git', 'node_modules', 'vendor', '.github'}
yaml_files = [f for f in yaml_files if not any(ex in f.parts for ex in exclude_dirs)]
syntax_errors = 0
for yfile in yaml_files:
try:
with open(yfile, 'r') as f:
yaml.safe_load_all(f)
except yaml.YAMLError as e:
self.issues.append(f"YAML syntax error in {yfile.relative_to(self.repo_path)}: {e}")
syntax_errors += 1
if syntax_errors == 0:
print(f" ✓ All {len(yaml_files)} YAML files are valid")
else:
print(f"{syntax_errors} YAML files have syntax errors")
def _check_best_practices(self):
"""Check GitOps best practices."""
print("\n✨ Checking best practices...")
# Check for namespace definitions
namespace_files = list(self.repo_path.rglob('*namespace*.yaml'))
if not namespace_files:
self.recommendations.append("No namespace definitions found. Consider explicitly defining namespaces.")
# Check for image tags (not 'latest')
all_yamls = list(self.repo_path.rglob('*.yaml')) + list(self.repo_path.rglob('*.yml'))
latest_tag_count = 0
for yfile in all_yamls:
try:
with open(yfile, 'r') as f:
content = f.read()
if ':latest' in content or 'image: latest' in content:
latest_tag_count += 1
except:
pass
if latest_tag_count > 0:
self.warnings.append(f"Found {latest_tag_count} files using ':latest' image tag")
self.recommendations.append("Pin image tags to specific versions or digests for reproducibility")
# Check for resource limits
deployment_files = [f for f in all_yamls if 'deployment' in str(f).lower() or 'statefulset' in str(f).lower()]
missing_limits = 0
for dfile in deployment_files:
try:
with open(dfile, 'r') as f:
content = yaml.safe_load_all(f)
for doc in content:
if not doc or doc.get('kind') not in ['Deployment', 'StatefulSet']:
continue
containers = doc.get('spec', {}).get('template', {}).get('spec', {}).get('containers', [])
for container in containers:
if 'resources' not in container or 'limits' not in container.get('resources', {}):
missing_limits += 1
break
except:
pass
if missing_limits > 0:
self.recommendations.append(f"{missing_limits} Deployments/StatefulSets missing resource limits")
def _check_secrets_management(self):
"""Check for secrets management practices."""
print("\n🔐 Checking secrets management...")
# Check for plain Kubernetes secrets
secret_files = list(self.repo_path.rglob('*secret*.yaml'))
plain_secrets = []
for sfile in secret_files:
try:
with open(sfile, 'r') as f:
for doc in yaml.safe_load_all(f):
if doc and doc.get('kind') == 'Secret' and doc.get('type') != 'kubernetes.io/service-account-token':
# Check if it's a SealedSecret or ExternalSecret
if doc.get('kind') not in ['SealedSecret'] and 'external-secrets.io' not in doc.get('apiVersion', ''):
plain_secrets.append(sfile.relative_to(self.repo_path))
except:
pass
if plain_secrets:
self.issues.append(f"Found {len(plain_secrets)} plain Kubernetes Secret manifests in Git")
self.recommendations.append("Use Sealed Secrets, External Secrets Operator, or SOPS for secrets management")
for s in plain_secrets[:3]: # Show first 3
self.issues.append(f" - {s}")
# Check for SOPS configuration
sops_config = self.repo_path / '.sops.yaml'
if sops_config.exists():
print(" ✓ SOPS configuration found (.sops.yaml)")
# Check for Sealed Secrets
sealed_secrets = list(self.repo_path.rglob('*sealedsecret*.yaml'))
if sealed_secrets:
print(f" ✓ Found {len(sealed_secrets)} SealedSecret manifests")
# Check for External Secrets
external_secrets = [f for f in self.repo_path.rglob('*.yaml')
if 'externalsecret' in str(f).lower() or 'secretstore' in str(f).lower()]
if external_secrets:
print(f" ✓ Found {len(external_secrets)} External Secrets manifests")
if not sops_config.exists() and not sealed_secrets and not external_secrets and plain_secrets:
self.recommendations.append("No secrets management solution detected. Consider implementing Sealed Secrets, ESO, or SOPS+age")
def main():
parser = argparse.ArgumentParser(
description='Validate GitOps repository structure and manifests',
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
# Validate current directory
python3 validate_gitops_repo.py .
# Validate specific repository
python3 validate_gitops_repo.py /path/to/gitops-repo
# Show only issues (no warnings)
python3 validate_gitops_repo.py . --errors-only
Checks:
- Repository structure (monorepo/polyrepo patterns)
- Kustomization file validity
- YAML syntax errors
- Best practices (image tags, resource limits, namespaces)
- Secrets management (detect plain secrets, check for SOPS/Sealed Secrets/ESO)
"""
)
parser.add_argument('repo_path', help='Path to GitOps repository')
parser.add_argument('--errors-only', action='store_true', help='Show only errors, not warnings')
args = parser.parse_args()
try:
validator = GitOpsRepoValidator(args.repo_path)
results = validator.validate()
# Print summary
print("\n" + "="*60)
print("📊 Validation Summary")
print("="*60)
if results['issues']:
print(f"\n❌ Issues ({len(results['issues'])}):")
for issue in results['issues']:
print(f"{issue}")
if results['warnings'] and not args.errors_only:
print(f"\n⚠️ Warnings ({len(results['warnings'])}):")
for warning in results['warnings']:
print(f"{warning}")
if results['recommendations'] and not args.errors_only:
print(f"\n💡 Recommendations ({len(results['recommendations'])}):")
for rec in results['recommendations']:
print(f"{rec}")
if not results['issues'] and not results['warnings']:
print("\n✅ No issues found! Repository structure looks good.")
# Exit code
sys.exit(1 if results['issues'] else 0)
except KeyboardInterrupt:
print("\n\nInterrupted by user")
sys.exit(1)
except Exception as e:
print(f"❌ Error: {e}")
import traceback
traceback.print_exc()
sys.exit(1)
if __name__ == '__main__':
main()