Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 17:51:15 +08:00
commit a91d4d5a1c
25 changed files with 4094 additions and 0 deletions

View File

@@ -0,0 +1,243 @@
# ArgoCD vs Flux: Comprehensive Comparison (2024-2025)
## Current Versions (October 2025)
- **ArgoCD**: v3.1.9 (stable), v3.2.0-rc4 (release candidate)
- **Flux**: v2.7.1 (latest)
## Quick Decision Matrix
| Criteria | Choose ArgoCD | Choose Flux |
|----------|---------------|-------------|
| **Primary Focus** | Developer experience, UI | Platform engineering, modularity |
| **Team Size** | Medium-large teams | Small teams, platform engineers |
| **UI Required** | Yes | No (CLI-driven) |
| **Complexity** | Simpler onboarding | Steeper learning curve |
| **Customization** | Less modular | Highly modular |
| **Multi-tenancy** | Built-in with Projects | Manual configuration |
| **Best For** | Application teams, demos | Infrastructure teams, advanced users |
## Key Differences
### Architecture
**ArgoCD**:
- Monolithic design with integrated components
- Web UI, API server, application controller in one system
- Centralized control plane
**Flux**:
- Modular microservices architecture
- Separate controllers: source, kustomize, helm, notification, image-automation
- Distributed reconciliation
### User Experience
**ArgoCD**:
- Rich web UI for visualization and management
- GUI dashboard for deployment, syncing, troubleshooting
- Easier onboarding for developers
- Better for demos and presentations
**Flux**:
- CLI-driven (flux CLI + kubectl)
- No built-in UI (can integrate with Weave GitOps UI separately)
- Requires comfort with command-line tools
- Steeper learning curve
### Application Management
**ArgoCD 3.x**:
- Application and ApplicationSet CRDs
- App-of-apps pattern for organizing applications
- Fine-grained RBAC (new in v3.0)
- Annotation-based tracking (default in v3.0, changed from labels)
**Flux 2.7**:
- Kustomization and HelmRelease CRDs
- No built-in grouping mechanism
- RBAC through Kubernetes RBAC
- Label-based tracking
### Multi-Cluster Support
**ArgoCD ApplicationSets**:
- Cluster generator for auto-discovery
- Matrix generator for cluster x app combinations
- Hub-and-spoke pattern (one ArgoCD manages multiple clusters)
- 83% faster deployments vs manual (30min → 5min)
**Flux Multi-Tenancy**:
- Manual cluster configuration
- Separate Flux installations per cluster or shared
- More flexible but requires more setup
- No built-in cluster generator
### Secrets Management
Both support:
- Sealed Secrets
- External Secrets Operator
- SOPS
**ArgoCD 3.0 Change**:
- Now explicitly endorses secrets operators
- Cautions against config management plugins for secrets
- Better integration with ESO
**Flux**:
- Native SOPS integration with age encryption
- Decryption happens in-cluster
- .sops.yaml configuration support
### Progressive Delivery
**ArgoCD + Argo Rollouts**:
- Separate project but tight integration
- Rich UI for visualizing rollouts
- Supports canary, blue-green, A/B testing
- Metric analysis with Prometheus, Datadog, etc.
**Flux + Flagger**:
- Flagger as companion project
- CLI-driven
- Supports canary, blue-green, A/B testing
- Metric analysis with Prometheus, Datadog, etc.
## Feature Comparison
| Feature | ArgoCD 3.x | Flux 2.7 |
|---------|-----------|----------|
| **Web UI** | ✅ Built-in | ❌ (3rd party available) |
| **CLI** | ✅ argocd | ✅ flux |
| **Git Sources** | ✅ | ✅ |
| **OCI Artifacts** | ❌ | ✅ (GA in v2.6) |
| **Helm Support** | ✅ | ✅ |
| **Kustomize** | ✅ (v5.7.0) | ✅ (v5.7.0) |
| **Multi-tenancy** | ✅ Projects | Manual |
| **Image Automation** | ⚠️ Via Image Updater | ✅ GA in v2.7 |
| **Notifications** | ✅ | ✅ |
| **RBAC** | ✅ Fine-grained (v3.0) | Kubernetes RBAC |
| **Progressive Delivery** | Argo Rollouts | Flagger |
| **Signature Verification** | ⚠️ Limited | ✅ cosign/notation |
## Performance & Scale
**ArgoCD**:
- Can manage 1000+ applications per instance
- Better defaults in v3.0 (resource exclusions reduce API load)
- ApplicationSets reduce management overhead
**Flux**:
- Lighter resource footprint
- Better for large-scale monorepos
- Source-watcher (v2.7) improves reconciliation efficiency
## Community & Support
**ArgoCD**:
- CNCF Graduated project
- Large community, many contributors
- Akuity offers commercial support
- Annual ArgoCon conference
**Flux**:
- CNCF Graduated project
- Weaveworks shutdown (Feb 2024) but project remains strong
- Grafana Labs offers Grafana Cloud integration
- GitOpsCon events
## Version 3.0 Changes (ArgoCD)
**Breaking Changes**:
- Annotation-based tracking (default, was labels)
- RBAC logs enforcement (no longer optional)
- Removed legacy metrics (argocd_app_sync_status, etc.)
**New Features**:
- Fine-grained RBAC (per-resource permissions)
- Better defaults (resource exclusions for high-churn objects)
- Secrets operators endorsement
## Version 2.7 Changes (Flux)
**New Features**:
- Image automation GA
- ExternalArtifact and ArtifactGenerator APIs
- Source-watcher component
- OpenTelemetry tracing support
- CEL expressions for readiness
## Migration Considerations
### From ArgoCD → Flux
**Pros**:
- Lower resource consumption
- More modular architecture
- Better OCI support
- Native SOPS integration
**Cons**:
- Lose web UI
- More complex setup
- Manual multi-tenancy
**Effort**: Medium-High (2-4 weeks for large deployment)
### From Flux → ArgoCD
**Pros**:
- Gain web UI
- Easier multi-tenancy
- ApplicationSets for multi-cluster
- Better for teams new to GitOps
**Cons**:
- Higher resource consumption
- Less modular
- Limited OCI support
**Effort**: Medium (1-3 weeks)
## Recommendations by Use Case
### Choose ArgoCD if:
- ✅ Developer teams need visibility (UI required)
- ✅ Managing dozens of applications across teams
- ✅ Multi-tenancy with Projects model
- ✅ Fast onboarding is priority
- ✅ Need built-in RBAC with fine-grained control
### Choose Flux if:
- ✅ Platform engineering focus
- ✅ Infrastructure-as-code emphasis
- ✅ Using OCI artifacts extensively
- ✅ Want modular, composable architecture
- ✅ Team comfortable with CLI tools
- ✅ SOPS+age encryption requirement
### Use Both if:
- Different teams have different needs
- ArgoCD for app teams, Flux for infrastructure
- Separate concerns (apps vs infrastructure)
## Cost Considerations
**ArgoCD**:
- Higher memory/CPU usage (~500MB-1GB per instance)
- Commercial support available (Akuity)
**Flux**:
- Lower resource footprint (~200-400MB total)
- Grafana Cloud integration available
## Conclusion
**2024-2025 Recommendation**:
- **For most organizations**: Start with ArgoCD for ease of use
- **For platform teams**: Flux offers more control and modularity
- **For enterprises**: Consider ArgoCD for UI + Flux for infrastructure
- Both are production-ready CNCF Graduated projects
The choice depends more on team preferences and workflows than technical capability.

View File

@@ -0,0 +1,160 @@
# GitOps Best Practices (2024-2025)
## CNCF GitOps Principles (OpenGitOps v1.0)
1. **Declarative**: System desired state expressed declaratively
2. **Versioned**: State stored in version control (Git)
3. **Automated**: Changes automatically applied
4. **Continuous Reconciliation**: Software agents ensure desired state
5. **Auditable**: All changes tracked in Git history
## Repository Organization
**DO**:
- Separate infrastructure from applications
- Use clear directory structure (apps/, infrastructure/, clusters/)
- Implement environment promotion (dev → staging → prod)
- Use Kustomize overlays for environment differences
**DON'T**:
- Commit secrets to Git (use SOPS/Sealed Secrets/ESO)
- Use `:latest` image tags (pin to specific versions)
- Make manual cluster changes (everything through Git)
- Skip testing in lower environments
## Security Best Practices
1. **Secrets**: Never plain text, use encryption or external stores
2. **RBAC**: Least privilege for GitOps controllers
3. **Image Security**: Pin to digests, scan for vulnerabilities
4. **Network Policies**: Restrict controller traffic
5. **Audit**: Enable audit logging
## ArgoCD 3.x Specific
**Fine-Grained RBAC** (new in 3.0):
```yaml
p, role:dev, applications, *, dev/*, allow
p, role:dev, applications/resources, *, dev/*/Deployment/*, allow
```
**Resource Exclusions** (default in 3.0):
- Reduces API load
- Excludes high-churn resources (Endpoints, Leases)
**Annotation Tracking** (default):
- More reliable than labels
- Auto-migrates on sync
## Flux 2.7 Specific
**OCI Artifacts** (GA in 2.6):
- Prefer OCI over Git for generated configs
- Use digest pinning for immutability
- Sign artifacts with cosign/notation
**Image Automation** (GA in 2.7):
- Automated image updates
- GitRepository write-back
**Source-Watcher** (new in 2.7):
- Improves reconciliation efficiency
- Enable with: `--components-extra=source-watcher`
## CI/CD Integration
**Git Workflow**:
```
1. Developer commits to feature branch
2. CI runs tests, builds image
3. CI updates Git manifest with new image tag
4. Developer creates PR to main
5. GitOps controller syncs after merge
```
**Don't**: Deploy directly from CI to cluster (breaks GitOps)
**Do**: Update Git from CI, let GitOps deploy
## Monitoring & Observability
**Track**:
- Sync success rate
- Reconciliation time
- Drift detection frequency
- Failed syncs/reconciliations
**Tools**:
- Prometheus metrics (both ArgoCD and Flux)
- Grafana dashboards
- Alert on sync failures
## Image Management
**Good**:
```yaml
image: myapp:v1.2.3
image: myapp@sha256:abc123...
```
**Bad**:
```yaml
image: myapp:latest
image: myapp:dev
```
**Strategy**: Semantic versioning + digest pinning
## Environment Promotion
**Recommended Flow**:
```
Dev (auto-sync) → Staging (auto-sync) → Production (manual approval)
```
**Implementation**:
- Separate directories or repos per environment
- PR-based promotion
- Automated tests before promotion
- Manual approval for production
## Disaster Recovery
1. **Git is Source of Truth**: Cluster can be rebuilt from Git
2. **Backup**: Git repo + cluster state
3. **Test Recovery**: Practice cluster rebuild
4. **Document Bootstrap**: How to restore from scratch
## Performance Optimization
**ArgoCD**:
- Use ApplicationSets for multi-cluster
- Enable resource exclusions (3.x default)
- Server-side diff for large apps
**Flux**:
- Use OCI artifacts for large repos
- Enable source-watcher (2.7)
- Tune reconciliation intervals
## Common Anti-Patterns to Avoid
1. **Manual kubectl apply**: Bypasses GitOps, creates drift
2. **Multiple sources of truth**: Git should be only source
3. **Secrets in Git**: Always encrypt
4. **Direct cluster modifications**: All changes through Git
5. **No testing**: Always test in dev/staging first
6. **Missing RBAC**: Controllers need minimal permissions
## 2025 Trends
**Adopt**:
- OCI artifacts (Flux)
- Workload identity (no static credentials)
- SOPS + age (over PGP)
- External Secrets Operator (dynamic secrets)
- Multi-cluster with ApplicationSets/Flux
⚠️ **Avoid**:
- Label-based tracking (use annotations - ArgoCD 3.x default)
- PGP encryption (use age)
- Long-lived service account tokens (use workload identity)

View File

@@ -0,0 +1,80 @@
# Multi-Cluster GitOps Management (2024-2025)
## ArgoCD ApplicationSets
**Cluster Generator** (auto-discover clusters):
```yaml
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: my-apps
spec:
generators:
- cluster:
selector:
matchLabels:
environment: production
template:
spec:
source:
repoURL: https://github.com/org/repo
path: apps/{{name}}
destination:
server: '{{server}}'
```
**Matrix Generator** (Cluster x Apps):
```yaml
generators:
- matrix:
generators:
- cluster: {}
- git:
directories:
- path: apps/*
```
**Performance**: 83% faster than manual (30min → 5min)
## Flux Multi-Cluster
**Option 1: Flux Per Cluster**
```
cluster-1/ → Flux instance 1
cluster-2/ → Flux instance 2
```
**Option 2: Hub-and-Spoke**
```
management-cluster/
└── flux manages → cluster-1, cluster-2
```
**Setup**:
```bash
flux bootstrap github --owner=org --repository=fleet \
--path=clusters/production --context=prod-cluster
```
## Hub-and-Spoke Pattern
**Benefits**: Centralized management, single source of truth
**Cons**: Single point of failure
**Best for**: < 50 clusters
## Workload Identity (2025 Best Practice)
**Instead of service account tokens, use**:
- AWS IRSA
- GCP Workload Identity
- Azure AD Workload Identity
No more long-lived credentials!
## Best Practices
1. **Cluster labeling** for organization
2. **Progressive rollout** (dev → staging → prod clusters)
3. **Separate repos** for cluster config vs apps
4. **Monitor sync status** across all clusters
5. **Use workload identity** (no static credentials)

290
references/oci_artifacts.md Normal file
View File

@@ -0,0 +1,290 @@
# OCI Artifacts with Flux (2024-2025)
## Overview
**GA Status**: Flux v2.6 (June 2025)
**Current**: Fully supported in Flux v2.7
OCI artifacts allow storing Kubernetes manifests, Helm charts, and Kustomize overlays in container registries instead of Git.
## Benefits
**Decoupled from Git**: No Git dependency for deployment
**Immutable**: Content-addressable by digest
**Standard**: Uses OCI spec, works with any OCI registry
**Signature Verification**: Native support for cosign/notation
**Performance**: Faster than Git for large repos
## OCIRepository Resource
```yaml
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: OC IRepository
metadata:
name: my-app-oci
namespace: flux-system
spec:
interval: 5m
url: oci://ghcr.io/org/app-config
ref:
tag: v1.0.0
# or digest:
# digest: sha256:abc123...
# or semver:
# semver: ">=1.0.0 <2.0.0"
provider: generic # or azure, aws, gcp
verify:
provider: cosign
secretRef:
name: cosign-public-key
```
## Using with Kustomization
```yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: my-app
spec:
interval: 10m
sourceRef:
kind: OCIRepository
name: my-app-oci
path: ./
prune: true
```
## Using with HelmRelease
**OCIRepository for Helm charts**:
```yaml
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: OCIRepository
metadata:
name: podinfo-oci
spec:
interval: 5m
url: oci://ghcr.io/stefanprodan/charts/podinfo
ref:
semver: ">=6.0.0"
---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: podinfo
spec:
chart:
spec:
chart: podinfo
sourceRef:
kind: OCIRepository
name: podinfo-oci
```
## Publishing OCI Artifacts
**Using flux CLI**:
```bash
# Build and push Kustomize overlay
flux push artifact oci://ghcr.io/org/app-config:v1.0.0 \
--path="./kustomize" \
--source="$(git config --get remote.origin.url)" \
--revision="$(git rev-parse HEAD)"
# Build and push Helm chart
flux push artifact oci://ghcr.io/org/charts/myapp:1.0.0 \
--path="./charts/myapp" \
--source="$(git config --get remote.origin.url)" \
--revision="$(git rev-parse HEAD)"
```
## Signature Verification
### Using cosign
**Sign artifact**:
```bash
cosign sign ghcr.io/org/app-config:v1.0.0
```
**Verify in Flux**:
```yaml
spec:
verify:
provider: cosign
secretRef:
name: cosign-public-key
```
### Using notation
**Sign artifact**:
```bash
notation sign ghcr.io/org/app-config:v1.0.0
```
**Verify in Flux**:
```yaml
spec:
verify:
provider: notation
secretRef:
name: notation-config
```
## Workload Identity
**Instead of static credentials, use cloud provider workload identity**:
**AWS IRSA**:
```yaml
spec:
provider: aws
# No credentials needed - uses pod's IAM role
```
**GCP Workload Identity**:
```yaml
spec:
provider: gcp
# No credentials needed - uses service account binding
```
**Azure Workload Identity**:
```yaml
spec:
provider: azure
# No credentials needed - uses managed identity
```
## Best Practices (2025)
1. **Use digest pinning** for production:
```yaml
ref:
digest: sha256:abc123...
```
2. **Sign all artifacts**:
```bash
flux push artifact ... | cosign sign
```
3. **Use semver for automated updates**:
```yaml
ref:
semver: ">=1.0.0 <2.0.0"
```
4. **Leverage workload identity** (no static credentials)
5. **Prefer OCI for generated configs** (Jsonnet, CUE, Helm output)
## When to Use OCI vs Git
**Use OCI Artifacts when**:
- ✅ Storing generated configurations (Jsonnet, CUE output)
- ✅ Need immutable, content-addressable storage
- ✅ Want signature verification
- ✅ Large repos (performance)
- ✅ Decoupling from Git
**Use Git when**:
- ✅ Source of truth for manifests
- ✅ Need Git workflow (PRs, reviews)
- ✅ Audit trail important
- ✅ Team collaboration
## Common Pattern: Hybrid Approach
```
Git (source of truth)
CI builds/generates manifests
Push to OCI registry (signed)
Flux pulls from OCI (verified)
Deploy to cluster
```
## Migration from Git to OCI
**Before (Git)**:
```yaml
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
name: my-app
spec:
url: https://github.com/org/repo
ref:
branch: main
```
**After (OCI)**:
```yaml
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: OCIRepository
metadata:
name: my-app-oci
spec:
url: oci://ghcr.io/org/app-config
ref:
tag: v1.0.0
```
**Update Kustomization/HelmRelease** sourceRef to point to OCIRepository
## Supported Registries
- ✅ GitHub Container Registry (ghcr.io)
- ✅ Docker Hub
- ✅ AWS ECR
- ✅ Google Artifact Registry
- ✅ Azure Container Registry
- ✅ Harbor
- ✅ GitLab Container Registry
## Troubleshooting
**Artifact not found**:
```bash
flux get sources oci
kubectl describe ocirepository <name>
# Verify artifact exists
crane digest ghcr.io/org/app:v1.0.0
```
**Authentication failures**:
```bash
# Check secret
kubectl get secret -n flux-system
# Test manually
crane manifest ghcr.io/org/app:v1.0.0
```
**Signature verification fails**:
```bash
# Verify locally
cosign verify ghcr.io/org/app:v1.0.0
# Check public key secret
kubectl get secret cosign-public-key -o yaml
```
## 2025 Recommendation
**Adopt OCI artifacts** for:
- Helm charts (already standard)
- Generated manifests (CI output)
- Multi-environment configs
**Keep Git for**:
- Source manifests
- Infrastructure definitions
- Team collaboration workflows

View File

@@ -0,0 +1,94 @@
# Progressive Delivery with GitOps (2024-2025)
## Argo Rollouts (with ArgoCD)
**Current Focus**: Kubernetes-native progressive delivery
**Deployment Strategies**:
### 1. Canary
```yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: my-app
spec:
strategy:
canary:
steps:
- setWeight: 20
- pause: {duration: 5m}
- setWeight: 50
- pause: {duration: 5m}
- setWeight: 100
```
### 2. Blue-Green
```yaml
spec:
strategy:
blueGreen:
activeService: my-app
previewService: my-app-preview
autoPromotionEnabled: false
```
### 3. Analysis with Metrics
```yaml
spec:
strategy:
canary:
analysis:
templates:
- templateName: success-rate
args:
- name: service-name
value: my-app
```
**Metric Providers**: Prometheus, Datadog, New Relic, CloudWatch
## Flagger (with Flux)
**Installation**:
```bash
flux install
kubectl apply -k github.com/fluxcd/flagger//kustomize/linkerd
```
**Canary with Flagger**:
```yaml
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: my-app
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
service:
port: 9898
analysis:
interval: 1m
threshold: 5
maxWeight: 50
stepWeight: 10
metrics:
- name: request-success-rate
thresholdRange:
min: 99
```
## Best Practices
1. **Start with Manual Approval** (autoPromotionEnabled: false)
2. **Monitor Key Metrics** (error rate, latency, saturation)
3. **Set Conservative Steps** (10%, 25%, 50%, 100%)
4. **Define Rollback Criteria** (error rate > 1%)
5. **Test in Staging First**
## 2025 Recommendation
**For ArgoCD users**: Argo Rollouts (tight integration, UI support)
**For Flux users**: Flagger (CNCF project, modular design)

184
references/repo_patterns.md Normal file
View File

@@ -0,0 +1,184 @@
# GitOps Repository Patterns (2024-2025)
## Monorepo vs Polyrepo
### Monorepo Pattern
**Structure**:
```
gitops-repo/
├── apps/
│ ├── frontend/
│ ├── backend/
│ └── database/
├── infrastructure/
│ ├── ingress/
│ ├── monitoring/
│ └── secrets/
└── clusters/
├── dev/
├── staging/
└── production/
```
**Pros**:
- Single source of truth
- Atomic changes across apps
- Easier to start with
- Simpler CI/CD
**Cons**:
- Scaling issues (>100 apps)
- RBAC complexity
- Large repo size
- Blast radius concerns
**Best for**: Startups, small teams (< 20 apps), single team ownership
### Polyrepo Pattern
**Structure**:
```
infrastructure-repo/ (Platform team)
app-team-1-repo/ (Team 1)
app-team-2-repo/ (Team 2)
cluster-config-repo/ (Platform team)
```
**Pros**:
- Clear ownership boundaries
- Better RBAC (repo-level)
- Scales to 100s of apps
- Team autonomy
**Cons**:
- More complex setup
- Cross-repo dependencies
- Multiple CI/CD pipelines
**Best for**: Large orgs, multiple teams, clear separation of concerns
## Common Patterns
### 1. Repo Per Team
- Each team has own repo
- Platform team manages infra repo
- Hub cluster manages all
### 2. Repo Per App
- Each app in separate repo
- Good for microservices
- Maximum autonomy
### 3. Hybrid (Recommended)
- Infrastructure monorepo (platform team)
- Application polyrepo (dev teams)
- Best of both worlds
## App-of-Apps Pattern (ArgoCD)
**Root Application**:
```yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: root
spec:
source:
repoURL: https://github.com/org/gitops
path: apps/
destination:
server: https://kubernetes.default.svc
```
**Apps Directory**:
```
apps/
├── app1.yaml (Application manifest)
├── app2.yaml
└── app3.yaml
```
**Benefits**: Centralized management, single sync point
## Environment Structure
### Option 1: Directory Per Environment
```
apps/
├── base/
│ └── kustomization.yaml
└── overlays/
├── dev/
├── staging/
└── production/
```
### Option 2: Branch Per Environment
```
main branch → production
staging branch → staging
dev branch → development
```
**Don't Repeat YAML**: Use Kustomize bases + overlays
## Flux Repository Organization
**Recommended Structure**:
```
flux-repo/
├── clusters/
│ ├── production/
│ │ ├── flux-system/
│ │ ├── apps.yaml
│ │ └── infrastructure.yaml
│ └── staging/
├── apps/
│ └── podinfo/
│ ├── kustomization.yaml
│ └── release.yaml
└── infrastructure/
└── sources/
├── gitrepositories.yaml
└── ocirepositories.yaml
```
## Kustomize vs Helm in GitOps
**Kustomize** (recommended for GitOps):
- Native Kubernetes
- Declarative patches
- No templating language
**Helm** (when necessary):
- Third-party charts
- Complex applications
- Need parameterization
**Best Practice**: Kustomize for your apps, Helm for third-party
## Promotion Strategies
### 1. Manual PR-based
```
dev/ → (PR) → staging/ → (PR) → production/
```
### 2. Automated with CI
```
dev/ → (auto-promote on tests pass) → staging/ → (manual approval) → production/
```
### 3. Progressive with Canary
```
production/stable/ → canary deployment → production/all/
```
## 2024-2025 Recommendations
1. **Start with monorepo**, migrate to polyrepo when needed
2. **Use Kustomize bases + overlays** (don't repeat YAML)
3. **Separate infrastructure from applications**
4. **Implement promotion workflows** (dev → staging → prod)
5. **Never commit directly to production** (always PR)

View File

@@ -0,0 +1,213 @@
# Secrets Management in GitOps (2024-2025)
## Overview
**Never commit plain secrets to Git.** Use encryption or external secret stores.
## Solutions Comparison
| Solution | Type | Complexity | Best For | 2025 Trend |
|----------|------|------------|----------|------------|
| **Sealed Secrets** | Encrypted in Git | Low | Simple, GitOps-first | Stable |
| **External Secrets Operator** | External store | Medium | Cloud-native, dynamic | ↗️ Growing |
| **SOPS + age** | Encrypted in Git | Medium | Flexible, Git-friendly | ↗️ Preferred over PGP |
## 1. Sealed Secrets
**How it works**: Public key encryption, controller decrypts in-cluster
**Setup**:
```bash
kubectl apply -f https://github.com/bitnami-labs/sealed-secrets/releases/download/v0.24.0/controller.yaml
```
**Usage**:
```bash
# Create sealed secret
kubectl create secret generic my-secret --dry-run=client -o yaml --from-literal=password=supersecret | \
kubeseal -o yaml > sealed-secret.yaml
# Commit to Git
git add sealed-secret.yaml
git commit -m "Add sealed secret"
```
**Pros**: Simple, GitOps-native, no external dependencies
**Cons**: Key rotation complexity, static secrets only
## 2. External Secrets Operator (ESO)
**Latest Version**: v0.20.2 (2024-2025)
**Supported Providers**:
- AWS Secrets Manager
- Azure Key Vault
- Google Secret Manager
- HashiCorp Vault
- 1Password
- Doppler
**Setup**:
```bash
helm install external-secrets external-secrets/external-secrets -n external-secrets-system --create-namespace
```
**Usage**:
```yaml
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: aws-secret-store
spec:
provider:
aws:
service: SecretsManager
region: us-east-1
auth:
jwt:
serviceAccountRef:
name: external-secrets-sa
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: my-secret
spec:
secretStoreRef:
name: aws-secret-store
target:
name: my-app-secret
data:
- secretKey: password
remoteRef:
key: prod/my-app/password
```
**Pros**: Dynamic secrets, cloud-native, automatic rotation
**Cons**: External dependency, requires cloud secret store
**2025 Recommendation**: Growing preference over Sealed Secrets
## 3. SOPS + age
**Recommended over PGP as of 2024-2025**
**Setup age**:
```bash
# Install age
brew install age # macOS
apt install age # Ubuntu
# Generate key
age-keygen -o key.txt
# Public key: age1...
```
**Setup SOPS**:
```bash
# Install SOPS
brew install sops
# Create .sops.yaml
cat <<EOF > .sops.yaml
creation_rules:
- path_regex: .*.yaml
encrypted_regex: ^(data|stringData)$
age: age1ql3z7hjy54pw3hyww5ayyfg7zqgvc7w3j2elw8zmrj2kg5sfn9aqmcac8p
EOF
```
**Encrypt secrets**:
```bash
# Create secret
kubectl create secret generic my-secret --dry-run=client -o yaml --from-literal=password=supersecret > secret.yaml
# Encrypt with SOPS
sops -e secret.yaml > secret.enc.yaml
# Commit encrypted version
git add secret.enc.yaml .sops.yaml
```
**Flux Integration**:
```yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: app
spec:
decryption:
provider: sops
secretRef:
name: sops-age
```
**Pros**: Git-friendly, flexible, age is simpler than PGP
**Cons**: Manual encryption step, key management
## Best Practices (2024-2025)
### 1. Key Rotation
**Sealed Secrets**: Rotate annually, maintain old keys for decryption
**ESO**: Automatic with cloud providers
**SOPS**: Re-encrypt when rotating age keys
### 2. Access Control
- Never commit `.sops` age key to Git
- Use separate keys per environment
- Store age keys in CI/CD secrets
- Use RBAC for Secret access
### 3. Encryption Scope
**SOPS .sops.yaml**:
```yaml
creation_rules:
- path_regex: production/.*
encrypted_regex: ^(data|stringData)$
age: age1prod...
- path_regex: staging/.*
encrypted_regex: ^(data|stringData)$
age: age1staging...
```
### 4. Git Pre-commit Hook
Prevent committing plain secrets:
```bash
#!/bin/bash
# .git/hooks/pre-commit
if git diff --cached --name-only | grep -E 'secret.*\.yaml$'; then
echo "⚠️ Potential secret file detected"
echo "Ensure it's encrypted with SOPS"
exit 1
fi
```
### 5. ArgoCD 3.0 Recommendation
**Use secrets operators** (ESO preferred), avoid config management plugins for secrets
## Decision Guide
**Choose Sealed Secrets if**:
- ✅ Simple GitOps workflow
- ✅ Static secrets
- ✅ No external dependencies wanted
- ✅ Small team
**Choose External Secrets Operator if**:
- ✅ Already using cloud secret stores
- ✅ Need secret rotation
- ✅ Dynamic secrets
- ✅ Enterprise compliance
**Choose SOPS + age if**:
- ✅ Git-centric workflow
- ✅ Want flexibility
- ✅ Multi-cloud
- ✅ Prefer open standards
## 2025 Trend Summary
**Growing**: External Secrets Operator, SOPS+age
**Stable**: Sealed Secrets (still widely used)
**Declining**: PGP encryption (age preferred)
**Emerging**: age encryption as standard (simpler than PGP)

View File

@@ -0,0 +1,134 @@
# GitOps Troubleshooting Guide (2024-2025)
## Common ArgoCD Issues
### 1. Application OutOfSync
**Symptoms**: Application shows OutOfSync status
**Causes**: Git changes not applied, manual cluster changes
**Fix**:
```bash
argocd app sync my-app
argocd app diff my-app # See differences
```
### 2. Annotation Tracking Migration (ArgoCD 3.x)
**Symptoms**: Resources not tracked after upgrade to 3.x
**Cause**: Default changed from labels to annotations
**Fix**: Resources auto-migrate on next sync, or force:
```bash
argocd app sync my-app --force
```
### 3. Sync Fails with "Resource is Invalid"
**Cause**: YAML validation error, CRD mismatch
**Fix**:
```bash
argocd app get my-app --show-operation
kubectl apply --dry-run=client -f manifest.yaml # Test locally
```
### 4. Image Pull Errors
**Cause**: Registry credentials, network issues
**Fix**:
```bash
kubectl get events -n <namespace>
kubectl describe pod <pod-name> -n <namespace>
# Check image pull secret
kubectl get secret -n <namespace>
```
## Common Flux Issues
### 1. GitRepository Not Ready
**Symptoms**: source not ready, no artifact
**Causes**: Auth failure, branch doesn't exist
**Fix**:
```bash
flux get sources git
flux reconcile source git <name> -n flux-system
kubectl describe gitrepository <name> -n flux-system
```
### 2. Kustomization Build Failed
**Cause**: Invalid kustomization.yaml, missing resources
**Fix**:
```bash
flux get kustomizations
kubectl describe kustomization <name> -n flux-system
# Test locally
kustomize build <path>
```
### 3. HelmRelease Install Failed
**Cause**: Values error, chart incompatibility
**Fix**:
```bash
flux get helmreleases
kubectl logs -n flux-system -l app=helm-controller
# Test locally
helm template <chart> -f values.yaml
```
### 4. OCI Repository Issues (Flux 2.6+)
**Cause**: Registry auth, OCI artifact not found
**Fix**:
```bash
flux get sources oci
kubectl describe ocirepository <name>
# Verify artifact exists
crane digest ghcr.io/org/app:v1.0.0
```
## SOPS Decryption Failures
**Symptom**: Secret not decrypted
**Fix**:
```bash
# Check age secret exists
kubectl get secret sops-age -n flux-system
# Test decryption locally
export SOPS_AGE_KEY_FILE=key.txt
sops -d secret.enc.yaml
```
## Performance Issues
### ArgoCD Slow Syncs
**Cause**: Too many resources, inefficient queries
**Fix** (ArgoCD 3.x):
- Use default resource exclusions
- Enable server-side diff
- Increase controller replicas
### Flux Slow Reconciliation
**Cause**: Large monorepos, many sources
**Fix** (Flux 2.7+):
- Enable source-watcher
- Increase interval
- Use OCI artifacts instead of Git
## Debugging Commands
**ArgoCD**:
```bash
argocd app get <app> --refresh
argocd app logs <app>
kubectl logs -n argocd -l app.kubernetes.io/name=argocd-application-controller
```
**Flux**:
```bash
flux logs --all-namespaces
flux check
flux get all
kubectl -n flux-system get events --sort-by='.lastTimestamp'
```
## Quick Wins
1. **Use `--dry-run`** before applying
2. **Check controller logs** first
3. **Verify RBAC** permissions
4. **Test manifests locally** (kubectl apply --dry-run, kustomize build)
5. **Check Git connectivity** (credentials, network)