zhongwei/gh-openshift-eng-ai-helpers-plugins-hcp

Files

Zhongwei Li 9fc59ddec1 Initial commit

2025-11-30 08:45:56 +08:00

14 KiB

Raw Permalink Blame History

name, description

name	description
HyperShift Agent Provider	Use this skill when you need to deploy HyperShift clusters on bare metal, edge environments, or disconnected infrastructures using pre-provisioned agents

HyperShift Agent Provider

This skill provides implementation guidance for creating HyperShift clusters using the Agent provider, which is designed for bare metal and edge deployments where pre-provisioned agents are available.

When to Use This Skill

This skill is automatically invoked by the /hcp:generate agent command to guide the Agent provider cluster creation process.

Prerequisites

HyperShift operator installed and configured
Pre-provisioned agents available in a Kubernetes namespace
Pull secret for accessing OpenShift images
Understanding of the target deployment environment

Agent Provider Overview

What is the Agent Provider?

The Agent provider:

Designed for bare metal and edge deployments
Requires pre-provisioned agents (nodes registered in advance)
No cloud provider automation - manual network configuration required
Suitable for disconnected/airgapped environments
Direct control over hardware and network configuration

Common Use Cases

Edge Computing: Remote locations with limited connectivity
Bare Metal: On-premises hardware without cloud automation
Disconnected Environments: Airgapped environments with security requirements
Custom Hardware: Specialized hardware configurations
Regulatory Compliance: Environments requiring specific data locality

Implementation Steps

Step 1: Analyze Cluster Description

Parse the natural language description for Agent-specific requirements:

Environment Type Detection:

Edge: "edge", "remote", "limited connectivity", "small footprint"
Bare Metal: "bare metal", "on-premises", "physical hardware", "custom hardware"
Disconnected: "airgapped", "disconnected", "offline", "secure environment"

Resource Indicators:

Minimal: "small", "edge", "minimal resources", "single node"
Standard: "production", "multi-node", "high availability"

Special Requirements:

FIPS: "fips", "compliance", "security"
Custom Storage: "local storage", "storage class", "persistent volumes"
Network Isolation: "isolated", "private network", "custom networking"

Step 2: Apply Agent Provider Defaults

Required Parameters:

--agent-namespace: Namespace where agents are located
--pull-secret: Path to pull secret file
--release-image: OpenShift release image
--base-domain: Base domain for the cluster (prompt user if not in description)

Smart Defaults by Environment:

Edge Environment:

--control-plane-availability-policy SingleReplica
--node-pool-replicas 1
--arch amd64  # or arm64 if specified

Bare Metal Environment:

--control-plane-availability-policy HighlyAvailable
--node-pool-replicas 3
--auto-repair true

Disconnected Environment:

--render  # Always render for review
--image-content-sources /path/to/image-content-sources.yaml
--additional-trust-bundle /path/to/ca-bundle.pem

Step 3: Interactive Parameter Collection

Required Information Collection:

Cluster Name

🔹 **Cluster Name**: What would you like to name your cluster?
   - Must be DNS-compatible (lowercase, hyphens allowed)
   - Will be used for resource naming
   - Example: edge-lab-01, production-cluster

Agent Namespace

🔹 **Agent Namespace**: In which namespace are your agents located?
   - This should be the namespace where you registered your agents
   - Example: default, agent-system, cluster-agents
   - [Press Enter for default: default]

Pull Secret

🔹 **Pull Secret**: Path to your OpenShift pull secret file?
   - Required for accessing OpenShift container images
   - Download from: https://console.redhat.com/openshift/install/pull-secret
   - Example: /home/user/pull-secret.json

Base Domain (if not specified in description)

🔹 **Base Domain**: What base domain should be used for cluster DNS?
   - This will be used for cluster API and application routes
   - Example: example.com, lab.internal, edge.local

OpenShift Version

🔹 **OpenShift Version**: Which OpenShift version do you want to use?

   📋 **Check supported versions**: https://amd64.ocp.releases.ci.openshift.org/

   - Enter release image URL: quay.io/openshift-release-dev/ocp-release:X.Y.Z-multi
   - [Press Enter for default: quay.io/openshift-release-dev/ocp-release:4.18.0-multi]

Optional Configuration (based on description analysis):

Architecture (if detected)

🔹 **Architecture**: Detected ARM architecture mention. Confirm architecture:
   - amd64 (x86_64)
   - arm64 (aarch64)
   - [Press Enter for default: amd64]

FIPS Mode (if compliance mentioned)

🔹 **FIPS Mode**: Enable FIPS mode for compliance?
   - Required for certain regulatory environments
   - May impact performance
   - [yes/no] [Press Enter for default: no]

Step 4: Disconnected Environment Handling

If disconnected/airgapped environment is detected:

Additional Required Information:

Mirror Registry

🔹 **Mirror Registry**: What is your mirror registry domain?
   - This registry should contain mirrored OpenShift images
   - Example: registry.example.com:5000

Image Content Sources

🔹 **Image Content Sources**: Path to image content sources file?
   - Required for mapping official images to mirror registry
   - Example: /path/to/image-content-sources.yaml

Additional Trust Bundle (optional)

🔹 **Additional Trust Bundle**: Path to custom CA bundle? (Optional)
   - Required if mirror registry uses custom certificates
   - Example: /path/to/ca-bundle.pem
   - [Press Enter to skip]

Always Use --render for Disconnected:

Include --render flag to generate manifests for review
Provide post-generation instructions for required manifest modifications

Step 5: Agent Availability Validation

Provide Agent Check Commands:

Before creating the cluster, verify your agents are available:

Check agents in namespace:
  kubectl get agents -n <agent-namespace>

Verify agent status:
  kubectl describe agents -n <agent-namespace>

Ensure agents are:
  - In "Available" state
  - Not bound to other clusters
  - Meet minimum resource requirements

Step 6: Generate Command

Basic Agent Cluster Command:

hypershift create cluster agent \
  --name <cluster-name> \
  --namespace <cluster-name>-ns \
  --agent-namespace <agent-namespace> \
  --pull-secret <pull-secret-path> \
  --release-image <release-image> \
  --base-domain <base-domain>

With Environment-Specific Flags:

Edge/Minimal Configuration:

hypershift create cluster agent \
  --name edge-cluster \
  --namespace edge-cluster-ns \
  --agent-namespace default \
  --pull-secret /path/to/pull-secret.json \
  --release-image quay.io/openshift-release-dev/ocp-release:4.18.0-multi \
  --base-domain edge.local \
  --control-plane-availability-policy SingleReplica \
  --node-pool-replicas 1

Production/HA Configuration:

hypershift create cluster agent \
  --name production-cluster \
  --namespace production-cluster-ns \
  --agent-namespace agent-system \
  --pull-secret /path/to/pull-secret.json \
  --release-image quay.io/openshift-release-dev/ocp-release:4.18.0-multi \
  --base-domain example.com \
  --control-plane-availability-policy HighlyAvailable \
  --node-pool-replicas 3 \
  --auto-repair

Disconnected Configuration:

hypershift create cluster agent \
  --name secure-cluster \
  --namespace secure-cluster-ns \
  --agent-namespace agent-system \
  --pull-secret /path/to/pull-secret.json \
  --release-image quay.io/openshift-release-dev/ocp-release:4.18.0-multi \
  --base-domain internal.example.com \
  --image-content-sources /path/to/image-content-sources.yaml \
  --additional-trust-bundle /path/to/ca-bundle.pem \
  --render

FIPS-Enabled Configuration:

hypershift create cluster agent \
  --name compliance-cluster \
  --namespace compliance-cluster-ns \
  --agent-namespace agent-system \
  --pull-secret /path/to/pull-secret.json \
  --release-image quay.io/openshift-release-dev/ocp-release:4.18.0-multi \
  --base-domain secure.example.com \
  --fips

Step 7: Post-Generation Instructions

For Disconnected Environments:

Provide additional configuration needed after cluster creation:

## Post-Creation Configuration (Disconnected)

After running the command above (with --render), you'll need to modify the generated manifests:

1. **Add ImageContentSources to HostedCluster:**
```yaml
spec:
  imageContentSources:
  - source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
    mirrors:
    - registry.example.com:5000/openshift/release
  - source: quay.io/openshift-release-dev/ocp-release
    mirrors:
    - registry.example.com:5000/openshift/release-images
  - source: registry.redhat.io/multicluster-engine
    mirrors:
    - registry.example.com:5000/openshift/multicluster-engine

Disable Default OperatorHub Sources:

spec:
  configuration:
    operatorhub:
      disableAllDefaultSources: true

Apply the manifests:

kubectl apply -f <rendered-manifest-files>


**For All Environments:**

Next Steps

Verify agents are available: kubectl get agents -n
Monitor cluster creation: kubectl get hostedcluster -n kubectl get nodepool -n
Check agent assignment: kubectl get agents -n -o wide
Access cluster when ready: hypershift create kubeconfig --name --namespace


## Error Handling

### No Agents Available

**Scenario:** No agents found in specified namespace.

**Action:**

No agents found in namespace "default".

Please ensure:

Agents are properly registered
Agents are in "Available" state
Correct namespace specified

Check agents: kubectl get agents -A


### Insufficient Agents

**Scenario:** Not enough agents for requested replicas.

**Action:**

Only 2 agents available, but you requested 3 node replicas.

Options:

Reduce node-pool-replicas to 2
Register additional agents
Use SingleReplica control plane (reduces total requirement)

Would you like me to adjust the configuration?


### Agent Already Bound

**Scenario:** Agents are already assigned to another cluster.

**Action:**

Some agents are already bound to other clusters.

Available agents: 1 Bound agents: 2

Check agent status: kubectl describe agents -n

Free up agents by deleting unused clusters or register new agents.


### Disconnected Image Access

**Scenario:** Can't access release images in disconnected environment.

**Action:**

Release image access failed. In disconnected environments:

Ensure mirror registry contains required images
Verify image-content-sources file mapping
Check custom CA bundle configuration
Validate network connectivity to mirror registry

Mirror required images using: oc adm release mirror --to


## Best Practices

### Agent Management

1. **Pre-provision agents:** Register agents before cluster creation
2. **Agent naming:** Use descriptive names for easier management
3. **Resource allocation:** Ensure agents meet minimum requirements
4. **Health monitoring:** Monitor agent status regularly

### Network Configuration

1. **DNS planning:** Ensure proper DNS resolution for base domain
2. **Load balancing:** Configure external load balancer for API access
3. **Ingress:** Plan for application ingress and routing
4. **Firewall:** Configure appropriate firewall rules

### Security

1. **Pull secret security:** Protect pull secret files
2. **FIPS compliance:** Enable FIPS for regulated environments
3. **Certificate management:** Plan for custom certificate authorities
4. **Network isolation:** Implement appropriate network segmentation

### Disconnected Deployments

1. **Mirror planning:** Pre-mirror all required images
2. **Catalog management:** Prepare custom operator catalogs
3. **Certificate management:** Plan for custom CA bundles
4. **Registry security:** Secure mirror registry access

## Anti-Patterns to Avoid

❌ **Assuming cloud automation**

"Create cluster with autoscaling"

✅ Agent provider requires manual agent provisioning

❌ **Insufficient agent planning**

Not checking agent availability before cluster creation

✅ Always verify sufficient agents are available

❌ **Ignoring disconnected requirements**

Using public image references in airgapped environment

✅ Properly configure image content sources and mirror registry

❌ **Inadequate resource planning**

Using minimal agents for production workloads

✅ Size agents appropriately for expected workloads

## Example Workflows

### Edge Deployment

Input: "small edge cluster for remote office monitoring"

Analysis:

Environment: Edge
Scale: Minimal
Use case: Monitoring

Generated Command: hypershift create cluster agent
--name edge-monitoring
--namespace edge-monitoring-ns
--agent-namespace default
--pull-secret /path/to/pull-secret.json
--release-image quay.io/openshift-release-dev/ocp-release:4.18.0-multi
--base-domain edge.local
--control-plane-availability-policy SingleReplica
--node-pool-replicas 1


### Secure Data Center

Input: "airgapped production cluster for financial data processing"