Files
gh-openshift-eng-ai-helpers…/skills/hcp-create-aws/SKILL.md
2025-11-30 08:45:56 +08:00

569 lines
17 KiB
Markdown

---
name: HyperShift AWS Provider
description: Use this skill when you need to deploy HyperShift clusters on AWS infrastructure with proper STS credentials, IAM roles, and VPC configuration
---
# HyperShift AWS Provider
This skill provides implementation guidance for creating HyperShift clusters on AWS, handling AWS-specific requirements including STS credentials, IAM roles, VPC configuration, and regional best practices.
## When to Use This Skill
This skill is automatically invoked by the `/hcp:generate aws` command to guide the AWS provider cluster creation process.
## Prerequisites
- AWS CLI configured with appropriate credentials
- HyperShift operator installed and configured
- STS credentials file for the target AWS account
- IAM role with required permissions for HyperShift
- Pull secret for accessing OpenShift images
## AWS Provider Overview
### AWS Provider Peculiarities
- **Requires AWS credentials (STS):** Must have valid STS credentials file
- **Region selection affects availability zones:** Different regions have different AZ availability
- **Instance types vary by region:** Not all instance types available in all regions
- **VPC CIDR must not conflict:** Must not overlap with existing infrastructure
- **IAM roles:** Can be auto-created or use pre-existing roles
### Common AWS Configurations
**Development Environment:**
- Single replica control plane (cost-effective)
- m5.large instances (balanced performance/cost)
- 2 availability zones (basic redundancy)
- Basic networking (public endpoints)
**Production Environment:**
- Highly available control plane
- m5.xlarge+ instances (better performance)
- 3+ availability zones (high availability)
- Custom VPC configuration
- KMS encryption enabled
**Cost-Optimized Environment:**
- Single NAT gateway
- Smaller instance types
- Minimal replicas
- Spot instances (where applicable)
## Implementation Steps
### Step 1: Analyze Cluster Description
Parse the natural language description for AWS-specific requirements:
**Environment Type Detection:**
- **Development**: "dev", "development", "testing", "demo", "sandbox"
- **Production**: "prod", "production", "critical", "enterprise"
- **Cost-Optimized**: "cheap", "cost", "minimal", "budget", "demo"
**Performance Indicators:**
- **High Performance**: "performance", "fast", "high-compute", "intensive"
- **Standard**: Default moderate configuration
- **Minimal**: "small", "minimal", "basic", "simple"
**Security/Compliance:**
- **FIPS**: "fips", "compliance", "security", "regulated"
- **Private**: "private", "isolated", "secure", "internal"
**Special Requirements:**
- **Multi-AZ**: "highly available", "ha", "multi-zone", "resilient"
- **Single-AZ**: "single zone", "simple", "minimal"
### Step 2: Apply AWS Provider Defaults
**Required Parameters:**
- `--region`: AWS region (default: us-east-1)
- `--pull-secret`: Path to pull secret file
- `--release-image`: OpenShift release image
- `--sts-creds`: **REQUIRED** - Path to STS credentials file
- `--role-arn`: **REQUIRED** - ARN of the IAM role to assume
- `--base-domain`: **REQUIRED** - Base domain for the cluster
**Smart Defaults by Environment:**
**Development Environment:**
```bash
--instance-type m5.large
--node-pool-replicas 2
--control-plane-availability-policy SingleReplica
--endpoint-access Public
--root-volume-size 120
--zones auto-select 2 zones based on region
```
**Production Environment:**
```bash
--instance-type m5.xlarge
--node-pool-replicas 3
--control-plane-availability-policy HighlyAvailable
--endpoint-access PublicAndPrivate
--root-volume-size 120
--auto-repair true
--zones auto-select 3+ zones based on region
```
**Cost-Optimized Environment:**
```bash
--instance-type m5.large
--node-pool-replicas 2
--control-plane-availability-policy SingleReplica
--endpoint-access Public
--root-volume-size 120
--zones auto-select 2 zones (minimal redundancy)
```
### Step 3: Interactive Parameter Collection
**Required Information Collection:**
1. **Cluster Name**
```
🔹 **Cluster Name**: What would you like to name your cluster?
- Must be DNS-compatible (lowercase, hyphens allowed)
- Used for AWS resource naming
- Example: dev-cluster, prod-app, demo-env
```
2. **AWS Region**
```
🔹 **AWS Region**: Which AWS region should host your cluster?
- Consider latency to your users
- Verify desired instance types are available
- [Press Enter for default: us-east-1]
Popular regions:
- us-east-1 (N. Virginia) - Largest service availability
- us-west-2 (Oregon) - West coast, latest services
- eu-west-1 (Ireland) - Europe
- ap-southeast-1 (Singapore) - Asia Pacific
```
3. **STS Credentials**
```
🔹 **STS Credentials**: Path to your AWS STS credentials file?
- Required for AWS authentication
- Generate using: aws sts get-session-token
- Example: /home/user/.aws/sts-creds.json
- Format: {"AccessKeyId": "...", "SecretAccessKey": "...", "SessionToken": "..."}
```
4. **IAM Role ARN**
```
🔹 **IAM Role ARN**: ARN of the IAM role for HyperShift?
- Role must have required HyperShift permissions
- Example: arn:aws:iam::123456789012:role/hypershift-operator-role
- See: https://hypershift.openshift.io/aws-setup/
```
5. **Base Domain**
```
🔹 **Base Domain**: What base domain should be used for cluster DNS?
- Must be a domain you control in Route53
- Used for cluster API and application routes
- Example: example.com, clusters.mycompany.com
```
6. **Pull Secret**
```
🔹 **Pull Secret**: Path to your OpenShift pull secret file?
- Required for accessing OpenShift container images
- Download from: https://console.redhat.com/openshift/install/pull-secret
- Example: /home/user/pull-secret.json
```
7. **OpenShift Version**
```
🔹 **OpenShift Version**: Which OpenShift version do you want to use?
📋 **Check supported versions**: https://amd64.ocp.releases.ci.openshift.org/
- Enter release image URL: quay.io/openshift-release-dev/ocp-release:X.Y.Z-multi
- [Press Enter for default: quay.io/openshift-release-dev/ocp-release:4.18.0-multi]
```
**Optional Configuration (based on description analysis):**
8. **Instance Type** (if performance requirements detected)
```
🔹 **Instance Type**: Select instance type based on your performance needs:
- m5.large (2 vCPU, 8GB RAM) - Development, light workloads
- m5.xlarge (4 vCPU, 16GB RAM) - Production, balanced workloads
- m5.2xlarge (8 vCPU, 32GB RAM) - High-performance workloads
- c5.xlarge (4 vCPU, 8GB RAM) - Compute-optimized
- [Press Enter for default based on environment type]
```
9. **Node Pool Replicas**
```
🔹 **Node Pool Replicas**: How many worker nodes do you need?
- Minimum: 2 (for basic redundancy)
- Production recommended: 3+
- [Press Enter for default based on environment type]
```
10. **Availability Zones** (auto-selected, but confirmed)
```
🔹 **Availability Zones**: Detected region: us-east-1
Auto-selecting zones for optimal distribution:
- Development: us-east-1a, us-east-1b (2 zones)
- Production: us-east-1a, us-east-1b, us-east-1c (3 zones)
Modify zone selection? [y/N]
```
### Step 4: Advanced Configuration (Conditional)
**For FIPS Compliance** (if detected):
```
🔹 **FIPS Mode**: Enable FIPS mode for compliance?
- Required for government/regulated workloads
- May impact performance
- [yes/no] [Press Enter for default: no]
```
**For High-Performance Workloads**:
```
🔹 **Root Volume Size**: Increase root volume size?
- Default: 120GB
- High-performance workloads: 200GB+
- [Press Enter for default: 120]
```
**For Production Environments**:
```
🔹 **Auto-Repair**: Enable automatic node repair?
- Automatically replaces unhealthy nodes
- Recommended for production
- [yes/no] [Press Enter for default: yes for production]
```
### Step 5: Generate Command
**Basic AWS Cluster Command:**
```bash
hypershift create cluster aws \
--name <cluster-name> \
--namespace <cluster-name>-ns \
--region <region> \
--instance-type <instance-type> \
--pull-secret <pull-secret-path> \
--node-pool-replicas <replica-count> \
--zones <zone-list> \
--control-plane-availability-policy <policy> \
--sts-creds <sts-creds-path> \
--role-arn <role-arn> \
--base-domain <base-domain> \
--release-image <release-image>
```
**Development Configuration Example:**
```bash
hypershift create cluster aws \
--name dev-cluster \
--namespace dev-cluster-ns \
--region us-east-1 \
--instance-type m5.large \
--pull-secret /path/to/pull-secret.json \
--node-pool-replicas 2 \
--zones us-east-1a,us-east-1b \
--control-plane-availability-policy SingleReplica \
--endpoint-access Public \
--root-volume-size 120 \
--sts-creds /path/to/sts-creds.json \
--role-arn arn:aws:iam::123456789012:role/hypershift-role \
--base-domain example.com \
--release-image quay.io/openshift-release-dev/ocp-release:4.18.0-multi
```
**Production Configuration Example:**
```bash
hypershift create cluster aws \
--name production-cluster \
--namespace production-cluster-ns \
--region us-west-2 \
--instance-type m5.xlarge \
--pull-secret /path/to/pull-secret.json \
--node-pool-replicas 3 \
--zones us-west-2a,us-west-2b,us-west-2c \
--control-plane-availability-policy HighlyAvailable \
--endpoint-access PublicAndPrivate \
--root-volume-size 120 \
--auto-repair \
--sts-creds /path/to/sts-creds.json \
--role-arn arn:aws:iam::123456789012:role/hypershift-prod-role \
--base-domain clusters.company.com \
--release-image quay.io/openshift-release-dev/ocp-release:4.18.0-multi
```
**FIPS-Enabled Configuration:**
```bash
hypershift create cluster aws \
--name compliance-cluster \
--namespace compliance-cluster-ns \
--region us-gov-east-1 \
--instance-type m5.xlarge \
--pull-secret /path/to/pull-secret.json \
--node-pool-replicas 3 \
--zones us-gov-east-1a,us-gov-east-1b,us-gov-east-1c \
--control-plane-availability-policy HighlyAvailable \
--fips \
--sts-creds /path/to/sts-creds.json \
--role-arn arn:aws-us-gov:iam::123456789012:role/hypershift-fips-role \
--base-domain secure.gov.example.com \
--release-image quay.io/openshift-release-dev/ocp-release:4.18.0-multi
```
### Step 6: Pre-Flight Validation
**Provide validation commands:**
```
## Pre-Flight Checks
Before creating the cluster, verify your setup:
1. **AWS Credentials:**
aws sts get-caller-identity
2. **STS Credentials File:**
cat /path/to/sts-creds.json | jq .
3. **IAM Role Access:**
aws iam get-role --role-name hypershift-role
4. **Route53 Domain:**
aws route53 list-hosted-zones --query "HostedZones[?Name=='example.com.']"
5. **Region Availability:**
aws ec2 describe-availability-zones --region us-east-1
6. **Instance Type Availability:**
aws ec2 describe-instance-type-offerings --location-type availability-zone --filters Name=instance-type,Values=m5.large --region us-east-1
```
### Step 7: Post-Generation Instructions
**Next Steps:**
```
## Next Steps
1. **Verify prerequisites are met:**
- AWS credentials configured
- STS credentials file exists and is valid
- IAM role has required permissions
- Base domain exists in Route53
2. **Run the generated command:**
Copy and paste the command above
3. **Monitor cluster creation:**
kubectl get hostedcluster -n <cluster-namespace>
kubectl get nodepool -n <cluster-namespace>
4. **Check AWS resources:**
- EC2 instances in AWS console
- Load balancers created
- VPC and networking resources
5. **Access cluster when ready:**
hypershift create kubeconfig --name <cluster-name> --namespace <cluster-namespace>
export KUBECONFIG=<cluster-name>-kubeconfig
oc get nodes
```
## Error Handling
### Invalid AWS Credentials
**Scenario:** AWS credentials are invalid or expired.
**Action:**
```
AWS credentials validation failed.
Please check:
1. AWS CLI configuration: aws configure list
2. STS credentials file validity
3. IAM permissions
Regenerate STS credentials:
aws sts get-session-token --duration-seconds 3600
```
### IAM Role Not Found
**Scenario:** Specified IAM role doesn't exist or can't be assumed.
**Action:**
```
IAM role "arn:aws:iam::123456789012:role/hypershift-role" not found or inaccessible.
Please verify:
1. Role exists: aws iam get-role --role-name hypershift-role
2. Role has required permissions
3. Trust relationship allows your account to assume the role
See HyperShift AWS setup guide: https://hypershift.openshift.io/aws-setup/
```
### Region/Zone Issues
**Scenario:** Instance type not available in selected region/zones.
**Action:**
```
Instance type "m5.large" not available in zone "us-east-1f".
Checking alternative zones in us-east-1:
✅ us-east-1a (available)
✅ us-east-1b (available)
❌ us-east-1f (not available)
Suggested zones: us-east-1a,us-east-1b
Would you like me to update the command?
```
### Route53 Domain Issues
**Scenario:** Base domain not found in Route53 or not accessible.
**Action:**
```
Base domain "example.com" not found in Route53.
Please ensure:
1. Domain exists in Route53: aws route53 list-hosted-zones
2. Account has access to the hosted zone
3. Domain spelling is correct
Alternative: Use a subdomain you control (e.g., clusters.mydomain.com)
```
### Resource Limits
**Scenario:** AWS account limits would be exceeded.
**Action:**
```
AWS service limits may be exceeded:
- EC2 instances: Current: 18/20, Requested: 5 more
- Elastic IPs: Current: 4/5, Requested: 2 more
Consider:
1. Request limit increases via AWS Support
2. Choose smaller instance types
3. Reduce node count
4. Clean up unused resources
```
## Best Practices
### Cost Optimization
1. **Right-size instances:** Don't over-provision for development
2. **Use Spot instances:** Where appropriate for non-critical workloads
3. **Monitor resource usage:** Regularly review AWS costs
4. **Clean up unused clusters:** Delete development clusters when not needed
### Security
1. **Least privilege IAM:** Use minimal required permissions
2. **STS credentials:** Use short-lived credentials when possible
3. **Private networking:** Use PrivateAndPublic endpoints for production
4. **KMS encryption:** Enable for sensitive workloads
### High Availability
1. **Multi-AZ deployment:** Use 3+ availability zones for production
2. **Instance distribution:** Spread nodes across zones
3. **Auto-repair:** Enable for automatic recovery
4. **Monitoring:** Set up CloudWatch monitoring
### Network Planning
1. **VPC design:** Plan CIDR ranges carefully
2. **Subnet strategy:** Use public/private subnet design
3. **Load balancer:** Configure appropriate load balancer types
4. **DNS:** Ensure proper Route53 configuration
## Anti-Patterns to Avoid
❌ **Using root AWS credentials**
```
Never use root account credentials for HyperShift
```
✅ Use IAM roles and STS credentials
❌ **Single availability zone for production**
```
--zones us-east-1a # Single point of failure
```
✅ Use multiple zones: `--zones us-east-1a,us-east-1b,us-east-1c`
❌ **Over-provisioning for development**
```
--instance-type m5.8xlarge --node-pool-replicas 10 # Expensive for dev
```
✅ Use appropriate sizing: `--instance-type m5.large --node-pool-replicas 2`
❌ **Ignoring region-specific limitations**
```
Choosing regions without checking instance type availability
```
✅ Verify instance types and services are available in target region
## Example Workflows
### Startup Development Environment
```
Input: "cheap AWS cluster for testing our new microservice"
Analysis:
- Environment: Development
- Cost focus: High priority
- Scale: Minimal
Generated Command:
hypershift create cluster aws \
--name dev-microservice \
--namespace dev-microservice-ns \
--region us-east-1 \
--instance-type m5.large \
--node-pool-replicas 2 \
--control-plane-availability-policy SingleReplica \
--endpoint-access Public
```
### Enterprise Production
```
Input: "highly available AWS production cluster for customer-facing applications"
Analysis:
- Environment: Production
- Availability: High priority
- Scale: Enterprise
Generated Command:
hypershift create cluster aws \
--name prod-customer-apps \
--namespace prod-customer-apps-ns \
--region us-west-2 \
--instance-type m5.xlarge \
--node-pool-replicas 5 \
--zones us-west-2a,us-west-2b,us-west-2c \
--control-plane-availability-policy HighlyAvailable \
--endpoint-access PublicAndPrivate \
--auto-repair
```
## See Also
- [HyperShift AWS Provider Documentation](https://hypershift.openshift.io/aws-setup/)
- [AWS IAM Roles for HyperShift](https://hypershift.openshift.io/aws-setup/#_prerequisites)
- [AWS CLI Configuration Guide](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html)
- [OpenShift on AWS Best Practices](https://docs.openshift.com/container-platform/latest/installing/installing_aws/)