14 KiB
SAP BTP Failover and Resilience - Detailed Reference
Source: https://github.com/SAP-docs/btp-best-practices-guide/tree/main/docs/deploy-and-deliver
High Availability Overview
SAP BTP applications can achieve high availability through multi-region deployment with intelligent traffic routing. This eliminates single points of failure and addresses latency concerns for global users.
Multi-Region Architecture
Core Concept
Custom Domain URL
│
▼
┌──────────────┐
│ Load Balancer │
│ (Health Checks)│
└──────┬───────┘
│
┌────────────┴────────────┐
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ Region 1 │ │ Region 2 │
│ (Active) │ │ (Passive/Active)│
│ │ │ │
│ Subaccount A │ │ Subaccount B │
│ App Instance │ │ App Instance │
└─────────────────┘ └─────────────────┘
Key Benefits
- Geographic Redundancy: Regional outages don't interrupt service
- Intelligent Routing: Health checks direct traffic to operational instances
- Unified Access: Custom domain remains constant during failovers
- Load Distribution: Traffic balances across regions
- Latency Optimization: Route users to nearest healthy region
Failover Scope
Supported Application Types
The basic failover guidance applies to:
- SAPUI5 applications
- HTML5 applications
- Applications without data persistence
- Applications without in-memory caching
- Applications with data stored in on-premise back-end systems
Note: Applications with cloud-based persistence require additional considerations for data synchronization.
Four Core Failover Principles
1. Deploy Across Two Data Centers
Configuration: Active/Passive
- Primary data center receives normal traffic
- Secondary data center acts as standby
- Switch to secondary only during primary downtime
Regional Selection Best Practices:
- Choose regions near users and backend systems
- Example: Frankfurt and Amsterdam for European users
- Consider same region for performance
- Review data processing restrictions for cross-region deployment
Legal Consideration: Cross-region deployment may create data processing compliance issues. Review Data Protection and Privacy documentation before proceeding.
2. Keep Applications Synchronized
Synchronization Options:
| Method | Effort | Best For |
|---|---|---|
| Manual | High | Infrequent updates |
| CI/CD Pipeline | Medium | Regular deployments |
| Solution Export + Transport Management | Medium | Neo environments |
Manual Synchronization
- Duplicate modifications across both data centers
- Mirror Git repositories
- Allows non-identical applications (reduced functionality in backup)
- Visual differentiation between primary and backup
CI/CD Pipeline Synchronization
# Pipeline deploys to both regions
stages:
- name: Build
steps:
- build_mta
- name: Deploy Primary
steps:
- deploy_to_region_1
- name: Deploy Secondary
steps:
- deploy_to_region_2
- Use Project "Piper" pipelines adapted for multi-deployment
- Parallel deployment to subaccounts in different regions
- Automatic consistency
Solution Export Wizard + Cloud Transport Management
- Export changes as MTA archive from primary (Neo)
- Import via Transport Management Service (Cloud Foundry)
- Deploy to secondary data center
3. Define Failover Detection
Detection Mechanisms:
- Response timeout monitoring (e.g., 25 seconds max)
- HTTP status code checking (5xx errors)
- Health endpoint monitoring
Implementation Options:
- Manual code implementation
- Rule-based solutions (e.g., Akamai ION)
- Load balancer health checks
Detection Behavior:
- Monitor first HTTP request to application URL
- Ignore subsequent requests (prevent single resource failures triggering failover)
- Present HTML down page with failover link when detected
Note: In basic scenarios, "the failover itself is therefore manually performed by the user" via the down page link.
4. Plan for Failback
Active/Active Setup:
- Applications identical in both data centers
- Same functionality everywhere
- Failback automatic with next failover event
- Not mandatory to explicitly return to primary
Active/Passive Setup:
- Applications may differ between data centers
- Reduced functionality in backup acceptable
- Failback to primary is mandatory
- Must restore full functionality
Recommended Failback Approach:
- User-driven failback model
- Visual differentiation reminds users to switch back
- Allow transactions to complete without interruption
- Prioritize completion over automatic recovery speed
Multi-Region Reference Use Cases
Available Implementations
| Scenario | Components | Resources |
|---|---|---|
| SAP Build Work Zone + Azure Traffic Manager | Work Zone, Azure | Blog post, GitHub, Discovery Center mission |
| SAP Build Work Zone + Amazon Route 53 | Work Zone, AWS | Blog post, GitHub |
| CAP Applications + SAP HANA Cloud | CAP, HANA Cloud multi-zone | GitHub repository |
| CAP Applications + Amazon Aurora | CAP, Aurora read replica | GitHub repository |
| SAP Cloud Integration + Azure Traffic Manager | CPI, Azure | GitHub, Discovery Center |
Architecture: SAP Build Work Zone + Azure Traffic Manager
User Request
│
▼
Azure Traffic Manager
(Priority-based routing)
│
├──► Primary Region (Priority 1)
│ └── SAP Build Work Zone Instance
│
└──► Secondary Region (Priority 2)
└── SAP Build Work Zone Instance (standby)
Architecture: CAP + HANA Cloud Multi-Zone
Application Load Balancer
│
┌────────┴────────┐
▼ ▼
CAP App (AZ1) CAP App (AZ2)
│ │
└────────┬────────┘
▼
SAP HANA Cloud
(Multi-zone replication)
Data Backup and Resilience
SAP-Managed Backups
| Service | Backup Type | Retention | Notes |
|---|---|---|---|
| SAP HANA Cloud | Continuous | As configured | Database recovery supported |
| PostgreSQL (Hyperscaler) | Point-in-time | 14 days | Restore by creating new instance |
| Redis | None | N/A | No persistence support |
| Object Store | None | N/A | Use versioning for protection |
Object Store Protection Mechanisms
- Object Versioning: Recover from accidental deletion
- Expiration Rules: Automatic version cleanup
- Deletion Prevention: AWS S3 buckets, Azure containers
Runtime-Specific Backup Strategies
| Runtime | Strategy |
|---|---|
| Cloud Foundry | Multi-AZ replication within region |
| Kyma | Managed K8s snapshots (excludes volumes) |
| Neo | Cross-region data copies |
Customer-Managed Backups
Critical: SAP doesn't manage backups of service configurations. You are responsible for backing up your service-specific configurations.
Key Responsibilities:
| Responsibility | Details |
|---|---|
| Self-Service Backup | You must back up service configurations yourself |
| Service Documentation Review | Consult each service's documentation for backup capabilities |
| Service Limitations Awareness | Some services don't support user-specific configuration backups |
| Risk Mitigation | Backup frequency varies by service; prevents accidental data loss |
Actions Required:
- Identify all SAP BTP services currently in use
- Review service-specific backup documentation
- Understand which services lack backup capabilities
- Implement backup strategies for supported configurations
- Plan accordingly for services without backup features
- Ensure business continuity plans include config recovery
Note: If backup information is unavailable for a service, contact SAP support channels.
Kyma Cluster Sharing and Isolation
When to Share Clusters
Recommended:
- Small teams with modest resource needs
- Development and testing environments
- Non-critical workloads
- Teams with established trust
Not Recommended:
- Multiple external customers
- Production workloads with strict isolation
- Untrusted tenants
Control Plane Isolation Strategies
| Strategy | Description |
|---|---|
| Namespaces | Isolate API resources within cluster |
| RBAC | Manage permissions within namespaces |
| Global Resources | Admin-managed CRDs and global objects |
| Policy Engines | Gatekeeper, Kyverno for compliance |
| Resource Quotas | Set limits per tenant/namespace |
Data Plane Isolation Strategies
| Strategy | Description |
|---|---|
| Network Policies | Restrict inter-namespace traffic |
| Centralized Observability | Cluster-wide metrics tracking |
| Service Mesh (Istio) | mTLS, dedicated ingress per tenant |
Sample Network Policy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: deny-cross-namespace
namespace: tenant-a
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: tenant-a
egress:
- to:
- namespaceSelector:
matchLabels:
name: tenant-a
Disaster Recovery Planning
Recovery Time Objective (RTO)
Depends on:
- Failover detection speed
- Application synchronization lag
- User-driven vs automatic failover
Recovery Point Objective (RPO)
Depends on:
- Synchronization frequency
- Data persistence strategy
- Backup retention periods
DR Checklist
- Define RTO and RPO requirements
- Select multi-region architecture
- Implement application synchronization
- Configure failover detection
- Test failover procedures regularly
- Document failback process
- Train operations team
- Establish communication protocols
Resilient Application Development
Design Principles
- Statelessness: Minimize in-memory state
- Idempotency: Safe to retry operations
- Circuit Breakers: Graceful degradation
- Health Endpoints: Enable monitoring
- Graceful Shutdown: Complete in-flight requests
Health Endpoint Example
app.get('/health', (req, res) => {
const health = {
status: 'UP',
checks: {
database: checkDatabase(),
cache: checkCache(),
externalApi: checkExternalApi()
}
};
const allHealthy = Object.values(health.checks)
.every(check => check.status === 'UP');
res.status(allHealthy ? 200 : 503).json(health);
});
Source Documentation:
- https://github.com/SAP-docs/btp-best-practices-guide/blob/main/docs/set-up-and-plan/planning-failover-on-sap-btp-8c46464.md
- https://github.com/SAP-docs/btp-best-practices-guide/blob/main/docs/deploy-and-deliver/implementing-failover-df972c5.md
- https://github.com/SAP-docs/btp-best-practices-guide/blob/main/docs/deploy-and-deliver/deploy-your-application-in-two-data-centers-61d08d8.md
- https://github.com/SAP-docs/btp-best-practices-guide/blob/main/docs/deploy-and-deliver/keep-the-two-applications-in-sync-e6d2bdb.md
- https://github.com/SAP-docs/btp-best-practices-guide/blob/main/docs/deploy-and-deliver/define-how-a-failover-is-detected-88b86db.md
- https://github.com/SAP-docs/btp-best-practices-guide/blob/main/docs/deploy-and-deliver/decide-on-the-failback-963f962.md
- https://github.com/SAP-docs/btp-best-practices-guide/blob/main/docs/deploy-and-deliver/multi-region-usecases.md
- https://github.com/SAP-docs/btp-best-practices-guide/blob/main/docs/deploy-and-deliver/data-backups-managed-by-sap-6c1e071.md
- https://github.com/SAP-docs/btp-best-practices-guide/blob/main/docs/set-up-and-plan/sharing-clusters-in-kyma-57ec1ea.md