zhongwei/gh-secondsky-sap-skills-skills-sap-btp-best-practices

Fork 0

Files

Zhongwei Li bd301d195d Initial commit

2025-11-30 08:54:47 +08:00

14 KiB

Raw Blame History

SAP BTP Failover and Resilience - Detailed Reference

Source: https://github.com/SAP-docs/btp-best-practices-guide/tree/main/docs/deploy-and-deliver

High Availability Overview

SAP BTP applications can achieve high availability through multi-region deployment with intelligent traffic routing. This eliminates single points of failure and addresses latency concerns for global users.

Multi-Region Architecture

Core Concept

                    Custom Domain URL
                           │
                           ▼
                    ┌──────────────┐
                    │ Load Balancer │
                    │ (Health Checks)│
                    └──────┬───────┘
                           │
              ┌────────────┴────────────┐
              ▼                         ▼
    ┌─────────────────┐       ┌─────────────────┐
    │  Region 1       │       │  Region 2       │
    │  (Active)       │       │  (Passive/Active)│
    │                 │       │                 │
    │  Subaccount A   │       │  Subaccount B   │
    │  App Instance   │       │  App Instance   │
    └─────────────────┘       └─────────────────┘

Key Benefits

Geographic Redundancy: Regional outages don't interrupt service
Intelligent Routing: Health checks direct traffic to operational instances
Unified Access: Custom domain remains constant during failovers
Load Distribution: Traffic balances across regions
Latency Optimization: Route users to nearest healthy region

Failover Scope

Supported Application Types

The basic failover guidance applies to:

SAPUI5 applications
HTML5 applications
Applications without data persistence
Applications without in-memory caching
Applications with data stored in on-premise back-end systems

Note: Applications with cloud-based persistence require additional considerations for data synchronization.

Four Core Failover Principles

1. Deploy Across Two Data Centers

Configuration: Active/Passive

Primary data center receives normal traffic
Secondary data center acts as standby
Switch to secondary only during primary downtime

Regional Selection Best Practices:

Choose regions near users and backend systems
Example: Frankfurt and Amsterdam for European users
Consider same region for performance
Review data processing restrictions for cross-region deployment

Legal Consideration: Cross-region deployment may create data processing compliance issues. Review Data Protection and Privacy documentation before proceeding.

2. Keep Applications Synchronized

Synchronization Options:

Method	Effort	Best For
Manual	High	Infrequent updates
CI/CD Pipeline	Medium	Regular deployments
Solution Export + Transport Management	Medium	Neo environments

Manual Synchronization

Duplicate modifications across both data centers
Mirror Git repositories
Allows non-identical applications (reduced functionality in backup)
Visual differentiation between primary and backup

CI/CD Pipeline Synchronization

# Pipeline deploys to both regions
stages:
  - name: Build
    steps:
      - build_mta

  - name: Deploy Primary
    steps:
      - deploy_to_region_1

  - name: Deploy Secondary
    steps:
      - deploy_to_region_2

Use Project "Piper" pipelines adapted for multi-deployment
Parallel deployment to subaccounts in different regions
Automatic consistency

Solution Export Wizard + Cloud Transport Management

Export changes as MTA archive from primary (Neo)
Import via Transport Management Service (Cloud Foundry)
Deploy to secondary data center

3. Define Failover Detection

Detection Mechanisms:

Response timeout monitoring (e.g., 25 seconds max)
HTTP status code checking (5xx errors)
Health endpoint monitoring

Implementation Options:

Manual code implementation
Rule-based solutions (e.g., Akamai ION)
Load balancer health checks

Detection Behavior:

Monitor first HTTP request to application URL
Ignore subsequent requests (prevent single resource failures triggering failover)
Present HTML down page with failover link when detected

Note: In basic scenarios, "the failover itself is therefore manually performed by the user" via the down page link.

4. Plan for Failback

Active/Active Setup:

Applications identical in both data centers
Same functionality everywhere
Failback automatic with next failover event
Not mandatory to explicitly return to primary

Active/Passive Setup:

Applications may differ between data centers
Reduced functionality in backup acceptable
Failback to primary is mandatory
Must restore full functionality

Recommended Failback Approach:

User-driven failback model
Visual differentiation reminds users to switch back
Allow transactions to complete without interruption
Prioritize completion over automatic recovery speed

Multi-Region Reference Use Cases

Available Implementations

Scenario	Components	Resources
SAP Build Work Zone + Azure Traffic Manager	Work Zone, Azure	Blog post, GitHub, Discovery Center mission
SAP Build Work Zone + Amazon Route 53	Work Zone, AWS	Blog post, GitHub
CAP Applications + SAP HANA Cloud	CAP, HANA Cloud multi-zone	GitHub repository
CAP Applications + Amazon Aurora	CAP, Aurora read replica	GitHub repository
SAP Cloud Integration + Azure Traffic Manager	CPI, Azure	GitHub, Discovery Center

Architecture: SAP Build Work Zone + Azure Traffic Manager

User Request
     │
     ▼
Azure Traffic Manager
(Priority-based routing)
     │
     ├──► Primary Region (Priority 1)
     │    └── SAP Build Work Zone Instance
     │
     └──► Secondary Region (Priority 2)
          └── SAP Build Work Zone Instance (standby)

Architecture: CAP + HANA Cloud Multi-Zone

     Application Load Balancer
              │
     ┌────────┴────────┐
     ▼                 ▼
CAP App (AZ1)     CAP App (AZ2)
     │                 │
     └────────┬────────┘
              ▼
    SAP HANA Cloud
    (Multi-zone replication)

Data Backup and Resilience

SAP-Managed Backups

Service	Backup Type	Retention	Notes
SAP HANA Cloud	Continuous	As configured	Database recovery supported
PostgreSQL (Hyperscaler)	Point-in-time	14 days	Restore by creating new instance
Redis	None	N/A	No persistence support
Object Store	None	N/A	Use versioning for protection

Object Store Protection Mechanisms

Object Versioning: Recover from accidental deletion
Expiration Rules: Automatic version cleanup
Deletion Prevention: AWS S3 buckets, Azure containers

Runtime-Specific Backup Strategies

Runtime	Strategy
Cloud Foundry	Multi-AZ replication within region
Kyma	Managed K8s snapshots (excludes volumes)
Neo	Cross-region data copies

Customer-Managed Backups

Critical: SAP doesn't manage backups of service configurations. You are responsible for backing up your service-specific configurations.

Key Responsibilities:

Responsibility	Details
Self-Service Backup	You must back up service configurations yourself
Service Documentation Review	Consult each service's documentation for backup capabilities
Service Limitations Awareness	Some services don't support user-specific configuration backups
Risk Mitigation	Backup frequency varies by service; prevents accidental data loss

Actions Required:

Identify all SAP BTP services currently in use
Review service-specific backup documentation
Understand which services lack backup capabilities
Implement backup strategies for supported configurations
Plan accordingly for services without backup features
Ensure business continuity plans include config recovery

Note: If backup information is unavailable for a service, contact SAP support channels.

Recommended:

Small teams with modest resource needs
Development and testing environments
Non-critical workloads
Teams with established trust

Not Recommended:

Multiple external customers
Production workloads with strict isolation
Untrusted tenants

Control Plane Isolation Strategies

Strategy	Description
Namespaces	Isolate API resources within cluster
RBAC	Manage permissions within namespaces
Global Resources	Admin-managed CRDs and global objects
Policy Engines	Gatekeeper, Kyverno for compliance
Resource Quotas	Set limits per tenant/namespace

Data Plane Isolation Strategies

Strategy	Description
Network Policies	Restrict inter-namespace traffic
Centralized Observability	Cluster-wide metrics tracking
Service Mesh (Istio)	mTLS, dedicated ingress per tenant

Sample Network Policy

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-cross-namespace
  namespace: tenant-a
spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              name: tenant-a
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              name: tenant-a

Disaster Recovery Planning

Recovery Time Objective (RTO)

Depends on:

Failover detection speed
Application synchronization lag
User-driven vs automatic failover

Recovery Point Objective (RPO)