zhongwei/gh-secondsky-sap-skills-skills-sap-btp-developer-guide

Files

Zhongwei Li 6942e32e6b Initial commit

2025-11-30 08:55:02 +08:00

7.3 KiB

Raw Permalink Blame History

SAP BTP Resilience Reference

Overview

Building resilient applications ensures stability, high availability, and graceful degradation during failures. SAP BTP provides patterns and services to achieve enterprise-grade resilience.

Key Resources

Resource	Description
Developing Resilient Apps on SAP BTP	Patterns and examples
Route Multi-Region Traffic	GitHub implementation
Architecting Multi-Region Resiliency	Discovery Center reference

Cloud Foundry Resilience

Availability Zones

Automatic Distribution:

Applications spread across multiple AZs
No manual configuration required
Platform handles placement

During AZ Failure:

~1/3 instances become unavailable (3-zone deployment)
Remaining instances handle increased load
Cloud Foundry reschedules to healthy zones

Best Practice: Configure sufficient instances to handle load during zone failures:

Minimum instances = Normal load instances × 1.5

Instance Configuration

# manifest.yml
applications:
  - name: my-app
    instances: 3  # At least 3 for HA
    memory: 512M
    health-check-type: http
    health-check-http-endpoint: /health

Health Checks

// Express health endpoint
app.get('/health', (req, res) => {
  const health = {
    status: 'UP',
    checks: {
      database: checkDatabase(),
      messaging: checkMessaging()
    }
  };
  res.status(200).json(health);
});

Kyma Resilience

Istio Service Mesh

Features:

Automatic retries
Circuit breakers
Timeouts
Load balancing

Configuration

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: my-app-dr
spec:
  host: my-app
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        h2UpgradePolicy: UPGRADE
        http1MaxPendingRequests: 100
        http2MaxRequests: 1000
    outlierDetection:
      consecutive5xxErrors: 5
      interval: 30s
      baseEjectionTime: 30s
      maxEjectionPercent: 50

Pod Distribution

apiVersion: apps/v1
kind: Deployment
spec:
  replicas: 3
  template:
    spec:
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app: my-app

ABAP Resilience

Built-in Features

Automatic workload distribution
Work process management
HANA failover support
Session management

Elastic Scaling

Automatic response to load:

Scale between 1 ACU and configured max
0.5 ACU increments
Metrics-based decisions

Resilience Patterns

Circuit Breaker

Purpose: Prevent cascading failures

States:

Closed: Normal operation
Open: Fail fast, skip calls
Half-Open: Test recovery

Implementation (CAP - Node.js):

Note

: The opossum library shown below is a third-party community package, not SAP-supported. Evaluate its maintenance status, compatibility with your CAP/Node.js versions, and security posture before production use. For Java applications, SAP Cloud SDK integrates with Resilience4j as the official resilience tooling.

const CircuitBreaker = require('opossum');

const breaker = new CircuitBreaker(callRemoteService, {
  timeout: 3000,
  errorThresholdPercentage: 50,
  resetTimeout: 30000
});

breaker.fallback(() => getCachedData());

const result = await breaker.fire(serviceParams);

Retry with Exponential Backoff

Purpose: Handle transient failures

async function retryWithBackoff(fn, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await fn();
    } catch (error) {
      if (i === maxRetries - 1) throw error;
      const delay = Math.pow(2, i) * 1000;
      await new Promise(r => setTimeout(r, delay));
    }
  }
}

Bulkhead

Purpose: Isolate failures

const Semaphore = require('semaphore');

const dbPool = Semaphore(10);  // Max 10 concurrent DB calls
const apiPool = Semaphore(20); // Max 20 concurrent API calls

async function callDatabase() {
  return new Promise((resolve, reject) => {
    dbPool.take(() => {
      performDbCall()
        .then(resolve)
        .catch(reject)
        .finally(() => dbPool.leave());
    });
  });
}

Timeout

Purpose: Prevent hanging requests

const timeout = (promise, ms) => {
  return Promise.race([
    promise,
    new Promise((_, reject) =>
      setTimeout(() => reject(new Error('Timeout')), ms)
    )
  ]);
};

const result = await timeout(fetchData(), 5000);

Graceful Degradation

Purpose: Provide reduced functionality instead of failing

async function getProductDetails(id) {
  try {
    // Try full data
    return await getFromPrimaryService(id);
  } catch (error) {
    // Fallback to cached/reduced data
    const cached = await getFromCache(id);
    if (cached) return { ...cached, _degraded: true };

    // Final fallback
    return getBasicDetails(id);
  }
}

Multi-Region Architecture

Active-Passive

Region A (Primary)     Region B (Standby)
     ↓                      ↓
  Active                 Standby
     ↓                      ↓
  HANA Cloud           HANA Cloud (Replica)

Failover: Manual or automated switch

Active-Active

        Global Load Balancer
              ↓
    ┌─────────┴─────────┐
    ↓                   ↓
Region A              Region B
    ↓                   ↓
HANA Cloud           HANA Cloud
    ↓                   ↓
    └───── Replication ─┘

Use Case: Highest availability requirements

Monitoring for Resilience

Key Metrics

Metric	Threshold	Action
Error rate	> 1%	Alert, investigate
Latency p99	> 2s	Scale, optimize
Circuit breaker trips	Any	Review dependencies
Retry rate	> 5%	Check downstream services

Alerting

# SAP Alert Notification example
conditions:
  - name: high-error-rate
    propertyKey: error_rate
    predicate: GREATER_THAN
    propertyValue: "0.01"

actions:
  - name: page-oncall
    type: EMAIL
    properties:
      destination: oncall@example.com

Best Practices

Design

Assume failure - Everything can fail
Design for graceful degradation
Implement health checks
Use async where possible
Plan for data consistency

Implementation

Set timeouts on all external calls
Implement retries with backoff
Use circuit breakers for dependencies
Cache aggressively where appropriate
Log and monitor all failures

Operations

Run chaos engineering tests
Practice disaster recovery
Monitor SLIs/SLOs
Automate failover where possible

Source Documentation

Developing Resilient Applications: https://github.com/SAP-docs/btp-developer-guide/blob/main/docs/developing-resilient-applications-b1b929a.md
SAP BTP Resilience Guide: https://help.sap.com/docs/btp/best-practices/developing-resilient-apps

7.3 KiB Raw Permalink Blame History Unescape Escape