Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:28:30 +08:00
commit 171acedaa4
220 changed files with 85967 additions and 0 deletions

View File

@@ -0,0 +1,200 @@
---
name: spring-boot-saga-pattern
description: Implement distributed transactions using the Saga Pattern in Spring Boot microservices. Use when building microservices requiring transaction management across multiple services, handling compensating transactions, ensuring eventual consistency, or implementing choreography or orchestration-based sagas with Spring Boot, Kafka, or Axon Framework.
allowed-tools: Read, Write, Bash
category: backend
tags: [spring-boot, saga, distributed-transactions, choreography, orchestration, microservices]
version: 1.1.0
---
# Spring Boot Saga Pattern
## When to Use
Implement this skill when:
- Building distributed transactions across multiple microservices
- Needing to replace two-phase commit (2PC) with a more scalable solution
- Handling transaction rollback when a service fails in multi-service workflows
- Ensuring eventual consistency in microservices architecture
- Implementing compensating transactions for failed operations
- Coordinating complex business processes spanning multiple services
- Choosing between choreography-based and orchestration-based saga approaches
**Trigger phrases**: distributed transactions, saga pattern, compensating transactions, microservices transaction, eventual consistency, rollback across services, orchestration pattern, choreography pattern
## Overview
The **Saga Pattern** is an architectural pattern for managing distributed transactions in microservices. Instead of using a single ACID transaction across multiple databases, a saga breaks the transaction into a sequence of local transactions. Each local transaction updates its database and publishes an event or message to trigger the next step. If a step fails, the saga executes **compensating transactions** to undo the changes made by previous steps.
### Key Architectural Decisions
When implementing a saga, make these decisions:
1. **Approach Selection**: Choose between **choreography-based** (event-driven, decoupled) or **orchestration-based** (centralized control, easier to track)
2. **Messaging Platform**: Select Kafka, RabbitMQ, or Spring Cloud Stream
3. **Framework**: Use Axon Framework, Eventuate Tram, Camunda, or Apache Camel
4. **State Persistence**: Store saga state in database for recovery and debugging
5. **Idempotency**: Ensure all operations (especially compensations) are idempotent and retryable
## Two Approaches to Implement Saga
### Choreography-Based Saga
Each microservice publishes events and listens to events from other services. **No central coordinator**.
**Best for**: Greenfield microservice applications with few participants
**Advantages**:
- Simple for small number of services
- Loose coupling between services
- No single point of failure
**Disadvantages**:
- Difficult to track workflow state
- Hard to troubleshoot and maintain
- Complexity grows with number of services
### Orchestration-Based Saga
A **central orchestrator** manages the entire transaction flow and tells services what to do.
**Best for**: Brownfield applications, complex workflows, or when centralized control is needed
**Advantages**:
- Centralized visibility and monitoring
- Easier to troubleshoot and maintain
- Clear transaction flow
- Simplified error handling
- Better for complex workflows
**Disadvantages**:
- Orchestrator can become single point of failure
- Additional infrastructure component
## Implementation Steps
### Step 1: Define Transaction Flow
Identify the sequence of operations and corresponding compensating transactions:
```
Order → Payment → Inventory → Shipment → Notification
↓ ↓ ↓ ↓ ↓
Cancel Refund Release Cancel Cancel
```
### Step 2: Choose Implementation Approach
- **Choreography**: Spring Cloud Stream with Kafka or RabbitMQ
- **Orchestration**: Axon Framework, Eventuate Tram, Camunda, or Apache Camel
### Step 3: Implement Services with Local Transactions
Each service handles its local ACID transaction and publishes events or responds to commands.
### Step 4: Implement Compensating Transactions
Every forward transaction must have a corresponding compensating transaction. Ensure **idempotency** and **retryability**.
### Step 5: Handle Failure Scenarios
Implement retry logic, timeouts, and dead-letter queues for failed messages.
## Best Practices
### Design Principles
1. **Idempotency**: Ensure compensating transactions execute safely multiple times
2. **Retryability**: Design operations to handle retries without side effects
3. **Atomicity**: Each local transaction must be atomic within its service
4. **Isolation**: Handle concurrent saga executions properly
5. **Eventual Consistency**: Accept that data becomes consistent over time
### Service Design
- Use **constructor injection** exclusively (never field injection)
- Implement services as **stateless** components
- Store saga state in persistent store (database or event store)
- Use **immutable DTOs** (Java records preferred)
- Separate domain logic from infrastructure concerns
### Error Handling
- Implement **circuit breakers** for service calls
- Use **dead-letter queues** for failed messages
- Log all saga events for debugging and monitoring
- Implement **timeout mechanisms** for long-running sagas
- Design **semantic locks** to prevent concurrent updates
### Testing
- Test happy path scenarios
- Test each failure scenario and its compensation
- Test concurrent saga executions
- Test idempotency of compensating transactions
- Use Testcontainers for integration testing
### Monitoring and Observability
- Track saga execution status and duration
- Monitor compensation transaction execution
- Alert on stuck or failed sagas
- Use distributed tracing (Spring Cloud Sleuth, Zipkin)
- Implement health checks for saga coordinators
## Technology Stack
**Spring Boot 3.x** with dependencies:
**Messaging**: Spring Cloud Stream, Apache Kafka, RabbitMQ, Spring AMQP
**Saga Frameworks**: Axon Framework (4.9.0), Eventuate Tram Sagas, Camunda, Apache Camel
**Persistence**: Spring Data JPA, Event Sourcing (optional), Transactional Outbox Pattern
**Monitoring**: Spring Boot Actuator, Micrometer, Distributed Tracing (Sleuth + Zipkin)
## Anti-Patterns to Avoid
**Tight Coupling**: Services directly calling each other instead of using events
**Missing Compensations**: Not implementing compensating transactions for every step
**Non-Idempotent Operations**: Compensations that cannot be safely retried
**Synchronous Sagas**: Waiting synchronously for each step (defeats the purpose)
**Lost Messages**: Not handling message delivery failures
**No Monitoring**: Running sagas without visibility into their status
**Shared Database**: Using same database across multiple services
**Ignoring Network Failures**: Not handling partial failures gracefully
## When NOT to Use Saga Pattern
Do not implement this pattern when:
- Single service transactions (use local ACID transactions instead)
- Strong consistency is required (consider monolith or shared database)
- Simple CRUD operations without cross-service dependencies
- Low transaction volume with simple flows
- Team lacks experience with distributed systems
## References
For detailed information, consult the following resources:
- [Saga Pattern Definition](references/01-saga-pattern-definition.md)
- [Choreography-Based Implementation](references/02-choreography-implementation.md)
- [Orchestration-Based Implementation](references/03-orchestration-implementation.md)
- [Event-Driven Architecture](references/04-event-driven-architecture.md)
- [Compensating Transactions](references/05-compensating-transactions.md)
- [State Management](references/06-state-management.md)
- [Error Handling and Retry](references/07-error-handling-retry.md)
- [Testing Strategies](references/08-testing-strategies.md)
- [Common Pitfalls and Solutions](references/09-pitfalls-solutions.md)
See also [examples.md](references/examples.md) for complete implementation examples:
- E-Commerce Order Processing (orchestration with Axon Framework)
- Food Delivery Application (choreography with Kafka and Spring Cloud Stream)
- Travel Booking System (complex orchestration with multiple compensations)
- Banking Transfer System
- Real-world microservices patterns

View File

@@ -0,0 +1,68 @@
# Saga Pattern Definition
## What is a Saga?
A **Saga** is a sequence of local transactions where each transaction updates data within a single service. Each local transaction publishes an event or message that triggers the next local transaction in the saga. If a local transaction fails, the saga executes compensating transactions to undo the changes made by preceding transactions.
## Key Characteristics
**Distributed Transactions**: Spans multiple microservices, each with its own database.
**Local Transactions**: Each service performs its own ACID transaction.
**Event-Driven**: Services communicate through events or commands.
**Compensations**: Rollback mechanism using compensating transactions.
**Eventual Consistency**: System reaches a consistent state over time.
## Saga vs Two-Phase Commit (2PC)
| Feature | Saga Pattern | Two-Phase Commit |
|---------|-------------|------------------|
| Locking | No distributed locks | Requires locks during commit |
| Performance | Better performance | Performance bottleneck |
| Scalability | Highly scalable | Limited scalability |
| Complexity | Business logic complexity | Protocol complexity |
| Failure Handling | Compensating transactions | Automatic rollback |
| Isolation | Lower isolation | Full isolation |
| NoSQL Support | Yes | No |
| Microservices Fit | Excellent | Poor |
## ACID vs BASE
**ACID** (Traditional Databases):
- **A**tomicity: All or nothing
- **C**onsistency: Valid state transitions
- **I**solation: Concurrent transactions don't interfere
- **D**urability: Committed data persists
**BASE** (Saga Pattern):
- **B**asically **A**vailable: System is available most of the time
- **S**oft state: State may change over time
- **E**ventual consistency: System becomes consistent eventually
## When to Use Saga Pattern
Use the saga pattern when:
- Building distributed transactions across multiple microservices
- Needing to replace 2PC with a more scalable solution
- Services need to maintain eventual consistency
- Handling long-running processes spanning multiple services
- Implementing compensating transactions for failed operations
## When NOT to Use Saga Pattern
Avoid the saga pattern when:
- Single service transactions (use local ACID transactions)
- Strong consistency is required immediately
- Simple CRUD operations without cross-service dependencies
- Low transaction volume with simple flows
- Team lacks experience with distributed systems
## Migration Path
Many organizations migrate from traditional monolithic systems or 2PC-based systems to sagas:
1. **From Monolith to Saga**: Identify transaction boundaries, extract services gradually, implement sagas incrementally
2. **From 2PC to Saga**: Analyze existing 2PC transactions, design compensating transactions, implement sagas in parallel, monitor and compare results before full migration

View File

@@ -0,0 +1,153 @@
# Choreography-Based Saga Implementation
## Architecture Overview
In choreography-based sagas, each service produces and listens to events. Services know what to do when they receive an event. **No central coordinator** manages the flow.
```
Service A → Event → Service B → Event → Service C
↓ ↓ ↓
Event Event Event
↓ ↓ ↓
Compensation Compensation Compensation
```
## Event Flow
### Success Path
1. **Order Service** creates order → publishes `OrderCreated` event
2. **Payment Service** listens → processes payment → publishes `PaymentProcessed` event
3. **Inventory Service** listens → reserves inventory → publishes `InventoryReserved` event
4. **Shipment Service** listens → prepares shipment → publishes `ShipmentPrepared` event
### Failure Path (When Payment Fails)
1. **Payment Service** publishes `PaymentFailed` event
2. **Order Service** listens → cancels order → publishes `OrderCancelled` event
3. All other services respond to cancellation with cleanup
## Event Publisher
```java
@Component
public class OrderEventPublisher {
private final StreamBridge streamBridge;
public OrderEventPublisher(StreamBridge streamBridge) {
this.streamBridge = streamBridge;
}
public void publishOrderCreatedEvent(String orderId, BigDecimal amount, String itemId) {
OrderCreatedEvent event = new OrderCreatedEvent(orderId, amount, itemId);
streamBridge.send("orderCreated-out-0",
MessageBuilder
.withPayload(event)
.setHeader(MessageHeaders.CONTENT_TYPE, MimeTypeUtils.APPLICATION_JSON)
.build());
}
}
```
## Event Listener
```java
@Component
public class PaymentEventListener {
@Bean
public Consumer<OrderCreatedEvent> handleOrderCreatedEvent() {
return event -> processPayment(event.getOrderId());
}
private void processPayment(String orderId) {
// Payment processing logic
}
}
```
## Event Classes
```java
public record OrderCreatedEvent(
String orderId,
BigDecimal amount,
String itemId
) {}
public record PaymentProcessedEvent(
String paymentId,
String orderId,
String itemId
) {}
public record PaymentFailedEvent(
String paymentId,
String orderId,
String itemId,
String reason
) {}
```
## Spring Cloud Stream Configuration
```yaml
spring:
cloud:
stream:
bindings:
orderCreated-out-0:
destination: order-events
paymentProcessed-out-0:
destination: payment-events
paymentFailed-out-0:
destination: payment-events
kafka:
binder:
brokers: localhost:9092
```
## Maven Dependencies
```xml
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-stream</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-stream-binder-kafka</artifactId>
</dependency>
```
## Gradle Dependencies
```groovy
implementation 'org.springframework.cloud:spring-cloud-stream'
implementation 'org.springframework.cloud:spring-cloud-stream-binder-kafka'
```
## Advantages and Disadvantages
### Advantages
- **Simple** for small number of services
- **Loose coupling** between services
- **No single point of failure**
- Each service is independently deployable
### Disadvantages
- **Difficult to track workflow state** - distributed across services
- **Hard to troubleshoot** - following event flow is complex
- **Complexity grows** with number of services
- **Distributed source of truth** - saga state not centralized
## When to Use Choreography
Use choreography-based sagas when:
- Microservices are few in number (< 5 services per saga)
- Loose coupling is critical
- Team is experienced with event-driven architecture
- System can handle eventual consistency
- Workflow doesn't need centralized monitoring

View File

@@ -0,0 +1,232 @@
# Orchestration-Based Saga Implementation
## Architecture Overview
A **central orchestrator** (Saga Coordinator) manages the entire transaction flow, sending commands to services and handling responses.
```
Saga Orchestrator
/ | \
Service A Service B Service C
```
## Orchestrator Responsibilities
1. **Command Dispatch**: Send commands to services
2. **Response Handling**: Process service responses
3. **State Management**: Track saga execution state
4. **Compensation Coordination**: Trigger compensating transactions on failure
5. **Timeout Management**: Handle service timeouts
6. **Retry Logic**: Manage retry attempts
## Axon Framework Implementation
### Saga Class
```java
@Saga
public class OrderSaga {
@Autowired
private transient CommandGateway commandGateway;
@StartSaga
@SagaEventHandler(associationProperty = "orderId")
public void handle(OrderCreatedEvent event) {
String paymentId = UUID.randomUUID().toString();
ProcessPaymentCommand command = new ProcessPaymentCommand(
paymentId,
event.getOrderId(),
event.getAmount(),
event.getItemId()
);
commandGateway.send(command);
}
@SagaEventHandler(associationProperty = "orderId")
public void handle(PaymentProcessedEvent event) {
ReserveInventoryCommand command = new ReserveInventoryCommand(
event.getOrderId(),
event.getItemId()
);
commandGateway.send(command);
}
@SagaEventHandler(associationProperty = "orderId")
public void handle(PaymentFailedEvent event) {
CancelOrderCommand command = new CancelOrderCommand(event.getOrderId());
commandGateway.send(command);
end();
}
@SagaEventHandler(associationProperty = "orderId")
public void handle(InventoryReservedEvent event) {
PrepareShipmentCommand command = new PrepareShipmentCommand(
event.getOrderId(),
event.getItemId()
);
commandGateway.send(command);
}
@EndSaga
@SagaEventHandler(associationProperty = "orderId")
public void handle(OrderCompletedEvent event) {
// Saga completed successfully
}
}
```
### Aggregate for Order Service
```java
@Aggregate
public class OrderAggregate {
@AggregateIdentifier
private String orderId;
private OrderStatus status;
public OrderAggregate() {
}
@CommandHandler
public OrderAggregate(CreateOrderCommand command) {
apply(new OrderCreatedEvent(
command.getOrderId(),
command.getAmount(),
command.getItemId()
));
}
@EventSourcingHandler
public void on(OrderCreatedEvent event) {
this.orderId = event.getOrderId();
this.status = OrderStatus.PENDING;
}
@CommandHandler
public void handle(CancelOrderCommand command) {
apply(new OrderCancelledEvent(command.getOrderId()));
}
@EventSourcingHandler
public void on(OrderCancelledEvent event) {
this.status = OrderStatus.CANCELLED;
}
}
```
### Aggregate for Payment Service
```java
@Aggregate
public class PaymentAggregate {
@AggregateIdentifier
private String paymentId;
public PaymentAggregate() {
}
@CommandHandler
public PaymentAggregate(ProcessPaymentCommand command) {
this.paymentId = command.getPaymentId();
if (command.getAmount().compareTo(BigDecimal.ZERO) <= 0) {
apply(new PaymentFailedEvent(
command.getPaymentId(),
command.getOrderId(),
command.getItemId(),
"Payment amount must be greater than zero"
));
} else {
apply(new PaymentProcessedEvent(
command.getPaymentId(),
command.getOrderId(),
command.getItemId()
));
}
}
}
```
## Axon Configuration
```yaml
axon:
serializer:
general: jackson
events: jackson
messages: jackson
eventhandling:
processors:
order-processor:
mode: tracking
source: eventBus
axonserver:
enabled: false
```
## Maven Dependencies for Axon
```xml
<dependency>
<groupId>org.axonframework</groupId>
<artifactId>axon-spring-boot-starter</artifactId>
<version>4.9.0</version> // Use latest stable version
</dependency>
```
## Advantages and Disadvantages
### Advantages
- **Centralized visibility** - easy to see workflow status
- **Easier to troubleshoot** - single place to analyze flow
- **Clear transaction flow** - orchestrator defines sequence
- **Simplified error handling** - centralized compensation logic
- **Better for complex workflows** - easier to manage many steps
### Disadvantages
- **Orchestrator becomes single point of failure** - can be mitigated with clustering
- **Additional infrastructure component** - more complexity in deployment
- **Potential tight coupling** - if orchestrator knows too much about services
## Eventuate Tram Sagas
Eventuate Tram is an alternative to Axon for orchestration-based sagas:
```xml
<dependency>
<groupId>io.eventuate.tram.sagas</groupId>
<artifactId>eventuate-tram-sagas-spring-starter</artifactId>
<version>0.28.0</version> // Use latest stable version
</dependency>
```
## Camunda for BPMN-Based Orchestration
Use Camunda when visual workflow design is beneficial:
**Features**:
- Visual workflow design
- BPMN 2.0 standard
- Human tasks support
- Complex workflow modeling
**Use When**:
- Business process modeling needed
- Visual workflow design preferred
- Human approval steps required
- Complex orchestration logic
## When to Use Orchestration
Use orchestration-based sagas when:
- Building brownfield applications with existing microservices
- Handling complex workflows with many steps
- Centralized control and monitoring is critical
- Organization wants clear visibility into saga execution
- Need for human intervention in workflow

View File

@@ -0,0 +1,262 @@
# Event-Driven Architecture in Sagas
## Event Types
### Domain Events
Represent business facts that happened within a service:
```java
public record OrderCreatedEvent(
String orderId,
Instant createdAt,
BigDecimal amount
) implements DomainEvent {}
```
### Integration Events
Communication between bounded contexts (microservices):
```java
public record PaymentRequestedEvent(
String orderId,
String paymentId,
BigDecimal amount
) implements IntegrationEvent {}
```
### Command Events
Request for action by another service:
```java
public record ProcessPaymentCommand(
String paymentId,
String orderId,
BigDecimal amount
) {}
```
## Event Versioning
Handle event schema evolution using versioning:
```java
public record OrderCreatedEventV1(
String orderId,
BigDecimal amount
) {}
public record OrderCreatedEventV2(
String orderId,
BigDecimal amount,
String customerId,
Instant timestamp
) {}
// Event Upcaster
public class OrderEventUpcaster implements EventUpcaster {
@Override
public Stream<IntermediateEventRepresentation> upcast(
Stream<IntermediateEventRepresentation> eventStream) {
return eventStream.map(event -> {
if (event.getType().getName().equals("OrderCreatedEventV1")) {
return upcastV1ToV2(event);
}
return event;
});
}
}
```
## Event Store
Store all events for audit trail and recovery:
```java
@Entity
@Table(name = "saga_events")
public class SagaEvent {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
@Column(nullable = false)
private String sagaId;
@Column(nullable = false)
private String eventType;
@Column(columnDefinition = "TEXT")
private String payload;
@Column(nullable = false)
private Instant timestamp;
@Column(nullable = false)
private Integer version;
}
```
## Event Publishing Patterns
### Outbox Pattern (Transactional)
Ensure atomic update of database and event publishing:
```java
@Service
public class OrderService {
private final OrderRepository orderRepository;
private final OutboxRepository outboxRepository;
@Transactional
public void createOrder(CreateOrderRequest request) {
// 1. Create and save order
Order order = new Order(...);
orderRepository.save(order);
// 2. Create outbox entry in same transaction
OutboxEntry entry = new OutboxEntry(
"OrderCreated",
order.getId(),
new OrderCreatedEvent(...)
);
outboxRepository.save(entry);
}
}
@Component
public class OutboxPoller {
@Scheduled(fixedDelay = 1000)
public void pollAndPublish() {
List<OutboxEntry> unpublished = outboxRepository.findUnpublished();
unpublished.forEach(entry -> {
eventPublisher.publish(entry.getEvent());
outboxRepository.markAsPublished(entry.getId());
});
}
}
```
### Direct Publishing Pattern
Publish events immediately after transaction:
```java
@Service
public class OrderService {
private final OrderRepository orderRepository;
private final EventPublisher eventPublisher;
@Transactional
public void createOrder(CreateOrderRequest request) {
Order order = new Order(...);
orderRepository.save(order);
// Publish event after transaction commits
TransactionSynchronizationManager.registerSynchronization(
new TransactionSynchronization() {
@Override
public void afterCommit() {
eventPublisher.publish(new OrderCreatedEvent(...));
}
}
);
}
}
```
## Event Sourcing
Store all state changes as events instead of current state:
**Benefits**:
- Complete audit trail
- Time-travel debugging
- Natural fit for sagas
- Event replay for recovery
**Implementation**:
```java
@Entity
public class Order {
@Id
private String orderId;
@OneToMany(cascade = CascadeType.ALL, orphanRemoval = true)
private List<DomainEvent> events = new ArrayList<>();
public void createOrder(...) {
apply(new OrderCreatedEvent(...));
}
protected void apply(DomainEvent event) {
if (event instanceof OrderCreatedEvent e) {
this.orderId = e.orderId();
this.status = OrderStatus.PENDING;
}
events.add(event);
}
public List<DomainEvent> getUncommittedEvents() {
return new ArrayList<>(events);
}
public void clearUncommittedEvents() {
events.clear();
}
}
```
## Event Ordering and Consistency
### Maintain Event Order
Use partitioning to maintain order within a saga:
```java
@Bean
public ProducerFactory<String, Object> producerFactory() {
Map<String, Object> config = new HashMap<>();
config.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
StringSerializer.class);
return new DefaultKafkaProducerFactory<>(config);
}
@Service
public class EventPublisher {
private final KafkaTemplate<String, Object> kafkaTemplate;
public void publish(DomainEvent event) {
// Use sagaId as key to maintain order
kafkaTemplate.send("events", event.getSagaId(), event);
}
}
```
### Handle Out-of-Order Events
Use saga state to detect and handle out-of-order events:
```java
@SagaEventHandler(associationProperty = "orderId")
public void handle(PaymentProcessedEvent event) {
if (saga.getStatus() != SagaStatus.AWAITING_PAYMENT) {
// Out of order event, ignore or queue for retry
logger.warn("Unexpected event in state: {}", saga.getStatus());
return;
}
// Process event
}
```

View File

@@ -0,0 +1,299 @@
# Compensating Transactions
## Design Principles
### Idempotency
Execute multiple times with same result:
```java
public void cancelPayment(String paymentId) {
Payment payment = paymentRepository.findById(paymentId)
.orElse(null);
if (payment == null) {
// Already cancelled or doesn't exist
return;
}
if (payment.getStatus() == PaymentStatus.CANCELLED) {
// Already cancelled, idempotent
return;
}
payment.setStatus(PaymentStatus.CANCELLED);
paymentRepository.save(payment);
// Refund logic here
}
```
### Retryability
Design operations to handle retries without side effects:
```java
@Retryable(
value = {TransientException.class},
maxAttempts = 3,
backoff = @Backoff(delay = 1000, multiplier = 2)
)
public void releaseInventory(String itemId, int quantity) {
// Use set operations for idempotency
InventoryItem item = inventoryRepository.findById(itemId)
.orElseThrow();
item.increaseAvailableQuantity(quantity);
inventoryRepository.save(item);
}
```
## Compensation Strategies
### Backward Recovery
Undo completed steps in reverse order:
```java
@SagaEventHandler(associationProperty = "orderId")
public void handle(PaymentFailedEvent event) {
logger.error("Payment failed, initiating compensation");
// Step 1: Cancel shipment preparation
commandGateway.send(new CancelShipmentCommand(event.getOrderId()));
// Step 2: Release inventory
commandGateway.send(new ReleaseInventoryCommand(event.getOrderId()));
// Step 3: Cancel order
commandGateway.send(new CancelOrderCommand(event.getOrderId()));
end();
}
```
### Forward Recovery
Retry failed operation with exponential backoff:
```java
@SagaEventHandler(associationProperty = "orderId")
public void handle(PaymentTransientFailureEvent event) {
if (event.getRetryCount() < MAX_RETRIES) {
// Retry payment with backoff
ProcessPaymentCommand retryCommand = new ProcessPaymentCommand(
event.getPaymentId(),
event.getOrderId(),
event.getAmount()
);
commandGateway.send(retryCommand);
} else {
// After max retries, compensate
handlePaymentFailure(event);
}
}
```
## Semantic Lock Pattern
Prevent concurrent modifications during saga execution:
```java
@Entity
public class Order {
@Id
private String orderId;
@Enumerated(EnumType.STRING)
private OrderStatus status;
@Version
private Long version;
private Instant lockedUntil;
public boolean tryLock(Duration lockDuration) {
if (isLocked()) {
return false;
}
this.lockedUntil = Instant.now().plus(lockDuration);
return true;
}
public boolean isLocked() {
return lockedUntil != null &&
Instant.now().isBefore(lockedUntil);
}
public void unlock() {
this.lockedUntil = null;
}
}
```
## Compensation in Axon Framework
```java
@Saga
public class OrderSaga {
private String orderId;
private String paymentId;
private String inventoryId;
private boolean compensating = false;
@SagaEventHandler(associationProperty = "orderId")
public void handle(InventoryReservationFailedEvent event) {
logger.error("Inventory reservation failed");
compensating = true;
// Compensate: refund payment
RefundPaymentCommand refundCommand = new RefundPaymentCommand(
paymentId,
event.getOrderId(),
event.getReservedAmount(),
"Inventory unavailable"
);
commandGateway.send(refundCommand);
}
@SagaEventHandler(associationProperty = "orderId")
public void handle(PaymentRefundedEvent event) {
if (!compensating) return;
logger.info("Payment refunded, cancelling order");
// Compensate: cancel order
CancelOrderCommand command = new CancelOrderCommand(
event.getOrderId(),
"Inventory unavailable - payment refunded"
);
commandGateway.send(command);
}
@EndSaga
@SagaEventHandler(associationProperty = "orderId")
public void handle(OrderCancelledEvent event) {
logger.info("Saga completed with compensation");
}
}
```
## Handling Compensation Failures
Handle cases where compensation itself fails:
```java
@Service
public class CompensationService {
private final DeadLetterQueueService dlqService;
public void handleCompensationFailure(String sagaId, String step, Exception cause) {
logger.error("Compensation failed for saga {} at step {}", sagaId, step, cause);
// Send to dead letter queue for manual intervention
dlqService.send(new FailedCompensation(
sagaId,
step,
cause.getMessage(),
Instant.now()
));
// Create alert for operations team
alertingService.alert(
"Compensation Failure",
"Saga " + sagaId + " failed compensation at " + step
);
}
}
```
## Testing Compensation
Verify that compensation produces expected results:
```java
@Test
void shouldCompensateWhenPaymentFails() {
String orderId = "order-123";
String paymentId = "payment-456";
// Arrange: execute payment
Payment payment = new Payment(paymentId, orderId, BigDecimal.TEN);
paymentRepository.save(payment);
orderRepository.save(new Order(orderId, OrderStatus.PENDING));
// Act: compensate
paymentService.cancelPayment(paymentId);
// Assert: verify idempotency
paymentService.cancelPayment(paymentId);
Payment result = paymentRepository.findById(paymentId).orElseThrow();
assertThat(result.getStatus()).isEqualTo(PaymentStatus.CANCELLED);
}
```
## Common Compensation Patterns
### Inventory Release
```java
@Service
public class InventoryService {
public void releaseInventory(String orderId) {
Order order = orderRepository.findById(orderId).orElseThrow();
order.getItems().forEach(item -> {
InventoryItem inventoryItem = inventoryRepository
.findById(item.getProductId())
.orElseThrow();
inventoryItem.increaseAvailableQuantity(item.getQuantity());
inventoryRepository.save(inventoryItem);
});
}
}
```
### Payment Refund
```java
@Service
public class PaymentService {
public void refundPayment(String paymentId) {
Payment payment = paymentRepository.findById(paymentId)
.orElseThrow();
if (payment.getStatus() == PaymentStatus.PROCESSED) {
payment.setStatus(PaymentStatus.REFUNDED);
paymentGateway.refund(payment.getTransactionId());
paymentRepository.save(payment);
}
}
}
```
### Order Cancellation
```java
@Service
public class OrderService {
public void cancelOrder(String orderId, String reason) {
Order order = orderRepository.findById(orderId)
.orElseThrow();
order.setStatus(OrderStatus.CANCELLED);
order.setCancellationReason(reason);
order.setCancelledAt(Instant.now());
orderRepository.save(order);
}
}
```

View File

@@ -0,0 +1,294 @@
# State Management in Sagas
## Saga State Entity
Persist saga state for recovery and monitoring:
```java
@Entity
@Table(name = "saga_state")
public class SagaState {
@Id
private String sagaId;
@Enumerated(EnumType.STRING)
private SagaStatus status;
@Column(columnDefinition = "TEXT")
private String currentStep;
@Column(columnDefinition = "TEXT")
private String compensationSteps;
private Instant startedAt;
private Instant completedAt;
@Version
private Long version;
}
public enum SagaStatus {
STARTED,
PROCESSING,
COMPENSATING,
COMPLETED,
FAILED,
CANCELLED
}
```
## Saga State Machine with Spring Statemachine
Define saga state transitions explicitly:
```java
@Configuration
@EnableStateMachine
public class SagaStateMachineConfig
extends StateMachineConfigurerAdapter<SagaStatus, SagaEvent> {
@Override
public void configure(
StateMachineStateConfigurer<SagaStatus, SagaEvent> states)
throws Exception {
states
.withStates()
.initial(SagaStatus.STARTED)
.states(EnumSet.allOf(SagaStatus.class))
.end(SagaStatus.COMPLETED)
.end(SagaStatus.FAILED);
}
@Override
public void configure(
StateMachineTransitionConfigurer<SagaStatus, SagaEvent> transitions)
throws Exception {
transitions
.withExternal()
.source(SagaStatus.STARTED)
.target(SagaStatus.PROCESSING)
.event(SagaEvent.ORDER_CREATED)
.and()
.withExternal()
.source(SagaStatus.PROCESSING)
.target(SagaStatus.COMPLETED)
.event(SagaEvent.ALL_STEPS_COMPLETED)
.and()
.withExternal()
.source(SagaStatus.PROCESSING)
.target(SagaStatus.COMPENSATING)
.event(SagaEvent.STEP_FAILED)
.and()
.withExternal()
.source(SagaStatus.COMPENSATING)
.target(SagaStatus.FAILED)
.event(SagaEvent.COMPENSATION_COMPLETED);
}
}
```
## State Transitions
### Successful Saga Flow
```
STARTED → PROCESSING → COMPLETED
```
### Failed Saga with Compensation
```
STARTED → PROCESSING → COMPENSATING → FAILED
```
### Saga with Retry
```
STARTED → PROCESSING → PROCESSING (retry) → COMPLETED
```
## Persisting Saga Context
Store context data for saga execution:
```java
@Entity
@Table(name = "saga_context")
public class SagaContext {
@Id
private String sagaId;
@Column(columnDefinition = "TEXT")
private String contextData; // JSON-serialized
private Instant createdAt;
private Instant updatedAt;
public <T> T getContextData(Class<T> type) {
return JsonUtils.fromJson(contextData, type);
}
public void setContextData(Object data) {
this.contextData = JsonUtils.toJson(data);
}
}
@Service
public class SagaContextService {
private final SagaContextRepository repository;
public void saveContext(String sagaId, Object context) {
SagaContext sagaContext = new SagaContext(sagaId);
sagaContext.setContextData(context);
repository.save(sagaContext);
}
public <T> T loadContext(String sagaId, Class<T> type) {
return repository.findById(sagaId)
.map(ctx -> ctx.getContextData(type))
.orElseThrow(() -> new SagaContextNotFoundException(sagaId));
}
}
```
## Handling Saga Timeouts
Detect and handle sagas that exceed expected duration:
```java
@Service
public class SagaTimeoutHandler {
private final SagaStateRepository repository;
private static final Duration MAX_SAGA_DURATION = Duration.ofMinutes(30);
@Scheduled(fixedDelay = 60000) // Check every minute
public void detectTimeouts() {
Instant timeout = Instant.now().minus(MAX_SAGA_DURATION);
List<SagaState> timedOutSagas = repository
.findByStatusAndStartedAtBefore(SagaStatus.PROCESSING, timeout);
timedOutSagas.forEach(saga -> {
logger.warn("Saga {} timed out", saga.getSagaId());
compensateSaga(saga);
});
}
private void compensateSaga(SagaState saga) {
saga.setStatus(SagaStatus.COMPENSATING);
repository.save(saga);
// Trigger compensation logic
}
}
```
## Saga Recovery
Recover sagas from failures:
```java
@Service
public class SagaRecoveryService {
private final SagaStateRepository stateRepository;
private final CommandGateway commandGateway;
@Scheduled(fixedDelay = 30000) // Check every 30 seconds
public void recoverFailedSagas() {
List<SagaState> failedSagas = stateRepository
.findByStatus(SagaStatus.FAILED);
failedSagas.forEach(saga -> {
if (canBeRetried(saga)) {
logger.info("Retrying saga {}", saga.getSagaId());
retrySaga(saga);
}
});
}
private boolean canBeRetried(SagaState saga) {
return saga.getRetryCount() < 3;
}
private void retrySaga(SagaState saga) {
saga.setStatus(SagaStatus.STARTED);
saga.setRetryCount(saga.getRetryCount() + 1);
stateRepository.save(saga);
// Send retry command
}
}
```
## Saga State Query
Query sagas for monitoring:
```java
@Repository
public interface SagaStateRepository extends JpaRepository<SagaState, String> {
List<SagaState> findByStatus(SagaStatus status);
List<SagaState> findByStatusAndStartedAtBefore(
SagaStatus status, Instant before);
Page<SagaState> findByStatus(SagaStatus status, Pageable pageable);
long countByStatus(SagaStatus status);
long countByStatusAndStartedAtBefore(SagaStatus status, Instant before);
}
@RestController
@RequestMapping("/api/sagas")
public class SagaMonitoringController {
private final SagaStateRepository repository;
@GetMapping("/status/{status}")
public List<SagaState> getSagasByStatus(
@PathVariable SagaStatus status) {
return repository.findByStatus(status);
}
@GetMapping("/stuck")
public List<SagaState> getStuckSagas() {
Instant oneHourAgo = Instant.now().minus(Duration.ofHours(1));
return repository.findByStatusAndStartedAtBefore(
SagaStatus.PROCESSING, oneHourAgo);
}
}
```
## Database Schema for State Management
```sql
CREATE TABLE saga_state (
saga_id VARCHAR(255) PRIMARY KEY,
status VARCHAR(50) NOT NULL,
current_step TEXT,
compensation_steps TEXT,
started_at TIMESTAMP NOT NULL,
completed_at TIMESTAMP,
version BIGINT,
INDEX idx_status (status),
INDEX idx_started_at (started_at)
);
CREATE TABLE saga_context (
saga_id VARCHAR(255) PRIMARY KEY,
context_data LONGTEXT,
created_at TIMESTAMP NOT NULL,
updated_at TIMESTAMP,
FOREIGN KEY (saga_id) REFERENCES saga_state(saga_id)
);
CREATE INDEX idx_saga_state_status_started
ON saga_state(status, started_at);
```

View File

@@ -0,0 +1,323 @@
# Error Handling and Retry Strategies
## Retry Configuration
Use Spring Retry for automatic retry logic:
```java
@Configuration
@EnableRetry
public class RetryConfig {
@Bean
public RetryTemplate retryTemplate() {
RetryTemplate retryTemplate = new RetryTemplate();
FixedBackOffPolicy backOffPolicy = new FixedBackOffPolicy();
backOffPolicy.setBackOffPeriod(2000L); // 2 second delay
ExponentialBackOffPolicy exponentialBackOff = new ExponentialBackOffPolicy();
exponentialBackOff.setInitialInterval(1000L);
exponentialBackOff.setMultiplier(2.0);
exponentialBackOff.setMaxInterval(10000L);
SimpleRetryPolicy retryPolicy = new SimpleRetryPolicy();
retryPolicy.setMaxAttempts(3);
retryTemplate.setBackOffPolicy(exponentialBackOff);
retryTemplate.setRetryPolicy(retryPolicy);
return retryTemplate;
}
}
```
## Retry with @Retryable
```java
@Service
public class OrderService {
@Retryable(
value = {TransientException.class},
maxAttempts = 3,
backoff = @Backoff(delay = 1000, multiplier = 2)
)
public void processOrder(String orderId) {
// Order processing logic
}
@Recover
public void recover(TransientException ex, String orderId) {
logger.error("Order processing failed after retries: {}", orderId, ex);
// Fallback logic
}
}
```
## Circuit Breaker with Resilience4j
Prevent cascading failures:
```java
@Configuration
public class CircuitBreakerConfig {
@Bean
public CircuitBreakerRegistry circuitBreakerRegistry() {
CircuitBreakerConfig config = CircuitBreakerConfig.custom()
.failureRateThreshold(50) // Open after 50% failures
.waitDurationInOpenState(Duration.ofMillis(1000))
.slidingWindowSize(2) // Check last 2 calls
.build();
return CircuitBreakerRegistry.of(config);
}
}
@Service
public class PaymentService {
private final CircuitBreaker circuitBreaker;
public PaymentService(CircuitBreakerRegistry registry) {
this.circuitBreaker = registry.circuitBreaker("payment");
}
public PaymentResult processPayment(PaymentRequest request) {
return circuitBreaker.executeSupplier(
() -> callPaymentGateway(request)
);
}
private PaymentResult callPaymentGateway(PaymentRequest request) {
// Call external payment gateway
return new PaymentResult(...);
}
}
```
## Dead Letter Queue
Handle failed messages:
```java
@Configuration
public class DeadLetterQueueConfig {
@Bean
public NewTopic deadLetterTopic() {
return new NewTopic("saga-dlq", 1, (short) 1);
}
}
@Component
public class SagaErrorHandler implements ConsumerAwareErrorHandler {
private final KafkaTemplate<String, Object> kafkaTemplate;
@Override
public void handle(Exception thrownException,
List<ConsumerRecord<?, ?>> records,
Consumer<?, ?> consumer,
MessageListenerContainer container) {
records.forEach(record -> {
logger.error("Processing failed for message: {}", record.key());
kafkaTemplate.send("saga-dlq", record.key(), record.value());
});
}
}
```
## Timeout Handling
Define and enforce timeout policies:
```java
@Service
public class TimeoutHandler {
private final SagaStateRepository sagaStateRepository;
private static final Duration STEP_TIMEOUT = Duration.ofSeconds(30);
@Scheduled(fixedDelay = 5000)
public void checkForTimeouts() {
Instant timeoutThreshold = Instant.now().minus(STEP_TIMEOUT);
List<SagaState> timedOutSagas = sagaStateRepository
.findByStatusAndUpdatedAtBefore(SagaStatus.PROCESSING, timeoutThreshold);
timedOutSagas.forEach(saga -> {
logger.warn("Saga {} timed out at step {}",
saga.getSagaId(), saga.getCurrentStep());
compensateSaga(saga);
});
}
private void compensateSaga(SagaState saga) {
saga.setStatus(SagaStatus.COMPENSATING);
sagaStateRepository.save(saga);
}
}
```
## Exponential Backoff
Prevent overwhelming downstream services:
```java
@Service
public class BackoffService {
public Duration calculateBackoff(int attemptNumber) {
long baseDelay = 1000; // 1 second
long delay = baseDelay * (long) Math.pow(2, attemptNumber - 1);
long maxDelay = 30000; // 30 seconds
return Duration.ofMillis(Math.min(delay, maxDelay));
}
@Retryable(
value = {ServiceUnavailableException.class},
maxAttempts = 5,
backoff = @Backoff(
delay = 1000,
multiplier = 2.0,
maxDelay = 30000
)
)
public void callExternalService() {
// External service call
}
}
```
## Idempotent Retry
Ensure retries don't cause duplicate processing:
```java
@Service
public class IdempotentPaymentService {
private final PaymentRepository paymentRepository;
private final Map<String, PaymentResult> processedPayments = new ConcurrentHashMap<>();
public PaymentResult processPayment(String paymentId, BigDecimal amount) {
// Check if already processed
if (processedPayments.containsKey(paymentId)) {
return processedPayments.get(paymentId);
}
// Check database
Optional<Payment> existing = paymentRepository.findById(paymentId);
if (existing.isPresent()) {
return new PaymentResult(existing.get());
}
// Process payment
PaymentResult result = callPaymentGateway(paymentId, amount);
// Cache and persist
processedPayments.put(paymentId, result);
paymentRepository.save(new Payment(paymentId, amount, result.getStatus()));
return result;
}
}
```
## Global Exception Handler
Centralize error handling:
```java
@RestControllerAdvice
public class GlobalExceptionHandler {
@ExceptionHandler(SagaExecutionException.class)
public ResponseEntity<ErrorResponse> handleSagaError(
SagaExecutionException ex) {
return ResponseEntity
.status(HttpStatus.UNPROCESSABLE_ENTITY)
.body(new ErrorResponse(
"SAGA_EXECUTION_FAILED",
ex.getMessage(),
ex.getSagaId()
));
}
@ExceptionHandler(ServiceUnavailableException.class)
public ResponseEntity<ErrorResponse> handleServiceUnavailable(
ServiceUnavailableException ex) {
return ResponseEntity
.status(HttpStatus.SERVICE_UNAVAILABLE)
.body(new ErrorResponse(
"SERVICE_UNAVAILABLE",
"Required service is temporarily unavailable"
));
}
@ExceptionHandler(TimeoutException.class)
public ResponseEntity<ErrorResponse> handleTimeout(
TimeoutException ex) {
return ResponseEntity
.status(HttpStatus.REQUEST_TIMEOUT)
.body(new ErrorResponse(
"REQUEST_TIMEOUT",
"Request timed out after " + ex.getDuration()
));
}
}
public record ErrorResponse(
String code,
String message,
String details
) {
public ErrorResponse(String code, String message) {
this(code, message, null);
}
}
```
## Monitoring Error Rates
Track failure metrics:
```java
@Component
public class SagaErrorMetrics {
private final MeterRegistry meterRegistry;
public SagaErrorMetrics(MeterRegistry meterRegistry) {
this.meterRegistry = meterRegistry;
}
public void recordSagaFailure(String sagaType) {
Counter.builder("saga.failure")
.tag("type", sagaType)
.register(meterRegistry)
.increment();
}
public void recordRetry(String sagaType) {
Counter.builder("saga.retry")
.tag("type", sagaType)
.register(meterRegistry)
.increment();
}
public void recordTimeout(String sagaType) {
Counter.builder("saga.timeout")
.tag("type", sagaType)
.register(meterRegistry)
.increment();
}
}
```

View File

@@ -0,0 +1,320 @@
# Testing Strategies for Sagas
## Unit Testing Saga Logic
Test saga behavior with Axon test fixtures:
```java
@Test
void shouldDispatchPaymentCommandWhenOrderCreated() {
// Arrange
String orderId = UUID.randomUUID().toString();
String paymentId = UUID.randomUUID().toString();
SagaTestFixture<OrderSaga> fixture = new SagaTestFixture<>(OrderSaga.class);
// Act & Assert
fixture
.givenNoPriorActivity()
.whenPublishingA(new OrderCreatedEvent(orderId, BigDecimal.TEN, "item-1"))
.expectDispatchedCommands(new ProcessPaymentCommand(paymentId, orderId, BigDecimal.TEN));
}
@Test
void shouldCompensateWhenPaymentFails() {
String orderId = UUID.randomUUID().toString();
String paymentId = UUID.randomUUID().toString();
SagaTestFixture<OrderSaga> fixture = new SagaTestFixture<>(OrderSaga.class);
fixture
.givenNoPriorActivity()
.whenPublishingA(new OrderCreatedEvent(orderId, BigDecimal.TEN, "item-1"))
.whenPublishingA(new PaymentFailedEvent(paymentId, orderId, "item-1", "Insufficient funds"))
.expectDispatchedCommands(new CancelOrderCommand(orderId))
.expectScheduledEventOfType(OrderSaga.class, null);
}
```
## Testing Event Publishing
Verify events are published correctly:
```java
@SpringBootTest
@WebMvcTest
class OrderServiceTest {
@MockBean
private EventPublisher eventPublisher;
@InjectMocks
private OrderService orderService;
@Test
void shouldPublishOrderCreatedEvent() {
// Arrange
CreateOrderRequest request = new CreateOrderRequest("cust-1", BigDecimal.TEN);
// Act
String orderId = orderService.createOrder(request);
// Assert
verify(eventPublisher).publish(
argThat(event -> event instanceof OrderCreatedEvent &&
((OrderCreatedEvent) event).orderId().equals(orderId))
);
}
}
```
## Integration Testing with Testcontainers
Test complete saga flow with real services:
```java
@SpringBootTest
@Testcontainers
class SagaIntegrationTest {
@Container
static KafkaContainer kafka = new KafkaContainer(
DockerImageName.parse("confluentinc/cp-kafka:7.4.0")
);
@Container
static PostgreSQLContainer<?> postgres = new PostgreSQLContainer<>(
"postgres:15-alpine"
);
@DynamicPropertySource
static void overrideProperties(DynamicPropertyRegistry registry) {
registry.add("spring.kafka.bootstrap-servers", kafka::getBootstrapServers);
registry.add("spring.datasource.url", postgres::getJdbcUrl);
registry.add("spring.datasource.username", postgres::getUsername);
registry.add("spring.datasource.password", postgres::getPassword);
}
@Test
void shouldCompleteOrderSagaSuccessfully(@Autowired OrderService orderService,
@Autowired OrderRepository orderRepository,
@Autowired EventPublisher eventPublisher) {
// Arrange
CreateOrderRequest request = new CreateOrderRequest("cust-1", BigDecimal.TEN);
// Act
String orderId = orderService.createOrder(request);
// Wait for async processing
Thread.sleep(2000);
// Assert
Order order = orderRepository.findById(orderId).orElseThrow();
assertThat(order.getStatus()).isEqualTo(OrderStatus.COMPLETED);
}
}
```
## Testing Idempotency
Verify operations produce same results on retry:
```java
@Test
void compensationShouldBeIdempotent() {
// Arrange
String paymentId = "payment-123";
Payment payment = new Payment(paymentId, "order-1", BigDecimal.TEN);
paymentRepository.save(payment);
// Act - First compensation
paymentService.cancelPayment(paymentId);
Payment firstResult = paymentRepository.findById(paymentId).orElseThrow();
// Act - Second compensation (should be idempotent)
paymentService.cancelPayment(paymentId);
Payment secondResult = paymentRepository.findById(paymentId).orElseThrow();
// Assert
assertThat(firstResult).isEqualTo(secondResult);
assertThat(secondResult.getStatus()).isEqualTo(PaymentStatus.CANCELLED);
assertThat(secondResult.getVersion()).isEqualTo(firstResult.getVersion());
}
```
## Testing Concurrent Sagas
Verify saga isolation under concurrent execution:
```java
@Test
void shouldHandleConcurrentSagaExecutions() throws InterruptedException {
// Arrange
int numThreads = 10;
ExecutorService executor = Executors.newFixedThreadPool(numThreads);
CountDownLatch latch = new CountDownLatch(numThreads);
// Act
for (int i = 0; i < numThreads; i++) {
final int index = i;
executor.submit(() -> {
try {
CreateOrderRequest request = new CreateOrderRequest(
"cust-" + index,
BigDecimal.TEN.multiply(BigDecimal.valueOf(index))
);
orderService.createOrder(request);
} finally {
latch.countDown();
}
});
}
latch.await(10, TimeUnit.SECONDS);
// Assert
long createdOrders = orderRepository.count();
assertThat(createdOrders).isEqualTo(numThreads);
}
```
## Testing Failure Scenarios
Test each failure path and compensation:
```java
@Test
void shouldCompensateWhenInventoryUnavailable() {
// Arrange
String orderId = UUID.randomUUID().toString();
inventoryService.setAvailability("item-1", 0); // No inventory
// Act
String result = orderService.createOrder(
new CreateOrderRequest("cust-1", BigDecimal.TEN)
);
// Wait for saga completion
Thread.sleep(2000);
// Assert
Order order = orderRepository.findById(orderId).orElseThrow();
assertThat(order.getStatus()).isEqualTo(OrderStatus.CANCELLED);
// Verify payment was refunded
Payment payment = paymentRepository.findByOrderId(orderId).orElseThrow();
assertThat(payment.getStatus()).isEqualTo(PaymentStatus.REFUNDED);
}
@Test
void shouldHandlePaymentGatewayFailure() {
// Arrange
paymentGateway.setFailureRate(1.0); // 100% failure
// Act
String orderId = orderService.createOrder(
new CreateOrderRequest("cust-1", BigDecimal.TEN)
);
// Wait for saga completion
Thread.sleep(2000);
// Assert
Order order = orderRepository.findById(orderId).orElseThrow();
assertThat(order.getStatus()).isEqualTo(OrderStatus.CANCELLED);
}
```
## Testing State Machine
Verify state transitions:
```java
@Test
void shouldTransitionStatesProperly() {
// Arrange
String sagaId = UUID.randomUUID().toString();
SagaState sagaState = new SagaState(sagaId, SagaStatus.STARTED);
sagaStateRepository.save(sagaState);
// Act & Assert
assertThat(sagaState.getStatus()).isEqualTo(SagaStatus.STARTED);
sagaState.setStatus(SagaStatus.PROCESSING);
sagaStateRepository.save(sagaState);
assertThat(sagaStateRepository.findById(sagaId).get().getStatus())
.isEqualTo(SagaStatus.PROCESSING);
sagaState.setStatus(SagaStatus.COMPLETED);
sagaStateRepository.save(sagaState);
assertThat(sagaStateRepository.findById(sagaId).get().getStatus())
.isEqualTo(SagaStatus.COMPLETED);
}
```
## Test Data Builders
Use builders for cleaner test code:
```java
public class OrderRequestBuilder {
private String customerId = "cust-default";
private BigDecimal totalAmount = BigDecimal.TEN;
private List<OrderItem> items = new ArrayList<>();
public OrderRequestBuilder withCustomerId(String customerId) {
this.customerId = customerId;
return this;
}
public OrderRequestBuilder withAmount(BigDecimal amount) {
this.totalAmount = amount;
return this;
}
public OrderRequestBuilder withItem(String productId, int quantity) {
items.add(new OrderItem(productId, "Product", quantity, BigDecimal.TEN));
return this;
}
public CreateOrderRequest build() {
return new CreateOrderRequest(customerId, totalAmount, items);
}
}
@Test
void shouldCreateOrderWithCustomization() {
CreateOrderRequest request = new OrderRequestBuilder()
.withCustomerId("customer-123")
.withAmount(BigDecimal.valueOf(50))
.withItem("product-1", 2)
.withItem("product-2", 1)
.build();
String orderId = orderService.createOrder(request);
assertThat(orderId).isNotNull();
}
```
## Performance Testing
Measure saga execution time:
```java
@Test
void shouldCompleteOrderSagaWithinTimeLimit() {
// Arrange
CreateOrderRequest request = new CreateOrderRequest("cust-1", BigDecimal.TEN);
long maxDurationMs = 5000; // 5 seconds
// Act
Instant start = Instant.now();
String orderId = orderService.createOrder(request);
Instant end = Instant.now();
// Assert
long duration = Duration.between(start, end).toMillis();
assertThat(duration).isLessThan(maxDurationMs);
}
```

View File

@@ -0,0 +1,471 @@
# Common Pitfalls and Solutions
## Pitfall 1: Lost Messages
### Problem
Messages get lost due to broker failures, network issues, or consumer crashes before acknowledgment.
### Solution
Use persistent messages with acknowledgments:
```java
@Bean
public ProducerFactory<String, Object> producerFactory() {
Map<String, Object> config = new HashMap<>();
config.put(ProducerConfig.ACKS_CONFIG, "all"); // All replicas must acknowledge
config.put(ProducerConfig.RETRIES_CONFIG, 3); // Retry failed sends
config.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true); // Prevent duplicates
return new DefaultKafkaProducerFactory<>(config);
}
@Bean
public ConsumerFactory<String, Object> consumerFactory() {
Map<String, Object> config = new HashMap<>();
config.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false); // Manual commit
return new DefaultKafkaConsumerFactory<>(config);
}
```
### Prevention Checklist
- ✓ Configure producer to wait for all replicas (`acks=all`)
- ✓ Enable idempotence to prevent duplicate messages
- ✓ Use manual commit for consumers
- ✓ Monitor message lag and broker health
- ✓ Use transactional outbox pattern
---
## Pitfall 2: Duplicate Processing
### Problem
Same message processed multiple times due to failed acknowledgments or retries, causing side effects.
### Solution
Implement idempotency with deduplication:
```java
@Service
public class DeduplicationService {
private final DeduplicationRepository repository;
public boolean isDuplicate(String messageId) {
return repository.existsById(messageId);
}
public void recordProcessed(String messageId) {
DeduplicatedMessage entry = new DeduplicatedMessage(
messageId,
Instant.now()
);
repository.save(entry);
}
}
@Component
public class PaymentEventListener {
private final DeduplicationService deduplicationService;
private final PaymentService paymentService;
@Bean
public Consumer<PaymentEvent> handlePaymentEvent() {
return event -> {
String messageId = event.getMessageId();
if (deduplicationService.isDuplicate(messageId)) {
logger.info("Duplicate message ignored: {}", messageId);
return;
}
paymentService.processPayment(event);
deduplicationService.recordProcessed(messageId);
};
}
}
```
### Prevention Checklist
- ✓ Add unique message ID to all events
- ✓ Implement deduplication cache/database
- ✓ Make all operations idempotent
- ✓ Use version control for entity updates
- ✓ Test with message replay
---
## Pitfall 3: Saga State Inconsistency
### Problem
Saga state in database doesn't match actual service states, leading to orphaned or stuck sagas.
### Solution
Use event sourcing or state reconciliation:
```java
@Service
public class SagaStateReconciler {
private final SagaStateRepository stateRepository;
private final OrderRepository orderRepository;
private final PaymentRepository paymentRepository;
@Scheduled(fixedDelay = 60000) // Run every minute
public void reconcileSagaStates() {
List<SagaState> processingSagas = stateRepository
.findByStatus(SagaStatus.PROCESSING);
processingSagas.forEach(saga -> {
if (isActuallyCompleted(saga)) {
logger.info("Reconciling saga {} - marking as completed", saga.getSagaId());
saga.setStatus(SagaStatus.COMPLETED);
saga.setCompletedAt(Instant.now());
stateRepository.save(saga);
}
});
}
private boolean isActuallyCompleted(SagaState saga) {
String orderId = saga.getSagaId();
Order order = orderRepository.findById(orderId).orElse(null);
if (order == null || order.getStatus() != OrderStatus.COMPLETED) {
return false;
}
Payment payment = paymentRepository.findByOrderId(orderId).orElse(null);
if (payment == null || payment.getStatus() != PaymentStatus.PROCESSED) {
return false;
}
return true;
}
}
```
### Prevention Checklist
- ✓ Use event sourcing for complete audit trail
- ✓ Implement state reconciliation job
- ✓ Add health checks for saga coordinator
- ✓ Monitor saga state transitions
- ✓ Persist compensation steps
---
## Pitfall 4: Orchestrator Single Point of Failure
### Problem
Orchestration-based saga fails when orchestrator is down, blocking all sagas.
### Solution
Implement clustering and failover:
```java
@Configuration
public class SagaOrchestratorClusterConfig {
@Bean
public SagaStateRepository sagaStateRepository() {
// Use shared database for cluster-wide state
return new DatabaseSagaStateRepository();
}
@Bean
@Primary
public CommandGateway clusterAwareCommandGateway(
CommandBus commandBus) {
return new ClusterAwareCommandGateway(commandBus);
}
}
@Component
public class OrchestratorHealthCheck extends AbstractHealthIndicator {
private final SagaStateRepository repository;
@Override
protected void doHealthCheck(Health.Builder builder) {
long stuckSagas = repository.countStuckSagas(Duration.ofMinutes(30));
if (stuckSagas > 100) {
builder.down()
.withDetail("stuckSagas", stuckSagas)
.withDetail("severity", "critical");
} else if (stuckSagas > 10) {
builder.degraded()
.withDetail("stuckSagas", stuckSagas)
.withDetail("severity", "warning");
} else {
builder.up()
.withDetail("stuckSagas", stuckSagas);
}
}
}
```
### Prevention Checklist
- ✓ Deploy orchestrator in cluster with shared state
- ✓ Use distributed coordination (ZooKeeper, Consul)
- ✓ Implement heartbeat monitoring
- ✓ Set up automatic failover
- ✓ Use circuit breakers for service calls
---
## Pitfall 5: Non-Idempotent Compensations
### Problem
Compensation logic fails on retry because it's not idempotent, leaving system in inconsistent state.
### Solution
Design all compensations to be idempotent:
```java
// Bad - Not idempotent
@Service
public class BadPaymentService {
public void refundPayment(String paymentId) {
Payment payment = paymentRepository.findById(paymentId).orElseThrow();
payment.setStatus(PaymentStatus.REFUNDED);
paymentRepository.save(payment);
// If this fails partway, retry causes problems
externalPaymentGateway.refund(payment.getTransactionId());
}
}
// Good - Idempotent
@Service
public class GoodPaymentService {
public void refundPayment(String paymentId) {
Payment payment = paymentRepository.findById(paymentId)
.orElse(null);
if (payment == null) {
// Already deleted or doesn't exist
logger.info("Payment {} not found, skipping refund", paymentId);
return;
}
if (payment.getStatus() == PaymentStatus.REFUNDED) {
// Already refunded
logger.info("Payment {} already refunded", paymentId);
return;
}
try {
externalPaymentGateway.refund(payment.getTransactionId());
payment.setStatus(PaymentStatus.REFUNDED);
paymentRepository.save(payment);
} catch (Exception e) {
logger.error("Refund failed, will retry", e);
throw e;
}
}
}
```
### Prevention Checklist
- ✓ Check current state before making changes
- ✓ Use status flags to track compensation completion
- ✓ Make database updates idempotent
- ✓ Test compensation with replays
- ✓ Document compensation logic
---
## Pitfall 6: Missing Timeouts
### Problem
Sagas hang indefinitely waiting for events that never arrive due to service failures.
### Solution
Implement timeout mechanisms:
```java
@Configuration
public class SagaTimeoutConfig {
@Bean
public SagaLifecycle sagaLifecycle(SagaStateRepository repository) {
return new SagaLifecycle() {
@Override
public void onSagaFinished(Saga saga) {
// Update saga state
}
};
}
}
@Saga
public class OrderSaga {
@Autowired
private transient CommandGateway commandGateway;
private String orderId;
private String paymentId;
private DeadlineManager deadlineManager;
@StartSaga
@SagaEventHandler(associationProperty = "orderId")
public void handle(OrderCreatedEvent event) {
this.orderId = event.orderId();
// Schedule timeout for payment processing
deadlineManager.scheduleDeadline(
Duration.ofSeconds(30),
"PaymentTimeout",
orderId
);
commandGateway.send(new ProcessPaymentCommand(...));
}
@DeadlineHandler(deadlineName = "PaymentTimeout")
public void handlePaymentTimeout() {
logger.warn("Payment processing timed out for order {}", orderId);
// Compensate
commandGateway.send(new CancelOrderCommand(orderId));
end();
}
@SagaEventHandler(associationProperty = "orderId")
public void handle(PaymentProcessedEvent event) {
// Cancel timeout
deadlineManager.cancelDeadline("PaymentTimeout", orderId);
// Continue saga...
}
}
```
### Prevention Checklist
- ✓ Set timeout for each saga step
- ✓ Use deadline manager to track timeouts
- ✓ Cancel timeouts when step completes
- ✓ Log timeout events
- ✓ Alert operations on repeated timeouts
---
## Pitfall 7: Tight Coupling Between Services
### Problem
Saga logic couples services tightly, making independent deployment impossible.
### Solution
Use event-driven communication:
```java
// Bad - Tight coupling
@Service
public class TightlyAgedOrderService {
public void createOrder(OrderRequest request) {
Order order = orderRepository.save(new Order(...));
// Direct coupling to payment service
paymentService.processPayment(order.getId(), request.getAmount());
}
}
// Good - Event-driven
@Service
public class LooselyAgedOrderService {
public void createOrder(OrderRequest request) {
Order order = orderRepository.save(new Order(...));
// Publish event - services listen independently
eventPublisher.publish(new OrderCreatedEvent(
order.getId(),
request.getAmount()
));
}
}
@Component
public class PaymentServiceListener {
@Bean
public Consumer<OrderCreatedEvent> handleOrderCreated() {
return event -> {
// Payment service can be deployed independently
paymentService.processPayment(
event.orderId(),
event.amount()
);
};
}
}
```
### Prevention Checklist
- ✓ Use events for inter-service communication
- ✓ Avoid direct service-to-service calls
- ✓ Define clear contracts for events
- ✓ Version events for backward compatibility
- ✓ Deploy services independently
---
## Pitfall 8: Inadequate Monitoring
### Problem
Sagas fail silently or get stuck without visibility, making troubleshooting impossible.
### Solution
Implement comprehensive monitoring:
```java
@Component
public class SagaMonitoring {
private final MeterRegistry meterRegistry;
@Bean
public MeterBinder sagaMetrics(SagaStateRepository repository) {
return (registry) -> {
Gauge.builder("saga.active", repository::countByStatus)
.description("Number of active sagas")
.register(registry);
Gauge.builder("saga.stuck", () ->
repository.countStuckSagas(Duration.ofMinutes(30)))
.description("Number of stuck sagas")
.register(registry);
};
}
public void recordSagaStart(String sagaType) {
Counter.builder("saga.started")
.tag("type", sagaType)
.register(meterRegistry)
.increment();
}
public void recordSagaCompletion(String sagaType, long durationMs) {
Timer.builder("saga.duration")
.tag("type", sagaType)
.publishPercentiles(0.5, 0.95, 0.99)
.register(meterRegistry)
.record(Duration.ofMillis(durationMs));
}
public void recordSagaFailure(String sagaType, String reason) {
Counter.builder("saga.failed")
.tag("type", sagaType)
.tag("reason", reason)
.register(meterRegistry)
.increment();
}
}
```
### Prevention Checklist
- ✓ Track saga state transitions
- ✓ Monitor step execution times
- ✓ Alert on stuck sagas
- ✓ Log all failures with details
- ✓ Use distributed tracing (Sleuth, Zipkin)
- ✓ Create dashboards for visibility

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff