Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:28:30 +08:00
commit 171acedaa4
220 changed files with 85967 additions and 0 deletions

View File

@@ -0,0 +1,68 @@
# Saga Pattern Definition
## What is a Saga?
A **Saga** is a sequence of local transactions where each transaction updates data within a single service. Each local transaction publishes an event or message that triggers the next local transaction in the saga. If a local transaction fails, the saga executes compensating transactions to undo the changes made by preceding transactions.
## Key Characteristics
**Distributed Transactions**: Spans multiple microservices, each with its own database.
**Local Transactions**: Each service performs its own ACID transaction.
**Event-Driven**: Services communicate through events or commands.
**Compensations**: Rollback mechanism using compensating transactions.
**Eventual Consistency**: System reaches a consistent state over time.
## Saga vs Two-Phase Commit (2PC)
| Feature | Saga Pattern | Two-Phase Commit |
|---------|-------------|------------------|
| Locking | No distributed locks | Requires locks during commit |
| Performance | Better performance | Performance bottleneck |
| Scalability | Highly scalable | Limited scalability |
| Complexity | Business logic complexity | Protocol complexity |
| Failure Handling | Compensating transactions | Automatic rollback |
| Isolation | Lower isolation | Full isolation |
| NoSQL Support | Yes | No |
| Microservices Fit | Excellent | Poor |
## ACID vs BASE
**ACID** (Traditional Databases):
- **A**tomicity: All or nothing
- **C**onsistency: Valid state transitions
- **I**solation: Concurrent transactions don't interfere
- **D**urability: Committed data persists
**BASE** (Saga Pattern):
- **B**asically **A**vailable: System is available most of the time
- **S**oft state: State may change over time
- **E**ventual consistency: System becomes consistent eventually
## When to Use Saga Pattern
Use the saga pattern when:
- Building distributed transactions across multiple microservices
- Needing to replace 2PC with a more scalable solution
- Services need to maintain eventual consistency
- Handling long-running processes spanning multiple services
- Implementing compensating transactions for failed operations
## When NOT to Use Saga Pattern
Avoid the saga pattern when:
- Single service transactions (use local ACID transactions)
- Strong consistency is required immediately
- Simple CRUD operations without cross-service dependencies
- Low transaction volume with simple flows
- Team lacks experience with distributed systems
## Migration Path
Many organizations migrate from traditional monolithic systems or 2PC-based systems to sagas:
1. **From Monolith to Saga**: Identify transaction boundaries, extract services gradually, implement sagas incrementally
2. **From 2PC to Saga**: Analyze existing 2PC transactions, design compensating transactions, implement sagas in parallel, monitor and compare results before full migration

View File

@@ -0,0 +1,153 @@
# Choreography-Based Saga Implementation
## Architecture Overview
In choreography-based sagas, each service produces and listens to events. Services know what to do when they receive an event. **No central coordinator** manages the flow.
```
Service A → Event → Service B → Event → Service C
↓ ↓ ↓
Event Event Event
↓ ↓ ↓
Compensation Compensation Compensation
```
## Event Flow
### Success Path
1. **Order Service** creates order → publishes `OrderCreated` event
2. **Payment Service** listens → processes payment → publishes `PaymentProcessed` event
3. **Inventory Service** listens → reserves inventory → publishes `InventoryReserved` event
4. **Shipment Service** listens → prepares shipment → publishes `ShipmentPrepared` event
### Failure Path (When Payment Fails)
1. **Payment Service** publishes `PaymentFailed` event
2. **Order Service** listens → cancels order → publishes `OrderCancelled` event
3. All other services respond to cancellation with cleanup
## Event Publisher
```java
@Component
public class OrderEventPublisher {
private final StreamBridge streamBridge;
public OrderEventPublisher(StreamBridge streamBridge) {
this.streamBridge = streamBridge;
}
public void publishOrderCreatedEvent(String orderId, BigDecimal amount, String itemId) {
OrderCreatedEvent event = new OrderCreatedEvent(orderId, amount, itemId);
streamBridge.send("orderCreated-out-0",
MessageBuilder
.withPayload(event)
.setHeader(MessageHeaders.CONTENT_TYPE, MimeTypeUtils.APPLICATION_JSON)
.build());
}
}
```
## Event Listener
```java
@Component
public class PaymentEventListener {
@Bean
public Consumer<OrderCreatedEvent> handleOrderCreatedEvent() {
return event -> processPayment(event.getOrderId());
}
private void processPayment(String orderId) {
// Payment processing logic
}
}
```
## Event Classes
```java
public record OrderCreatedEvent(
String orderId,
BigDecimal amount,
String itemId
) {}
public record PaymentProcessedEvent(
String paymentId,
String orderId,
String itemId
) {}
public record PaymentFailedEvent(
String paymentId,
String orderId,
String itemId,
String reason
) {}
```
## Spring Cloud Stream Configuration
```yaml
spring:
cloud:
stream:
bindings:
orderCreated-out-0:
destination: order-events
paymentProcessed-out-0:
destination: payment-events
paymentFailed-out-0:
destination: payment-events
kafka:
binder:
brokers: localhost:9092
```
## Maven Dependencies
```xml
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-stream</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-stream-binder-kafka</artifactId>
</dependency>
```
## Gradle Dependencies
```groovy
implementation 'org.springframework.cloud:spring-cloud-stream'
implementation 'org.springframework.cloud:spring-cloud-stream-binder-kafka'
```
## Advantages and Disadvantages
### Advantages
- **Simple** for small number of services
- **Loose coupling** between services
- **No single point of failure**
- Each service is independently deployable
### Disadvantages
- **Difficult to track workflow state** - distributed across services
- **Hard to troubleshoot** - following event flow is complex
- **Complexity grows** with number of services
- **Distributed source of truth** - saga state not centralized
## When to Use Choreography
Use choreography-based sagas when:
- Microservices are few in number (< 5 services per saga)
- Loose coupling is critical
- Team is experienced with event-driven architecture
- System can handle eventual consistency
- Workflow doesn't need centralized monitoring

View File

@@ -0,0 +1,232 @@
# Orchestration-Based Saga Implementation
## Architecture Overview
A **central orchestrator** (Saga Coordinator) manages the entire transaction flow, sending commands to services and handling responses.
```
Saga Orchestrator
/ | \
Service A Service B Service C
```
## Orchestrator Responsibilities
1. **Command Dispatch**: Send commands to services
2. **Response Handling**: Process service responses
3. **State Management**: Track saga execution state
4. **Compensation Coordination**: Trigger compensating transactions on failure
5. **Timeout Management**: Handle service timeouts
6. **Retry Logic**: Manage retry attempts
## Axon Framework Implementation
### Saga Class
```java
@Saga
public class OrderSaga {
@Autowired
private transient CommandGateway commandGateway;
@StartSaga
@SagaEventHandler(associationProperty = "orderId")
public void handle(OrderCreatedEvent event) {
String paymentId = UUID.randomUUID().toString();
ProcessPaymentCommand command = new ProcessPaymentCommand(
paymentId,
event.getOrderId(),
event.getAmount(),
event.getItemId()
);
commandGateway.send(command);
}
@SagaEventHandler(associationProperty = "orderId")
public void handle(PaymentProcessedEvent event) {
ReserveInventoryCommand command = new ReserveInventoryCommand(
event.getOrderId(),
event.getItemId()
);
commandGateway.send(command);
}
@SagaEventHandler(associationProperty = "orderId")
public void handle(PaymentFailedEvent event) {
CancelOrderCommand command = new CancelOrderCommand(event.getOrderId());
commandGateway.send(command);
end();
}
@SagaEventHandler(associationProperty = "orderId")
public void handle(InventoryReservedEvent event) {
PrepareShipmentCommand command = new PrepareShipmentCommand(
event.getOrderId(),
event.getItemId()
);
commandGateway.send(command);
}
@EndSaga
@SagaEventHandler(associationProperty = "orderId")
public void handle(OrderCompletedEvent event) {
// Saga completed successfully
}
}
```
### Aggregate for Order Service
```java
@Aggregate
public class OrderAggregate {
@AggregateIdentifier
private String orderId;
private OrderStatus status;
public OrderAggregate() {
}
@CommandHandler
public OrderAggregate(CreateOrderCommand command) {
apply(new OrderCreatedEvent(
command.getOrderId(),
command.getAmount(),
command.getItemId()
));
}
@EventSourcingHandler
public void on(OrderCreatedEvent event) {
this.orderId = event.getOrderId();
this.status = OrderStatus.PENDING;
}
@CommandHandler
public void handle(CancelOrderCommand command) {
apply(new OrderCancelledEvent(command.getOrderId()));
}
@EventSourcingHandler
public void on(OrderCancelledEvent event) {
this.status = OrderStatus.CANCELLED;
}
}
```
### Aggregate for Payment Service
```java
@Aggregate
public class PaymentAggregate {
@AggregateIdentifier
private String paymentId;
public PaymentAggregate() {
}
@CommandHandler
public PaymentAggregate(ProcessPaymentCommand command) {
this.paymentId = command.getPaymentId();
if (command.getAmount().compareTo(BigDecimal.ZERO) <= 0) {
apply(new PaymentFailedEvent(
command.getPaymentId(),
command.getOrderId(),
command.getItemId(),
"Payment amount must be greater than zero"
));
} else {
apply(new PaymentProcessedEvent(
command.getPaymentId(),
command.getOrderId(),
command.getItemId()
));
}
}
}
```
## Axon Configuration
```yaml
axon:
serializer:
general: jackson
events: jackson
messages: jackson
eventhandling:
processors:
order-processor:
mode: tracking
source: eventBus
axonserver:
enabled: false
```
## Maven Dependencies for Axon
```xml
<dependency>
<groupId>org.axonframework</groupId>
<artifactId>axon-spring-boot-starter</artifactId>
<version>4.9.0</version> // Use latest stable version
</dependency>
```
## Advantages and Disadvantages
### Advantages
- **Centralized visibility** - easy to see workflow status
- **Easier to troubleshoot** - single place to analyze flow
- **Clear transaction flow** - orchestrator defines sequence
- **Simplified error handling** - centralized compensation logic
- **Better for complex workflows** - easier to manage many steps
### Disadvantages
- **Orchestrator becomes single point of failure** - can be mitigated with clustering
- **Additional infrastructure component** - more complexity in deployment
- **Potential tight coupling** - if orchestrator knows too much about services
## Eventuate Tram Sagas
Eventuate Tram is an alternative to Axon for orchestration-based sagas:
```xml
<dependency>
<groupId>io.eventuate.tram.sagas</groupId>
<artifactId>eventuate-tram-sagas-spring-starter</artifactId>
<version>0.28.0</version> // Use latest stable version
</dependency>
```
## Camunda for BPMN-Based Orchestration
Use Camunda when visual workflow design is beneficial:
**Features**:
- Visual workflow design
- BPMN 2.0 standard
- Human tasks support
- Complex workflow modeling
**Use When**:
- Business process modeling needed
- Visual workflow design preferred
- Human approval steps required
- Complex orchestration logic
## When to Use Orchestration
Use orchestration-based sagas when:
- Building brownfield applications with existing microservices
- Handling complex workflows with many steps
- Centralized control and monitoring is critical
- Organization wants clear visibility into saga execution
- Need for human intervention in workflow

View File

@@ -0,0 +1,262 @@
# Event-Driven Architecture in Sagas
## Event Types
### Domain Events
Represent business facts that happened within a service:
```java
public record OrderCreatedEvent(
String orderId,
Instant createdAt,
BigDecimal amount
) implements DomainEvent {}
```
### Integration Events
Communication between bounded contexts (microservices):
```java
public record PaymentRequestedEvent(
String orderId,
String paymentId,
BigDecimal amount
) implements IntegrationEvent {}
```
### Command Events
Request for action by another service:
```java
public record ProcessPaymentCommand(
String paymentId,
String orderId,
BigDecimal amount
) {}
```
## Event Versioning
Handle event schema evolution using versioning:
```java
public record OrderCreatedEventV1(
String orderId,
BigDecimal amount
) {}
public record OrderCreatedEventV2(
String orderId,
BigDecimal amount,
String customerId,
Instant timestamp
) {}
// Event Upcaster
public class OrderEventUpcaster implements EventUpcaster {
@Override
public Stream<IntermediateEventRepresentation> upcast(
Stream<IntermediateEventRepresentation> eventStream) {
return eventStream.map(event -> {
if (event.getType().getName().equals("OrderCreatedEventV1")) {
return upcastV1ToV2(event);
}
return event;
});
}
}
```
## Event Store
Store all events for audit trail and recovery:
```java
@Entity
@Table(name = "saga_events")
public class SagaEvent {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
@Column(nullable = false)
private String sagaId;
@Column(nullable = false)
private String eventType;
@Column(columnDefinition = "TEXT")
private String payload;
@Column(nullable = false)
private Instant timestamp;
@Column(nullable = false)
private Integer version;
}
```
## Event Publishing Patterns
### Outbox Pattern (Transactional)
Ensure atomic update of database and event publishing:
```java
@Service
public class OrderService {
private final OrderRepository orderRepository;
private final OutboxRepository outboxRepository;
@Transactional
public void createOrder(CreateOrderRequest request) {
// 1. Create and save order
Order order = new Order(...);
orderRepository.save(order);
// 2. Create outbox entry in same transaction
OutboxEntry entry = new OutboxEntry(
"OrderCreated",
order.getId(),
new OrderCreatedEvent(...)
);
outboxRepository.save(entry);
}
}
@Component
public class OutboxPoller {
@Scheduled(fixedDelay = 1000)
public void pollAndPublish() {
List<OutboxEntry> unpublished = outboxRepository.findUnpublished();
unpublished.forEach(entry -> {
eventPublisher.publish(entry.getEvent());
outboxRepository.markAsPublished(entry.getId());
});
}
}
```
### Direct Publishing Pattern
Publish events immediately after transaction:
```java
@Service
public class OrderService {
private final OrderRepository orderRepository;
private final EventPublisher eventPublisher;
@Transactional
public void createOrder(CreateOrderRequest request) {
Order order = new Order(...);
orderRepository.save(order);
// Publish event after transaction commits
TransactionSynchronizationManager.registerSynchronization(
new TransactionSynchronization() {
@Override
public void afterCommit() {
eventPublisher.publish(new OrderCreatedEvent(...));
}
}
);
}
}
```
## Event Sourcing
Store all state changes as events instead of current state:
**Benefits**:
- Complete audit trail
- Time-travel debugging
- Natural fit for sagas
- Event replay for recovery
**Implementation**:
```java
@Entity
public class Order {
@Id
private String orderId;
@OneToMany(cascade = CascadeType.ALL, orphanRemoval = true)
private List<DomainEvent> events = new ArrayList<>();
public void createOrder(...) {
apply(new OrderCreatedEvent(...));
}
protected void apply(DomainEvent event) {
if (event instanceof OrderCreatedEvent e) {
this.orderId = e.orderId();
this.status = OrderStatus.PENDING;
}
events.add(event);
}
public List<DomainEvent> getUncommittedEvents() {
return new ArrayList<>(events);
}
public void clearUncommittedEvents() {
events.clear();
}
}
```
## Event Ordering and Consistency
### Maintain Event Order
Use partitioning to maintain order within a saga:
```java
@Bean
public ProducerFactory<String, Object> producerFactory() {
Map<String, Object> config = new HashMap<>();
config.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
StringSerializer.class);
return new DefaultKafkaProducerFactory<>(config);
}
@Service
public class EventPublisher {
private final KafkaTemplate<String, Object> kafkaTemplate;
public void publish(DomainEvent event) {
// Use sagaId as key to maintain order
kafkaTemplate.send("events", event.getSagaId(), event);
}
}
```
### Handle Out-of-Order Events
Use saga state to detect and handle out-of-order events:
```java
@SagaEventHandler(associationProperty = "orderId")
public void handle(PaymentProcessedEvent event) {
if (saga.getStatus() != SagaStatus.AWAITING_PAYMENT) {
// Out of order event, ignore or queue for retry
logger.warn("Unexpected event in state: {}", saga.getStatus());
return;
}
// Process event
}
```

View File

@@ -0,0 +1,299 @@
# Compensating Transactions
## Design Principles
### Idempotency
Execute multiple times with same result:
```java
public void cancelPayment(String paymentId) {
Payment payment = paymentRepository.findById(paymentId)
.orElse(null);
if (payment == null) {
// Already cancelled or doesn't exist
return;
}
if (payment.getStatus() == PaymentStatus.CANCELLED) {
// Already cancelled, idempotent
return;
}
payment.setStatus(PaymentStatus.CANCELLED);
paymentRepository.save(payment);
// Refund logic here
}
```
### Retryability
Design operations to handle retries without side effects:
```java
@Retryable(
value = {TransientException.class},
maxAttempts = 3,
backoff = @Backoff(delay = 1000, multiplier = 2)
)
public void releaseInventory(String itemId, int quantity) {
// Use set operations for idempotency
InventoryItem item = inventoryRepository.findById(itemId)
.orElseThrow();
item.increaseAvailableQuantity(quantity);
inventoryRepository.save(item);
}
```
## Compensation Strategies
### Backward Recovery
Undo completed steps in reverse order:
```java
@SagaEventHandler(associationProperty = "orderId")
public void handle(PaymentFailedEvent event) {
logger.error("Payment failed, initiating compensation");
// Step 1: Cancel shipment preparation
commandGateway.send(new CancelShipmentCommand(event.getOrderId()));
// Step 2: Release inventory
commandGateway.send(new ReleaseInventoryCommand(event.getOrderId()));
// Step 3: Cancel order
commandGateway.send(new CancelOrderCommand(event.getOrderId()));
end();
}
```
### Forward Recovery
Retry failed operation with exponential backoff:
```java
@SagaEventHandler(associationProperty = "orderId")
public void handle(PaymentTransientFailureEvent event) {
if (event.getRetryCount() < MAX_RETRIES) {
// Retry payment with backoff
ProcessPaymentCommand retryCommand = new ProcessPaymentCommand(
event.getPaymentId(),
event.getOrderId(),
event.getAmount()
);
commandGateway.send(retryCommand);
} else {
// After max retries, compensate
handlePaymentFailure(event);
}
}
```
## Semantic Lock Pattern
Prevent concurrent modifications during saga execution:
```java
@Entity
public class Order {
@Id
private String orderId;
@Enumerated(EnumType.STRING)
private OrderStatus status;
@Version
private Long version;
private Instant lockedUntil;
public boolean tryLock(Duration lockDuration) {
if (isLocked()) {
return false;
}
this.lockedUntil = Instant.now().plus(lockDuration);
return true;
}
public boolean isLocked() {
return lockedUntil != null &&
Instant.now().isBefore(lockedUntil);
}
public void unlock() {
this.lockedUntil = null;
}
}
```
## Compensation in Axon Framework
```java
@Saga
public class OrderSaga {
private String orderId;
private String paymentId;
private String inventoryId;
private boolean compensating = false;
@SagaEventHandler(associationProperty = "orderId")
public void handle(InventoryReservationFailedEvent event) {
logger.error("Inventory reservation failed");
compensating = true;
// Compensate: refund payment
RefundPaymentCommand refundCommand = new RefundPaymentCommand(
paymentId,
event.getOrderId(),
event.getReservedAmount(),
"Inventory unavailable"
);
commandGateway.send(refundCommand);
}
@SagaEventHandler(associationProperty = "orderId")
public void handle(PaymentRefundedEvent event) {
if (!compensating) return;
logger.info("Payment refunded, cancelling order");
// Compensate: cancel order
CancelOrderCommand command = new CancelOrderCommand(
event.getOrderId(),
"Inventory unavailable - payment refunded"
);
commandGateway.send(command);
}
@EndSaga
@SagaEventHandler(associationProperty = "orderId")
public void handle(OrderCancelledEvent event) {
logger.info("Saga completed with compensation");
}
}
```
## Handling Compensation Failures
Handle cases where compensation itself fails:
```java
@Service
public class CompensationService {
private final DeadLetterQueueService dlqService;
public void handleCompensationFailure(String sagaId, String step, Exception cause) {
logger.error("Compensation failed for saga {} at step {}", sagaId, step, cause);
// Send to dead letter queue for manual intervention
dlqService.send(new FailedCompensation(
sagaId,
step,
cause.getMessage(),
Instant.now()
));
// Create alert for operations team
alertingService.alert(
"Compensation Failure",
"Saga " + sagaId + " failed compensation at " + step
);
}
}
```
## Testing Compensation
Verify that compensation produces expected results:
```java
@Test
void shouldCompensateWhenPaymentFails() {
String orderId = "order-123";
String paymentId = "payment-456";
// Arrange: execute payment
Payment payment = new Payment(paymentId, orderId, BigDecimal.TEN);
paymentRepository.save(payment);
orderRepository.save(new Order(orderId, OrderStatus.PENDING));
// Act: compensate
paymentService.cancelPayment(paymentId);
// Assert: verify idempotency
paymentService.cancelPayment(paymentId);
Payment result = paymentRepository.findById(paymentId).orElseThrow();
assertThat(result.getStatus()).isEqualTo(PaymentStatus.CANCELLED);
}
```
## Common Compensation Patterns
### Inventory Release
```java
@Service
public class InventoryService {
public void releaseInventory(String orderId) {
Order order = orderRepository.findById(orderId).orElseThrow();
order.getItems().forEach(item -> {
InventoryItem inventoryItem = inventoryRepository
.findById(item.getProductId())
.orElseThrow();
inventoryItem.increaseAvailableQuantity(item.getQuantity());
inventoryRepository.save(inventoryItem);
});
}
}
```
### Payment Refund
```java
@Service
public class PaymentService {
public void refundPayment(String paymentId) {
Payment payment = paymentRepository.findById(paymentId)
.orElseThrow();
if (payment.getStatus() == PaymentStatus.PROCESSED) {
payment.setStatus(PaymentStatus.REFUNDED);
paymentGateway.refund(payment.getTransactionId());
paymentRepository.save(payment);
}
}
}
```
### Order Cancellation
```java
@Service
public class OrderService {
public void cancelOrder(String orderId, String reason) {
Order order = orderRepository.findById(orderId)
.orElseThrow();
order.setStatus(OrderStatus.CANCELLED);
order.setCancellationReason(reason);
order.setCancelledAt(Instant.now());
orderRepository.save(order);
}
}
```

View File

@@ -0,0 +1,294 @@
# State Management in Sagas
## Saga State Entity
Persist saga state for recovery and monitoring:
```java
@Entity
@Table(name = "saga_state")
public class SagaState {
@Id
private String sagaId;
@Enumerated(EnumType.STRING)
private SagaStatus status;
@Column(columnDefinition = "TEXT")
private String currentStep;
@Column(columnDefinition = "TEXT")
private String compensationSteps;
private Instant startedAt;
private Instant completedAt;
@Version
private Long version;
}
public enum SagaStatus {
STARTED,
PROCESSING,
COMPENSATING,
COMPLETED,
FAILED,
CANCELLED
}
```
## Saga State Machine with Spring Statemachine
Define saga state transitions explicitly:
```java
@Configuration
@EnableStateMachine
public class SagaStateMachineConfig
extends StateMachineConfigurerAdapter<SagaStatus, SagaEvent> {
@Override
public void configure(
StateMachineStateConfigurer<SagaStatus, SagaEvent> states)
throws Exception {
states
.withStates()
.initial(SagaStatus.STARTED)
.states(EnumSet.allOf(SagaStatus.class))
.end(SagaStatus.COMPLETED)
.end(SagaStatus.FAILED);
}
@Override
public void configure(
StateMachineTransitionConfigurer<SagaStatus, SagaEvent> transitions)
throws Exception {
transitions
.withExternal()
.source(SagaStatus.STARTED)
.target(SagaStatus.PROCESSING)
.event(SagaEvent.ORDER_CREATED)
.and()
.withExternal()
.source(SagaStatus.PROCESSING)
.target(SagaStatus.COMPLETED)
.event(SagaEvent.ALL_STEPS_COMPLETED)
.and()
.withExternal()
.source(SagaStatus.PROCESSING)
.target(SagaStatus.COMPENSATING)
.event(SagaEvent.STEP_FAILED)
.and()
.withExternal()
.source(SagaStatus.COMPENSATING)
.target(SagaStatus.FAILED)
.event(SagaEvent.COMPENSATION_COMPLETED);
}
}
```
## State Transitions
### Successful Saga Flow
```
STARTED → PROCESSING → COMPLETED
```
### Failed Saga with Compensation
```
STARTED → PROCESSING → COMPENSATING → FAILED
```
### Saga with Retry
```
STARTED → PROCESSING → PROCESSING (retry) → COMPLETED
```
## Persisting Saga Context
Store context data for saga execution:
```java
@Entity
@Table(name = "saga_context")
public class SagaContext {
@Id
private String sagaId;
@Column(columnDefinition = "TEXT")
private String contextData; // JSON-serialized
private Instant createdAt;
private Instant updatedAt;
public <T> T getContextData(Class<T> type) {
return JsonUtils.fromJson(contextData, type);
}
public void setContextData(Object data) {
this.contextData = JsonUtils.toJson(data);
}
}
@Service
public class SagaContextService {
private final SagaContextRepository repository;
public void saveContext(String sagaId, Object context) {
SagaContext sagaContext = new SagaContext(sagaId);
sagaContext.setContextData(context);
repository.save(sagaContext);
}
public <T> T loadContext(String sagaId, Class<T> type) {
return repository.findById(sagaId)
.map(ctx -> ctx.getContextData(type))
.orElseThrow(() -> new SagaContextNotFoundException(sagaId));
}
}
```
## Handling Saga Timeouts
Detect and handle sagas that exceed expected duration:
```java
@Service
public class SagaTimeoutHandler {
private final SagaStateRepository repository;
private static final Duration MAX_SAGA_DURATION = Duration.ofMinutes(30);
@Scheduled(fixedDelay = 60000) // Check every minute
public void detectTimeouts() {
Instant timeout = Instant.now().minus(MAX_SAGA_DURATION);
List<SagaState> timedOutSagas = repository
.findByStatusAndStartedAtBefore(SagaStatus.PROCESSING, timeout);
timedOutSagas.forEach(saga -> {
logger.warn("Saga {} timed out", saga.getSagaId());
compensateSaga(saga);
});
}
private void compensateSaga(SagaState saga) {
saga.setStatus(SagaStatus.COMPENSATING);
repository.save(saga);
// Trigger compensation logic
}
}
```
## Saga Recovery
Recover sagas from failures:
```java
@Service
public class SagaRecoveryService {
private final SagaStateRepository stateRepository;
private final CommandGateway commandGateway;
@Scheduled(fixedDelay = 30000) // Check every 30 seconds
public void recoverFailedSagas() {
List<SagaState> failedSagas = stateRepository
.findByStatus(SagaStatus.FAILED);
failedSagas.forEach(saga -> {
if (canBeRetried(saga)) {
logger.info("Retrying saga {}", saga.getSagaId());
retrySaga(saga);
}
});
}
private boolean canBeRetried(SagaState saga) {
return saga.getRetryCount() < 3;
}
private void retrySaga(SagaState saga) {
saga.setStatus(SagaStatus.STARTED);
saga.setRetryCount(saga.getRetryCount() + 1);
stateRepository.save(saga);
// Send retry command
}
}
```
## Saga State Query
Query sagas for monitoring:
```java
@Repository
public interface SagaStateRepository extends JpaRepository<SagaState, String> {
List<SagaState> findByStatus(SagaStatus status);
List<SagaState> findByStatusAndStartedAtBefore(
SagaStatus status, Instant before);
Page<SagaState> findByStatus(SagaStatus status, Pageable pageable);
long countByStatus(SagaStatus status);
long countByStatusAndStartedAtBefore(SagaStatus status, Instant before);
}
@RestController
@RequestMapping("/api/sagas")
public class SagaMonitoringController {
private final SagaStateRepository repository;
@GetMapping("/status/{status}")
public List<SagaState> getSagasByStatus(
@PathVariable SagaStatus status) {
return repository.findByStatus(status);
}
@GetMapping("/stuck")
public List<SagaState> getStuckSagas() {
Instant oneHourAgo = Instant.now().minus(Duration.ofHours(1));
return repository.findByStatusAndStartedAtBefore(
SagaStatus.PROCESSING, oneHourAgo);
}
}
```
## Database Schema for State Management
```sql
CREATE TABLE saga_state (
saga_id VARCHAR(255) PRIMARY KEY,
status VARCHAR(50) NOT NULL,
current_step TEXT,
compensation_steps TEXT,
started_at TIMESTAMP NOT NULL,
completed_at TIMESTAMP,
version BIGINT,
INDEX idx_status (status),
INDEX idx_started_at (started_at)
);
CREATE TABLE saga_context (
saga_id VARCHAR(255) PRIMARY KEY,
context_data LONGTEXT,
created_at TIMESTAMP NOT NULL,
updated_at TIMESTAMP,
FOREIGN KEY (saga_id) REFERENCES saga_state(saga_id)
);
CREATE INDEX idx_saga_state_status_started
ON saga_state(status, started_at);
```

View File

@@ -0,0 +1,323 @@
# Error Handling and Retry Strategies
## Retry Configuration
Use Spring Retry for automatic retry logic:
```java
@Configuration
@EnableRetry
public class RetryConfig {
@Bean
public RetryTemplate retryTemplate() {
RetryTemplate retryTemplate = new RetryTemplate();
FixedBackOffPolicy backOffPolicy = new FixedBackOffPolicy();
backOffPolicy.setBackOffPeriod(2000L); // 2 second delay
ExponentialBackOffPolicy exponentialBackOff = new ExponentialBackOffPolicy();
exponentialBackOff.setInitialInterval(1000L);
exponentialBackOff.setMultiplier(2.0);
exponentialBackOff.setMaxInterval(10000L);
SimpleRetryPolicy retryPolicy = new SimpleRetryPolicy();
retryPolicy.setMaxAttempts(3);
retryTemplate.setBackOffPolicy(exponentialBackOff);
retryTemplate.setRetryPolicy(retryPolicy);
return retryTemplate;
}
}
```
## Retry with @Retryable
```java
@Service
public class OrderService {
@Retryable(
value = {TransientException.class},
maxAttempts = 3,
backoff = @Backoff(delay = 1000, multiplier = 2)
)
public void processOrder(String orderId) {
// Order processing logic
}
@Recover
public void recover(TransientException ex, String orderId) {
logger.error("Order processing failed after retries: {}", orderId, ex);
// Fallback logic
}
}
```
## Circuit Breaker with Resilience4j
Prevent cascading failures:
```java
@Configuration
public class CircuitBreakerConfig {
@Bean
public CircuitBreakerRegistry circuitBreakerRegistry() {
CircuitBreakerConfig config = CircuitBreakerConfig.custom()
.failureRateThreshold(50) // Open after 50% failures
.waitDurationInOpenState(Duration.ofMillis(1000))
.slidingWindowSize(2) // Check last 2 calls
.build();
return CircuitBreakerRegistry.of(config);
}
}
@Service
public class PaymentService {
private final CircuitBreaker circuitBreaker;
public PaymentService(CircuitBreakerRegistry registry) {
this.circuitBreaker = registry.circuitBreaker("payment");
}
public PaymentResult processPayment(PaymentRequest request) {
return circuitBreaker.executeSupplier(
() -> callPaymentGateway(request)
);
}
private PaymentResult callPaymentGateway(PaymentRequest request) {
// Call external payment gateway
return new PaymentResult(...);
}
}
```
## Dead Letter Queue
Handle failed messages:
```java
@Configuration
public class DeadLetterQueueConfig {
@Bean
public NewTopic deadLetterTopic() {
return new NewTopic("saga-dlq", 1, (short) 1);
}
}
@Component
public class SagaErrorHandler implements ConsumerAwareErrorHandler {
private final KafkaTemplate<String, Object> kafkaTemplate;
@Override
public void handle(Exception thrownException,
List<ConsumerRecord<?, ?>> records,
Consumer<?, ?> consumer,
MessageListenerContainer container) {
records.forEach(record -> {
logger.error("Processing failed for message: {}", record.key());
kafkaTemplate.send("saga-dlq", record.key(), record.value());
});
}
}
```
## Timeout Handling
Define and enforce timeout policies:
```java
@Service
public class TimeoutHandler {
private final SagaStateRepository sagaStateRepository;
private static final Duration STEP_TIMEOUT = Duration.ofSeconds(30);
@Scheduled(fixedDelay = 5000)
public void checkForTimeouts() {
Instant timeoutThreshold = Instant.now().minus(STEP_TIMEOUT);
List<SagaState> timedOutSagas = sagaStateRepository
.findByStatusAndUpdatedAtBefore(SagaStatus.PROCESSING, timeoutThreshold);
timedOutSagas.forEach(saga -> {
logger.warn("Saga {} timed out at step {}",
saga.getSagaId(), saga.getCurrentStep());
compensateSaga(saga);
});
}
private void compensateSaga(SagaState saga) {
saga.setStatus(SagaStatus.COMPENSATING);
sagaStateRepository.save(saga);
}
}
```
## Exponential Backoff
Prevent overwhelming downstream services:
```java
@Service
public class BackoffService {
public Duration calculateBackoff(int attemptNumber) {
long baseDelay = 1000; // 1 second
long delay = baseDelay * (long) Math.pow(2, attemptNumber - 1);
long maxDelay = 30000; // 30 seconds
return Duration.ofMillis(Math.min(delay, maxDelay));
}
@Retryable(
value = {ServiceUnavailableException.class},
maxAttempts = 5,
backoff = @Backoff(
delay = 1000,
multiplier = 2.0,
maxDelay = 30000
)
)
public void callExternalService() {
// External service call
}
}
```
## Idempotent Retry
Ensure retries don't cause duplicate processing:
```java
@Service
public class IdempotentPaymentService {
private final PaymentRepository paymentRepository;
private final Map<String, PaymentResult> processedPayments = new ConcurrentHashMap<>();
public PaymentResult processPayment(String paymentId, BigDecimal amount) {
// Check if already processed
if (processedPayments.containsKey(paymentId)) {
return processedPayments.get(paymentId);
}
// Check database
Optional<Payment> existing = paymentRepository.findById(paymentId);
if (existing.isPresent()) {
return new PaymentResult(existing.get());
}
// Process payment
PaymentResult result = callPaymentGateway(paymentId, amount);
// Cache and persist
processedPayments.put(paymentId, result);
paymentRepository.save(new Payment(paymentId, amount, result.getStatus()));
return result;
}
}
```
## Global Exception Handler
Centralize error handling:
```java
@RestControllerAdvice
public class GlobalExceptionHandler {
@ExceptionHandler(SagaExecutionException.class)
public ResponseEntity<ErrorResponse> handleSagaError(
SagaExecutionException ex) {
return ResponseEntity
.status(HttpStatus.UNPROCESSABLE_ENTITY)
.body(new ErrorResponse(
"SAGA_EXECUTION_FAILED",
ex.getMessage(),
ex.getSagaId()
));
}
@ExceptionHandler(ServiceUnavailableException.class)
public ResponseEntity<ErrorResponse> handleServiceUnavailable(
ServiceUnavailableException ex) {
return ResponseEntity
.status(HttpStatus.SERVICE_UNAVAILABLE)
.body(new ErrorResponse(
"SERVICE_UNAVAILABLE",
"Required service is temporarily unavailable"
));
}
@ExceptionHandler(TimeoutException.class)
public ResponseEntity<ErrorResponse> handleTimeout(
TimeoutException ex) {
return ResponseEntity
.status(HttpStatus.REQUEST_TIMEOUT)
.body(new ErrorResponse(
"REQUEST_TIMEOUT",
"Request timed out after " + ex.getDuration()
));
}
}
public record ErrorResponse(
String code,
String message,
String details
) {
public ErrorResponse(String code, String message) {
this(code, message, null);
}
}
```
## Monitoring Error Rates
Track failure metrics:
```java
@Component
public class SagaErrorMetrics {
private final MeterRegistry meterRegistry;
public SagaErrorMetrics(MeterRegistry meterRegistry) {
this.meterRegistry = meterRegistry;
}
public void recordSagaFailure(String sagaType) {
Counter.builder("saga.failure")
.tag("type", sagaType)
.register(meterRegistry)
.increment();
}
public void recordRetry(String sagaType) {
Counter.builder("saga.retry")
.tag("type", sagaType)
.register(meterRegistry)
.increment();
}
public void recordTimeout(String sagaType) {
Counter.builder("saga.timeout")
.tag("type", sagaType)
.register(meterRegistry)
.increment();
}
}
```

View File

@@ -0,0 +1,320 @@
# Testing Strategies for Sagas
## Unit Testing Saga Logic
Test saga behavior with Axon test fixtures:
```java
@Test
void shouldDispatchPaymentCommandWhenOrderCreated() {
// Arrange
String orderId = UUID.randomUUID().toString();
String paymentId = UUID.randomUUID().toString();
SagaTestFixture<OrderSaga> fixture = new SagaTestFixture<>(OrderSaga.class);
// Act & Assert
fixture
.givenNoPriorActivity()
.whenPublishingA(new OrderCreatedEvent(orderId, BigDecimal.TEN, "item-1"))
.expectDispatchedCommands(new ProcessPaymentCommand(paymentId, orderId, BigDecimal.TEN));
}
@Test
void shouldCompensateWhenPaymentFails() {
String orderId = UUID.randomUUID().toString();
String paymentId = UUID.randomUUID().toString();
SagaTestFixture<OrderSaga> fixture = new SagaTestFixture<>(OrderSaga.class);
fixture
.givenNoPriorActivity()
.whenPublishingA(new OrderCreatedEvent(orderId, BigDecimal.TEN, "item-1"))
.whenPublishingA(new PaymentFailedEvent(paymentId, orderId, "item-1", "Insufficient funds"))
.expectDispatchedCommands(new CancelOrderCommand(orderId))
.expectScheduledEventOfType(OrderSaga.class, null);
}
```
## Testing Event Publishing
Verify events are published correctly:
```java
@SpringBootTest
@WebMvcTest
class OrderServiceTest {
@MockBean
private EventPublisher eventPublisher;
@InjectMocks
private OrderService orderService;
@Test
void shouldPublishOrderCreatedEvent() {
// Arrange
CreateOrderRequest request = new CreateOrderRequest("cust-1", BigDecimal.TEN);
// Act
String orderId = orderService.createOrder(request);
// Assert
verify(eventPublisher).publish(
argThat(event -> event instanceof OrderCreatedEvent &&
((OrderCreatedEvent) event).orderId().equals(orderId))
);
}
}
```
## Integration Testing with Testcontainers
Test complete saga flow with real services:
```java
@SpringBootTest
@Testcontainers
class SagaIntegrationTest {
@Container
static KafkaContainer kafka = new KafkaContainer(
DockerImageName.parse("confluentinc/cp-kafka:7.4.0")
);
@Container
static PostgreSQLContainer<?> postgres = new PostgreSQLContainer<>(
"postgres:15-alpine"
);
@DynamicPropertySource
static void overrideProperties(DynamicPropertyRegistry registry) {
registry.add("spring.kafka.bootstrap-servers", kafka::getBootstrapServers);
registry.add("spring.datasource.url", postgres::getJdbcUrl);
registry.add("spring.datasource.username", postgres::getUsername);
registry.add("spring.datasource.password", postgres::getPassword);
}
@Test
void shouldCompleteOrderSagaSuccessfully(@Autowired OrderService orderService,
@Autowired OrderRepository orderRepository,
@Autowired EventPublisher eventPublisher) {
// Arrange
CreateOrderRequest request = new CreateOrderRequest("cust-1", BigDecimal.TEN);
// Act
String orderId = orderService.createOrder(request);
// Wait for async processing
Thread.sleep(2000);
// Assert
Order order = orderRepository.findById(orderId).orElseThrow();
assertThat(order.getStatus()).isEqualTo(OrderStatus.COMPLETED);
}
}
```
## Testing Idempotency
Verify operations produce same results on retry:
```java
@Test
void compensationShouldBeIdempotent() {
// Arrange
String paymentId = "payment-123";
Payment payment = new Payment(paymentId, "order-1", BigDecimal.TEN);
paymentRepository.save(payment);
// Act - First compensation
paymentService.cancelPayment(paymentId);
Payment firstResult = paymentRepository.findById(paymentId).orElseThrow();
// Act - Second compensation (should be idempotent)
paymentService.cancelPayment(paymentId);
Payment secondResult = paymentRepository.findById(paymentId).orElseThrow();
// Assert
assertThat(firstResult).isEqualTo(secondResult);
assertThat(secondResult.getStatus()).isEqualTo(PaymentStatus.CANCELLED);
assertThat(secondResult.getVersion()).isEqualTo(firstResult.getVersion());
}
```
## Testing Concurrent Sagas
Verify saga isolation under concurrent execution:
```java
@Test
void shouldHandleConcurrentSagaExecutions() throws InterruptedException {
// Arrange
int numThreads = 10;
ExecutorService executor = Executors.newFixedThreadPool(numThreads);
CountDownLatch latch = new CountDownLatch(numThreads);
// Act
for (int i = 0; i < numThreads; i++) {
final int index = i;
executor.submit(() -> {
try {
CreateOrderRequest request = new CreateOrderRequest(
"cust-" + index,
BigDecimal.TEN.multiply(BigDecimal.valueOf(index))
);
orderService.createOrder(request);
} finally {
latch.countDown();
}
});
}
latch.await(10, TimeUnit.SECONDS);
// Assert
long createdOrders = orderRepository.count();
assertThat(createdOrders).isEqualTo(numThreads);
}
```
## Testing Failure Scenarios
Test each failure path and compensation:
```java
@Test
void shouldCompensateWhenInventoryUnavailable() {
// Arrange
String orderId = UUID.randomUUID().toString();
inventoryService.setAvailability("item-1", 0); // No inventory
// Act
String result = orderService.createOrder(
new CreateOrderRequest("cust-1", BigDecimal.TEN)
);
// Wait for saga completion
Thread.sleep(2000);
// Assert
Order order = orderRepository.findById(orderId).orElseThrow();
assertThat(order.getStatus()).isEqualTo(OrderStatus.CANCELLED);
// Verify payment was refunded
Payment payment = paymentRepository.findByOrderId(orderId).orElseThrow();
assertThat(payment.getStatus()).isEqualTo(PaymentStatus.REFUNDED);
}
@Test
void shouldHandlePaymentGatewayFailure() {
// Arrange
paymentGateway.setFailureRate(1.0); // 100% failure
// Act
String orderId = orderService.createOrder(
new CreateOrderRequest("cust-1", BigDecimal.TEN)
);
// Wait for saga completion
Thread.sleep(2000);
// Assert
Order order = orderRepository.findById(orderId).orElseThrow();
assertThat(order.getStatus()).isEqualTo(OrderStatus.CANCELLED);
}
```
## Testing State Machine
Verify state transitions:
```java
@Test
void shouldTransitionStatesProperly() {
// Arrange
String sagaId = UUID.randomUUID().toString();
SagaState sagaState = new SagaState(sagaId, SagaStatus.STARTED);
sagaStateRepository.save(sagaState);
// Act & Assert
assertThat(sagaState.getStatus()).isEqualTo(SagaStatus.STARTED);
sagaState.setStatus(SagaStatus.PROCESSING);
sagaStateRepository.save(sagaState);
assertThat(sagaStateRepository.findById(sagaId).get().getStatus())
.isEqualTo(SagaStatus.PROCESSING);
sagaState.setStatus(SagaStatus.COMPLETED);
sagaStateRepository.save(sagaState);
assertThat(sagaStateRepository.findById(sagaId).get().getStatus())
.isEqualTo(SagaStatus.COMPLETED);
}
```
## Test Data Builders
Use builders for cleaner test code:
```java
public class OrderRequestBuilder {
private String customerId = "cust-default";
private BigDecimal totalAmount = BigDecimal.TEN;
private List<OrderItem> items = new ArrayList<>();
public OrderRequestBuilder withCustomerId(String customerId) {
this.customerId = customerId;
return this;
}
public OrderRequestBuilder withAmount(BigDecimal amount) {
this.totalAmount = amount;
return this;
}
public OrderRequestBuilder withItem(String productId, int quantity) {
items.add(new OrderItem(productId, "Product", quantity, BigDecimal.TEN));
return this;
}
public CreateOrderRequest build() {
return new CreateOrderRequest(customerId, totalAmount, items);
}
}
@Test
void shouldCreateOrderWithCustomization() {
CreateOrderRequest request = new OrderRequestBuilder()
.withCustomerId("customer-123")
.withAmount(BigDecimal.valueOf(50))
.withItem("product-1", 2)
.withItem("product-2", 1)
.build();
String orderId = orderService.createOrder(request);
assertThat(orderId).isNotNull();
}
```
## Performance Testing
Measure saga execution time:
```java
@Test
void shouldCompleteOrderSagaWithinTimeLimit() {
// Arrange
CreateOrderRequest request = new CreateOrderRequest("cust-1", BigDecimal.TEN);
long maxDurationMs = 5000; // 5 seconds
// Act
Instant start = Instant.now();
String orderId = orderService.createOrder(request);
Instant end = Instant.now();
// Assert
long duration = Duration.between(start, end).toMillis();
assertThat(duration).isLessThan(maxDurationMs);
}
```

View File

@@ -0,0 +1,471 @@
# Common Pitfalls and Solutions
## Pitfall 1: Lost Messages
### Problem
Messages get lost due to broker failures, network issues, or consumer crashes before acknowledgment.
### Solution
Use persistent messages with acknowledgments:
```java
@Bean
public ProducerFactory<String, Object> producerFactory() {
Map<String, Object> config = new HashMap<>();
config.put(ProducerConfig.ACKS_CONFIG, "all"); // All replicas must acknowledge
config.put(ProducerConfig.RETRIES_CONFIG, 3); // Retry failed sends
config.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true); // Prevent duplicates
return new DefaultKafkaProducerFactory<>(config);
}
@Bean
public ConsumerFactory<String, Object> consumerFactory() {
Map<String, Object> config = new HashMap<>();
config.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false); // Manual commit
return new DefaultKafkaConsumerFactory<>(config);
}
```
### Prevention Checklist
- ✓ Configure producer to wait for all replicas (`acks=all`)
- ✓ Enable idempotence to prevent duplicate messages
- ✓ Use manual commit for consumers
- ✓ Monitor message lag and broker health
- ✓ Use transactional outbox pattern
---
## Pitfall 2: Duplicate Processing
### Problem
Same message processed multiple times due to failed acknowledgments or retries, causing side effects.
### Solution
Implement idempotency with deduplication:
```java
@Service
public class DeduplicationService {
private final DeduplicationRepository repository;
public boolean isDuplicate(String messageId) {
return repository.existsById(messageId);
}
public void recordProcessed(String messageId) {
DeduplicatedMessage entry = new DeduplicatedMessage(
messageId,
Instant.now()
);
repository.save(entry);
}
}
@Component
public class PaymentEventListener {
private final DeduplicationService deduplicationService;
private final PaymentService paymentService;
@Bean
public Consumer<PaymentEvent> handlePaymentEvent() {
return event -> {
String messageId = event.getMessageId();
if (deduplicationService.isDuplicate(messageId)) {
logger.info("Duplicate message ignored: {}", messageId);
return;
}
paymentService.processPayment(event);
deduplicationService.recordProcessed(messageId);
};
}
}
```
### Prevention Checklist
- ✓ Add unique message ID to all events
- ✓ Implement deduplication cache/database
- ✓ Make all operations idempotent
- ✓ Use version control for entity updates
- ✓ Test with message replay
---
## Pitfall 3: Saga State Inconsistency
### Problem
Saga state in database doesn't match actual service states, leading to orphaned or stuck sagas.
### Solution
Use event sourcing or state reconciliation:
```java
@Service
public class SagaStateReconciler {
private final SagaStateRepository stateRepository;
private final OrderRepository orderRepository;
private final PaymentRepository paymentRepository;
@Scheduled(fixedDelay = 60000) // Run every minute
public void reconcileSagaStates() {
List<SagaState> processingSagas = stateRepository
.findByStatus(SagaStatus.PROCESSING);
processingSagas.forEach(saga -> {
if (isActuallyCompleted(saga)) {
logger.info("Reconciling saga {} - marking as completed", saga.getSagaId());
saga.setStatus(SagaStatus.COMPLETED);
saga.setCompletedAt(Instant.now());
stateRepository.save(saga);
}
});
}
private boolean isActuallyCompleted(SagaState saga) {
String orderId = saga.getSagaId();
Order order = orderRepository.findById(orderId).orElse(null);
if (order == null || order.getStatus() != OrderStatus.COMPLETED) {
return false;
}
Payment payment = paymentRepository.findByOrderId(orderId).orElse(null);
if (payment == null || payment.getStatus() != PaymentStatus.PROCESSED) {
return false;
}
return true;
}
}
```
### Prevention Checklist
- ✓ Use event sourcing for complete audit trail
- ✓ Implement state reconciliation job
- ✓ Add health checks for saga coordinator
- ✓ Monitor saga state transitions
- ✓ Persist compensation steps
---
## Pitfall 4: Orchestrator Single Point of Failure
### Problem
Orchestration-based saga fails when orchestrator is down, blocking all sagas.
### Solution
Implement clustering and failover:
```java
@Configuration
public class SagaOrchestratorClusterConfig {
@Bean
public SagaStateRepository sagaStateRepository() {
// Use shared database for cluster-wide state
return new DatabaseSagaStateRepository();
}
@Bean
@Primary
public CommandGateway clusterAwareCommandGateway(
CommandBus commandBus) {
return new ClusterAwareCommandGateway(commandBus);
}
}
@Component
public class OrchestratorHealthCheck extends AbstractHealthIndicator {
private final SagaStateRepository repository;
@Override
protected void doHealthCheck(Health.Builder builder) {
long stuckSagas = repository.countStuckSagas(Duration.ofMinutes(30));
if (stuckSagas > 100) {
builder.down()
.withDetail("stuckSagas", stuckSagas)
.withDetail("severity", "critical");
} else if (stuckSagas > 10) {
builder.degraded()
.withDetail("stuckSagas", stuckSagas)
.withDetail("severity", "warning");
} else {
builder.up()
.withDetail("stuckSagas", stuckSagas);
}
}
}
```
### Prevention Checklist
- ✓ Deploy orchestrator in cluster with shared state
- ✓ Use distributed coordination (ZooKeeper, Consul)
- ✓ Implement heartbeat monitoring
- ✓ Set up automatic failover
- ✓ Use circuit breakers for service calls
---
## Pitfall 5: Non-Idempotent Compensations
### Problem
Compensation logic fails on retry because it's not idempotent, leaving system in inconsistent state.
### Solution
Design all compensations to be idempotent:
```java
// Bad - Not idempotent
@Service
public class BadPaymentService {
public void refundPayment(String paymentId) {
Payment payment = paymentRepository.findById(paymentId).orElseThrow();
payment.setStatus(PaymentStatus.REFUNDED);
paymentRepository.save(payment);
// If this fails partway, retry causes problems
externalPaymentGateway.refund(payment.getTransactionId());
}
}
// Good - Idempotent
@Service
public class GoodPaymentService {
public void refundPayment(String paymentId) {
Payment payment = paymentRepository.findById(paymentId)
.orElse(null);
if (payment == null) {
// Already deleted or doesn't exist
logger.info("Payment {} not found, skipping refund", paymentId);
return;
}
if (payment.getStatus() == PaymentStatus.REFUNDED) {
// Already refunded
logger.info("Payment {} already refunded", paymentId);
return;
}
try {
externalPaymentGateway.refund(payment.getTransactionId());
payment.setStatus(PaymentStatus.REFUNDED);
paymentRepository.save(payment);
} catch (Exception e) {
logger.error("Refund failed, will retry", e);
throw e;
}
}
}
```
### Prevention Checklist
- ✓ Check current state before making changes
- ✓ Use status flags to track compensation completion
- ✓ Make database updates idempotent
- ✓ Test compensation with replays
- ✓ Document compensation logic
---
## Pitfall 6: Missing Timeouts
### Problem
Sagas hang indefinitely waiting for events that never arrive due to service failures.
### Solution
Implement timeout mechanisms:
```java
@Configuration
public class SagaTimeoutConfig {
@Bean
public SagaLifecycle sagaLifecycle(SagaStateRepository repository) {
return new SagaLifecycle() {
@Override
public void onSagaFinished(Saga saga) {
// Update saga state
}
};
}
}
@Saga
public class OrderSaga {
@Autowired
private transient CommandGateway commandGateway;
private String orderId;
private String paymentId;
private DeadlineManager deadlineManager;
@StartSaga
@SagaEventHandler(associationProperty = "orderId")
public void handle(OrderCreatedEvent event) {
this.orderId = event.orderId();
// Schedule timeout for payment processing
deadlineManager.scheduleDeadline(
Duration.ofSeconds(30),
"PaymentTimeout",
orderId
);
commandGateway.send(new ProcessPaymentCommand(...));
}
@DeadlineHandler(deadlineName = "PaymentTimeout")
public void handlePaymentTimeout() {
logger.warn("Payment processing timed out for order {}", orderId);
// Compensate
commandGateway.send(new CancelOrderCommand(orderId));
end();
}
@SagaEventHandler(associationProperty = "orderId")
public void handle(PaymentProcessedEvent event) {
// Cancel timeout
deadlineManager.cancelDeadline("PaymentTimeout", orderId);
// Continue saga...
}
}
```
### Prevention Checklist
- ✓ Set timeout for each saga step
- ✓ Use deadline manager to track timeouts
- ✓ Cancel timeouts when step completes
- ✓ Log timeout events
- ✓ Alert operations on repeated timeouts
---
## Pitfall 7: Tight Coupling Between Services
### Problem
Saga logic couples services tightly, making independent deployment impossible.
### Solution
Use event-driven communication:
```java
// Bad - Tight coupling
@Service
public class TightlyAgedOrderService {
public void createOrder(OrderRequest request) {
Order order = orderRepository.save(new Order(...));
// Direct coupling to payment service
paymentService.processPayment(order.getId(), request.getAmount());
}
}
// Good - Event-driven
@Service
public class LooselyAgedOrderService {
public void createOrder(OrderRequest request) {
Order order = orderRepository.save(new Order(...));
// Publish event - services listen independently
eventPublisher.publish(new OrderCreatedEvent(
order.getId(),
request.getAmount()
));
}
}
@Component
public class PaymentServiceListener {
@Bean
public Consumer<OrderCreatedEvent> handleOrderCreated() {
return event -> {
// Payment service can be deployed independently
paymentService.processPayment(
event.orderId(),
event.amount()
);
};
}
}
```
### Prevention Checklist
- ✓ Use events for inter-service communication
- ✓ Avoid direct service-to-service calls
- ✓ Define clear contracts for events
- ✓ Version events for backward compatibility
- ✓ Deploy services independently
---
## Pitfall 8: Inadequate Monitoring
### Problem
Sagas fail silently or get stuck without visibility, making troubleshooting impossible.
### Solution
Implement comprehensive monitoring:
```java
@Component
public class SagaMonitoring {
private final MeterRegistry meterRegistry;
@Bean
public MeterBinder sagaMetrics(SagaStateRepository repository) {
return (registry) -> {
Gauge.builder("saga.active", repository::countByStatus)
.description("Number of active sagas")
.register(registry);
Gauge.builder("saga.stuck", () ->
repository.countStuckSagas(Duration.ofMinutes(30)))
.description("Number of stuck sagas")
.register(registry);
};
}
public void recordSagaStart(String sagaType) {
Counter.builder("saga.started")
.tag("type", sagaType)
.register(meterRegistry)
.increment();
}
public void recordSagaCompletion(String sagaType, long durationMs) {
Timer.builder("saga.duration")
.tag("type", sagaType)
.publishPercentiles(0.5, 0.95, 0.99)
.register(meterRegistry)
.record(Duration.ofMillis(durationMs));
}
public void recordSagaFailure(String sagaType, String reason) {
Counter.builder("saga.failed")
.tag("type", sagaType)
.tag("reason", reason)
.register(meterRegistry)
.increment();
}
}
```
### Prevention Checklist
- ✓ Track saga state transitions
- ✓ Monitor step execution times
- ✓ Alert on stuck sagas
- ✓ Log all failures with details
- ✓ Use distributed tracing (Sleuth, Zipkin)
- ✓ Create dashboards for visibility

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff