Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:33:10 +08:00
commit 3fb1c70e0e
21 changed files with 6250 additions and 0 deletions

View File

@@ -0,0 +1,527 @@
---
name: api-design-principles
description: Master REST and GraphQL API design principles to build intuitive, scalable, and maintainable APIs that delight developers. Use when designing new APIs, reviewing API specifications, or establishing API design standards.
---
# API Design Principles
Master REST and GraphQL API design principles to build intuitive, scalable, and maintainable APIs that delight developers and stand the test of time.
## When to Use This Skill
- Designing new REST or GraphQL APIs
- Refactoring existing APIs for better usability
- Establishing API design standards for your team
- Reviewing API specifications before implementation
- Migrating between API paradigms (REST to GraphQL, etc.)
- Creating developer-friendly API documentation
- Optimizing APIs for specific use cases (mobile, third-party integrations)
## Core Concepts
### 1. RESTful Design Principles
**Resource-Oriented Architecture**
- Resources are nouns (users, orders, products), not verbs
- Use HTTP methods for actions (GET, POST, PUT, PATCH, DELETE)
- URLs represent resource hierarchies
- Consistent naming conventions
**HTTP Methods Semantics:**
- `GET`: Retrieve resources (idempotent, safe)
- `POST`: Create new resources
- `PUT`: Replace entire resource (idempotent)
- `PATCH`: Partial resource updates
- `DELETE`: Remove resources (idempotent)
### 2. GraphQL Design Principles
**Schema-First Development**
- Types define your domain model
- Queries for reading data
- Mutations for modifying data
- Subscriptions for real-time updates
**Query Structure:**
- Clients request exactly what they need
- Single endpoint, multiple operations
- Strongly typed schema
- Introspection built-in
### 3. API Versioning Strategies
**URL Versioning:**
```
/api/v1/users
/api/v2/users
```
**Header Versioning:**
```
Accept: application/vnd.api+json; version=1
```
**Query Parameter Versioning:**
```
/api/users?version=1
```
## REST API Design Patterns
### Pattern 1: Resource Collection Design
```python
# Good: Resource-oriented endpoints
GET /api/users # List users (with pagination)
POST /api/users # Create user
GET /api/users/{id} # Get specific user
PUT /api/users/{id} # Replace user
PATCH /api/users/{id} # Update user fields
DELETE /api/users/{id} # Delete user
# Nested resources
GET /api/users/{id}/orders # Get user's orders
POST /api/users/{id}/orders # Create order for user
# Bad: Action-oriented endpoints (avoid)
POST /api/createUser
POST /api/getUserById
POST /api/deleteUser
```
### Pattern 2: Pagination and Filtering
```python
from typing import List, Optional
from pydantic import BaseModel, Field
class PaginationParams(BaseModel):
page: int = Field(1, ge=1, description="Page number")
page_size: int = Field(20, ge=1, le=100, description="Items per page")
class FilterParams(BaseModel):
status: Optional[str] = None
created_after: Optional[str] = None
search: Optional[str] = None
class PaginatedResponse(BaseModel):
items: List[dict]
total: int
page: int
page_size: int
pages: int
@property
def has_next(self) -> bool:
return self.page < self.pages
@property
def has_prev(self) -> bool:
return self.page > 1
# FastAPI endpoint example
from fastapi import FastAPI, Query, Depends
app = FastAPI()
@app.get("/api/users", response_model=PaginatedResponse)
async def list_users(
page: int = Query(1, ge=1),
page_size: int = Query(20, ge=1, le=100),
status: Optional[str] = Query(None),
search: Optional[str] = Query(None)
):
# Apply filters
query = build_query(status=status, search=search)
# Count total
total = await count_users(query)
# Fetch page
offset = (page - 1) * page_size
users = await fetch_users(query, limit=page_size, offset=offset)
return PaginatedResponse(
items=users,
total=total,
page=page,
page_size=page_size,
pages=(total + page_size - 1) // page_size
)
```
### Pattern 3: Error Handling and Status Codes
```python
from fastapi import HTTPException, status
from pydantic import BaseModel
class ErrorResponse(BaseModel):
error: str
message: str
details: Optional[dict] = None
timestamp: str
path: str
class ValidationErrorDetail(BaseModel):
field: str
message: str
value: Any
# Consistent error responses
STATUS_CODES = {
"success": 200,
"created": 201,
"no_content": 204,
"bad_request": 400,
"unauthorized": 401,
"forbidden": 403,
"not_found": 404,
"conflict": 409,
"unprocessable": 422,
"internal_error": 500
}
def raise_not_found(resource: str, id: str):
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail={
"error": "NotFound",
"message": f"{resource} not found",
"details": {"id": id}
}
)
def raise_validation_error(errors: List[ValidationErrorDetail]):
raise HTTPException(
status_code=status.HTTP_422_UNPROCESSABLE_ENTITY,
detail={
"error": "ValidationError",
"message": "Request validation failed",
"details": {"errors": [e.dict() for e in errors]}
}
)
# Example usage
@app.get("/api/users/{user_id}")
async def get_user(user_id: str):
user = await fetch_user(user_id)
if not user:
raise_not_found("User", user_id)
return user
```
### Pattern 4: HATEOAS (Hypermedia as the Engine of Application State)
```python
class UserResponse(BaseModel):
id: str
name: str
email: str
_links: dict
@classmethod
def from_user(cls, user: User, base_url: str):
return cls(
id=user.id,
name=user.name,
email=user.email,
_links={
"self": {"href": f"{base_url}/api/users/{user.id}"},
"orders": {"href": f"{base_url}/api/users/{user.id}/orders"},
"update": {
"href": f"{base_url}/api/users/{user.id}",
"method": "PATCH"
},
"delete": {
"href": f"{base_url}/api/users/{user.id}",
"method": "DELETE"
}
}
)
```
## GraphQL Design Patterns
### Pattern 1: Schema Design
```graphql
# schema.graphql
# Clear type definitions
type User {
id: ID!
email: String!
name: String!
createdAt: DateTime!
# Relationships
orders(
first: Int = 20
after: String
status: OrderStatus
): OrderConnection!
profile: UserProfile
}
type Order {
id: ID!
status: OrderStatus!
total: Money!
items: [OrderItem!]!
createdAt: DateTime!
# Back-reference
user: User!
}
# Pagination pattern (Relay-style)
type OrderConnection {
edges: [OrderEdge!]!
pageInfo: PageInfo!
totalCount: Int!
}
type OrderEdge {
node: Order!
cursor: String!
}
type PageInfo {
hasNextPage: Boolean!
hasPreviousPage: Boolean!
startCursor: String
endCursor: String
}
# Enums for type safety
enum OrderStatus {
PENDING
CONFIRMED
SHIPPED
DELIVERED
CANCELLED
}
# Custom scalars
scalar DateTime
scalar Money
# Query root
type Query {
user(id: ID!): User
users(
first: Int = 20
after: String
search: String
): UserConnection!
order(id: ID!): Order
}
# Mutation root
type Mutation {
createUser(input: CreateUserInput!): CreateUserPayload!
updateUser(input: UpdateUserInput!): UpdateUserPayload!
deleteUser(id: ID!): DeleteUserPayload!
createOrder(input: CreateOrderInput!): CreateOrderPayload!
}
# Input types for mutations
input CreateUserInput {
email: String!
name: String!
password: String!
}
# Payload types for mutations
type CreateUserPayload {
user: User
errors: [Error!]
}
type Error {
field: String
message: String!
}
```
### Pattern 2: Resolver Design
```python
from typing import Optional, List
from ariadne import QueryType, MutationType, ObjectType
from dataclasses import dataclass
query = QueryType()
mutation = MutationType()
user_type = ObjectType("User")
@query.field("user")
async def resolve_user(obj, info, id: str) -> Optional[dict]:
"""Resolve single user by ID."""
return await fetch_user_by_id(id)
@query.field("users")
async def resolve_users(
obj,
info,
first: int = 20,
after: Optional[str] = None,
search: Optional[str] = None
) -> dict:
"""Resolve paginated user list."""
# Decode cursor
offset = decode_cursor(after) if after else 0
# Fetch users
users = await fetch_users(
limit=first + 1, # Fetch one extra to check hasNextPage
offset=offset,
search=search
)
# Pagination
has_next = len(users) > first
if has_next:
users = users[:first]
edges = [
{
"node": user,
"cursor": encode_cursor(offset + i)
}
for i, user in enumerate(users)
]
return {
"edges": edges,
"pageInfo": {
"hasNextPage": has_next,
"hasPreviousPage": offset > 0,
"startCursor": edges[0]["cursor"] if edges else None,
"endCursor": edges[-1]["cursor"] if edges else None
},
"totalCount": await count_users(search=search)
}
@user_type.field("orders")
async def resolve_user_orders(user: dict, info, first: int = 20) -> dict:
"""Resolve user's orders (N+1 prevention with DataLoader)."""
# Use DataLoader to batch requests
loader = info.context["loaders"]["orders_by_user"]
orders = await loader.load(user["id"])
return paginate_orders(orders, first)
@mutation.field("createUser")
async def resolve_create_user(obj, info, input: dict) -> dict:
"""Create new user."""
try:
# Validate input
validate_user_input(input)
# Create user
user = await create_user(
email=input["email"],
name=input["name"],
password=hash_password(input["password"])
)
return {
"user": user,
"errors": []
}
except ValidationError as e:
return {
"user": None,
"errors": [{"field": e.field, "message": e.message}]
}
```
### Pattern 3: DataLoader (N+1 Problem Prevention)
```python
from aiodataloader import DataLoader
from typing import List, Optional
class UserLoader(DataLoader):
"""Batch load users by ID."""
async def batch_load_fn(self, user_ids: List[str]) -> List[Optional[dict]]:
"""Load multiple users in single query."""
users = await fetch_users_by_ids(user_ids)
# Map results back to input order
user_map = {user["id"]: user for user in users}
return [user_map.get(user_id) for user_id in user_ids]
class OrdersByUserLoader(DataLoader):
"""Batch load orders by user ID."""
async def batch_load_fn(self, user_ids: List[str]) -> List[List[dict]]:
"""Load orders for multiple users in single query."""
orders = await fetch_orders_by_user_ids(user_ids)
# Group orders by user_id
orders_by_user = {}
for order in orders:
user_id = order["user_id"]
if user_id not in orders_by_user:
orders_by_user[user_id] = []
orders_by_user[user_id].append(order)
# Return in input order
return [orders_by_user.get(user_id, []) for user_id in user_ids]
# Context setup
def create_context():
return {
"loaders": {
"user": UserLoader(),
"orders_by_user": OrdersByUserLoader()
}
}
```
## Best Practices
### REST APIs
1. **Consistent Naming**: Use plural nouns for collections (`/users`, not `/user`)
2. **Stateless**: Each request contains all necessary information
3. **Use HTTP Status Codes Correctly**: 2xx success, 4xx client errors, 5xx server errors
4. **Version Your API**: Plan for breaking changes from day one
5. **Pagination**: Always paginate large collections
6. **Rate Limiting**: Protect your API with rate limits
7. **Documentation**: Use OpenAPI/Swagger for interactive docs
### GraphQL APIs
1. **Schema First**: Design schema before writing resolvers
2. **Avoid N+1**: Use DataLoaders for efficient data fetching
3. **Input Validation**: Validate at schema and resolver levels
4. **Error Handling**: Return structured errors in mutation payloads
5. **Pagination**: Use cursor-based pagination (Relay spec)
6. **Deprecation**: Use `@deprecated` directive for gradual migration
7. **Monitoring**: Track query complexity and execution time
## Common Pitfalls
- **Over-fetching/Under-fetching (REST)**: Fixed in GraphQL but requires DataLoaders
- **Breaking Changes**: Version APIs or use deprecation strategies
- **Inconsistent Error Formats**: Standardize error responses
- **Missing Rate Limits**: APIs without limits are vulnerable to abuse
- **Poor Documentation**: Undocumented APIs frustrate developers
- **Ignoring HTTP Semantics**: POST for idempotent operations breaks expectations
- **Tight Coupling**: API structure shouldn't mirror database schema
## Resources
- **references/rest-best-practices.md**: Comprehensive REST API design guide
- **references/graphql-schema-design.md**: GraphQL schema patterns and anti-patterns
- **references/api-versioning-strategies.md**: Versioning approaches and migration paths
- **assets/rest-api-template.py**: FastAPI REST API template
- **assets/graphql-schema-template.graphql**: Complete GraphQL schema example
- **assets/api-design-checklist.md**: Pre-implementation review checklist
- **scripts/openapi-generator.py**: Generate OpenAPI specs from code

View File

@@ -0,0 +1,136 @@
# API Design Checklist
## Pre-Implementation Review
### Resource Design
- [ ] Resources are nouns, not verbs
- [ ] Plural names for collections
- [ ] Consistent naming across all endpoints
- [ ] Clear resource hierarchy (avoid deep nesting >2 levels)
- [ ] All CRUD operations properly mapped to HTTP methods
### HTTP Methods
- [ ] GET for retrieval (safe, idempotent)
- [ ] POST for creation
- [ ] PUT for full replacement (idempotent)
- [ ] PATCH for partial updates
- [ ] DELETE for removal (idempotent)
### Status Codes
- [ ] 200 OK for successful GET/PATCH/PUT
- [ ] 201 Created for POST
- [ ] 204 No Content for DELETE
- [ ] 400 Bad Request for malformed requests
- [ ] 401 Unauthorized for missing auth
- [ ] 403 Forbidden for insufficient permissions
- [ ] 404 Not Found for missing resources
- [ ] 422 Unprocessable Entity for validation errors
- [ ] 429 Too Many Requests for rate limiting
- [ ] 500 Internal Server Error for server issues
### Pagination
- [ ] All collection endpoints paginated
- [ ] Default page size defined (e.g., 20)
- [ ] Maximum page size enforced (e.g., 100)
- [ ] Pagination metadata included (total, pages, etc.)
- [ ] Cursor-based or offset-based pattern chosen
### Filtering & Sorting
- [ ] Query parameters for filtering
- [ ] Sort parameter supported
- [ ] Search parameter for full-text search
- [ ] Field selection supported (sparse fieldsets)
### Versioning
- [ ] Versioning strategy defined (URL/header/query)
- [ ] Version included in all endpoints
- [ ] Deprecation policy documented
### Error Handling
- [ ] Consistent error response format
- [ ] Detailed error messages
- [ ] Field-level validation errors
- [ ] Error codes for client handling
- [ ] Timestamps in error responses
### Authentication & Authorization
- [ ] Authentication method defined (Bearer token, API key)
- [ ] Authorization checks on all endpoints
- [ ] 401 vs 403 used correctly
- [ ] Token expiration handled
### Rate Limiting
- [ ] Rate limits defined per endpoint/user
- [ ] Rate limit headers included
- [ ] 429 status code for exceeded limits
- [ ] Retry-After header provided
### Documentation
- [ ] OpenAPI/Swagger spec generated
- [ ] All endpoints documented
- [ ] Request/response examples provided
- [ ] Error responses documented
- [ ] Authentication flow documented
### Testing
- [ ] Unit tests for business logic
- [ ] Integration tests for endpoints
- [ ] Error scenarios tested
- [ ] Edge cases covered
- [ ] Performance tests for heavy endpoints
### Security
- [ ] Input validation on all fields
- [ ] SQL injection prevention
- [ ] XSS prevention
- [ ] CORS configured correctly
- [ ] HTTPS enforced
- [ ] Sensitive data not in URLs
- [ ] No secrets in responses
### Performance
- [ ] Database queries optimized
- [ ] N+1 queries prevented
- [ ] Caching strategy defined
- [ ] Cache headers set appropriately
- [ ] Large responses paginated
### Monitoring
- [ ] Logging implemented
- [ ] Error tracking configured
- [ ] Performance metrics collected
- [ ] Health check endpoint available
- [ ] Alerts configured for errors
## GraphQL-Specific Checks
### Schema Design
- [ ] Schema-first approach used
- [ ] Types properly defined
- [ ] Non-null vs nullable decided
- [ ] Interfaces/unions used appropriately
- [ ] Custom scalars defined
### Queries
- [ ] Query depth limiting
- [ ] Query complexity analysis
- [ ] DataLoaders prevent N+1
- [ ] Pagination pattern chosen (Relay/offset)
### Mutations
- [ ] Input types defined
- [ ] Payload types with errors
- [ ] Optimistic response support
- [ ] Idempotency considered
### Performance
- [ ] DataLoader for all relationships
- [ ] Query batching enabled
- [ ] Persisted queries considered
- [ ] Response caching implemented
### Documentation
- [ ] All fields documented
- [ ] Deprecations marked
- [ ] Examples provided
- [ ] Schema introspection enabled

View File

@@ -0,0 +1,165 @@
"""
Production-ready REST API template using FastAPI.
Includes pagination, filtering, error handling, and best practices.
"""
from fastapi import FastAPI, HTTPException, Query, Path, Depends, status
from fastapi.responses import JSONResponse
from pydantic import BaseModel, Field, EmailStr
from typing import Optional, List, Any
from datetime import datetime
from enum import Enum
app = FastAPI(
title="API Template",
version="1.0.0",
docs_url="/api/docs"
)
# Models
class UserStatus(str, Enum):
ACTIVE = "active"
INACTIVE = "inactive"
SUSPENDED = "suspended"
class UserBase(BaseModel):
email: EmailStr
name: str = Field(..., min_length=1, max_length=100)
status: UserStatus = UserStatus.ACTIVE
class UserCreate(UserBase):
password: str = Field(..., min_length=8)
class UserUpdate(BaseModel):
email: Optional[EmailStr] = None
name: Optional[str] = Field(None, min_length=1, max_length=100)
status: Optional[UserStatus] = None
class User(UserBase):
id: str
created_at: datetime
updated_at: datetime
class Config:
from_attributes = True
# Pagination
class PaginationParams(BaseModel):
page: int = Field(1, ge=1)
page_size: int = Field(20, ge=1, le=100)
class PaginatedResponse(BaseModel):
items: List[Any]
total: int
page: int
page_size: int
pages: int
# Error handling
class ErrorDetail(BaseModel):
field: Optional[str] = None
message: str
code: str
class ErrorResponse(BaseModel):
error: str
message: str
details: Optional[List[ErrorDetail]] = None
@app.exception_handler(HTTPException)
async def http_exception_handler(request, exc):
return JSONResponse(
status_code=exc.status_code,
content=ErrorResponse(
error=exc.__class__.__name__,
message=exc.detail if isinstance(exc.detail, str) else exc.detail.get("message", "Error"),
details=exc.detail.get("details") if isinstance(exc.detail, dict) else None
).dict()
)
# Endpoints
@app.get("/api/users", response_model=PaginatedResponse, tags=["Users"])
async def list_users(
page: int = Query(1, ge=1),
page_size: int = Query(20, ge=1, le=100),
status: Optional[UserStatus] = Query(None),
search: Optional[str] = Query(None)
):
"""List users with pagination and filtering."""
# Mock implementation
total = 100
items = [
User(
id=str(i),
email=f"user{i}@example.com",
name=f"User {i}",
status=UserStatus.ACTIVE,
created_at=datetime.now(),
updated_at=datetime.now()
).dict()
for i in range((page-1)*page_size, min(page*page_size, total))
]
return PaginatedResponse(
items=items,
total=total,
page=page,
page_size=page_size,
pages=(total + page_size - 1) // page_size
)
@app.post("/api/users", response_model=User, status_code=status.HTTP_201_CREATED, tags=["Users"])
async def create_user(user: UserCreate):
"""Create a new user."""
# Mock implementation
return User(
id="123",
email=user.email,
name=user.name,
status=user.status,
created_at=datetime.now(),
updated_at=datetime.now()
)
@app.get("/api/users/{user_id}", response_model=User, tags=["Users"])
async def get_user(user_id: str = Path(..., description="User ID")):
"""Get user by ID."""
# Mock: Check if exists
if user_id == "999":
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail={"message": "User not found", "details": {"id": user_id}}
)
return User(
id=user_id,
email="user@example.com",
name="User Name",
status=UserStatus.ACTIVE,
created_at=datetime.now(),
updated_at=datetime.now()
)
@app.patch("/api/users/{user_id}", response_model=User, tags=["Users"])
async def update_user(user_id: str, update: UserUpdate):
"""Partially update user."""
# Validate user exists
existing = await get_user(user_id)
# Apply updates
update_data = update.dict(exclude_unset=True)
for field, value in update_data.items():
setattr(existing, field, value)
existing.updated_at = datetime.now()
return existing
@app.delete("/api/users/{user_id}", status_code=status.HTTP_204_NO_CONTENT, tags=["Users"])
async def delete_user(user_id: str):
"""Delete user."""
await get_user(user_id) # Verify exists
return None
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)

View File

@@ -0,0 +1,566 @@
# GraphQL Schema Design Patterns
## Schema Organization
### Modular Schema Structure
```graphql
# user.graphql
type User {
id: ID!
email: String!
name: String!
posts: [Post!]!
}
extend type Query {
user(id: ID!): User
users(first: Int, after: String): UserConnection!
}
extend type Mutation {
createUser(input: CreateUserInput!): CreateUserPayload!
}
# post.graphql
type Post {
id: ID!
title: String!
content: String!
author: User!
}
extend type Query {
post(id: ID!): Post
}
```
## Type Design Patterns
### 1. Non-Null Types
```graphql
type User {
id: ID! # Always required
email: String! # Required
phone: String # Optional (nullable)
posts: [Post!]! # Non-null array of non-null posts
tags: [String!] # Nullable array of non-null strings
}
```
### 2. Interfaces for Polymorphism
```graphql
interface Node {
id: ID!
createdAt: DateTime!
}
type User implements Node {
id: ID!
createdAt: DateTime!
email: String!
}
type Post implements Node {
id: ID!
createdAt: DateTime!
title: String!
}
type Query {
node(id: ID!): Node
}
```
### 3. Unions for Heterogeneous Results
```graphql
union SearchResult = User | Post | Comment
type Query {
search(query: String!): [SearchResult!]!
}
# Query example
{
search(query: "graphql") {
... on User {
name
email
}
... on Post {
title
content
}
... on Comment {
text
author { name }
}
}
}
```
### 4. Input Types
```graphql
input CreateUserInput {
email: String!
name: String!
password: String!
profileInput: ProfileInput
}
input ProfileInput {
bio: String
avatar: String
website: String
}
input UpdateUserInput {
id: ID!
email: String
name: String
profileInput: ProfileInput
}
```
## Pagination Patterns
### Relay Cursor Pagination (Recommended)
```graphql
type UserConnection {
edges: [UserEdge!]!
pageInfo: PageInfo!
totalCount: Int!
}
type UserEdge {
node: User!
cursor: String!
}
type PageInfo {
hasNextPage: Boolean!
hasPreviousPage: Boolean!
startCursor: String
endCursor: String
}
type Query {
users(
first: Int
after: String
last: Int
before: String
): UserConnection!
}
# Usage
{
users(first: 10, after: "cursor123") {
edges {
cursor
node {
id
name
}
}
pageInfo {
hasNextPage
endCursor
}
}
}
```
### Offset Pagination (Simpler)
```graphql
type UserList {
items: [User!]!
total: Int!
page: Int!
pageSize: Int!
}
type Query {
users(page: Int = 1, pageSize: Int = 20): UserList!
}
```
## Mutation Design Patterns
### 1. Input/Payload Pattern
```graphql
input CreatePostInput {
title: String!
content: String!
tags: [String!]
}
type CreatePostPayload {
post: Post
errors: [Error!]
success: Boolean!
}
type Error {
field: String
message: String!
code: String!
}
type Mutation {
createPost(input: CreatePostInput!): CreatePostPayload!
}
```
### 2. Optimistic Response Support
```graphql
type UpdateUserPayload {
user: User
clientMutationId: String
errors: [Error!]
}
input UpdateUserInput {
id: ID!
name: String
clientMutationId: String
}
type Mutation {
updateUser(input: UpdateUserInput!): UpdateUserPayload!
}
```
### 3. Batch Mutations
```graphql
input BatchCreateUserInput {
users: [CreateUserInput!]!
}
type BatchCreateUserPayload {
results: [CreateUserResult!]!
successCount: Int!
errorCount: Int!
}
type CreateUserResult {
user: User
errors: [Error!]
index: Int!
}
type Mutation {
batchCreateUsers(input: BatchCreateUserInput!): BatchCreateUserPayload!
}
```
## Field Design
### Arguments and Filtering
```graphql
type Query {
posts(
# Pagination
first: Int = 20
after: String
# Filtering
status: PostStatus
authorId: ID
tag: String
# Sorting
orderBy: PostOrderBy = CREATED_AT
orderDirection: OrderDirection = DESC
# Searching
search: String
): PostConnection!
}
enum PostStatus {
DRAFT
PUBLISHED
ARCHIVED
}
enum PostOrderBy {
CREATED_AT
UPDATED_AT
TITLE
}
enum OrderDirection {
ASC
DESC
}
```
### Computed Fields
```graphql
type User {
firstName: String!
lastName: String!
fullName: String! # Computed in resolver
posts: [Post!]!
postCount: Int! # Computed, doesn't load all posts
}
type Post {
likeCount: Int!
commentCount: Int!
isLikedByViewer: Boolean! # Context-dependent
}
```
## Subscriptions
```graphql
type Subscription {
postAdded: Post!
postUpdated(postId: ID!): Post!
userStatusChanged(userId: ID!): UserStatus!
}
type UserStatus {
userId: ID!
online: Boolean!
lastSeen: DateTime!
}
# Client usage
subscription {
postAdded {
id
title
author {
name
}
}
}
```
## Custom Scalars
```graphql
scalar DateTime
scalar Email
scalar URL
scalar JSON
scalar Money
type User {
email: Email!
website: URL
createdAt: DateTime!
metadata: JSON
}
type Product {
price: Money!
}
```
## Directives
### Built-in Directives
```graphql
type User {
name: String!
email: String! @deprecated(reason: "Use emails field instead")
emails: [String!]!
# Conditional inclusion
privateData: PrivateData @include(if: $isOwner)
}
# Query
query GetUser($isOwner: Boolean!) {
user(id: "123") {
name
privateData @include(if: $isOwner) {
ssn
}
}
}
```
### Custom Directives
```graphql
directive @auth(requires: Role = USER) on FIELD_DEFINITION
enum Role {
USER
ADMIN
MODERATOR
}
type Mutation {
deleteUser(id: ID!): Boolean! @auth(requires: ADMIN)
updateProfile(input: ProfileInput!): User! @auth
}
```
## Error Handling
### Union Error Pattern
```graphql
type User {
id: ID!
email: String!
}
type ValidationError {
field: String!
message: String!
}
type NotFoundError {
message: String!
resourceType: String!
resourceId: ID!
}
type AuthorizationError {
message: String!
}
union UserResult = User | ValidationError | NotFoundError | AuthorizationError
type Query {
user(id: ID!): UserResult!
}
# Usage
{
user(id: "123") {
... on User {
id
email
}
... on NotFoundError {
message
resourceType
}
... on AuthorizationError {
message
}
}
}
```
### Errors in Payload
```graphql
type CreateUserPayload {
user: User
errors: [Error!]
success: Boolean!
}
type Error {
field: String
message: String!
code: ErrorCode!
}
enum ErrorCode {
VALIDATION_ERROR
UNAUTHORIZED
NOT_FOUND
INTERNAL_ERROR
}
```
## N+1 Query Problem Solutions
### DataLoader Pattern
```python
from aiodataloader import DataLoader
class PostLoader(DataLoader):
async def batch_load_fn(self, post_ids):
posts = await db.posts.find({"id": {"$in": post_ids}})
post_map = {post["id"]: post for post in posts}
return [post_map.get(pid) for pid in post_ids]
# Resolver
@user_type.field("posts")
async def resolve_posts(user, info):
loader = info.context["loaders"]["post"]
return await loader.load_many(user["post_ids"])
```
### Query Depth Limiting
```python
from graphql import GraphQLError
def depth_limit_validator(max_depth: int):
def validate(context, node, ancestors):
depth = len(ancestors)
if depth > max_depth:
raise GraphQLError(
f"Query depth {depth} exceeds maximum {max_depth}"
)
return validate
```
### Query Complexity Analysis
```python
def complexity_limit_validator(max_complexity: int):
def calculate_complexity(node):
# Each field = 1, lists multiply
complexity = 1
if is_list_field(node):
complexity *= get_list_size_arg(node)
return complexity
return validate_complexity
```
## Schema Versioning
### Field Deprecation
```graphql
type User {
name: String! @deprecated(reason: "Use firstName and lastName")
firstName: String!
lastName: String!
}
```
### Schema Evolution
```graphql
# v1 - Initial
type User {
name: String!
}
# v2 - Add optional field (backward compatible)
type User {
name: String!
email: String
}
# v3 - Deprecate and add new field
type User {
name: String! @deprecated(reason: "Use firstName/lastName")
firstName: String!
lastName: String!
email: String
}
```
## Best Practices Summary
1. **Nullable vs Non-Null**: Start nullable, make non-null when guaranteed
2. **Input Types**: Always use input types for mutations
3. **Payload Pattern**: Return errors in mutation payloads
4. **Pagination**: Use cursor-based for infinite scroll, offset for simple cases
5. **Naming**: Use camelCase for fields, PascalCase for types
6. **Deprecation**: Use `@deprecated` instead of removing fields
7. **DataLoaders**: Always use for relationships to prevent N+1
8. **Complexity Limits**: Protect against expensive queries
9. **Custom Scalars**: Use for domain-specific types (Email, DateTime)
10. **Documentation**: Document all fields with descriptions

View File

@@ -0,0 +1,385 @@
# REST API Best Practices
## URL Structure
### Resource Naming
```
# Good - Plural nouns
GET /api/users
GET /api/orders
GET /api/products
# Bad - Verbs or mixed conventions
GET /api/getUser
GET /api/user (inconsistent singular)
POST /api/createOrder
```
### Nested Resources
```
# Shallow nesting (preferred)
GET /api/users/{id}/orders
GET /api/orders/{id}
# Deep nesting (avoid)
GET /api/users/{id}/orders/{orderId}/items/{itemId}/reviews
# Better:
GET /api/order-items/{id}/reviews
```
## HTTP Methods and Status Codes
### GET - Retrieve Resources
```
GET /api/users → 200 OK (with list)
GET /api/users/{id} → 200 OK or 404 Not Found
GET /api/users?page=2 → 200 OK (paginated)
```
### POST - Create Resources
```
POST /api/users
Body: {"name": "John", "email": "john@example.com"}
→ 201 Created
Location: /api/users/123
Body: {"id": "123", "name": "John", ...}
POST /api/users (validation error)
→ 422 Unprocessable Entity
Body: {"errors": [...]}
```
### PUT - Replace Resources
```
PUT /api/users/{id}
Body: {complete user object}
→ 200 OK (updated)
→ 404 Not Found (doesn't exist)
# Must include ALL fields
```
### PATCH - Partial Update
```
PATCH /api/users/{id}
Body: {"name": "Jane"} (only changed fields)
→ 200 OK
→ 404 Not Found
```
### DELETE - Remove Resources
```
DELETE /api/users/{id}
→ 204 No Content (deleted)
→ 404 Not Found
→ 409 Conflict (can't delete due to references)
```
## Filtering, Sorting, and Searching
### Query Parameters
```
# Filtering
GET /api/users?status=active
GET /api/users?role=admin&status=active
# Sorting
GET /api/users?sort=created_at
GET /api/users?sort=-created_at (descending)
GET /api/users?sort=name,created_at
# Searching
GET /api/users?search=john
GET /api/users?q=john
# Field selection (sparse fieldsets)
GET /api/users?fields=id,name,email
```
## Pagination Patterns
### Offset-Based Pagination
```python
GET /api/users?page=2&page_size=20
Response:
{
"items": [...],
"page": 2,
"page_size": 20,
"total": 150,
"pages": 8
}
```
### Cursor-Based Pagination (for large datasets)
```python
GET /api/users?limit=20&cursor=eyJpZCI6MTIzfQ
Response:
{
"items": [...],
"next_cursor": "eyJpZCI6MTQzfQ",
"has_more": true
}
```
### Link Header Pagination (RESTful)
```
GET /api/users?page=2
Response Headers:
Link: <https://api.example.com/users?page=3>; rel="next",
<https://api.example.com/users?page=1>; rel="prev",
<https://api.example.com/users?page=1>; rel="first",
<https://api.example.com/users?page=8>; rel="last"
```
## Versioning Strategies
### URL Versioning (Recommended)
```
/api/v1/users
/api/v2/users
Pros: Clear, easy to route
Cons: Multiple URLs for same resource
```
### Header Versioning
```
GET /api/users
Accept: application/vnd.api+json; version=2
Pros: Clean URLs
Cons: Less visible, harder to test
```
### Query Parameter
```
GET /api/users?version=2
Pros: Easy to test
Cons: Optional parameter can be forgotten
```
## Rate Limiting
### Headers
```
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 742
X-RateLimit-Reset: 1640000000
Response when limited:
429 Too Many Requests
Retry-After: 3600
```
### Implementation Pattern
```python
from fastapi import HTTPException, Request
from datetime import datetime, timedelta
class RateLimiter:
def __init__(self, calls: int, period: int):
self.calls = calls
self.period = period
self.cache = {}
def check(self, key: str) -> bool:
now = datetime.now()
if key not in self.cache:
self.cache[key] = []
# Remove old requests
self.cache[key] = [
ts for ts in self.cache[key]
if now - ts < timedelta(seconds=self.period)
]
if len(self.cache[key]) >= self.calls:
return False
self.cache[key].append(now)
return True
limiter = RateLimiter(calls=100, period=60)
@app.get("/api/users")
async def get_users(request: Request):
if not limiter.check(request.client.host):
raise HTTPException(
status_code=429,
headers={"Retry-After": "60"}
)
return {"users": [...]}
```
## Authentication and Authorization
### Bearer Token
```
Authorization: Bearer eyJhbGciOiJIUzI1NiIs...
401 Unauthorized - Missing/invalid token
403 Forbidden - Valid token, insufficient permissions
```
### API Keys
```
X-API-Key: your-api-key-here
```
## Error Response Format
### Consistent Structure
```json
{
"error": {
"code": "VALIDATION_ERROR",
"message": "Request validation failed",
"details": [
{
"field": "email",
"message": "Invalid email format",
"value": "not-an-email"
}
],
"timestamp": "2025-10-16T12:00:00Z",
"path": "/api/users"
}
}
```
### Status Code Guidelines
- `200 OK`: Successful GET, PATCH, PUT
- `201 Created`: Successful POST
- `204 No Content`: Successful DELETE
- `400 Bad Request`: Malformed request
- `401 Unauthorized`: Authentication required
- `403 Forbidden`: Authenticated but not authorized
- `404 Not Found`: Resource doesn't exist
- `409 Conflict`: State conflict (duplicate email, etc.)
- `422 Unprocessable Entity`: Validation errors
- `429 Too Many Requests`: Rate limited
- `500 Internal Server Error`: Server error
- `503 Service Unavailable`: Temporary downtime
## Caching
### Cache Headers
```
# Client caching
Cache-Control: public, max-age=3600
# No caching
Cache-Control: no-cache, no-store, must-revalidate
# Conditional requests
ETag: "33a64df551425fcc55e4d42a148795d9f25f89d4"
If-None-Match: "33a64df551425fcc55e4d42a148795d9f25f89d4"
→ 304 Not Modified
```
## Bulk Operations
### Batch Endpoints
```python
POST /api/users/batch
{
"items": [
{"name": "User1", "email": "user1@example.com"},
{"name": "User2", "email": "user2@example.com"}
]
}
Response:
{
"results": [
{"id": "1", "status": "created"},
{"id": null, "status": "failed", "error": "Email already exists"}
]
}
```
## Idempotency
### Idempotency Keys
```
POST /api/orders
Idempotency-Key: unique-key-123
If duplicate request:
→ 200 OK (return cached response)
```
## CORS Configuration
```python
from fastapi.middleware.cors import CORSMiddleware
app.add_middleware(
CORSMiddleware,
allow_origins=["https://example.com"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
```
## Documentation with OpenAPI
```python
from fastapi import FastAPI
app = FastAPI(
title="My API",
description="API for managing users",
version="1.0.0",
docs_url="/docs",
redoc_url="/redoc"
)
@app.get(
"/api/users/{user_id}",
summary="Get user by ID",
response_description="User details",
tags=["Users"]
)
async def get_user(
user_id: str = Path(..., description="The user ID")
):
"""
Retrieve user by ID.
Returns full user profile including:
- Basic information
- Contact details
- Account status
"""
pass
```
## Health and Monitoring Endpoints
```python
@app.get("/health")
async def health_check():
return {
"status": "healthy",
"version": "1.0.0",
"timestamp": datetime.now().isoformat()
}
@app.get("/health/detailed")
async def detailed_health():
return {
"status": "healthy",
"checks": {
"database": await check_database(),
"redis": await check_redis(),
"external_api": await check_external_api()
}
}
```

View File

@@ -0,0 +1,487 @@
---
name: architecture-patterns
description: Implement proven backend architecture patterns including Clean Architecture, Hexagonal Architecture, and Domain-Driven Design. Use when architecting complex backend systems or refactoring existing applications for better maintainability.
---
# Architecture Patterns
Master proven backend architecture patterns including Clean Architecture, Hexagonal Architecture, and Domain-Driven Design to build maintainable, testable, and scalable systems.
## When to Use This Skill
- Designing new backend systems from scratch
- Refactoring monolithic applications for better maintainability
- Establishing architecture standards for your team
- Migrating from tightly coupled to loosely coupled architectures
- Implementing domain-driven design principles
- Creating testable and mockable codebases
- Planning microservices decomposition
## Core Concepts
### 1. Clean Architecture (Uncle Bob)
**Layers (dependency flows inward):**
- **Entities**: Core business models
- **Use Cases**: Application business rules
- **Interface Adapters**: Controllers, presenters, gateways
- **Frameworks & Drivers**: UI, database, external services
**Key Principles:**
- Dependencies point inward
- Inner layers know nothing about outer layers
- Business logic independent of frameworks
- Testable without UI, database, or external services
### 2. Hexagonal Architecture (Ports and Adapters)
**Components:**
- **Domain Core**: Business logic
- **Ports**: Interfaces defining interactions
- **Adapters**: Implementations of ports (database, REST, message queue)
**Benefits:**
- Swap implementations easily (mock for testing)
- Technology-agnostic core
- Clear separation of concerns
### 3. Domain-Driven Design (DDD)
**Strategic Patterns:**
- **Bounded Contexts**: Separate models for different domains
- **Context Mapping**: How contexts relate
- **Ubiquitous Language**: Shared terminology
**Tactical Patterns:**
- **Entities**: Objects with identity
- **Value Objects**: Immutable objects defined by attributes
- **Aggregates**: Consistency boundaries
- **Repositories**: Data access abstraction
- **Domain Events**: Things that happened
## Clean Architecture Pattern
### Directory Structure
```
app/
├── domain/ # Entities & business rules
│ ├── entities/
│ │ ├── user.py
│ │ └── order.py
│ ├── value_objects/
│ │ ├── email.py
│ │ └── money.py
│ └── interfaces/ # Abstract interfaces
│ ├── user_repository.py
│ └── payment_gateway.py
├── use_cases/ # Application business rules
│ ├── create_user.py
│ ├── process_order.py
│ └── send_notification.py
├── adapters/ # Interface implementations
│ ├── repositories/
│ │ ├── postgres_user_repository.py
│ │ └── redis_cache_repository.py
│ ├── controllers/
│ │ └── user_controller.py
│ └── gateways/
│ ├── stripe_payment_gateway.py
│ └── sendgrid_email_gateway.py
└── infrastructure/ # Framework & external concerns
├── database.py
├── config.py
└── logging.py
```
### Implementation Example
```python
# domain/entities/user.py
from dataclasses import dataclass
from datetime import datetime
from typing import Optional
@dataclass
class User:
"""Core user entity - no framework dependencies."""
id: str
email: str
name: str
created_at: datetime
is_active: bool = True
def deactivate(self):
"""Business rule: deactivating user."""
self.is_active = False
def can_place_order(self) -> bool:
"""Business rule: active users can order."""
return self.is_active
# domain/interfaces/user_repository.py
from abc import ABC, abstractmethod
from typing import Optional, List
from domain.entities.user import User
class IUserRepository(ABC):
"""Port: defines contract, no implementation."""
@abstractmethod
async def find_by_id(self, user_id: str) -> Optional[User]:
pass
@abstractmethod
async def find_by_email(self, email: str) -> Optional[User]:
pass
@abstractmethod
async def save(self, user: User) -> User:
pass
@abstractmethod
async def delete(self, user_id: str) -> bool:
pass
# use_cases/create_user.py
from domain.entities.user import User
from domain.interfaces.user_repository import IUserRepository
from dataclasses import dataclass
from datetime import datetime
import uuid
@dataclass
class CreateUserRequest:
email: str
name: str
@dataclass
class CreateUserResponse:
user: User
success: bool
error: Optional[str] = None
class CreateUserUseCase:
"""Use case: orchestrates business logic."""
def __init__(self, user_repository: IUserRepository):
self.user_repository = user_repository
async def execute(self, request: CreateUserRequest) -> CreateUserResponse:
# Business validation
existing = await self.user_repository.find_by_email(request.email)
if existing:
return CreateUserResponse(
user=None,
success=False,
error="Email already exists"
)
# Create entity
user = User(
id=str(uuid.uuid4()),
email=request.email,
name=request.name,
created_at=datetime.now(),
is_active=True
)
# Persist
saved_user = await self.user_repository.save(user)
return CreateUserResponse(
user=saved_user,
success=True
)
# adapters/repositories/postgres_user_repository.py
from domain.interfaces.user_repository import IUserRepository
from domain.entities.user import User
from typing import Optional
import asyncpg
class PostgresUserRepository(IUserRepository):
"""Adapter: PostgreSQL implementation."""
def __init__(self, pool: asyncpg.Pool):
self.pool = pool
async def find_by_id(self, user_id: str) -> Optional[User]:
async with self.pool.acquire() as conn:
row = await conn.fetchrow(
"SELECT * FROM users WHERE id = $1", user_id
)
return self._to_entity(row) if row else None
async def find_by_email(self, email: str) -> Optional[User]:
async with self.pool.acquire() as conn:
row = await conn.fetchrow(
"SELECT * FROM users WHERE email = $1", email
)
return self._to_entity(row) if row else None
async def save(self, user: User) -> User:
async with self.pool.acquire() as conn:
await conn.execute(
"""
INSERT INTO users (id, email, name, created_at, is_active)
VALUES ($1, $2, $3, $4, $5)
ON CONFLICT (id) DO UPDATE
SET email = $2, name = $3, is_active = $5
""",
user.id, user.email, user.name, user.created_at, user.is_active
)
return user
async def delete(self, user_id: str) -> bool:
async with self.pool.acquire() as conn:
result = await conn.execute(
"DELETE FROM users WHERE id = $1", user_id
)
return result == "DELETE 1"
def _to_entity(self, row) -> User:
"""Map database row to entity."""
return User(
id=row["id"],
email=row["email"],
name=row["name"],
created_at=row["created_at"],
is_active=row["is_active"]
)
# adapters/controllers/user_controller.py
from fastapi import APIRouter, Depends, HTTPException
from use_cases.create_user import CreateUserUseCase, CreateUserRequest
from pydantic import BaseModel
router = APIRouter()
class CreateUserDTO(BaseModel):
email: str
name: str
@router.post("/users")
async def create_user(
dto: CreateUserDTO,
use_case: CreateUserUseCase = Depends(get_create_user_use_case)
):
"""Controller: handles HTTP concerns only."""
request = CreateUserRequest(email=dto.email, name=dto.name)
response = await use_case.execute(request)
if not response.success:
raise HTTPException(status_code=400, detail=response.error)
return {"user": response.user}
```
## Hexagonal Architecture Pattern
```python
# Core domain (hexagon center)
class OrderService:
"""Domain service - no infrastructure dependencies."""
def __init__(
self,
order_repository: OrderRepositoryPort,
payment_gateway: PaymentGatewayPort,
notification_service: NotificationPort
):
self.orders = order_repository
self.payments = payment_gateway
self.notifications = notification_service
async def place_order(self, order: Order) -> OrderResult:
# Business logic
if not order.is_valid():
return OrderResult(success=False, error="Invalid order")
# Use ports (interfaces)
payment = await self.payments.charge(
amount=order.total,
customer=order.customer_id
)
if not payment.success:
return OrderResult(success=False, error="Payment failed")
order.mark_as_paid()
saved_order = await self.orders.save(order)
await self.notifications.send(
to=order.customer_email,
subject="Order confirmed",
body=f"Order {order.id} confirmed"
)
return OrderResult(success=True, order=saved_order)
# Ports (interfaces)
class OrderRepositoryPort(ABC):
@abstractmethod
async def save(self, order: Order) -> Order:
pass
class PaymentGatewayPort(ABC):
@abstractmethod
async def charge(self, amount: Money, customer: str) -> PaymentResult:
pass
class NotificationPort(ABC):
@abstractmethod
async def send(self, to: str, subject: str, body: str):
pass
# Adapters (implementations)
class StripePaymentAdapter(PaymentGatewayPort):
"""Primary adapter: connects to Stripe API."""
def __init__(self, api_key: str):
self.stripe = stripe
self.stripe.api_key = api_key
async def charge(self, amount: Money, customer: str) -> PaymentResult:
try:
charge = self.stripe.Charge.create(
amount=amount.cents,
currency=amount.currency,
customer=customer
)
return PaymentResult(success=True, transaction_id=charge.id)
except stripe.error.CardError as e:
return PaymentResult(success=False, error=str(e))
class MockPaymentAdapter(PaymentGatewayPort):
"""Test adapter: no external dependencies."""
async def charge(self, amount: Money, customer: str) -> PaymentResult:
return PaymentResult(success=True, transaction_id="mock-123")
```
## Domain-Driven Design Pattern
```python
# Value Objects (immutable)
from dataclasses import dataclass
from typing import Optional
@dataclass(frozen=True)
class Email:
"""Value object: validated email."""
value: str
def __post_init__(self):
if "@" not in self.value:
raise ValueError("Invalid email")
@dataclass(frozen=True)
class Money:
"""Value object: amount with currency."""
amount: int # cents
currency: str
def add(self, other: "Money") -> "Money":
if self.currency != other.currency:
raise ValueError("Currency mismatch")
return Money(self.amount + other.amount, self.currency)
# Entities (with identity)
class Order:
"""Entity: has identity, mutable state."""
def __init__(self, id: str, customer: Customer):
self.id = id
self.customer = customer
self.items: List[OrderItem] = []
self.status = OrderStatus.PENDING
self._events: List[DomainEvent] = []
def add_item(self, product: Product, quantity: int):
"""Business logic in entity."""
item = OrderItem(product, quantity)
self.items.append(item)
self._events.append(ItemAddedEvent(self.id, item))
def total(self) -> Money:
"""Calculated property."""
return sum(item.subtotal() for item in self.items)
def submit(self):
"""State transition with business rules."""
if not self.items:
raise ValueError("Cannot submit empty order")
if self.status != OrderStatus.PENDING:
raise ValueError("Order already submitted")
self.status = OrderStatus.SUBMITTED
self._events.append(OrderSubmittedEvent(self.id))
# Aggregates (consistency boundary)
class Customer:
"""Aggregate root: controls access to entities."""
def __init__(self, id: str, email: Email):
self.id = id
self.email = email
self._addresses: List[Address] = []
self._orders: List[str] = [] # Order IDs, not full objects
def add_address(self, address: Address):
"""Aggregate enforces invariants."""
if len(self._addresses) >= 5:
raise ValueError("Maximum 5 addresses allowed")
self._addresses.append(address)
@property
def primary_address(self) -> Optional[Address]:
return next((a for a in self._addresses if a.is_primary), None)
# Domain Events
@dataclass
class OrderSubmittedEvent:
order_id: str
occurred_at: datetime = field(default_factory=datetime.now)
# Repository (aggregate persistence)
class OrderRepository:
"""Repository: persist/retrieve aggregates."""
async def find_by_id(self, order_id: str) -> Optional[Order]:
"""Reconstitute aggregate from storage."""
pass
async def save(self, order: Order):
"""Persist aggregate and publish events."""
await self._persist(order)
await self._publish_events(order._events)
order._events.clear()
```
## Resources
- **references/clean-architecture-guide.md**: Detailed layer breakdown
- **references/hexagonal-architecture-guide.md**: Ports and adapters patterns
- **references/ddd-tactical-patterns.md**: Entities, value objects, aggregates
- **assets/clean-architecture-template/**: Complete project structure
- **assets/ddd-examples/**: Domain modeling examples
## Best Practices
1. **Dependency Rule**: Dependencies always point inward
2. **Interface Segregation**: Small, focused interfaces
3. **Business Logic in Domain**: Keep frameworks out of core
4. **Test Independence**: Core testable without infrastructure
5. **Bounded Contexts**: Clear domain boundaries
6. **Ubiquitous Language**: Consistent terminology
7. **Thin Controllers**: Delegate to use cases
8. **Rich Domain Models**: Behavior with data
## Common Pitfalls
- **Anemic Domain**: Entities with only data, no behavior
- **Framework Coupling**: Business logic depends on frameworks
- **Fat Controllers**: Business logic in controllers
- **Repository Leakage**: Exposing ORM objects
- **Missing Abstractions**: Concrete dependencies in core
- **Over-Engineering**: Clean architecture for simple CRUD

View File

@@ -0,0 +1,585 @@
---
name: microservices-patterns
description: Design microservices architectures with service boundaries, event-driven communication, and resilience patterns. Use when building distributed systems, decomposing monoliths, or implementing microservices.
---
# Microservices Patterns
Master microservices architecture patterns including service boundaries, inter-service communication, data management, and resilience patterns for building distributed systems.
## When to Use This Skill
- Decomposing monoliths into microservices
- Designing service boundaries and contracts
- Implementing inter-service communication
- Managing distributed data and transactions
- Building resilient distributed systems
- Implementing service discovery and load balancing
- Designing event-driven architectures
## Core Concepts
### 1. Service Decomposition Strategies
**By Business Capability**
- Organize services around business functions
- Each service owns its domain
- Example: OrderService, PaymentService, InventoryService
**By Subdomain (DDD)**
- Core domain, supporting subdomains
- Bounded contexts map to services
- Clear ownership and responsibility
**Strangler Fig Pattern**
- Gradually extract from monolith
- New functionality as microservices
- Proxy routes to old/new systems
### 2. Communication Patterns
**Synchronous (Request/Response)**
- REST APIs
- gRPC
- GraphQL
**Asynchronous (Events/Messages)**
- Event streaming (Kafka)
- Message queues (RabbitMQ, SQS)
- Pub/Sub patterns
### 3. Data Management
**Database Per Service**
- Each service owns its data
- No shared databases
- Loose coupling
**Saga Pattern**
- Distributed transactions
- Compensating actions
- Eventual consistency
### 4. Resilience Patterns
**Circuit Breaker**
- Fail fast on repeated errors
- Prevent cascade failures
**Retry with Backoff**
- Transient fault handling
- Exponential backoff
**Bulkhead**
- Isolate resources
- Limit impact of failures
## Service Decomposition Patterns
### Pattern 1: By Business Capability
```python
# E-commerce example
# Order Service
class OrderService:
"""Handles order lifecycle."""
async def create_order(self, order_data: dict) -> Order:
order = Order.create(order_data)
# Publish event for other services
await self.event_bus.publish(
OrderCreatedEvent(
order_id=order.id,
customer_id=order.customer_id,
items=order.items,
total=order.total
)
)
return order
# Payment Service (separate service)
class PaymentService:
"""Handles payment processing."""
async def process_payment(self, payment_request: PaymentRequest) -> PaymentResult:
# Process payment
result = await self.payment_gateway.charge(
amount=payment_request.amount,
customer=payment_request.customer_id
)
if result.success:
await self.event_bus.publish(
PaymentCompletedEvent(
order_id=payment_request.order_id,
transaction_id=result.transaction_id
)
)
return result
# Inventory Service (separate service)
class InventoryService:
"""Handles inventory management."""
async def reserve_items(self, order_id: str, items: List[OrderItem]) -> ReservationResult:
# Check availability
for item in items:
available = await self.inventory_repo.get_available(item.product_id)
if available < item.quantity:
return ReservationResult(
success=False,
error=f"Insufficient inventory for {item.product_id}"
)
# Reserve items
reservation = await self.create_reservation(order_id, items)
await self.event_bus.publish(
InventoryReservedEvent(
order_id=order_id,
reservation_id=reservation.id
)
)
return ReservationResult(success=True, reservation=reservation)
```
### Pattern 2: API Gateway
```python
from fastapi import FastAPI, HTTPException, Depends
import httpx
from circuitbreaker import circuit
app = FastAPI()
class APIGateway:
"""Central entry point for all client requests."""
def __init__(self):
self.order_service_url = "http://order-service:8000"
self.payment_service_url = "http://payment-service:8001"
self.inventory_service_url = "http://inventory-service:8002"
self.http_client = httpx.AsyncClient(timeout=5.0)
@circuit(failure_threshold=5, recovery_timeout=30)
async def call_order_service(self, path: str, method: str = "GET", **kwargs):
"""Call order service with circuit breaker."""
response = await self.http_client.request(
method,
f"{self.order_service_url}{path}",
**kwargs
)
response.raise_for_status()
return response.json()
async def create_order_aggregate(self, order_id: str) -> dict:
"""Aggregate data from multiple services."""
# Parallel requests
order, payment, inventory = await asyncio.gather(
self.call_order_service(f"/orders/{order_id}"),
self.call_payment_service(f"/payments/order/{order_id}"),
self.call_inventory_service(f"/reservations/order/{order_id}"),
return_exceptions=True
)
# Handle partial failures
result = {"order": order}
if not isinstance(payment, Exception):
result["payment"] = payment
if not isinstance(inventory, Exception):
result["inventory"] = inventory
return result
@app.post("/api/orders")
async def create_order(
order_data: dict,
gateway: APIGateway = Depends()
):
"""API Gateway endpoint."""
try:
# Route to order service
order = await gateway.call_order_service(
"/orders",
method="POST",
json=order_data
)
return {"order": order}
except httpx.HTTPError as e:
raise HTTPException(status_code=503, detail="Order service unavailable")
```
## Communication Patterns
### Pattern 1: Synchronous REST Communication
```python
# Service A calls Service B
import httpx
from tenacity import retry, stop_after_attempt, wait_exponential
class ServiceClient:
"""HTTP client with retries and timeout."""
def __init__(self, base_url: str):
self.base_url = base_url
self.client = httpx.AsyncClient(
timeout=httpx.Timeout(5.0, connect=2.0),
limits=httpx.Limits(max_keepalive_connections=20)
)
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=2, max=10)
)
async def get(self, path: str, **kwargs):
"""GET with automatic retries."""
response = await self.client.get(f"{self.base_url}{path}", **kwargs)
response.raise_for_status()
return response.json()
async def post(self, path: str, **kwargs):
"""POST request."""
response = await self.client.post(f"{self.base_url}{path}", **kwargs)
response.raise_for_status()
return response.json()
# Usage
payment_client = ServiceClient("http://payment-service:8001")
result = await payment_client.post("/payments", json=payment_data)
```
### Pattern 2: Asynchronous Event-Driven
```python
# Event-driven communication with Kafka
from aiokafka import AIOKafkaProducer, AIOKafkaConsumer
import json
from dataclasses import dataclass, asdict
from datetime import datetime
@dataclass
class DomainEvent:
event_id: str
event_type: str
aggregate_id: str
occurred_at: datetime
data: dict
class EventBus:
"""Event publishing and subscription."""
def __init__(self, bootstrap_servers: List[str]):
self.bootstrap_servers = bootstrap_servers
self.producer = None
async def start(self):
self.producer = AIOKafkaProducer(
bootstrap_servers=self.bootstrap_servers,
value_serializer=lambda v: json.dumps(v).encode()
)
await self.producer.start()
async def publish(self, event: DomainEvent):
"""Publish event to Kafka topic."""
topic = event.event_type
await self.producer.send_and_wait(
topic,
value=asdict(event),
key=event.aggregate_id.encode()
)
async def subscribe(self, topic: str, handler: callable):
"""Subscribe to events."""
consumer = AIOKafkaConsumer(
topic,
bootstrap_servers=self.bootstrap_servers,
value_deserializer=lambda v: json.loads(v.decode()),
group_id="my-service"
)
await consumer.start()
try:
async for message in consumer:
event_data = message.value
await handler(event_data)
finally:
await consumer.stop()
# Order Service publishes event
async def create_order(order_data: dict):
order = await save_order(order_data)
event = DomainEvent(
event_id=str(uuid.uuid4()),
event_type="OrderCreated",
aggregate_id=order.id,
occurred_at=datetime.now(),
data={
"order_id": order.id,
"customer_id": order.customer_id,
"total": order.total
}
)
await event_bus.publish(event)
# Inventory Service listens for OrderCreated
async def handle_order_created(event_data: dict):
"""React to order creation."""
order_id = event_data["data"]["order_id"]
items = event_data["data"]["items"]
# Reserve inventory
await reserve_inventory(order_id, items)
```
### Pattern 3: Saga Pattern (Distributed Transactions)
```python
# Saga orchestration for order fulfillment
from enum import Enum
from typing import List, Callable
class SagaStep:
"""Single step in saga."""
def __init__(
self,
name: str,
action: Callable,
compensation: Callable
):
self.name = name
self.action = action
self.compensation = compensation
class SagaStatus(Enum):
PENDING = "pending"
COMPLETED = "completed"
COMPENSATING = "compensating"
FAILED = "failed"
class OrderFulfillmentSaga:
"""Orchestrated saga for order fulfillment."""
def __init__(self):
self.steps: List[SagaStep] = [
SagaStep(
"create_order",
action=self.create_order,
compensation=self.cancel_order
),
SagaStep(
"reserve_inventory",
action=self.reserve_inventory,
compensation=self.release_inventory
),
SagaStep(
"process_payment",
action=self.process_payment,
compensation=self.refund_payment
),
SagaStep(
"confirm_order",
action=self.confirm_order,
compensation=self.cancel_order_confirmation
)
]
async def execute(self, order_data: dict) -> SagaResult:
"""Execute saga steps."""
completed_steps = []
context = {"order_data": order_data}
try:
for step in self.steps:
# Execute step
result = await step.action(context)
if not result.success:
# Compensate
await self.compensate(completed_steps, context)
return SagaResult(
status=SagaStatus.FAILED,
error=result.error
)
completed_steps.append(step)
context.update(result.data)
return SagaResult(status=SagaStatus.COMPLETED, data=context)
except Exception as e:
# Compensate on error
await self.compensate(completed_steps, context)
return SagaResult(status=SagaStatus.FAILED, error=str(e))
async def compensate(self, completed_steps: List[SagaStep], context: dict):
"""Execute compensating actions in reverse order."""
for step in reversed(completed_steps):
try:
await step.compensation(context)
except Exception as e:
# Log compensation failure
print(f"Compensation failed for {step.name}: {e}")
# Step implementations
async def create_order(self, context: dict) -> StepResult:
order = await order_service.create(context["order_data"])
return StepResult(success=True, data={"order_id": order.id})
async def cancel_order(self, context: dict):
await order_service.cancel(context["order_id"])
async def reserve_inventory(self, context: dict) -> StepResult:
result = await inventory_service.reserve(
context["order_id"],
context["order_data"]["items"]
)
return StepResult(
success=result.success,
data={"reservation_id": result.reservation_id}
)
async def release_inventory(self, context: dict):
await inventory_service.release(context["reservation_id"])
async def process_payment(self, context: dict) -> StepResult:
result = await payment_service.charge(
context["order_id"],
context["order_data"]["total"]
)
return StepResult(
success=result.success,
data={"transaction_id": result.transaction_id},
error=result.error
)
async def refund_payment(self, context: dict):
await payment_service.refund(context["transaction_id"])
```
## Resilience Patterns
### Circuit Breaker Pattern
```python
from enum import Enum
from datetime import datetime, timedelta
from typing import Callable, Any
class CircuitState(Enum):
CLOSED = "closed" # Normal operation
OPEN = "open" # Failing, reject requests
HALF_OPEN = "half_open" # Testing if recovered
class CircuitBreaker:
"""Circuit breaker for service calls."""
def __init__(
self,
failure_threshold: int = 5,
recovery_timeout: int = 30,
success_threshold: int = 2
):
self.failure_threshold = failure_threshold
self.recovery_timeout = recovery_timeout
self.success_threshold = success_threshold
self.failure_count = 0
self.success_count = 0
self.state = CircuitState.CLOSED
self.opened_at = None
async def call(self, func: Callable, *args, **kwargs) -> Any:
"""Execute function with circuit breaker."""
if self.state == CircuitState.OPEN:
if self._should_attempt_reset():
self.state = CircuitState.HALF_OPEN
else:
raise CircuitBreakerOpenError("Circuit breaker is open")
try:
result = await func(*args, **kwargs)
self._on_success()
return result
except Exception as e:
self._on_failure()
raise
def _on_success(self):
"""Handle successful call."""
self.failure_count = 0
if self.state == CircuitState.HALF_OPEN:
self.success_count += 1
if self.success_count >= self.success_threshold:
self.state = CircuitState.CLOSED
self.success_count = 0
def _on_failure(self):
"""Handle failed call."""
self.failure_count += 1
if self.failure_count >= self.failure_threshold:
self.state = CircuitState.OPEN
self.opened_at = datetime.now()
if self.state == CircuitState.HALF_OPEN:
self.state = CircuitState.OPEN
self.opened_at = datetime.now()
def _should_attempt_reset(self) -> bool:
"""Check if enough time passed to try again."""
return (
datetime.now() - self.opened_at
> timedelta(seconds=self.recovery_timeout)
)
# Usage
breaker = CircuitBreaker(failure_threshold=5, recovery_timeout=30)
async def call_payment_service(payment_data: dict):
return await breaker.call(
payment_client.process_payment,
payment_data
)
```
## Resources
- **references/service-decomposition-guide.md**: Breaking down monoliths
- **references/communication-patterns.md**: Sync vs async patterns
- **references/saga-implementation.md**: Distributed transactions
- **assets/circuit-breaker.py**: Production circuit breaker
- **assets/event-bus-template.py**: Kafka event bus implementation
- **assets/api-gateway-template.py**: Complete API gateway
## Best Practices
1. **Service Boundaries**: Align with business capabilities
2. **Database Per Service**: No shared databases
3. **API Contracts**: Versioned, backward compatible
4. **Async When Possible**: Events over direct calls
5. **Circuit Breakers**: Fail fast on service failures
6. **Distributed Tracing**: Track requests across services
7. **Service Registry**: Dynamic service discovery
8. **Health Checks**: Liveness and readiness probes
## Common Pitfalls
- **Distributed Monolith**: Tightly coupled services
- **Chatty Services**: Too many inter-service calls
- **Shared Databases**: Tight coupling through data
- **No Circuit Breakers**: Cascade failures
- **Synchronous Everything**: Tight coupling, poor resilience
- **Premature Microservices**: Starting with microservices
- **Ignoring Network Failures**: Assuming reliable network
- **No Compensation Logic**: Can't undo failed transactions

View File

@@ -0,0 +1,146 @@
---
name: temporal-python-testing
description: Test Temporal workflows with pytest, time-skipping, and mocking strategies. Covers unit testing, integration testing, replay testing, and local development setup. Use when implementing Temporal workflow tests or debugging test failures.
---
# Temporal Python Testing Strategies
Comprehensive testing approaches for Temporal workflows using pytest, progressive disclosure resources for specific testing scenarios.
## When to Use This Skill
- **Unit testing workflows** - Fast tests with time-skipping
- **Integration testing** - Workflows with mocked activities
- **Replay testing** - Validate determinism against production histories
- **Local development** - Set up Temporal server and pytest
- **CI/CD integration** - Automated testing pipelines
- **Coverage strategies** - Achieve ≥80% test coverage
## Testing Philosophy
**Recommended Approach** (Source: docs.temporal.io/develop/python/testing-suite):
- Write majority as integration tests
- Use pytest with async fixtures
- Time-skipping enables fast feedback (month-long workflows → seconds)
- Mock activities to isolate workflow logic
- Validate determinism with replay testing
**Three Test Types**:
1. **Unit**: Workflows with time-skipping, activities with ActivityEnvironment
2. **Integration**: Workers with mocked activities
3. **End-to-end**: Full Temporal server with real activities (use sparingly)
## Available Resources
This skill provides detailed guidance through progressive disclosure. Load specific resources based on your testing needs:
### Unit Testing Resources
**File**: `resources/unit-testing.md`
**When to load**: Testing individual workflows or activities in isolation
**Contains**:
- WorkflowEnvironment with time-skipping
- ActivityEnvironment for activity testing
- Fast execution of long-running workflows
- Manual time advancement patterns
- pytest fixtures and patterns
### Integration Testing Resources
**File**: `resources/integration-testing.md`
**When to load**: Testing workflows with mocked external dependencies
**Contains**:
- Activity mocking strategies
- Error injection patterns
- Multi-activity workflow testing
- Signal and query testing
- Coverage strategies
### Replay Testing Resources
**File**: `resources/replay-testing.md`
**When to load**: Validating determinism or deploying workflow changes
**Contains**:
- Determinism validation
- Production history replay
- CI/CD integration patterns
- Version compatibility testing
### Local Development Resources
**File**: `resources/local-setup.md`
**When to load**: Setting up development environment
**Contains**:
- Docker Compose configuration
- pytest setup and configuration
- Coverage tool integration
- Development workflow
## Quick Start Guide
### Basic Workflow Test
```python
import pytest
from temporalio.testing import WorkflowEnvironment
from temporalio.worker import Worker
@pytest.fixture
async def workflow_env():
env = await WorkflowEnvironment.start_time_skipping()
yield env
await env.shutdown()
@pytest.mark.asyncio
async def test_workflow(workflow_env):
async with Worker(
workflow_env.client,
task_queue="test-queue",
workflows=[YourWorkflow],
activities=[your_activity],
):
result = await workflow_env.client.execute_workflow(
YourWorkflow.run,
args,
id="test-wf-id",
task_queue="test-queue",
)
assert result == expected
```
### Basic Activity Test
```python
from temporalio.testing import ActivityEnvironment
async def test_activity():
env = ActivityEnvironment()
result = await env.run(your_activity, "test-input")
assert result == expected_output
```
## Coverage Targets
**Recommended Coverage** (Source: docs.temporal.io best practices):
- **Workflows**: ≥80% logic coverage
- **Activities**: ≥80% logic coverage
- **Integration**: Critical paths with mocked activities
- **Replay**: All workflow versions before deployment
## Key Testing Principles
1. **Time-Skipping** - Month-long workflows test in seconds
2. **Mock Activities** - Isolate workflow logic from external dependencies
3. **Replay Testing** - Validate determinism before deployment
4. **High Coverage** - ≥80% target for production workflows
5. **Fast Feedback** - Unit tests run in milliseconds
## How to Use Resources
**Load specific resource when needed**:
- "Show me unit testing patterns" → Load `resources/unit-testing.md`
- "How do I mock activities?" → Load `resources/integration-testing.md`
- "Setup local Temporal server" → Load `resources/local-setup.md`
- "Validate determinism" → Load `resources/replay-testing.md`
## Additional References
- Python SDK Testing: docs.temporal.io/develop/python/testing-suite
- Testing Patterns: github.com/temporalio/temporal/blob/main/docs/development/testing.md
- Python Samples: github.com/temporalio/samples-python

View File

@@ -0,0 +1,452 @@
# Integration Testing with Mocked Activities
Comprehensive patterns for testing workflows with mocked external dependencies, error injection, and complex scenarios.
## Activity Mocking Strategy
**Purpose**: Test workflow orchestration logic without calling real external services
### Basic Mock Pattern
```python
import pytest
from temporalio.testing import WorkflowEnvironment
from temporalio.worker import Worker
from unittest.mock import Mock
@pytest.mark.asyncio
async def test_workflow_with_mocked_activity(workflow_env):
"""Mock activity to test workflow logic"""
# Create mock activity
mock_activity = Mock(return_value="mocked-result")
@workflow.defn
class WorkflowWithActivity:
@workflow.run
async def run(self, input: str) -> str:
result = await workflow.execute_activity(
process_external_data,
input,
start_to_close_timeout=timedelta(seconds=10),
)
return f"processed: {result}"
async with Worker(
workflow_env.client,
task_queue="test",
workflows=[WorkflowWithActivity],
activities=[mock_activity], # Use mock instead of real activity
):
result = await workflow_env.client.execute_workflow(
WorkflowWithActivity.run,
"test-input",
id="wf-mock",
task_queue="test",
)
assert result == "processed: mocked-result"
mock_activity.assert_called_once()
```
### Dynamic Mock Responses
**Scenario-Based Mocking**:
```python
@pytest.mark.asyncio
async def test_workflow_multiple_mock_scenarios(workflow_env):
"""Test different workflow paths with dynamic mocks"""
# Mock returns different values based on input
def dynamic_activity(input: str) -> str:
if input == "error-case":
raise ApplicationError("Validation failed", non_retryable=True)
return f"processed-{input}"
@workflow.defn
class DynamicWorkflow:
@workflow.run
async def run(self, input: str) -> str:
try:
result = await workflow.execute_activity(
dynamic_activity,
input,
start_to_close_timeout=timedelta(seconds=10),
)
return f"success: {result}"
except ApplicationError as e:
return f"error: {e.message}"
async with Worker(
workflow_env.client,
task_queue="test",
workflows=[DynamicWorkflow],
activities=[dynamic_activity],
):
# Test success path
result_success = await workflow_env.client.execute_workflow(
DynamicWorkflow.run,
"valid-input",
id="wf-success",
task_queue="test",
)
assert result_success == "success: processed-valid-input"
# Test error path
result_error = await workflow_env.client.execute_workflow(
DynamicWorkflow.run,
"error-case",
id="wf-error",
task_queue="test",
)
assert "Validation failed" in result_error
```
## Error Injection Patterns
### Testing Transient Failures
**Retry Behavior**:
```python
@pytest.mark.asyncio
async def test_workflow_transient_errors(workflow_env):
"""Test retry logic with controlled failures"""
attempt_count = 0
@activity.defn
async def transient_activity() -> str:
nonlocal attempt_count
attempt_count += 1
if attempt_count < 3:
raise Exception(f"Transient error {attempt_count}")
return "success-after-retries"
@workflow.defn
class RetryWorkflow:
@workflow.run
async def run(self) -> str:
return await workflow.execute_activity(
transient_activity,
start_to_close_timeout=timedelta(seconds=10),
retry_policy=RetryPolicy(
initial_interval=timedelta(milliseconds=10),
maximum_attempts=5,
backoff_coefficient=1.0,
),
)
async with Worker(
workflow_env.client,
task_queue="test",
workflows=[RetryWorkflow],
activities=[transient_activity],
):
result = await workflow_env.client.execute_workflow(
RetryWorkflow.run,
id="retry-wf",
task_queue="test",
)
assert result == "success-after-retries"
assert attempt_count == 3
```
### Testing Non-Retryable Errors
**Business Validation Failures**:
```python
@pytest.mark.asyncio
async def test_workflow_non_retryable_error(workflow_env):
"""Test handling of permanent failures"""
@activity.defn
async def validation_activity(input: dict) -> str:
if not input.get("valid"):
raise ApplicationError(
"Invalid input",
non_retryable=True, # Don't retry validation errors
)
return "validated"
@workflow.defn
class ValidationWorkflow:
@workflow.run
async def run(self, input: dict) -> str:
try:
return await workflow.execute_activity(
validation_activity,
input,
start_to_close_timeout=timedelta(seconds=10),
)
except ApplicationError as e:
return f"validation-failed: {e.message}"
async with Worker(
workflow_env.client,
task_queue="test",
workflows=[ValidationWorkflow],
activities=[validation_activity],
):
result = await workflow_env.client.execute_workflow(
ValidationWorkflow.run,
{"valid": False},
id="validation-wf",
task_queue="test",
)
assert "validation-failed" in result
```
## Multi-Activity Workflow Testing
### Sequential Activity Pattern
```python
@pytest.mark.asyncio
async def test_workflow_sequential_activities(workflow_env):
"""Test workflow orchestrating multiple activities"""
activity_calls = []
@activity.defn
async def step_1(input: str) -> str:
activity_calls.append("step_1")
return f"{input}-step1"
@activity.defn
async def step_2(input: str) -> str:
activity_calls.append("step_2")
return f"{input}-step2"
@activity.defn
async def step_3(input: str) -> str:
activity_calls.append("step_3")
return f"{input}-step3"
@workflow.defn
class SequentialWorkflow:
@workflow.run
async def run(self, input: str) -> str:
result_1 = await workflow.execute_activity(
step_1,
input,
start_to_close_timeout=timedelta(seconds=10),
)
result_2 = await workflow.execute_activity(
step_2,
result_1,
start_to_close_timeout=timedelta(seconds=10),
)
result_3 = await workflow.execute_activity(
step_3,
result_2,
start_to_close_timeout=timedelta(seconds=10),
)
return result_3
async with Worker(
workflow_env.client,
task_queue="test",
workflows=[SequentialWorkflow],
activities=[step_1, step_2, step_3],
):
result = await workflow_env.client.execute_workflow(
SequentialWorkflow.run,
"start",
id="seq-wf",
task_queue="test",
)
assert result == "start-step1-step2-step3"
assert activity_calls == ["step_1", "step_2", "step_3"]
```
### Parallel Activity Pattern
```python
@pytest.mark.asyncio
async def test_workflow_parallel_activities(workflow_env):
"""Test concurrent activity execution"""
@activity.defn
async def parallel_task(task_id: int) -> str:
return f"task-{task_id}"
@workflow.defn
class ParallelWorkflow:
@workflow.run
async def run(self, task_count: int) -> list[str]:
# Execute activities in parallel
tasks = [
workflow.execute_activity(
parallel_task,
i,
start_to_close_timeout=timedelta(seconds=10),
)
for i in range(task_count)
]
return await asyncio.gather(*tasks)
async with Worker(
workflow_env.client,
task_queue="test",
workflows=[ParallelWorkflow],
activities=[parallel_task],
):
result = await workflow_env.client.execute_workflow(
ParallelWorkflow.run,
3,
id="parallel-wf",
task_queue="test",
)
assert result == ["task-0", "task-1", "task-2"]
```
## Signal and Query Testing
### Signal Handlers
```python
@pytest.mark.asyncio
async def test_workflow_signals(workflow_env):
"""Test workflow signal handling"""
@workflow.defn
class SignalWorkflow:
def __init__(self) -> None:
self._status = "initialized"
@workflow.run
async def run(self) -> str:
# Wait for completion signal
await workflow.wait_condition(lambda: self._status == "completed")
return self._status
@workflow.signal
async def update_status(self, new_status: str) -> None:
self._status = new_status
@workflow.query
def get_status(self) -> str:
return self._status
async with Worker(
workflow_env.client,
task_queue="test",
workflows=[SignalWorkflow],
):
# Start workflow
handle = await workflow_env.client.start_workflow(
SignalWorkflow.run,
id="signal-wf",
task_queue="test",
)
# Verify initial state via query
initial_status = await handle.query(SignalWorkflow.get_status)
assert initial_status == "initialized"
# Send signal
await handle.signal(SignalWorkflow.update_status, "processing")
# Verify updated state
updated_status = await handle.query(SignalWorkflow.get_status)
assert updated_status == "processing"
# Complete workflow
await handle.signal(SignalWorkflow.update_status, "completed")
result = await handle.result()
assert result == "completed"
```
## Coverage Strategies
### Workflow Logic Coverage
**Target**: ≥80% coverage of workflow decision logic
```python
# Test all branches
@pytest.mark.parametrize("condition,expected", [
(True, "branch-a"),
(False, "branch-b"),
])
async def test_workflow_branches(workflow_env, condition, expected):
"""Ensure all code paths are tested"""
# Test implementation
pass
```
### Activity Coverage
**Target**: ≥80% coverage of activity logic
```python
# Test activity edge cases
@pytest.mark.parametrize("input,expected", [
("valid", "success"),
("", "empty-input-error"),
(None, "null-input-error"),
])
async def test_activity_edge_cases(activity_env, input, expected):
"""Test activity error handling"""
# Test implementation
pass
```
## Integration Test Organization
### Test Structure
```
tests/
├── integration/
│ ├── conftest.py # Shared fixtures
│ ├── test_order_workflow.py # Order processing tests
│ ├── test_payment_workflow.py # Payment tests
│ └── test_fulfillment_workflow.py
├── unit/
│ ├── test_order_activities.py
│ └── test_payment_activities.py
└── fixtures/
└── test_data.py # Test data builders
```
### Shared Fixtures
```python
# conftest.py
import pytest
from temporalio.testing import WorkflowEnvironment
@pytest.fixture(scope="session")
async def workflow_env():
"""Session-scoped environment for integration tests"""
env = await WorkflowEnvironment.start_time_skipping()
yield env
await env.shutdown()
@pytest.fixture
def mock_payment_service():
"""Mock external payment service"""
return Mock()
@pytest.fixture
def mock_inventory_service():
"""Mock external inventory service"""
return Mock()
```
## Best Practices
1. **Mock External Dependencies**: Never call real APIs in tests
2. **Test Error Scenarios**: Verify compensation and retry logic
3. **Parallel Testing**: Use pytest-xdist for faster test runs
4. **Isolated Tests**: Each test should be independent
5. **Clear Assertions**: Verify both results and side effects
6. **Coverage Target**: ≥80% for critical workflows
7. **Fast Execution**: Use time-skipping, avoid real delays
## Additional Resources
- Mocking Strategies: docs.temporal.io/develop/python/testing-suite
- pytest Best Practices: docs.pytest.org/en/stable/goodpractices.html
- Python SDK Samples: github.com/temporalio/samples-python

View File

@@ -0,0 +1,550 @@
# Local Development Setup for Temporal Python Testing
Comprehensive guide for setting up local Temporal development environment with pytest integration and coverage tracking.
## Temporal Server Setup with Docker Compose
### Basic Docker Compose Configuration
```yaml
# docker-compose.yml
version: "3.8"
services:
temporal:
image: temporalio/auto-setup:latest
container_name: temporal-dev
ports:
- "7233:7233" # Temporal server
- "8233:8233" # Web UI
environment:
- DB=postgresql
- POSTGRES_USER=temporal
- POSTGRES_PWD=temporal
- POSTGRES_SEEDS=postgresql
- DYNAMIC_CONFIG_FILE_PATH=config/dynamicconfig/development-sql.yaml
depends_on:
- postgresql
postgresql:
image: postgres:14-alpine
container_name: temporal-postgres
environment:
- POSTGRES_USER=temporal
- POSTGRES_PASSWORD=temporal
- POSTGRES_DB=temporal
ports:
- "5432:5432"
volumes:
- postgres_data:/var/lib/postgresql/data
temporal-ui:
image: temporalio/ui:latest
container_name: temporal-ui
depends_on:
- temporal
environment:
- TEMPORAL_ADDRESS=temporal:7233
- TEMPORAL_CORS_ORIGINS=http://localhost:3000
ports:
- "8080:8080"
volumes:
postgres_data:
```
### Starting Local Server
```bash
# Start Temporal server
docker-compose up -d
# Verify server is running
docker-compose ps
# View logs
docker-compose logs -f temporal
# Access Temporal Web UI
open http://localhost:8080
# Stop server
docker-compose down
# Reset data (clean slate)
docker-compose down -v
```
### Health Check Script
```python
# scripts/health_check.py
import asyncio
from temporalio.client import Client
async def check_temporal_health():
"""Verify Temporal server is accessible"""
try:
client = await Client.connect("localhost:7233")
print("✓ Connected to Temporal server")
# Test workflow execution
from temporalio.worker import Worker
@workflow.defn
class HealthCheckWorkflow:
@workflow.run
async def run(self) -> str:
return "healthy"
async with Worker(
client,
task_queue="health-check",
workflows=[HealthCheckWorkflow],
):
result = await client.execute_workflow(
HealthCheckWorkflow.run,
id="health-check",
task_queue="health-check",
)
print(f"✓ Workflow execution successful: {result}")
return True
except Exception as e:
print(f"✗ Health check failed: {e}")
return False
if __name__ == "__main__":
asyncio.run(check_temporal_health())
```
## pytest Configuration
### Project Structure
```
temporal-project/
├── docker-compose.yml
├── pyproject.toml
├── pytest.ini
├── requirements.txt
├── src/
│ ├── workflows/
│ │ ├── __init__.py
│ │ ├── order_workflow.py
│ │ └── payment_workflow.py
│ └── activities/
│ ├── __init__.py
│ ├── payment_activities.py
│ └── inventory_activities.py
├── tests/
│ ├── conftest.py
│ ├── unit/
│ │ ├── test_workflows.py
│ │ └── test_activities.py
│ ├── integration/
│ │ └── test_order_flow.py
│ └── replay/
│ └── test_workflow_replay.py
└── scripts/
├── health_check.py
└── export_histories.py
```
### pytest Configuration
```ini
# pytest.ini
[pytest]
asyncio_mode = auto
testpaths = tests
python_files = test_*.py
python_classes = Test*
python_functions = test_*
# Markers for test categorization
markers =
unit: Unit tests (fast, isolated)
integration: Integration tests (require Temporal server)
replay: Replay tests (require production histories)
slow: Slow running tests
# Coverage settings
addopts =
--verbose
--strict-markers
--cov=src
--cov-report=term-missing
--cov-report=html
--cov-fail-under=80
# Async test timeout
asyncio_default_fixture_loop_scope = function
```
### Shared Test Fixtures
```python
# tests/conftest.py
import pytest
from temporalio.testing import WorkflowEnvironment
from temporalio.client import Client
@pytest.fixture(scope="session")
def event_loop():
"""Provide event loop for async fixtures"""
import asyncio
loop = asyncio.get_event_loop_policy().new_event_loop()
yield loop
loop.close()
@pytest.fixture(scope="session")
async def temporal_client():
"""Provide Temporal client connected to local server"""
client = await Client.connect("localhost:7233")
yield client
await client.close()
@pytest.fixture(scope="module")
async def workflow_env():
"""Module-scoped time-skipping environment"""
env = await WorkflowEnvironment.start_time_skipping()
yield env
await env.shutdown()
@pytest.fixture
def activity_env():
"""Function-scoped activity environment"""
from temporalio.testing import ActivityEnvironment
return ActivityEnvironment()
@pytest.fixture
async def test_worker(temporal_client, workflow_env):
"""Pre-configured test worker"""
from temporalio.worker import Worker
from src.workflows import OrderWorkflow, PaymentWorkflow
from src.activities import process_payment, update_inventory
return Worker(
workflow_env.client,
task_queue="test-queue",
workflows=[OrderWorkflow, PaymentWorkflow],
activities=[process_payment, update_inventory],
)
```
### Dependencies
```txt
# requirements.txt
temporalio>=1.5.0
pytest>=7.4.0
pytest-asyncio>=0.21.0
pytest-cov>=4.1.0
pytest-xdist>=3.3.0 # Parallel test execution
```
```toml
# pyproject.toml
[build-system]
requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_backend"
[project]
name = "temporal-project"
version = "0.1.0"
requires-python = ">=3.10"
dependencies = [
"temporalio>=1.5.0",
]
[project.optional-dependencies]
dev = [
"pytest>=7.4.0",
"pytest-asyncio>=0.21.0",
"pytest-cov>=4.1.0",
"pytest-xdist>=3.3.0",
]
[tool.pytest.ini_options]
asyncio_mode = "auto"
testpaths = ["tests"]
```
## Coverage Configuration
### Coverage Settings
```ini
# .coveragerc
[run]
source = src
omit =
*/tests/*
*/venv/*
*/__pycache__/*
[report]
exclude_lines =
# Exclude type checking blocks
if TYPE_CHECKING:
# Exclude debug code
def __repr__
# Exclude abstract methods
@abstractmethod
# Exclude pass statements
pass
[html]
directory = htmlcov
```
### Running Tests with Coverage
```bash
# Run all tests with coverage
pytest --cov=src --cov-report=term-missing
# Generate HTML coverage report
pytest --cov=src --cov-report=html
open htmlcov/index.html
# Run specific test categories
pytest -m unit # Unit tests only
pytest -m integration # Integration tests only
pytest -m "not slow" # Skip slow tests
# Parallel execution (faster)
pytest -n auto # Use all CPU cores
# Fail if coverage below threshold
pytest --cov=src --cov-fail-under=80
```
### Coverage Report Example
```
---------- coverage: platform darwin, python 3.11.5 -----------
Name Stmts Miss Cover Missing
-----------------------------------------------------------------
src/__init__.py 0 0 100%
src/activities/__init__.py 2 0 100%
src/activities/inventory.py 45 3 93% 78-80
src/activities/payment.py 38 0 100%
src/workflows/__init__.py 2 0 100%
src/workflows/order_workflow.py 67 5 93% 45-49
src/workflows/payment_workflow.py 52 0 100%
-----------------------------------------------------------------
TOTAL 206 8 96%
10 files skipped due to complete coverage.
```
## Development Workflow
### Daily Development Flow
```bash
# 1. Start Temporal server
docker-compose up -d
# 2. Verify server health
python scripts/health_check.py
# 3. Run tests during development
pytest tests/unit/ --verbose
# 4. Run full test suite before commit
pytest --cov=src --cov-report=term-missing
# 5. Check coverage
open htmlcov/index.html
# 6. Stop server
docker-compose down
```
### Pre-Commit Hook
```bash
# .git/hooks/pre-commit
#!/bin/bash
echo "Running tests..."
pytest --cov=src --cov-fail-under=80
if [ $? -ne 0 ]; then
echo "Tests failed. Commit aborted."
exit 1
fi
echo "All tests passed!"
```
### Makefile for Common Tasks
```makefile
# Makefile
.PHONY: setup test test-unit test-integration coverage clean
setup:
docker-compose up -d
pip install -r requirements.txt
python scripts/health_check.py
test:
pytest --cov=src --cov-report=term-missing
test-unit:
pytest -m unit --verbose
test-integration:
pytest -m integration --verbose
test-replay:
pytest -m replay --verbose
test-parallel:
pytest -n auto --cov=src
coverage:
pytest --cov=src --cov-report=html
open htmlcov/index.html
clean:
docker-compose down -v
rm -rf .pytest_cache htmlcov .coverage
ci:
docker-compose up -d
sleep 10 # Wait for Temporal to start
pytest --cov=src --cov-fail-under=80
docker-compose down
```
### CI/CD Example
```yaml
# .github/workflows/test.yml
name: Tests
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.11"
- name: Start Temporal server
run: docker-compose up -d
- name: Wait for Temporal
run: sleep 10
- name: Install dependencies
run: |
pip install -r requirements.txt
- name: Run tests with coverage
run: |
pytest --cov=src --cov-report=xml --cov-fail-under=80
- name: Upload coverage
uses: codecov/codecov-action@v3
with:
file: ./coverage.xml
- name: Cleanup
if: always()
run: docker-compose down
```
## Debugging Tips
### Enable Temporal SDK Logging
```python
import logging
# Enable debug logging for Temporal SDK
logging.basicConfig(level=logging.DEBUG)
temporal_logger = logging.getLogger("temporalio")
temporal_logger.setLevel(logging.DEBUG)
```
### Interactive Debugging
```python
# Add breakpoint in test
@pytest.mark.asyncio
async def test_workflow_with_breakpoint(workflow_env):
import pdb; pdb.set_trace() # Debug here
async with Worker(...):
result = await workflow_env.client.execute_workflow(...)
```
### Temporal Web UI
```bash
# Access Web UI at http://localhost:8080
# - View workflow executions
# - Inspect event history
# - Replay workflows
# - Monitor workers
```
## Best Practices
1. **Isolated Environment**: Use Docker Compose for reproducible local setup
2. **Health Checks**: Always verify Temporal server before running tests
3. **Fast Feedback**: Use pytest markers to run unit tests quickly
4. **Coverage Targets**: Maintain ≥80% code coverage
5. **Parallel Testing**: Use pytest-xdist for faster test runs
6. **CI/CD Integration**: Automated testing on every commit
7. **Cleanup**: Clear Docker volumes between test runs if needed
## Troubleshooting
**Issue: Temporal server not starting**
```bash
# Check logs
docker-compose logs temporal
# Reset database
docker-compose down -v
docker-compose up -d
```
**Issue: Tests timing out**
```python
# Increase timeout in pytest.ini
asyncio_default_timeout = 30
```
**Issue: Port already in use**
```bash
# Find process using port 7233
lsof -i :7233
# Kill process or change port in docker-compose.yml
```
## Additional Resources
- Temporal Local Development: docs.temporal.io/develop/python/local-dev
- pytest Documentation: docs.pytest.org
- Docker Compose: docs.docker.com/compose
- pytest-asyncio: github.com/pytest-dev/pytest-asyncio

View File

@@ -0,0 +1,455 @@
# Replay Testing for Determinism and Compatibility
Comprehensive guide for validating workflow determinism and ensuring safe code changes using replay testing.
## What is Replay Testing?
**Purpose**: Verify that workflow code changes are backward-compatible with existing workflow executions
**How it works**:
1. Temporal records every workflow decision as Event History
2. Replay testing re-executes workflow code against recorded history
3. If new code makes same decisions → deterministic (safe to deploy)
4. If decisions differ → non-deterministic (breaking change)
**Critical Use Cases**:
- Deploying workflow code changes to production
- Validating refactoring doesn't break running workflows
- CI/CD automated compatibility checks
- Version migration validation
## Basic Replay Testing
### Replayer Setup
```python
from temporalio.worker import Replayer
from temporalio.client import Client
async def test_workflow_replay():
"""Test workflow against production history"""
# Connect to Temporal server
client = await Client.connect("localhost:7233")
# Create replayer with current workflow code
replayer = Replayer(
workflows=[OrderWorkflow, PaymentWorkflow]
)
# Fetch workflow history from production
handle = client.get_workflow_handle("order-123")
history = await handle.fetch_history()
# Replay history with current code
await replayer.replay_workflow(history)
# Success = deterministic, Exception = breaking change
```
### Testing Against Multiple Histories
```python
import pytest
from temporalio.worker import Replayer
@pytest.mark.asyncio
async def test_replay_multiple_workflows():
"""Replay against multiple production histories"""
replayer = Replayer(workflows=[OrderWorkflow])
# Test against different workflow executions
workflow_ids = [
"order-success-123",
"order-cancelled-456",
"order-retry-789",
]
for workflow_id in workflow_ids:
handle = client.get_workflow_handle(workflow_id)
history = await handle.fetch_history()
# Replay should succeed for all variants
await replayer.replay_workflow(history)
```
## Determinism Validation
### Common Non-Deterministic Patterns
**Problem: Random Number Generation**
```python
# ❌ Non-deterministic (breaks replay)
@workflow.defn
class BadWorkflow:
@workflow.run
async def run(self) -> int:
return random.randint(1, 100) # Different on replay!
# ✅ Deterministic (safe for replay)
@workflow.defn
class GoodWorkflow:
@workflow.run
async def run(self) -> int:
return workflow.random().randint(1, 100) # Deterministic random
```
**Problem: Current Time**
```python
# ❌ Non-deterministic
@workflow.defn
class BadWorkflow:
@workflow.run
async def run(self) -> str:
now = datetime.now() # Different on replay!
return now.isoformat()
# ✅ Deterministic
@workflow.defn
class GoodWorkflow:
@workflow.run
async def run(self) -> str:
now = workflow.now() # Deterministic time
return now.isoformat()
```
**Problem: Direct External Calls**
```python
# ❌ Non-deterministic
@workflow.defn
class BadWorkflow:
@workflow.run
async def run(self) -> dict:
response = requests.get("https://api.example.com/data") # External call!
return response.json()
# ✅ Deterministic
@workflow.defn
class GoodWorkflow:
@workflow.run
async def run(self) -> dict:
# Use activity for external calls
return await workflow.execute_activity(
fetch_external_data,
start_to_close_timeout=timedelta(seconds=30),
)
```
### Testing Determinism
```python
@pytest.mark.asyncio
async def test_workflow_determinism():
"""Verify workflow produces same output on multiple runs"""
@workflow.defn
class DeterministicWorkflow:
@workflow.run
async def run(self, seed: int) -> list[int]:
# Use workflow.random() for determinism
rng = workflow.random()
rng.seed(seed)
return [rng.randint(1, 100) for _ in range(10)]
env = await WorkflowEnvironment.start_time_skipping()
# Run workflow twice with same input
results = []
for i in range(2):
async with Worker(
env.client,
task_queue="test",
workflows=[DeterministicWorkflow],
):
result = await env.client.execute_workflow(
DeterministicWorkflow.run,
42, # Same seed
id=f"determinism-test-{i}",
task_queue="test",
)
results.append(result)
await env.shutdown()
# Verify identical outputs
assert results[0] == results[1]
```
## Production History Replay
### Exporting Workflow History
```python
from temporalio.client import Client
async def export_workflow_history(workflow_id: str, output_file: str):
"""Export workflow history for replay testing"""
client = await Client.connect("production.temporal.io:7233")
# Fetch workflow history
handle = client.get_workflow_handle(workflow_id)
history = await handle.fetch_history()
# Save to file for replay testing
with open(output_file, "wb") as f:
f.write(history.SerializeToString())
print(f"Exported history to {output_file}")
```
### Replaying from File
```python
from temporalio.worker import Replayer
from temporalio.api.history.v1 import History
async def test_replay_from_file():
"""Replay workflow from exported history file"""
# Load history from file
with open("workflow_histories/order-123.pb", "rb") as f:
history = History.FromString(f.read())
# Replay with current workflow code
replayer = Replayer(workflows=[OrderWorkflow])
await replayer.replay_workflow(history)
# Success = safe to deploy
```
## CI/CD Integration Patterns
### GitHub Actions Example
```yaml
# .github/workflows/replay-tests.yml
name: Replay Tests
on:
pull_request:
branches: [main]
jobs:
replay-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.11"
- name: Install dependencies
run: |
pip install -r requirements.txt
pip install pytest pytest-asyncio
- name: Download production histories
run: |
# Fetch recent workflow histories from production
python scripts/export_histories.py
- name: Run replay tests
run: |
pytest tests/replay/ --verbose
- name: Upload results
if: failure()
uses: actions/upload-artifact@v3
with:
name: replay-failures
path: replay-failures/
```
### Automated History Export
```python
# scripts/export_histories.py
import asyncio
from temporalio.client import Client
from datetime import datetime, timedelta
async def export_recent_histories():
"""Export recent production workflow histories"""
client = await Client.connect("production.temporal.io:7233")
# Query recent completed workflows
workflows = client.list_workflows(
query="WorkflowType='OrderWorkflow' AND CloseTime > '7 days ago'"
)
count = 0
async for workflow in workflows:
# Export history
history = await workflow.fetch_history()
# Save to file
filename = f"workflow_histories/{workflow.id}.pb"
with open(filename, "wb") as f:
f.write(history.SerializeToString())
count += 1
if count >= 100: # Limit to 100 most recent
break
print(f"Exported {count} workflow histories")
if __name__ == "__main__":
asyncio.run(export_recent_histories())
```
### Replay Test Suite
```python
# tests/replay/test_workflow_replay.py
import pytest
import glob
from temporalio.worker import Replayer
from temporalio.api.history.v1 import History
from workflows import OrderWorkflow, PaymentWorkflow
@pytest.mark.asyncio
async def test_replay_all_histories():
"""Replay all production histories"""
replayer = Replayer(
workflows=[OrderWorkflow, PaymentWorkflow]
)
# Load all history files
history_files = glob.glob("workflow_histories/*.pb")
failures = []
for history_file in history_files:
try:
with open(history_file, "rb") as f:
history = History.FromString(f.read())
await replayer.replay_workflow(history)
print(f"{history_file}")
except Exception as e:
failures.append((history_file, str(e)))
print(f"{history_file}: {e}")
# Report failures
if failures:
pytest.fail(
f"Replay failed for {len(failures)} workflows:\n"
+ "\n".join(f" {file}: {error}" for file, error in failures)
)
```
## Version Compatibility Testing
### Testing Code Evolution
```python
@pytest.mark.asyncio
async def test_workflow_version_compatibility():
"""Test workflow with version changes"""
@workflow.defn
class EvolvingWorkflow:
@workflow.run
async def run(self) -> str:
# Use versioning for safe code evolution
version = workflow.get_version("feature-flag", 1, 2)
if version == 1:
# Old behavior
return "version-1"
else:
# New behavior
return "version-2"
env = await WorkflowEnvironment.start_time_skipping()
# Test version 1 behavior
async with Worker(
env.client,
task_queue="test",
workflows=[EvolvingWorkflow],
):
result_v1 = await env.client.execute_workflow(
EvolvingWorkflow.run,
id="evolving-v1",
task_queue="test",
)
assert result_v1 == "version-1"
# Simulate workflow executing again with version 2
result_v2 = await env.client.execute_workflow(
EvolvingWorkflow.run,
id="evolving-v2",
task_queue="test",
)
# New workflows use version 2
assert result_v2 == "version-2"
await env.shutdown()
```
### Migration Strategy
```python
# Phase 1: Add version check
@workflow.defn
class MigratingWorkflow:
@workflow.run
async def run(self) -> dict:
version = workflow.get_version("new-logic", 1, 2)
if version == 1:
# Old logic (existing workflows)
return await self._old_implementation()
else:
# New logic (new workflows)
return await self._new_implementation()
# Phase 2: After all old workflows complete, remove old code
@workflow.defn
class MigratedWorkflow:
@workflow.run
async def run(self) -> dict:
# Only new logic remains
return await self._new_implementation()
```
## Best Practices
1. **Replay Before Deploy**: Always run replay tests before deploying workflow changes
2. **Export Regularly**: Continuously export production histories for testing
3. **CI/CD Integration**: Automated replay testing in pull request checks
4. **Version Tracking**: Use workflow.get_version() for safe code evolution
5. **History Retention**: Keep representative workflow histories for regression testing
6. **Determinism**: Never use random(), datetime.now(), or direct external calls
7. **Comprehensive Testing**: Test against various workflow execution paths
## Common Replay Errors
**Non-Deterministic Error**:
```
WorkflowNonDeterministicError: Workflow command mismatch at position 5
Expected: ScheduleActivityTask(activity_id='activity-1')
Got: ScheduleActivityTask(activity_id='activity-2')
```
**Solution**: Code change altered workflow decision sequence
**Version Mismatch Error**:
```
WorkflowVersionError: Workflow version changed from 1 to 2 without using get_version()
```
**Solution**: Use workflow.get_version() for backward-compatible changes
## Additional Resources
- Replay Testing: docs.temporal.io/develop/python/testing-suite#replay-testing
- Workflow Versioning: docs.temporal.io/workflows#versioning
- Determinism Guide: docs.temporal.io/workflows#deterministic-constraints
- CI/CD Integration: github.com/temporalio/samples-python/tree/main/.github/workflows

View File

@@ -0,0 +1,320 @@
# Unit Testing Temporal Workflows and Activities
Focused guide for testing individual workflows and activities in isolation using WorkflowEnvironment and ActivityEnvironment.
## WorkflowEnvironment with Time-Skipping
**Purpose**: Test workflows in isolation with instant time progression (month-long workflows → seconds)
### Basic Setup Pattern
```python
import pytest
from temporalio.testing import WorkflowEnvironment
from temporalio.worker import Worker
@pytest.fixture
async def workflow_env():
"""Reusable time-skipping test environment"""
env = await WorkflowEnvironment.start_time_skipping()
yield env
await env.shutdown()
@pytest.mark.asyncio
async def test_workflow_execution(workflow_env):
"""Test workflow with time-skipping"""
async with Worker(
workflow_env.client,
task_queue="test-queue",
workflows=[YourWorkflow],
activities=[your_activity],
):
result = await workflow_env.client.execute_workflow(
YourWorkflow.run,
"test-input",
id="test-wf-id",
task_queue="test-queue",
)
assert result == "expected-output"
```
**Key Benefits**:
- `workflow.sleep(timedelta(days=30))` completes instantly
- Fast feedback loop (milliseconds vs hours)
- Deterministic test execution
### Time-Skipping Examples
**Sleep Advancement**:
```python
@pytest.mark.asyncio
async def test_workflow_with_delays(workflow_env):
"""Workflow sleeps are instant in time-skipping mode"""
@workflow.defn
class DelayedWorkflow:
@workflow.run
async def run(self) -> str:
await workflow.sleep(timedelta(hours=24)) # Instant in tests
return "completed"
async with Worker(
workflow_env.client,
task_queue="test",
workflows=[DelayedWorkflow],
):
result = await workflow_env.client.execute_workflow(
DelayedWorkflow.run,
id="delayed-wf",
task_queue="test",
)
assert result == "completed"
```
**Manual Time Control**:
```python
@pytest.mark.asyncio
async def test_workflow_manual_time(workflow_env):
"""Manually advance time for precise control"""
handle = await workflow_env.client.start_workflow(
TimeBasedWorkflow.run,
id="time-wf",
task_queue="test",
)
# Advance time by specific amount
await workflow_env.sleep(timedelta(hours=1))
# Verify intermediate state via query
state = await handle.query(TimeBasedWorkflow.get_state)
assert state == "processing"
# Advance to completion
await workflow_env.sleep(timedelta(hours=23))
result = await handle.result()
assert result == "completed"
```
### Testing Workflow Logic
**Decision Testing**:
```python
@pytest.mark.asyncio
async def test_workflow_branching(workflow_env):
"""Test different execution paths"""
@workflow.defn
class ConditionalWorkflow:
@workflow.run
async def run(self, condition: bool) -> str:
if condition:
return "path-a"
return "path-b"
async with Worker(
workflow_env.client,
task_queue="test",
workflows=[ConditionalWorkflow],
):
# Test true path
result_a = await workflow_env.client.execute_workflow(
ConditionalWorkflow.run,
True,
id="cond-wf-true",
task_queue="test",
)
assert result_a == "path-a"
# Test false path
result_b = await workflow_env.client.execute_workflow(
ConditionalWorkflow.run,
False,
id="cond-wf-false",
task_queue="test",
)
assert result_b == "path-b"
```
## ActivityEnvironment Testing
**Purpose**: Test activities in isolation without workflows or Temporal server
### Basic Activity Test
```python
from temporalio.testing import ActivityEnvironment
async def test_activity_basic():
"""Test activity without workflow context"""
@activity.defn
async def process_data(input: str) -> str:
return input.upper()
env = ActivityEnvironment()
result = await env.run(process_data, "test")
assert result == "TEST"
```
### Testing Activity Context
**Heartbeat Testing**:
```python
async def test_activity_heartbeat():
"""Verify heartbeat calls"""
@activity.defn
async def long_running_activity(total_items: int) -> int:
for i in range(total_items):
activity.heartbeat(i) # Report progress
await asyncio.sleep(0.1)
return total_items
env = ActivityEnvironment()
result = await env.run(long_running_activity, 10)
assert result == 10
```
**Cancellation Testing**:
```python
async def test_activity_cancellation():
"""Test activity cancellation handling"""
@activity.defn
async def cancellable_activity() -> str:
try:
while True:
if activity.is_cancelled():
return "cancelled"
await asyncio.sleep(0.1)
except asyncio.CancelledError:
return "cancelled"
env = ActivityEnvironment(cancellation_reason="test-cancel")
result = await env.run(cancellable_activity)
assert result == "cancelled"
```
### Testing Error Handling
**Exception Propagation**:
```python
async def test_activity_error():
"""Test activity error handling"""
@activity.defn
async def failing_activity(should_fail: bool) -> str:
if should_fail:
raise ApplicationError("Validation failed", non_retryable=True)
return "success"
env = ActivityEnvironment()
# Test success path
result = await env.run(failing_activity, False)
assert result == "success"
# Test error path
with pytest.raises(ApplicationError) as exc_info:
await env.run(failing_activity, True)
assert "Validation failed" in str(exc_info.value)
```
## Pytest Integration Patterns
### Shared Fixtures
```python
# conftest.py
import pytest
from temporalio.testing import WorkflowEnvironment
@pytest.fixture(scope="module")
async def workflow_env():
"""Module-scoped environment (reused across tests)"""
env = await WorkflowEnvironment.start_time_skipping()
yield env
await env.shutdown()
@pytest.fixture
def activity_env():
"""Function-scoped environment (fresh per test)"""
return ActivityEnvironment()
```
### Parameterized Tests
```python
@pytest.mark.parametrize("input,expected", [
("test", "TEST"),
("hello", "HELLO"),
("123", "123"),
])
async def test_activity_parameterized(activity_env, input, expected):
"""Test multiple input scenarios"""
result = await activity_env.run(process_data, input)
assert result == expected
```
## Best Practices
1. **Fast Execution**: Use time-skipping for all workflow tests
2. **Isolation**: Test workflows and activities separately
3. **Shared Fixtures**: Reuse WorkflowEnvironment across related tests
4. **Coverage Target**: ≥80% for workflow logic
5. **Mock Activities**: Use ActivityEnvironment for activity-specific logic
6. **Determinism**: Ensure test results are consistent across runs
7. **Error Cases**: Test both success and failure scenarios
## Common Patterns
**Testing Retry Logic**:
```python
@pytest.mark.asyncio
async def test_workflow_with_retries(workflow_env):
"""Test activity retry behavior"""
call_count = 0
@activity.defn
async def flaky_activity() -> str:
nonlocal call_count
call_count += 1
if call_count < 3:
raise Exception("Transient error")
return "success"
@workflow.defn
class RetryWorkflow:
@workflow.run
async def run(self) -> str:
return await workflow.execute_activity(
flaky_activity,
start_to_close_timeout=timedelta(seconds=10),
retry_policy=RetryPolicy(
initial_interval=timedelta(milliseconds=1),
maximum_attempts=5,
),
)
async with Worker(
workflow_env.client,
task_queue="test",
workflows=[RetryWorkflow],
activities=[flaky_activity],
):
result = await workflow_env.client.execute_workflow(
RetryWorkflow.run,
id="retry-wf",
task_queue="test",
)
assert result == "success"
assert call_count == 3 # Verify retry attempts
```
## Additional Resources
- Python SDK Testing: docs.temporal.io/develop/python/testing-suite
- pytest Documentation: docs.pytest.org
- Temporal Samples: github.com/temporalio/samples-python

View File

@@ -0,0 +1,286 @@
---
name: workflow-orchestration-patterns
description: Design durable workflows with Temporal for distributed systems. Covers workflow vs activity separation, saga patterns, state management, and determinism constraints. Use when building long-running processes, distributed transactions, or microservice orchestration.
---
# Workflow Orchestration Patterns
Master workflow orchestration architecture with Temporal, covering fundamental design decisions, resilience patterns, and best practices for building reliable distributed systems.
## When to Use Workflow Orchestration
### Ideal Use Cases (Source: docs.temporal.io)
- **Multi-step processes** spanning machines/services/databases
- **Distributed transactions** requiring all-or-nothing semantics
- **Long-running workflows** (hours to years) with automatic state persistence
- **Failure recovery** that must resume from last successful step
- **Business processes**: bookings, orders, campaigns, approvals
- **Entity lifecycle management**: inventory tracking, account management, cart workflows
- **Infrastructure automation**: CI/CD pipelines, provisioning, deployments
- **Human-in-the-loop** systems requiring timeouts and escalations
### When NOT to Use
- Simple CRUD operations (use direct API calls)
- Pure data processing pipelines (use Airflow, batch processing)
- Stateless request/response (use standard APIs)
- Real-time streaming (use Kafka, event processors)
## Critical Design Decision: Workflows vs Activities
**The Fundamental Rule** (Source: temporal.io/blog/workflow-engine-principles):
- **Workflows** = Orchestration logic and decision-making
- **Activities** = External interactions (APIs, databases, network calls)
### Workflows (Orchestration)
**Characteristics:**
- Contain business logic and coordination
- **MUST be deterministic** (same inputs → same outputs)
- **Cannot** perform direct external calls
- State automatically preserved across failures
- Can run for years despite infrastructure failures
**Example workflow tasks:**
- Decide which steps to execute
- Handle compensation logic
- Manage timeouts and retries
- Coordinate child workflows
### Activities (External Interactions)
**Characteristics:**
- Handle all external system interactions
- Can be non-deterministic (API calls, DB writes)
- Include built-in timeouts and retry logic
- **Must be idempotent** (calling N times = calling once)
- Short-lived (seconds to minutes typically)
**Example activity tasks:**
- Call payment gateway API
- Write to database
- Send emails or notifications
- Query external services
### Design Decision Framework
```
Does it touch external systems? → Activity
Is it orchestration/decision logic? → Workflow
```
## Core Workflow Patterns
### 1. Saga Pattern with Compensation
**Purpose**: Implement distributed transactions with rollback capability
**Pattern** (Source: temporal.io/blog/compensating-actions-part-of-a-complete-breakfast-with-sagas):
```
For each step:
1. Register compensation BEFORE executing
2. Execute the step (via activity)
3. On failure, run all compensations in reverse order (LIFO)
```
**Example: Payment Workflow**
1. Reserve inventory (compensation: release inventory)
2. Charge payment (compensation: refund payment)
3. Fulfill order (compensation: cancel fulfillment)
**Critical Requirements:**
- Compensations must be idempotent
- Register compensation BEFORE executing step
- Run compensations in reverse order
- Handle partial failures gracefully
### 2. Entity Workflows (Actor Model)
**Purpose**: Long-lived workflow representing single entity instance
**Pattern** (Source: docs.temporal.io/evaluate/use-cases-design-patterns):
- One workflow execution = one entity (cart, account, inventory item)
- Workflow persists for entity lifetime
- Receives signals for state changes
- Supports queries for current state
**Example Use Cases:**
- Shopping cart (add items, checkout, expiration)
- Bank account (deposits, withdrawals, balance checks)
- Product inventory (stock updates, reservations)
**Benefits:**
- Encapsulates entity behavior
- Guarantees consistency per entity
- Natural event sourcing
### 3. Fan-Out/Fan-In (Parallel Execution)
**Purpose**: Execute multiple tasks in parallel, aggregate results
**Pattern:**
- Spawn child workflows or parallel activities
- Wait for all to complete
- Aggregate results
- Handle partial failures
**Scaling Rule** (Source: temporal.io/blog/workflow-engine-principles):
- Don't scale individual workflows
- For 1M tasks: spawn 1K child workflows × 1K tasks each
- Keep each workflow bounded
### 4. Async Callback Pattern
**Purpose**: Wait for external event or human approval
**Pattern:**
- Workflow sends request and waits for signal
- External system processes asynchronously
- Sends signal to resume workflow
- Workflow continues with response
**Use Cases:**
- Human approval workflows
- Webhook callbacks
- Long-running external processes
## State Management and Determinism
### Automatic State Preservation
**How Temporal Works** (Source: docs.temporal.io/workflows):
- Complete program state preserved automatically
- Event History records every command and event
- Seamless recovery from crashes
- Applications restore pre-failure state
### Determinism Constraints
**Workflows Execute as State Machines**:
- Replay behavior must be consistent
- Same inputs → identical outputs every time
**Prohibited in Workflows** (Source: docs.temporal.io/workflows):
- ❌ Threading, locks, synchronization primitives
- ❌ Random number generation (`random()`)
- ❌ Global state or static variables
- ❌ System time (`datetime.now()`)
- ❌ Direct file I/O or network calls
- ❌ Non-deterministic libraries
**Allowed in Workflows**:
-`workflow.now()` (deterministic time)
-`workflow.random()` (deterministic random)
- ✅ Pure functions and calculations
- ✅ Calling activities (non-deterministic operations)
### Versioning Strategies
**Challenge**: Changing workflow code while old executions still running
**Solutions**:
1. **Versioning API**: Use `workflow.get_version()` for safe changes
2. **New Workflow Type**: Create new workflow, route new executions to it
3. **Backward Compatibility**: Ensure old events replay correctly
## Resilience and Error Handling
### Retry Policies
**Default Behavior**: Temporal retries activities forever
**Configure Retry**:
- Initial retry interval
- Backoff coefficient (exponential backoff)
- Maximum interval (cap retry delay)
- Maximum attempts (eventually fail)
**Non-Retryable Errors**:
- Invalid input (validation failures)
- Business rule violations
- Permanent failures (resource not found)
### Idempotency Requirements
**Why Critical** (Source: docs.temporal.io/activities):
- Activities may execute multiple times
- Network failures trigger retries
- Duplicate execution must be safe
**Implementation Strategies**:
- Idempotency keys (deduplication)
- Check-then-act with unique constraints
- Upsert operations instead of insert
- Track processed request IDs
### Activity Heartbeats
**Purpose**: Detect stalled long-running activities
**Pattern**:
- Activity sends periodic heartbeat
- Includes progress information
- Timeout if no heartbeat received
- Enables progress-based retry
## Best Practices
### Workflow Design
1. **Keep workflows focused** - Single responsibility per workflow
2. **Small workflows** - Use child workflows for scalability
3. **Clear boundaries** - Workflow orchestrates, activities execute
4. **Test locally** - Use time-skipping test environment
### Activity Design
1. **Idempotent operations** - Safe to retry
2. **Short-lived** - Seconds to minutes, not hours
3. **Timeout configuration** - Always set timeouts
4. **Heartbeat for long tasks** - Report progress
5. **Error handling** - Distinguish retryable vs non-retryable
### Common Pitfalls
**Workflow Violations**:
- Using `datetime.now()` instead of `workflow.now()`
- Threading or async operations in workflow code
- Calling external APIs directly from workflow
- Non-deterministic logic in workflows
**Activity Mistakes**:
- Non-idempotent operations (can't handle retries)
- Missing timeouts (activities run forever)
- No error classification (retry validation errors)
- Ignoring payload limits (2MB per argument)
### Operational Considerations
**Monitoring**:
- Workflow execution duration
- Activity failure rates
- Retry attempts and backoff
- Pending workflow counts
**Scalability**:
- Horizontal scaling with workers
- Task queue partitioning
- Child workflow decomposition
- Activity batching when appropriate
## Additional Resources
**Official Documentation**:
- Temporal Core Concepts: docs.temporal.io/workflows
- Workflow Patterns: docs.temporal.io/evaluate/use-cases-design-patterns
- Best Practices: docs.temporal.io/develop/best-practices
- Saga Pattern: temporal.io/blog/saga-pattern-made-easy
**Key Principles**:
1. Workflows = orchestration, Activities = external calls
2. Determinism is non-negotiable for workflows
3. Idempotency is critical for activities
4. State preservation is automatic
5. Design for failure and recovery