368 lines
10 KiB
Markdown
368 lines
10 KiB
Markdown
# MCP Code Execution Best Practices
|
|
|
|
This reference document provides detailed guidance on implementing efficient MCP integrations using code execution patterns, based on [Anthropic's MCP engineering blog post](https://www.anthropic.com/engineering/code-execution-with-mcp).
|
|
|
|
## Core Principles
|
|
|
|
### 1. Progressive Disclosure
|
|
|
|
**Problem**: Loading all MCP tool definitions upfront wastes context window space.
|
|
|
|
**Solution**: Present tools as code APIs on a filesystem, allowing models to load only what they need.
|
|
|
|
```
|
|
scripts/
|
|
├── tools/
|
|
│ ├── google-drive/
|
|
│ │ ├── getDocument.ts
|
|
│ │ ├── listFiles.ts
|
|
│ │ └── index.ts
|
|
│ └── salesforce/
|
|
│ ├── updateRecord.ts
|
|
│ └── index.ts
|
|
```
|
|
|
|
**Benefits**:
|
|
- Reduces initial context from 150,000 tokens to 2,000 tokens (98.7% reduction)
|
|
- Scales to thousands of tools without overwhelming the model
|
|
- Tools loaded on-demand as needed
|
|
|
|
**Implementation**:
|
|
```python
|
|
# Agent explores filesystem
|
|
tools_available = os.listdir('scripts/tools/google-drive/')
|
|
|
|
# Agent reads only needed tool definitions
|
|
with open('scripts/tools/google-drive/getDocument.py') as f:
|
|
tool_code = f.read()
|
|
```
|
|
|
|
### 2. Context-Efficient Data Handling
|
|
|
|
**Problem**: Intermediate results flowing through context window consume excessive tokens.
|
|
|
|
**Bad Example**:
|
|
```python
|
|
# Without code execution - all data flows through context
|
|
TOOL CALL: gdrive.getSheet(sheetId: 'abc123')
|
|
→ returns 10,000 rows to model
|
|
→ model filters in context
|
|
→ passes filtered data to next tool
|
|
```
|
|
|
|
**Good Example**:
|
|
```python
|
|
# With code execution - filter in execution environment
|
|
sheet_data = await gdrive.getSheet({'sheetId': 'abc123'})
|
|
|
|
# Filter in execution environment (no context cost)
|
|
pending_orders = [
|
|
row for row in sheet_data
|
|
if row['Status'] == 'pending' and row['Amount'] > 1000
|
|
]
|
|
|
|
# Only return summary to model
|
|
print(f"Found {len(pending_orders)} high-value pending orders")
|
|
print(pending_orders[:5]) # Show first 5 for review
|
|
```
|
|
|
|
**Benefits**:
|
|
- Processes 10,000 rows but only sends 5 to model
|
|
- Reduces token usage by 99.5%
|
|
- Faster execution, lower costs
|
|
|
|
### 3. Parallel Execution
|
|
|
|
**Problem**: Sequential tool calls waste time when operations are independent.
|
|
|
|
**Bad Example**:
|
|
```python
|
|
# Sequential execution
|
|
twitter_data = await x_com.search_tweets(query)
|
|
# Wait for Twitter...
|
|
reddit_data = await reddit.search_discussions(query)
|
|
# Wait for Reddit...
|
|
```
|
|
|
|
**Good Example**:
|
|
```python
|
|
# Parallel execution with asyncio.gather()
|
|
twitter_task = x_com.search_tweets(query)
|
|
reddit_task = reddit.search_discussions(query)
|
|
producthunt_task = producthunt.search(query)
|
|
|
|
# Execute all concurrently
|
|
results = await asyncio.gather(
|
|
twitter_task,
|
|
reddit_task,
|
|
producthunt_task
|
|
)
|
|
|
|
twitter_data, reddit_data, ph_data = results
|
|
```
|
|
|
|
**Benefits**:
|
|
- 3x faster execution (if all APIs take similar time)
|
|
- Better user experience
|
|
- Efficient resource utilization
|
|
|
|
### 4. Complex Control Flow
|
|
|
|
**Problem**: Implementing loops and conditionals via sequential tool calls is inefficient.
|
|
|
|
**Bad Example**:
|
|
```python
|
|
# Agent alternates between tool calls and sleep
|
|
TOOL CALL: slack.getMessages()
|
|
→ no deployment message
|
|
SLEEP: 5 seconds
|
|
TOOL CALL: slack.getMessages()
|
|
→ no deployment message
|
|
SLEEP: 5 seconds
|
|
# ... repeat many times
|
|
```
|
|
|
|
**Good Example**:
|
|
```python
|
|
# Implement control flow in code
|
|
async def wait_for_deployment(channel: str, timeout: int = 300):
|
|
start_time = time.time()
|
|
|
|
while time.time() - start_time < timeout:
|
|
messages = await slack.getChannelHistory(channel, limit=10)
|
|
|
|
if any('deployment complete' in m['text'].lower() for m in messages):
|
|
return {'status': 'success', 'message': messages[0]}
|
|
|
|
await asyncio.sleep(10)
|
|
|
|
return {'status': 'timeout'}
|
|
```
|
|
|
|
**Benefits**:
|
|
- Single code execution instead of 60+ tool calls
|
|
- Faster time to first token
|
|
- More reliable error handling
|
|
|
|
### 5. Privacy-Preserving Operations
|
|
|
|
**Problem**: Sensitive data flowing through model context raises privacy concerns.
|
|
|
|
**Solution**: Keep sensitive data in execution environment, only share summaries.
|
|
|
|
```python
|
|
# Load sensitive customer data
|
|
customers = await gdrive.getSheet({'sheetId': 'customer_contacts'})
|
|
|
|
# Process PII in execution environment (never shown to model)
|
|
for customer in customers:
|
|
await salesforce.updateRecord({
|
|
'objectType': 'Lead',
|
|
'recordId': customer['salesforce_id'],
|
|
'data': {
|
|
'Email': customer['email'], # PII stays in execution env
|
|
'Phone': customer['phone'], # PII stays in execution env
|
|
'Name': customer['name'] # PII stays in execution env
|
|
}
|
|
})
|
|
|
|
# Only summary goes to model
|
|
print(f"Updated {len(customers)} customer records")
|
|
print("✓ All contact information synchronized")
|
|
```
|
|
|
|
**Optional Enhancement**: Tokenize PII automatically in MCP client:
|
|
```python
|
|
# What model sees (if PII is tokenized):
|
|
[
|
|
{'email': '[EMAIL_1]', 'phone': '[PHONE_1]', 'name': '[NAME_1]'},
|
|
{'email': '[EMAIL_2]', 'phone': '[PHONE_2]', 'name': '[NAME_2]'}
|
|
]
|
|
|
|
# Real data flows Google Sheets → Salesforce without entering model context
|
|
```
|
|
|
|
### 6. State Persistence and Skills
|
|
|
|
**Problem**: Agents cannot build on previous work without memory.
|
|
|
|
**Solution**: Use filesystem to persist intermediate results and reusable functions.
|
|
|
|
**State Persistence**:
|
|
```python
|
|
# Save intermediate results
|
|
import json
|
|
|
|
intermediate_data = await fetch_and_process()
|
|
|
|
with open('./workspace/state.json', 'w') as f:
|
|
json.dump(intermediate_data, f)
|
|
|
|
# Later execution picks up where it left off
|
|
with open('./workspace/state.json') as f:
|
|
state = json.load(f)
|
|
```
|
|
|
|
**Skill Evolution**:
|
|
```python
|
|
# Save reusable function as a skill
|
|
# In ./skills/save-sheet-as-csv.py
|
|
import pandas as pd
|
|
from scripts.tools import gdrive
|
|
|
|
async def save_sheet_as_csv(sheet_id: str, output_path: str):
|
|
"""
|
|
Reusable function to export Google Sheet as CSV
|
|
"""
|
|
data = await gdrive.getSheet({'sheetId': sheet_id})
|
|
df = pd.DataFrame(data)
|
|
df.to_csv(output_path, index=False)
|
|
return output_path
|
|
|
|
# Later, in any workflow:
|
|
from skills.save_sheet_as_csv import save_sheet_as_csv
|
|
|
|
csv_path = await save_sheet_as_csv('abc123', './data/export.csv')
|
|
```
|
|
|
|
**Add SKILL.md** to create structured skill:
|
|
```markdown
|
|
---
|
|
name: sheet-csv-exporter
|
|
description: Export Google Sheets to CSV format
|
|
---
|
|
|
|
# Sheet CSV Exporter
|
|
|
|
Provides a reusable function for exporting Google Sheets to CSV files.
|
|
|
|
## Usage
|
|
|
|
```python
|
|
from skills.save_sheet_as_csv import save_sheet_as_csv
|
|
|
|
csv_path = await save_sheet_as_csv(
|
|
sheet_id='your-sheet-id',
|
|
output_path='./output/data.csv'
|
|
)
|
|
```
|
|
```
|
|
|
|
## Token Usage Comparison
|
|
|
|
| Approach | Token Usage | Latency | Privacy |
|
|
|----------|-------------|---------|---------|
|
|
| **Direct Tool Calls** | 150,000+ tokens (all tool definitions loaded) | High (sequential calls) | ⚠️ All data through context |
|
|
| **Code Execution with MCP** | 2,000 tokens (load on demand) | Low (parallel execution) | ✅ Data filtered/tokenized |
|
|
|
|
**Savings**: 98.7% token reduction, 3-5x faster execution
|
|
|
|
## When to Use Code Execution
|
|
|
|
✅ **Use code execution when**:
|
|
- Working with many MCP tools (>10 tools)
|
|
- Processing large datasets (>1000 rows)
|
|
- Need parallel API calls
|
|
- Workflow involves loops/conditionals
|
|
- Privacy concerns with sensitive data
|
|
- Building reusable workflows
|
|
|
|
❌ **Avoid code execution when**:
|
|
- Simple single tool call
|
|
- Small data amounts
|
|
- Quick ad-hoc tasks
|
|
- No performance concerns
|
|
- Execution environment unavailable
|
|
|
|
## Implementation Considerations
|
|
|
|
### Security
|
|
- Sandbox execution environment properly
|
|
- Limit resource usage (CPU, memory, time)
|
|
- Monitor for malicious code patterns
|
|
- Validate all inputs
|
|
|
|
### Error Handling
|
|
```python
|
|
try:
|
|
result = await mcp_tool(params)
|
|
except Exception as e:
|
|
# Log error
|
|
logger.error(f"MCP tool failed: {e}")
|
|
# Return graceful fallback
|
|
return {'error': str(e), 'status': 'failed'}
|
|
```
|
|
|
|
### Testing
|
|
- Test scripts in isolation
|
|
- Mock MCP tool responses
|
|
- Verify error handling
|
|
- Check performance gains
|
|
|
|
## Examples from Production
|
|
|
|
### Example 1: Document Processing Pipeline
|
|
```python
|
|
async def process_contracts(folder_id: str):
|
|
"""Process all contracts in a folder"""
|
|
# 1. List all files (single MCP call)
|
|
files = await gdrive.listFiles({'folderId': folder_id})
|
|
|
|
# 2. Filter in execution environment
|
|
pdf_files = [f for f in files if f['type'] == 'pdf']
|
|
|
|
# 3. Parallel processing
|
|
results = await asyncio.gather(*[
|
|
extract_contract_data(f['id'])
|
|
for f in pdf_files
|
|
])
|
|
|
|
# 4. Aggregate and save
|
|
summary = aggregate_contract_summary(results)
|
|
|
|
# Only summary to model
|
|
return {
|
|
'total_contracts': len(pdf_files),
|
|
'processed': len(results),
|
|
'summary': summary[:500] # Truncate for context
|
|
}
|
|
```
|
|
|
|
### Example 2: Social Media Monitoring
|
|
```python
|
|
async def monitor_brand_mentions(brand: str):
|
|
"""Monitor brand across multiple platforms"""
|
|
# Parallel fetch from multiple sources
|
|
twitter_task = x_com.search_tweets(f'"{brand}"')
|
|
reddit_task = reddit.search(brand, subreddits=['technology'])
|
|
hn_task = hackernews.search(brand)
|
|
|
|
mentions = await asyncio.gather(
|
|
twitter_task, reddit_task, hn_task
|
|
)
|
|
|
|
# Sentiment analysis in execution environment
|
|
sentiment = analyze_sentiment_batch(mentions)
|
|
|
|
# Filter and aggregate
|
|
recent_mentions = filter_last_24h(mentions)
|
|
key_insights = extract_key_insights(recent_mentions)
|
|
|
|
return {
|
|
'mention_count': len(recent_mentions),
|
|
'sentiment': sentiment,
|
|
'key_insights': key_insights,
|
|
'platforms': {
|
|
'twitter': len(mentions[0]),
|
|
'reddit': len(mentions[1]),
|
|
'hackernews': len(mentions[2])
|
|
}
|
|
}
|
|
```
|
|
|
|
## Further Reading
|
|
|
|
- [MCP Official Documentation](https://modelcontextprotocol.io/)
|
|
- [Anthropic MCP Engineering Blog](https://www.anthropic.com/engineering/code-execution-with-mcp)
|
|
- [Cloudflare Code Mode](https://blog.cloudflare.com/code-mode/)
|