339 lines
7.7 KiB
Markdown
339 lines
7.7 KiB
Markdown
# Code Interpreter Guide
|
|
|
|
Complete guide to using Code Interpreter with the Assistants API.
|
|
|
|
---
|
|
|
|
## What is Code Interpreter?
|
|
|
|
A built-in tool that executes Python code in a sandboxed environment, enabling:
|
|
- Data analysis and processing
|
|
- Mathematical computations
|
|
- Chart and graph generation
|
|
- File parsing (CSV, JSON, Excel, etc.)
|
|
- Data transformations
|
|
|
|
---
|
|
|
|
## Setup
|
|
|
|
```typescript
|
|
const assistant = await openai.beta.assistants.create({
|
|
name: "Data Analyst",
|
|
instructions: "You analyze data and create visualizations.",
|
|
tools: [{ type: "code_interpreter" }],
|
|
model: "gpt-4o",
|
|
});
|
|
```
|
|
|
|
---
|
|
|
|
## File Uploads
|
|
|
|
### Upload Data Files
|
|
|
|
```typescript
|
|
const file = await openai.files.create({
|
|
file: fs.createReadStream("data.csv"),
|
|
purpose: "assistants",
|
|
});
|
|
```
|
|
|
|
### Attach to Messages
|
|
|
|
```typescript
|
|
await openai.beta.threads.messages.create(thread.id, {
|
|
role: "user",
|
|
content: "Analyze this sales data",
|
|
attachments: [{
|
|
file_id: file.id,
|
|
tools: [{ type: "code_interpreter" }],
|
|
}],
|
|
});
|
|
```
|
|
|
|
---
|
|
|
|
## Supported File Formats
|
|
|
|
**Data Files**:
|
|
- `.csv`, `.json`, `.xlsx` - Tabular data
|
|
- `.txt`, `.md` - Text files
|
|
- `.pdf`, `.docx`, `.pptx` - Documents (text extraction)
|
|
|
|
**Code Files**:
|
|
- `.py`, `.js`, `.ts`, `.java`, `.cpp` - Source code
|
|
|
|
**Images** (for processing, not vision):
|
|
- `.png`, `.jpg`, `.jpeg`, `.gif` - Image manipulation
|
|
|
|
**Archives**:
|
|
- `.zip`, `.tar` - Compressed files
|
|
|
|
**Size Limit**: 512 MB per file
|
|
|
|
---
|
|
|
|
## Common Use Cases
|
|
|
|
### 1. Data Analysis
|
|
|
|
```typescript
|
|
const thread = await openai.beta.threads.create({
|
|
messages: [{
|
|
role: "user",
|
|
content: "Calculate the average, median, and standard deviation of the revenue column",
|
|
attachments: [{
|
|
file_id: csvFileId,
|
|
tools: [{ type: "code_interpreter" }],
|
|
}],
|
|
}],
|
|
});
|
|
```
|
|
|
|
### 2. Data Visualization
|
|
|
|
```typescript
|
|
await openai.beta.threads.messages.create(thread.id, {
|
|
role: "user",
|
|
content: "Create a line chart showing revenue over time",
|
|
});
|
|
|
|
// After run completes, download the generated image
|
|
const messages = await openai.beta.threads.messages.list(thread.id);
|
|
for (const content of messages.data[0].content) {
|
|
if (content.type === 'image_file') {
|
|
const imageData = await openai.files.content(content.image_file.file_id);
|
|
const buffer = Buffer.from(await imageData.arrayBuffer());
|
|
fs.writeFileSync('chart.png', buffer);
|
|
}
|
|
}
|
|
```
|
|
|
|
### 3. File Conversion
|
|
|
|
```typescript
|
|
await openai.beta.threads.messages.create(thread.id, {
|
|
role: "user",
|
|
content: "Convert this Excel file to CSV format",
|
|
attachments: [{
|
|
file_id: excelFileId,
|
|
tools: [{ type: "code_interpreter" }],
|
|
}],
|
|
});
|
|
```
|
|
|
|
---
|
|
|
|
## Retrieving Outputs
|
|
|
|
### Text Output
|
|
|
|
```typescript
|
|
const messages = await openai.beta.threads.messages.list(thread.id);
|
|
const response = messages.data[0];
|
|
|
|
for (const content of response.content) {
|
|
if (content.type === 'text') {
|
|
console.log(content.text.value);
|
|
}
|
|
}
|
|
```
|
|
|
|
### Generated Files (Charts, CSVs)
|
|
|
|
```typescript
|
|
for (const content of response.content) {
|
|
if (content.type === 'image_file') {
|
|
const fileId = content.image_file.file_id;
|
|
const data = await openai.files.content(fileId);
|
|
const buffer = Buffer.from(await data.arrayBuffer());
|
|
fs.writeFileSync(`output_${fileId}.png`, buffer);
|
|
}
|
|
}
|
|
```
|
|
|
|
### Execution Logs
|
|
|
|
```typescript
|
|
const runSteps = await openai.beta.threads.runs.steps.list(thread.id, run.id);
|
|
|
|
for (const step of runSteps.data) {
|
|
if (step.step_details.type === 'tool_calls') {
|
|
for (const toolCall of step.step_details.tool_calls) {
|
|
if (toolCall.type === 'code_interpreter') {
|
|
console.log('Code:', toolCall.code_interpreter.input);
|
|
console.log('Output:', toolCall.code_interpreter.outputs);
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Python Environment
|
|
|
|
### Available Libraries
|
|
|
|
The Code Interpreter sandbox includes common libraries:
|
|
- **Data**: pandas, numpy
|
|
- **Math**: scipy, sympy
|
|
- **Plotting**: matplotlib, seaborn
|
|
- **ML**: scikit-learn (limited)
|
|
- **Utils**: requests, PIL, csv, json
|
|
|
|
**Note**: Not all PyPI packages available. Use standard library where possible.
|
|
|
|
### Environment Limits
|
|
|
|
- **Execution Time**: Part of 10-minute run limit
|
|
- **Memory**: Limited (exact amount not documented)
|
|
- **Disk Space**: Files persist during run only
|
|
- **Network**: No outbound internet access
|
|
|
|
---
|
|
|
|
## Best Practices
|
|
|
|
### 1. Clear Instructions
|
|
|
|
```typescript
|
|
// ❌ Vague
|
|
"Analyze the data"
|
|
|
|
// ✅ Specific
|
|
"Calculate the mean, median, and mode for each numeric column. Create a bar chart comparing these metrics."
|
|
```
|
|
|
|
### 2. File Download Immediately
|
|
|
|
```typescript
|
|
// Generated files are temporary - download right after completion
|
|
if (run.status === 'completed') {
|
|
const messages = await openai.beta.threads.messages.list(thread.id);
|
|
// Download all image files immediately
|
|
for (const message of messages.data) {
|
|
for (const content of message.content) {
|
|
if (content.type === 'image_file') {
|
|
await downloadFile(content.image_file.file_id);
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### 3. Error Handling
|
|
|
|
```typescript
|
|
const runSteps = await openai.beta.threads.runs.steps.list(thread.id, run.id);
|
|
|
|
for (const step of runSteps.data) {
|
|
if (step.step_details.type === 'tool_calls') {
|
|
for (const toolCall of step.step_details.tool_calls) {
|
|
if (toolCall.type === 'code_interpreter') {
|
|
const outputs = toolCall.code_interpreter.outputs;
|
|
for (const output of outputs) {
|
|
if (output.type === 'logs' && output.logs.includes('Error')) {
|
|
console.error('Execution error:', output.logs);
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Common Patterns
|
|
|
|
### Pattern: Iterative Analysis
|
|
|
|
```typescript
|
|
// 1. Upload data
|
|
const file = await openai.files.create({...});
|
|
|
|
// 2. Initial analysis
|
|
await sendMessage("What are the columns and data types?");
|
|
|
|
// 3. Follow-up based on results
|
|
await sendMessage("Show the distribution of the 'category' column");
|
|
|
|
// 4. Visualization
|
|
await sendMessage("Create a heatmap of correlations between numeric columns");
|
|
```
|
|
|
|
### Pattern: Multi-File Processing
|
|
|
|
```typescript
|
|
await openai.beta.threads.messages.create(thread.id, {
|
|
role: "user",
|
|
content: "Merge these two CSV files on the 'id' column",
|
|
attachments: [
|
|
{ file_id: file1Id, tools: [{ type: "code_interpreter" }] },
|
|
{ file_id: file2Id, tools: [{ type: "code_interpreter" }] },
|
|
],
|
|
});
|
|
```
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### Issue: Code Execution Fails
|
|
|
|
**Symptoms**: Run completes but no output/error in logs
|
|
|
|
**Solutions**:
|
|
- Check file format compatibility
|
|
- Verify file isn't corrupted
|
|
- Ensure data is in expected format (headers, encoding)
|
|
- Try simpler request first to verify setup
|
|
|
|
### Issue: Generated Files Not Found
|
|
|
|
**Symptoms**: `image_file.file_id` doesn't exist
|
|
|
|
**Solutions**:
|
|
- Download immediately after run completes
|
|
- Check run steps for actual outputs
|
|
- Verify code execution succeeded
|
|
|
|
### Issue: Timeout on Large Files
|
|
|
|
**Symptoms**: Run exceeds 10-minute limit
|
|
|
|
**Solutions**:
|
|
- Split large files into smaller chunks
|
|
- Request specific analysis (not "analyze everything")
|
|
- Use sampling for exploratory analysis
|
|
|
|
---
|
|
|
|
## Example Prompts
|
|
|
|
**Data Exploration**:
|
|
- "Summarize this dataset: shape, columns, data types, missing values"
|
|
- "Show the first 10 rows"
|
|
- "What are the unique values in the 'status' column?"
|
|
|
|
**Statistical Analysis**:
|
|
- "Calculate descriptive statistics for all numeric columns"
|
|
- "Perform correlation analysis between price and quantity"
|
|
- "Detect outliers using the IQR method"
|
|
|
|
**Visualization**:
|
|
- "Create a histogram of the 'age' distribution"
|
|
- "Plot revenue trends over time with a moving average"
|
|
- "Generate a scatter plot of height vs weight, colored by gender"
|
|
|
|
**Data Transformation**:
|
|
- "Remove rows with missing values"
|
|
- "Normalize the 'sales' column to 0-1 range"
|
|
- "Convert dates to YYYY-MM-DD format"
|
|
|
|
---
|
|
|
|
**Last Updated**: 2025-10-25
|