Initial commit
This commit is contained in:
30
skills/xlsx/LICENSE.txt
Normal file
30
skills/xlsx/LICENSE.txt
Normal file
@@ -0,0 +1,30 @@
|
||||
© 2025 Anthropic, PBC. All rights reserved.
|
||||
|
||||
LICENSE: Use of these materials (including all code, prompts, assets, files,
|
||||
and other components of this Skill) is governed by your agreement with
|
||||
Anthropic regarding use of Anthropic's services. If no separate agreement
|
||||
exists, use is governed by Anthropic's Consumer Terms of Service or
|
||||
Commercial Terms of Service, as applicable:
|
||||
https://www.anthropic.com/legal/consumer-terms
|
||||
https://www.anthropic.com/legal/commercial-terms
|
||||
Your applicable agreement is referred to as the "Agreement." "Services" are
|
||||
as defined in the Agreement.
|
||||
|
||||
ADDITIONAL RESTRICTIONS: Notwithstanding anything in the Agreement to the
|
||||
contrary, users may not:
|
||||
|
||||
- Extract these materials from the Services or retain copies of these
|
||||
materials outside the Services
|
||||
- Reproduce or copy these materials, except for temporary copies created
|
||||
automatically during authorized use of the Services
|
||||
- Create derivative works based on these materials
|
||||
- Distribute, sublicense, or transfer these materials to any third party
|
||||
- Make, offer to sell, sell, or import any inventions embodied in these
|
||||
materials
|
||||
- Reverse engineer, decompile, or disassemble these materials
|
||||
|
||||
The receipt, viewing, or possession of these materials does not convey or
|
||||
imply any license or right beyond those expressly granted above.
|
||||
|
||||
Anthropic retains all right, title, and interest in these materials,
|
||||
including all copyrights, patents, and other intellectual property rights.
|
||||
289
skills/xlsx/SKILL.md
Normal file
289
skills/xlsx/SKILL.md
Normal file
@@ -0,0 +1,289 @@
|
||||
---
|
||||
name: xlsx
|
||||
description: "Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas"
|
||||
license: Proprietary. LICENSE.txt has complete terms
|
||||
---
|
||||
|
||||
# Requirements for Outputs
|
||||
|
||||
## All Excel files
|
||||
|
||||
### Zero Formula Errors
|
||||
- Every Excel model MUST be delivered with ZERO formula errors (#REF!, #DIV/0!, #VALUE!, #N/A, #NAME?)
|
||||
|
||||
### Preserve Existing Templates (when updating templates)
|
||||
- Study and EXACTLY match existing format, style, and conventions when modifying files
|
||||
- Never impose standardized formatting on files with established patterns
|
||||
- Existing template conventions ALWAYS override these guidelines
|
||||
|
||||
## Financial models
|
||||
|
||||
### Color Coding Standards
|
||||
Unless otherwise stated by the user or existing template
|
||||
|
||||
#### Industry-Standard Color Conventions
|
||||
- **Blue text (RGB: 0,0,255)**: Hardcoded inputs, and numbers users will change for scenarios
|
||||
- **Black text (RGB: 0,0,0)**: ALL formulas and calculations
|
||||
- **Green text (RGB: 0,128,0)**: Links pulling from other worksheets within same workbook
|
||||
- **Red text (RGB: 255,0,0)**: External links to other files
|
||||
- **Yellow background (RGB: 255,255,0)**: Key assumptions needing attention or cells that need to be updated
|
||||
|
||||
### Number Formatting Standards
|
||||
|
||||
#### Required Format Rules
|
||||
- **Years**: Format as text strings (e.g., "2024" not "2,024")
|
||||
- **Currency**: Use $#,##0 format; ALWAYS specify units in headers ("Revenue ($mm)")
|
||||
- **Zeros**: Use number formatting to make all zeros "-", including percentages (e.g., "$#,##0;($#,##0);-")
|
||||
- **Percentages**: Default to 0.0% format (one decimal)
|
||||
- **Multiples**: Format as 0.0x for valuation multiples (EV/EBITDA, P/E)
|
||||
- **Negative numbers**: Use parentheses (123) not minus -123
|
||||
|
||||
### Formula Construction Rules
|
||||
|
||||
#### Assumptions Placement
|
||||
- Place ALL assumptions (growth rates, margins, multiples, etc.) in separate assumption cells
|
||||
- Use cell references instead of hardcoded values in formulas
|
||||
- Example: Use =B5*(1+$B$6) instead of =B5*1.05
|
||||
|
||||
#### Formula Error Prevention
|
||||
- Verify all cell references are correct
|
||||
- Check for off-by-one errors in ranges
|
||||
- Ensure consistent formulas across all projection periods
|
||||
- Test with edge cases (zero values, negative numbers)
|
||||
- Verify no unintended circular references
|
||||
|
||||
#### Documentation Requirements for Hardcodes
|
||||
- Comment or in cells beside (if end of table). Format: "Source: [System/Document], [Date], [Specific Reference], [URL if applicable]"
|
||||
- Examples:
|
||||
- "Source: Company 10-K, FY2024, Page 45, Revenue Note, [SEC EDGAR URL]"
|
||||
- "Source: Company 10-Q, Q2 2025, Exhibit 99.1, [SEC EDGAR URL]"
|
||||
- "Source: Bloomberg Terminal, 8/15/2025, AAPL US Equity"
|
||||
- "Source: FactSet, 8/20/2025, Consensus Estimates Screen"
|
||||
|
||||
# XLSX creation, editing, and analysis
|
||||
|
||||
## Overview
|
||||
|
||||
A user may ask you to create, edit, or analyze the contents of an .xlsx file. You have different tools and workflows available for different tasks.
|
||||
|
||||
## Important Requirements
|
||||
|
||||
**LibreOffice Required for Formula Recalculation**: You can assume LibreOffice is installed for recalculating formula values using the `recalc.py` script. The script automatically configures LibreOffice on first run
|
||||
|
||||
## Reading and analyzing data
|
||||
|
||||
### Data analysis with pandas
|
||||
For data analysis, visualization, and basic operations, use **pandas** which provides powerful data manipulation capabilities:
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
# Read Excel
|
||||
df = pd.read_excel('file.xlsx') # Default: first sheet
|
||||
all_sheets = pd.read_excel('file.xlsx', sheet_name=None) # All sheets as dict
|
||||
|
||||
# Analyze
|
||||
df.head() # Preview data
|
||||
df.info() # Column info
|
||||
df.describe() # Statistics
|
||||
|
||||
# Write Excel
|
||||
df.to_excel('output.xlsx', index=False)
|
||||
```
|
||||
|
||||
## Excel File Workflows
|
||||
|
||||
## CRITICAL: Use Formulas, Not Hardcoded Values
|
||||
|
||||
**Always use Excel formulas instead of calculating values in Python and hardcoding them.** This ensures the spreadsheet remains dynamic and updateable.
|
||||
|
||||
### ❌ WRONG - Hardcoding Calculated Values
|
||||
```python
|
||||
# Bad: Calculating in Python and hardcoding result
|
||||
total = df['Sales'].sum()
|
||||
sheet['B10'] = total # Hardcodes 5000
|
||||
|
||||
# Bad: Computing growth rate in Python
|
||||
growth = (df.iloc[-1]['Revenue'] - df.iloc[0]['Revenue']) / df.iloc[0]['Revenue']
|
||||
sheet['C5'] = growth # Hardcodes 0.15
|
||||
|
||||
# Bad: Python calculation for average
|
||||
avg = sum(values) / len(values)
|
||||
sheet['D20'] = avg # Hardcodes 42.5
|
||||
```
|
||||
|
||||
### ✅ CORRECT - Using Excel Formulas
|
||||
```python
|
||||
# Good: Let Excel calculate the sum
|
||||
sheet['B10'] = '=SUM(B2:B9)'
|
||||
|
||||
# Good: Growth rate as Excel formula
|
||||
sheet['C5'] = '=(C4-C2)/C2'
|
||||
|
||||
# Good: Average using Excel function
|
||||
sheet['D20'] = '=AVERAGE(D2:D19)'
|
||||
```
|
||||
|
||||
This applies to ALL calculations - totals, percentages, ratios, differences, etc. The spreadsheet should be able to recalculate when source data changes.
|
||||
|
||||
## Common Workflow
|
||||
1. **Choose tool**: pandas for data, openpyxl for formulas/formatting
|
||||
2. **Create/Load**: Create new workbook or load existing file
|
||||
3. **Modify**: Add/edit data, formulas, and formatting
|
||||
4. **Save**: Write to file
|
||||
5. **Recalculate formulas (MANDATORY IF USING FORMULAS)**: Use the recalc.py script
|
||||
```bash
|
||||
python recalc.py output.xlsx
|
||||
```
|
||||
6. **Verify and fix any errors**:
|
||||
- The script returns JSON with error details
|
||||
- If `status` is `errors_found`, check `error_summary` for specific error types and locations
|
||||
- Fix the identified errors and recalculate again
|
||||
- Common errors to fix:
|
||||
- `#REF!`: Invalid cell references
|
||||
- `#DIV/0!`: Division by zero
|
||||
- `#VALUE!`: Wrong data type in formula
|
||||
- `#NAME?`: Unrecognized formula name
|
||||
|
||||
### Creating new Excel files
|
||||
|
||||
```python
|
||||
# Using openpyxl for formulas and formatting
|
||||
from openpyxl import Workbook
|
||||
from openpyxl.styles import Font, PatternFill, Alignment
|
||||
|
||||
wb = Workbook()
|
||||
sheet = wb.active
|
||||
|
||||
# Add data
|
||||
sheet['A1'] = 'Hello'
|
||||
sheet['B1'] = 'World'
|
||||
sheet.append(['Row', 'of', 'data'])
|
||||
|
||||
# Add formula
|
||||
sheet['B2'] = '=SUM(A1:A10)'
|
||||
|
||||
# Formatting
|
||||
sheet['A1'].font = Font(bold=True, color='FF0000')
|
||||
sheet['A1'].fill = PatternFill('solid', start_color='FFFF00')
|
||||
sheet['A1'].alignment = Alignment(horizontal='center')
|
||||
|
||||
# Column width
|
||||
sheet.column_dimensions['A'].width = 20
|
||||
|
||||
wb.save('output.xlsx')
|
||||
```
|
||||
|
||||
### Editing existing Excel files
|
||||
|
||||
```python
|
||||
# Using openpyxl to preserve formulas and formatting
|
||||
from openpyxl import load_workbook
|
||||
|
||||
# Load existing file
|
||||
wb = load_workbook('existing.xlsx')
|
||||
sheet = wb.active # or wb['SheetName'] for specific sheet
|
||||
|
||||
# Working with multiple sheets
|
||||
for sheet_name in wb.sheetnames:
|
||||
sheet = wb[sheet_name]
|
||||
print(f"Sheet: {sheet_name}")
|
||||
|
||||
# Modify cells
|
||||
sheet['A1'] = 'New Value'
|
||||
sheet.insert_rows(2) # Insert row at position 2
|
||||
sheet.delete_cols(3) # Delete column 3
|
||||
|
||||
# Add new sheet
|
||||
new_sheet = wb.create_sheet('NewSheet')
|
||||
new_sheet['A1'] = 'Data'
|
||||
|
||||
wb.save('modified.xlsx')
|
||||
```
|
||||
|
||||
## Recalculating formulas
|
||||
|
||||
Excel files created or modified by openpyxl contain formulas as strings but not calculated values. Use the provided `recalc.py` script to recalculate formulas:
|
||||
|
||||
```bash
|
||||
python recalc.py <excel_file> [timeout_seconds]
|
||||
```
|
||||
|
||||
Example:
|
||||
```bash
|
||||
python recalc.py output.xlsx 30
|
||||
```
|
||||
|
||||
The script:
|
||||
- Automatically sets up LibreOffice macro on first run
|
||||
- Recalculates all formulas in all sheets
|
||||
- Scans ALL cells for Excel errors (#REF!, #DIV/0!, etc.)
|
||||
- Returns JSON with detailed error locations and counts
|
||||
- Works on both Linux and macOS
|
||||
|
||||
## Formula Verification Checklist
|
||||
|
||||
Quick checks to ensure formulas work correctly:
|
||||
|
||||
### Essential Verification
|
||||
- [ ] **Test 2-3 sample references**: Verify they pull correct values before building full model
|
||||
- [ ] **Column mapping**: Confirm Excel columns match (e.g., column 64 = BL, not BK)
|
||||
- [ ] **Row offset**: Remember Excel rows are 1-indexed (DataFrame row 5 = Excel row 6)
|
||||
|
||||
### Common Pitfalls
|
||||
- [ ] **NaN handling**: Check for null values with `pd.notna()`
|
||||
- [ ] **Far-right columns**: FY data often in columns 50+
|
||||
- [ ] **Multiple matches**: Search all occurrences, not just first
|
||||
- [ ] **Division by zero**: Check denominators before using `/` in formulas (#DIV/0!)
|
||||
- [ ] **Wrong references**: Verify all cell references point to intended cells (#REF!)
|
||||
- [ ] **Cross-sheet references**: Use correct format (Sheet1!A1) for linking sheets
|
||||
|
||||
### Formula Testing Strategy
|
||||
- [ ] **Start small**: Test formulas on 2-3 cells before applying broadly
|
||||
- [ ] **Verify dependencies**: Check all cells referenced in formulas exist
|
||||
- [ ] **Test edge cases**: Include zero, negative, and very large values
|
||||
|
||||
### Interpreting recalc.py Output
|
||||
The script returns JSON with error details:
|
||||
```json
|
||||
{
|
||||
"status": "success", // or "errors_found"
|
||||
"total_errors": 0, // Total error count
|
||||
"total_formulas": 42, // Number of formulas in file
|
||||
"error_summary": { // Only present if errors found
|
||||
"#REF!": {
|
||||
"count": 2,
|
||||
"locations": ["Sheet1!B5", "Sheet1!C10"]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Library Selection
|
||||
- **pandas**: Best for data analysis, bulk operations, and simple data export
|
||||
- **openpyxl**: Best for complex formatting, formulas, and Excel-specific features
|
||||
|
||||
### Working with openpyxl
|
||||
- Cell indices are 1-based (row=1, column=1 refers to cell A1)
|
||||
- Use `data_only=True` to read calculated values: `load_workbook('file.xlsx', data_only=True)`
|
||||
- **Warning**: If opened with `data_only=True` and saved, formulas are replaced with values and permanently lost
|
||||
- For large files: Use `read_only=True` for reading or `write_only=True` for writing
|
||||
- Formulas are preserved but not evaluated - use recalc.py to update values
|
||||
|
||||
### Working with pandas
|
||||
- Specify data types to avoid inference issues: `pd.read_excel('file.xlsx', dtype={'id': str})`
|
||||
- For large files, read specific columns: `pd.read_excel('file.xlsx', usecols=['A', 'C', 'E'])`
|
||||
- Handle dates properly: `pd.read_excel('file.xlsx', parse_dates=['date_column'])`
|
||||
|
||||
## Code Style Guidelines
|
||||
**IMPORTANT**: When generating Python code for Excel operations:
|
||||
- Write minimal, concise Python code without unnecessary comments
|
||||
- Avoid verbose variable names and redundant operations
|
||||
- Avoid unnecessary print statements
|
||||
|
||||
**For Excel files themselves**:
|
||||
- Add comments to cells with complex formulas or important assumptions
|
||||
- Document data sources for hardcoded values
|
||||
- Include notes for key calculations and model sections
|
||||
178
skills/xlsx/recalc.py
Normal file
178
skills/xlsx/recalc.py
Normal file
@@ -0,0 +1,178 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Excel Formula Recalculation Script
|
||||
Recalculates all formulas in an Excel file using LibreOffice
|
||||
"""
|
||||
|
||||
import json
|
||||
import sys
|
||||
import subprocess
|
||||
import os
|
||||
import platform
|
||||
from pathlib import Path
|
||||
from openpyxl import load_workbook
|
||||
|
||||
|
||||
def setup_libreoffice_macro():
|
||||
"""Setup LibreOffice macro for recalculation if not already configured"""
|
||||
if platform.system() == 'Darwin':
|
||||
macro_dir = os.path.expanduser('~/Library/Application Support/LibreOffice/4/user/basic/Standard')
|
||||
else:
|
||||
macro_dir = os.path.expanduser('~/.config/libreoffice/4/user/basic/Standard')
|
||||
|
||||
macro_file = os.path.join(macro_dir, 'Module1.xba')
|
||||
|
||||
if os.path.exists(macro_file):
|
||||
with open(macro_file, 'r') as f:
|
||||
if 'RecalculateAndSave' in f.read():
|
||||
return True
|
||||
|
||||
if not os.path.exists(macro_dir):
|
||||
subprocess.run(['soffice', '--headless', '--terminate_after_init'],
|
||||
capture_output=True, timeout=10)
|
||||
os.makedirs(macro_dir, exist_ok=True)
|
||||
|
||||
macro_content = '''<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE script:module PUBLIC "-//OpenOffice.org//DTD OfficeDocument 1.0//EN" "module.dtd">
|
||||
<script:module xmlns:script="http://openoffice.org/2000/script" script:name="Module1" script:language="StarBasic">
|
||||
Sub RecalculateAndSave()
|
||||
ThisComponent.calculateAll()
|
||||
ThisComponent.store()
|
||||
ThisComponent.close(True)
|
||||
End Sub
|
||||
</script:module>'''
|
||||
|
||||
try:
|
||||
with open(macro_file, 'w') as f:
|
||||
f.write(macro_content)
|
||||
return True
|
||||
except Exception:
|
||||
return False
|
||||
|
||||
|
||||
def recalc(filename, timeout=30):
|
||||
"""
|
||||
Recalculate formulas in Excel file and report any errors
|
||||
|
||||
Args:
|
||||
filename: Path to Excel file
|
||||
timeout: Maximum time to wait for recalculation (seconds)
|
||||
|
||||
Returns:
|
||||
dict with error locations and counts
|
||||
"""
|
||||
if not Path(filename).exists():
|
||||
return {'error': f'File {filename} does not exist'}
|
||||
|
||||
abs_path = str(Path(filename).absolute())
|
||||
|
||||
if not setup_libreoffice_macro():
|
||||
return {'error': 'Failed to setup LibreOffice macro'}
|
||||
|
||||
cmd = [
|
||||
'soffice', '--headless', '--norestore',
|
||||
'vnd.sun.star.script:Standard.Module1.RecalculateAndSave?language=Basic&location=application',
|
||||
abs_path
|
||||
]
|
||||
|
||||
# Handle timeout command differences between Linux and macOS
|
||||
if platform.system() != 'Windows':
|
||||
timeout_cmd = 'timeout' if platform.system() == 'Linux' else None
|
||||
if platform.system() == 'Darwin':
|
||||
# Check if gtimeout is available on macOS
|
||||
try:
|
||||
subprocess.run(['gtimeout', '--version'], capture_output=True, timeout=1, check=False)
|
||||
timeout_cmd = 'gtimeout'
|
||||
except (FileNotFoundError, subprocess.TimeoutExpired):
|
||||
pass
|
||||
|
||||
if timeout_cmd:
|
||||
cmd = [timeout_cmd, str(timeout)] + cmd
|
||||
|
||||
result = subprocess.run(cmd, capture_output=True, text=True)
|
||||
|
||||
if result.returncode != 0 and result.returncode != 124: # 124 is timeout exit code
|
||||
error_msg = result.stderr or 'Unknown error during recalculation'
|
||||
if 'Module1' in error_msg or 'RecalculateAndSave' not in error_msg:
|
||||
return {'error': 'LibreOffice macro not configured properly'}
|
||||
else:
|
||||
return {'error': error_msg}
|
||||
|
||||
# Check for Excel errors in the recalculated file - scan ALL cells
|
||||
try:
|
||||
wb = load_workbook(filename, data_only=True)
|
||||
|
||||
excel_errors = ['#VALUE!', '#DIV/0!', '#REF!', '#NAME?', '#NULL!', '#NUM!', '#N/A']
|
||||
error_details = {err: [] for err in excel_errors}
|
||||
total_errors = 0
|
||||
|
||||
for sheet_name in wb.sheetnames:
|
||||
ws = wb[sheet_name]
|
||||
# Check ALL rows and columns - no limits
|
||||
for row in ws.iter_rows():
|
||||
for cell in row:
|
||||
if cell.value is not None and isinstance(cell.value, str):
|
||||
for err in excel_errors:
|
||||
if err in cell.value:
|
||||
location = f"{sheet_name}!{cell.coordinate}"
|
||||
error_details[err].append(location)
|
||||
total_errors += 1
|
||||
break
|
||||
|
||||
wb.close()
|
||||
|
||||
# Build result summary
|
||||
result = {
|
||||
'status': 'success' if total_errors == 0 else 'errors_found',
|
||||
'total_errors': total_errors,
|
||||
'error_summary': {}
|
||||
}
|
||||
|
||||
# Add non-empty error categories
|
||||
for err_type, locations in error_details.items():
|
||||
if locations:
|
||||
result['error_summary'][err_type] = {
|
||||
'count': len(locations),
|
||||
'locations': locations[:20] # Show up to 20 locations
|
||||
}
|
||||
|
||||
# Add formula count for context - also check ALL cells
|
||||
wb_formulas = load_workbook(filename, data_only=False)
|
||||
formula_count = 0
|
||||
for sheet_name in wb_formulas.sheetnames:
|
||||
ws = wb_formulas[sheet_name]
|
||||
for row in ws.iter_rows():
|
||||
for cell in row:
|
||||
if cell.value and isinstance(cell.value, str) and cell.value.startswith('='):
|
||||
formula_count += 1
|
||||
wb_formulas.close()
|
||||
|
||||
result['total_formulas'] = formula_count
|
||||
|
||||
return result
|
||||
|
||||
except Exception as e:
|
||||
return {'error': str(e)}
|
||||
|
||||
|
||||
def main():
|
||||
if len(sys.argv) < 2:
|
||||
print("Usage: python recalc.py <excel_file> [timeout_seconds]")
|
||||
print("\nRecalculates all formulas in an Excel file using LibreOffice")
|
||||
print("\nReturns JSON with error details:")
|
||||
print(" - status: 'success' or 'errors_found'")
|
||||
print(" - total_errors: Total number of Excel errors found")
|
||||
print(" - total_formulas: Number of formulas in the file")
|
||||
print(" - error_summary: Breakdown by error type with locations")
|
||||
print(" - #VALUE!, #DIV/0!, #REF!, #NAME?, #NULL!, #NUM!, #N/A")
|
||||
sys.exit(1)
|
||||
|
||||
filename = sys.argv[1]
|
||||
timeout = int(sys.argv[2]) if len(sys.argv) > 2 else 30
|
||||
|
||||
result = recalc(filename, timeout)
|
||||
print(json.dumps(result, indent=2))
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
Reference in New Issue
Block a user