Files
gh-francyjglisboa-agent-ski…/references/quality-standards.md
2025-11-29 18:27:25 +08:00

938 lines
20 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Mandatory Quality Standards
## Fundamental Principles
**Production-Ready, Not Prototype**
- Code must work without modifications
- Doesn't need "now implement X"
- Can be used immediately
**Functional, Not Placeholder**
- Complete code in all functions
- No TODO, pass, NotImplementedError
- Robust error handling
**Useful, Not Generic**
- Specific and detailed content
- Concrete examples, not abstract
- Not just external links
---
## Standards by File Type
### Python Scripts
#### ✅ MANDATORY
**1. Complete structure**:
```python
#!/usr/bin/env python3
"""Module docstring"""
# Imports
import ...
# Constants
CONST = value
# Classes/Functions
class/def ...
# Main
def main():
...
if __name__ == "__main__":
main()
```
**2. Docstrings**:
- Module docstring: 3-5 lines
- Class docstring: Description + Example
- Method docstring: Args, Returns, Raises, Example
**3. Type hints**:
```python
def function(param1: str, param2: int = 10) -> Dict[str, Any]:
...
```
**4. Error handling**:
```python
try:
result = risky_operation()
except SpecificError as e:
# Handle specifically
log_error(e)
raise CustomError(f"Context: {e}")
```
**5. Validations**:
```python
def process(data: Dict) -> pd.DataFrame:
# Validate input
if not data:
raise ValueError("Data cannot be empty")
if 'required_field' not in data:
raise ValueError("Missing required field")
# Process
...
# Validate output
assert len(result) > 0, "Result cannot be empty"
assert result['value'].notna().all(), "No null values allowed"
return result
```
**6. Appropriate logging**:
```python
import logging
logger = logging.getLogger(__name__)
def fetch_data():
logger.info("Fetching data from API...")
# ...
logger.debug(f"Received {len(data)} records")
# ...
logger.error(f"API error: {e}")
```
#### ❌ FORBIDDEN
```python
# ❌ DON'T DO THIS:
def analyze():
# TODO: implement analysis
pass
def process(data): # ❌ No type hints
# ❌ No docstring
result = data # ❌ No real logic
return result # ❌ No validation
def fetch_api(url):
response = requests.get(url) # ❌ No timeout
return response.json() # ❌ No error handling
```
#### ✅ DO THIS:
```python
def analyze_yoy(df: pd.DataFrame, commodity: str, year1: int, year2: int) -> Dict:
"""
Perform year-over-year analysis
Args:
df: DataFrame with parsed data
commodity: Commodity name (e.g., "CORN")
year1: Current year
year2: Previous year
Returns:
Dict with keys:
- production_current: float
- production_previous: float
- change_percent: float
- interpretation: str
Raises:
ValueError: If data not found for specified years
DataQualityError: If data fails validation
Example:
>>> analyze_yoy(df, "CORN", 2023, 2022)
{'production_current': 15.3, 'change_percent': 11.7, ...}
"""
# Validate inputs
if commodity not in df['commodity'].unique():
raise ValueError(f"Commodity {commodity} not found in data")
# Filter data
df1 = df[(df['commodity'] == commodity) & (df['year'] == year1)]
df2 = df[(df['commodity'] == commodity) & (df['year'] == year2)]
if len(df1) == 0 or len(df2) == 0:
raise ValueError(f"Data not found for {commodity} in {year1} or {year2}")
# Extract values
prod1 = df1['production'].iloc[0]
prod2 = df2['production'].iloc[0]
# Calculate
change = prod1 - prod2
change_pct = (change / prod2) * 100
# Interpret
if abs(change_pct) < 2:
interpretation = "stable"
elif change_pct > 10:
interpretation = "significant_increase"
elif change_pct > 2:
interpretation = "moderate_increase"
elif change_pct < -10:
interpretation = "significant_decrease"
else:
interpretation = "moderate_decrease"
# Return
return {
"commodity": commodity,
"production_current": round(prod1, 1),
"production_previous": round(prod2, 1),
"change_absolute": round(change, 1),
"change_percent": round(change_pct, 1),
"interpretation": interpretation
}
```
---
### SKILL.md
#### ✅ MANDATORY
**1. Valid frontmatter**:
```yaml
---
name: agent-name
description: [150-250 words with keywords]
---
```
**2. Size**: 5000-7000 words
**3. Mandatory sections**:
- When to use (specific triggers)
- Data source (detailed API)
- Workflows (complete step-by-step)
- Scripts (each one explained)
- Analyses (methodologies)
- Errors (complete handling)
- Validations (mandatory)
- Keywords (complete list)
- Examples (5+ complete)
**4. Detailed workflows**:
**GOOD**:
```markdown
### Workflow: YoY Comparison
1. **Identify question parameters**
- Commodity: [extract from question]
- Years: Current vs previous (or specified)
2. **Fetch data**
```bash
python scripts/fetch_nass.py \
--commodity CORN \
--years 2023,2022 \
--output data/raw/corn_2023_2022.json
```
3. **Parse**
```bash
python scripts/parse_nass.py \
--input data/raw/corn_2023_2022.json \
--output data/processed/corn.csv
```
4. **Analyze**
```bash
python scripts/analyze_nass.py \
--input data/processed/corn.csv \
--analysis yoy \
--commodity CORN \
--year1 2023 \
--year2 2022 \
--output data/analysis/corn_yoy.json
```
5. **Interpret results**
File `data/analysis/corn_yoy.json` contains:
```json
{
"production_current": 15.3,
"change_percent": 11.7,
"interpretation": "significant_increase"
}
```
Respond to user:
"Corn production grew 11.7% in 2023..."
```
**BAD**:
```markdown
### Workflow: Comparison
1. Get data
2. Compare
3. Return result
```
**5. Complete examples**:
**GOOD**:
```markdown
### Example 1: YoY Comparison
**Question**: "How's corn production compared to last year?"
**Executed flow**:
[Specific commands with outputs]
**Generated answer**:
"Corn production in 2023 is 15.3 billion bushels,
growth of 11.7% vs 2022 (13.7 billion). Growth
comes mainly from area increase (+8%) with stable yield."
```
**BAD**:
```markdown
### Example: Comparison
User asks about comparison. Agent compares and responds.
```
#### ❌ FORBIDDEN
- Empty sections
- "See documentation"
- Workflows without specific commands
- Generic examples
---
### References
#### ✅ MANDATORY
**1. Useful and self-contained content**:
**GOOD** (references/api-guide.md):
```markdown
## Endpoint: Get Production Data
**URL**: `GET https://quickstats.nass.usda.gov/api/api_GET/`
**Parameters**:
- `commodity_desc`: Commodity name
- Example: "CORN", "SOYBEANS"
- Case-sensitive
- `year`: Desired year
- Example: 2023
- Range: 1866-present
**Complete request example**:
```bash
curl -H "X-Api-Key: YOUR_KEY" \
"https://quickstats.nass.usda.gov/api/api_GET/?commodity_desc=CORN&year=2023&format=JSON"
```
**Expected response**:
```json
{
"data": [
{
"year": 2023,
"commodity_desc": "CORN",
"value": "15,300,000,000",
"unit_desc": "BU"
}
]
}
```
**Important fields**:
- `value`: Comes as STRING with commas
- Solution: `value.replace(',', '')`
- Convert to float after
```
❌ **BAD**:
```markdown
## API Endpoint
For details on how to use the API, consult the official documentation at:
https://quickstats.nass.usda.gov/api
[End of file]
```
**2. Adequate size**:
- API guide: 1500-2000 words
- Analysis methods: 2000-3000 words
- Troubleshooting: 1000-1500 words
**3. Concrete examples**:
- Always include examples with real values
- Executable code blocks
- Expected outputs
#### ❌ FORBIDDEN
- "For more information, see [link]"
- Sections with only 2-3 lines
- Lists without details
- Circular references ("see other doc that sees other doc")
---
### Assets (Configs)
#### ✅ MANDATORY
**1. Syntactically valid JSON**:
```bash
# ALWAYS validate:
python -c "import json; json.load(open('config.json'))"
```
**2. Real values**:
**GOOD**:
```json
{
"api": {
"base_url": "https://quickstats.nass.usda.gov/api",
"api_key_env": "NASS_API_KEY",
"_instructions": "Get free API key from: https://quickstats.nass.usda.gov/api#registration",
"rate_limit_per_day": 1000,
"timeout_seconds": 30
}
}
```
**BAD**:
```json
{
"api": {
"base_url": "YOUR_API_URL_HERE",
"api_key": "YOUR_KEY_HERE"
}
}
```
**3. Inline comments** (using `_comment` or `_note`):
```json
{
"_comment": "Differentiated TTL by data type",
"cache": {
"ttl_historical_days": 365,
"_note_historical": "Historical data doesn't change",
"ttl_current_days": 7,
"_note_current": "Current year data may be revised"
}
}
```
---
### README.md
#### ✅ MANDATORY
**1. Complete installation instructions**:
**GOOD**:
```markdown
## Installation
### 1. Get API Key (Free)
1. Access https://quickstats.nass.usda.gov/api#registration
2. Fill form:
- Name: [your name]
- Email: [your email]
- Purpose: "Personal research"
3. Click "Submit"
4. You'll receive email with API key in ~1 minute
5. Key format: `A1B2C3D4-E5F6-G7H8-I9J0-K1L2M3N4O5P6`
### 2. Configure Environment
**Option A - Export** (temporary):
```bash
export NASS_API_KEY="your_key_here"
```
**Option B - .bashrc/.zshrc** (permanent):
```bash
echo 'export NASS_API_KEY="your_key_here"' >> ~/.bashrc
source ~/.bashrc
```
**Option C - .env file** (per project):
```bash
echo "NASS_API_KEY=your_key_here" > .env
```
### 3. Install Dependencies
```bash
cd nass-usda-agriculture
pip install -r requirements.txt
```
Requirements:
- requests
- pandas
- numpy
```
❌ **BAD**:
```markdown
## Installation
1. Get API key from the official website
2. Configure environment
3. Install dependencies
4. Done!
```
**2. Concrete usage examples**:
**GOOD**:
```markdown
## Examples
### Example 1: Current Production
```
You: "What's US corn production in 2023?"
Claude: "Corn production in 2023 was 15.3 billion
bushels (389 million metric tons)..."
```
### Example 2: YoY Comparison
```
You: "Compare soybeans this year vs last year"
Claude: "Soybean production in 2023 is 2.6% below 2022:
- 2023: 4.165 billion bushels
- 2022: 4.276 billion bushels
- Drop from area (-4.5%), yield improved (+0.8%)"
```
[3-5 more examples]
```
**BAD**:
```markdown
## Usage
Ask questions about agriculture and the agent will respond.
```
**3. Specific troubleshooting**:
**GOOD**:
```markdown
### Error: "NASS_API_KEY environment variable not found"
**Cause**: API key not configured
**Step-by-step solution**:
1. Verify key was obtained: https://...
2. Configure environment:
```bash
export NASS_API_KEY="your_key_here"
```
3. Verify:
```bash
echo $NASS_API_KEY
```
4. Should show your key
5. If doesn't work, restart terminal
**Still not working?**
- Check for extra spaces in key
- Verify key hasn't expired (validity: 1 year)
- Re-generate key if needed
```
---
## Quality Checklist
### Per Python Script
- [ ] Shebang: `#!/usr/bin/env python3`
- [ ] Module docstring (3-5 lines)
- [ ] Organized imports (stdlib, 3rd party, local)
- [ ] Constants at top (if applicable)
- [ ] Type hints in all public functions
- [ ] Docstrings in classes (description + attributes + example)
- [ ] Docstrings in methods (Args, Returns, Raises, Example)
- [ ] Error handling for risky operations
- [ ] Input validations
- [ ] Output validations
- [ ] Appropriate logging
- [ ] Main function with argparse
- [ ] if __name__ == "__main__"
- [ ] Functional code (no TODO/pass)
- [ ] Valid syntax (test: `python -m py_compile script.py`)
### Per SKILL.md
- [ ] Frontmatter with name and description
- [ ] Description 150-250 characters with keywords
- [ ] Size 5000+ words
- [ ] "When to Use" section with specific triggers
- [ ] "Data Source" section detailed
- [ ] Step-by-step workflows with commands
- [ ] Scripts explained individually
- [ ] Analyses documented (objective, methodology)
- [ ] Errors handled (all expected)
- [ ] Validations listed
- [ ] Performance/cache explained
- [ ] Complete keywords
- [ ] Complete examples (5+)
### Per Reference File
- [ ] 1000+ words
- [ ] Useful content (not just links)
- [ ] Concrete examples with real values
- [ ] Executable code blocks
- [ ] Well structured (headings, lists)
- [ ] No empty sections
- [ ] No "TODO: write"
### Per Asset (Config)
- [ ] Syntactically valid JSON (validate!)
- [ ] Real values (not "YOUR_X_HERE" without context)
- [ ] Inline comments (_comment, _note)
- [ ] Instructions for values user must fill
- [ ] Logical and organized structure
### Per README.md
- [ ] Step-by-step installation
- [ ] How to get API key (detailed)
- [ ] How to configure (3 options)
- [ ] How to install dependencies
- [ ] How to install in Claude Code
- [ ] Usage examples (5+)
- [ ] Troubleshooting (10+ problems)
- [ ] License
- [ ] Contact/contribution (if applicable)
### Complete Agent
- [ ] DECISIONS.md documents all choices
- [ ] **VERSION** file created (e.g. 1.0.0)
- [ ] **CHANGELOG.md** created with complete v1.0.0 entry
- [ ] **INSTALACAO.md** with complete didactic tutorial
- [ ] **comprehensive_{domain}_report()** implemented
- [ ] marketplace.json with version field
- [ ] 18+ files created
- [ ] ~1500+ lines of Python code
- [ ] ~10,000+ words of documentation
- [ ] 2+ configs
- [ ] requirements.txt
- [ ] .gitignore (if needed)
- [ ] No placeholder/TODO
- [ ] Valid syntax (Python, JSON, YAML)
- [ ] Ready to use (production-ready)
---
## Quality Examples
### Example: Error Handling
**BAD**:
```python
def fetch(url):
return requests.get(url).json()
```
**GOOD**:
```python
def fetch(url: str, timeout: int = 30) -> Dict:
"""
Fetch data from URL with error handling
Args:
url: URL to fetch
timeout: Timeout in seconds
Returns:
JSON response as dict
Raises:
NetworkError: If connection fails
TimeoutError: If request times out
APIError: If API returns error
"""
try:
response = requests.get(url, timeout=timeout)
response.raise_for_status()
data = response.json()
if 'error' in data:
raise APIError(f"API error: {data['error']}")
return data
except requests.Timeout:
raise TimeoutError(f"Request timed out after {timeout}s")
except requests.ConnectionError as e:
raise NetworkError(f"Connection failed: {e}")
except requests.HTTPError as e:
if e.response.status_code == 429:
raise RateLimitError("Rate limit exceeded")
else:
raise APIError(f"HTTP {e.response.status_code}: {e}")
```
### Example: Validations
**BAD**:
```python
def parse(data):
df = pd.DataFrame(data)
return df
```
**GOOD**:
```python
def parse(data: List[Dict]) -> pd.DataFrame:
"""Parse and validate data"""
# Validate input
if not data:
raise ValueError("Data cannot be empty")
if not isinstance(data, list):
raise TypeError(f"Expected list, got {type(data)}")
# Parse
df = pd.DataFrame(data)
# Validate schema
required_cols = ['year', 'commodity', 'value']
missing = set(required_cols) - set(df.columns)
if missing:
raise ValueError(f"Missing required columns: {missing}")
# Validate types
df['year'] = pd.to_numeric(df['year'], errors='raise')
df['value'] = pd.to_numeric(df['value'], errors='raise')
# Validate ranges
current_year = datetime.now().year
if (df['year'] > current_year).any():
raise ValueError(f"Future years found (max allowed: {current_year})")
if (df['value'] < 0).any():
raise ValueError("Negative values found")
# Validate no duplicates
if df.duplicated(subset=['year', 'commodity']).any():
raise ValueError("Duplicate records found")
return df
```
### Example: Docstrings
**BAD**:
```python
def analyze(df, commodity):
"""Analyze data"""
# ...
```
**GOOD**:
```python
def analyze_yoy(
df: pd.DataFrame,
commodity: str,
year1: int,
year2: int
) -> Dict[str, Any]:
"""
Perform year-over-year comparison analysis
Compares production, area, and yield between two years
and decomposes growth into area vs yield contributions.
Args:
df: DataFrame with columns ['year', 'commodity', 'production', 'area', 'yield']
commodity: Commodity name (e.g., "CORN", "SOYBEANS")
year1: Current year to compare
year2: Previous year to compare against
Returns:
Dict containing:
- production_current (float): Production in year1 (million units)
- production_previous (float): Production in year2
- change_absolute (float): Absolute change
- change_percent (float): Percent change
- decomposition (dict): Area vs yield contribution
- interpretation (str): "increase", "decrease", or "stable"
Raises:
ValueError: If commodity not found in data
ValueError: If either year not found in data
DataQualityError: If production != area * yield (tolerance > 1%)
Example:
>>> df = pd.DataFrame([
... {'year': 2023, 'commodity': 'CORN', 'production': 15.3, 'area': 94.6, 'yield': 177},
... {'year': 2022, 'commodity': 'CORN', 'production': 13.7, 'area': 89.2, 'yield': 173}
... ])
>>> result = analyze_yoy(df, "CORN", 2023, 2022)
>>> result['change_percent']
11.7
"""
# [Complete implementation]
```
---
## Anti-Patterns
### Anti-Pattern 1: Partial Implementation
**NO**:
```python
def yoy_comparison(df, commodity, year1, year2):
# Implement YoY comparison
pass
def state_ranking(df, commodity):
# TODO: implement ranking
raise NotImplementedError()
```
**YES**:
```python
# [Complete and functional code for BOTH functions]
```
### Anti-Pattern 2: Empty References
**NO**:
```markdown
# Analysis Methods
## YoY Comparison
This method compares two years.
## Ranking
This method ranks states.
```
**YES**:
```markdown
# Analysis Methods
## YoY Comparison
### Objective
Compare metrics between current and previous year...
### Detailed Methodology
**Formulas**:
```
Δ X = X(t) - X(t-1)
Δ X% = (Δ X / X(t-1)) × 100
```
**Decomposition** (for production):
[Complete mathematics]
**Interpretation**:
- |Δ| < 2%: Stable
- Δ > 10%: Significant increase
[...]
### Validations
[List]
### Complete Numerical Example
[With real values]
```
### Anti-Pattern 3: Useless Configs
**NO**:
```json
{
"api_url": "INSERT_URL",
"api_key": "INSERT_KEY"
}
```
**YES**:
```json
{
"_comment": "Configuration for NASS USDA Agent",
"api": {
"base_url": "https://quickstats.nass.usda.gov/api",
"_note": "This is the official USDA NASS API base URL",
"api_key_env": "NASS_API_KEY",
"_key_instructions": "Get free API key from: https://quickstats.nass.usda.gov/api#registration"
}
}
```
---
## Final Validation
Before delivering to user, verify:
### Sanity Test
```bash
# 1. Python syntax
find scripts -name "*.py" -exec python -m py_compile {} \;
# 2. JSON syntax
python -c "import json; json.load(open('assets/config.json'))"
# 3. Imports make sense
grep -r "^import\|^from" scripts/*.py | sort | uniq
# Verify all libs are: stdlib, requests, pandas, numpy
# No imports of uninstalled libs
# 4. SKILL.md has frontmatter
head -5 SKILL.md | grep "^---$"
# 5. SKILL.md size
wc -w SKILL.md
# Should be > 5000 words
```
### Final Checklist
- [ ] Syntax check passed (Python, JSON)
- [ ] No import of non-existent lib
- [ ] No TODO or pass
- [ ] SKILL.md > 5000 words
- [ ] References with content
- [ ] README with complete instructions
- [ ] DECISIONS.md created
- [ ] requirements.txt created