# Phase 1: Discovery and API Research ## Objective Research and **DECIDE** autonomously which API or data source to use for the agent. ## Detailed Process ### Step 1: Identify Domain From user input, extract the main domain: | User Input | Identified Domain | |------------------|---------------------| | "US crop data" | Agriculture (US) | | "stock market analysis" | Finance / Stock Market | | "global climate data" | Climate / Meteorology | | "economic indicators" | Economy / Macro | | "commodity data" | Trading / Commodities | ### Step 2: Search Available APIs For the identified domain, use WebSearch to find public APIs: **Search queries**: ``` "[domain] API free public data" "[domain] government API documentation" "best API for [domain] historical data" "[domain] open data sources" ``` **Example (US agriculture)**: ```bash WebSearch: "US agriculture API free historical data" WebSearch: "USDA API documentation" WebSearch: "agricultural statistics API United States" ``` **Typical result**: 5-10 candidate APIs ### Step 3: Research Documentation For each candidate API, use WebFetch to load: - Homepage/overview - Getting started guide - API reference - Rate limits and pricing **Extract information**: ```markdown ## API 1: [Name] **URL**: [base URL] **Docs**: [docs URL] **Authentication**: - Type: API key / OAuth / None - Cost: Free / Paid - How to obtain: [steps] **Available Data**: - Temporal coverage: [from when to when] - Geographic coverage: [countries, regions] - Metrics: [list] - Granularity: [daily, monthly, annual] **Limitations**: - Rate limit: [requests per day/hour] - Max records: [per request] - Throttling: [yes/no] **Quality**: - Source: [official government / private] - Reliability: [high/medium/low] - Update frequency: [frequency] **Documentation**: - Quality: [excellent/good/poor] ### Step 4: API Capability Inventory (NEW v2.0 - CRITICAL!) **OBJECTIVE:** Ensure the skill uses 100% of API capabilities, not just the basics! **LEARNING:** us-crop-monitor v1.0 used only CONDITION (1 of 5 NASS metrics). v2.0 had to add PROGRESS, YIELD, PRODUCTION, AREA (+3,500 lines of rework). **Process:** **Step 4.1: Complete Inventory** For the chosen API, catalog ALL data types: ```markdown ## Complete Inventory - {API Name} **Available Metrics/Endpoints:** | Endpoint/Metric | Returns | Granularity | Coverage | Value | |-----------------|---------------|---------------|-----------|-------| | {metric1} | {description} | {daily/weekly}| {geo} | ⭐⭐⭐⭐⭐ | | {metric2} | {description} | {monthly} | {geo} | ⭐⭐⭐⭐⭐ | | {metric3} | {description} | {annual} | {geo} | ⭐⭐⭐⭐ | ... **Real Example (NASS):** | Metric Type | Data | Frequency | Value | Implement? | |----------------|--------------------| ----------|----------|------------| | CONDITION | Quality ratings | Weekly | ⭐⭐⭐⭐⭐ | ✅ YES | | PROGRESS | % planted/harvested| Weekly | ⭐⭐⭐⭐⭐ | ✅ YES | | YIELD | Bu/acre | Monthly | ⭐⭐⭐⭐⭐ | ✅ YES | | PRODUCTION | Total bushels | Monthly | ⭐⭐⭐⭐⭐ | ✅ YES | | AREA | Acres planted | Annual | ⭐⭐⭐⭐ | ✅ YES | | PRICE | $/bushel | Monthly | ⭐⭐⭐ | ⚪ v2.0 | ``` **Step 4.2: Coverage Decision** **GOLDEN RULE:** - If metric has ⭐⭐⭐⭐ or ⭐⭐⭐⭐⭐ value → Implement in v1.0 - If API has 5 high-value metrics → Implement all 5! - Never leave >50% of API unused without strong justification **Step 4.3: Document Decision** In DECISIONS.md: ```markdown ## API Coverage Decision API {name} offers {N} types of metrics. **Implemented in v1.0 ({X} of {N}):** - {metric1} - {justification} - {metric2} - {justification} ... **Not implemented ({Y} of {N}):** - {metricZ} - {why not} (planned for v2.0) **Coverage:** {X/N * 100}% = {evaluation} - If < 70%: Clearly explain why low coverage - If > 70%: ✅ Good coverage ``` **Output of this phase:** Exact list of all `get_*()` methods to implement - Examples: [many/few/none] - SDKs: [Python/R/None] **Ease of Use**: - Format: JSON / CSV / XML - Structure: [simple/complex] - Quirks: [any strange behavior?] ``` ### Step 4: Compare Options Create comparison table: | API | Coverage | Cost | Rate Limit | Quality | Docs | Ease | Score | |-----|-----------|-------|------------|-----------|------|------------|-------| | API 1 | ⭐⭐⭐⭐⭐ | Free | 1000/day | Official | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | 9.2/10 | | API 2 | ⭐⭐⭐⭐ | $49/mo | Unlimited | Private | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | 7.8/10 | | API 3 | ⭐⭐⭐ | Free | 100/day | Private | ⭐⭐ | ⭐⭐⭐ | 5.5/10 | **Scoring criteria**: - Coverage (fit with need): 30% weight - Cost (prefer free): 20% weight - Rate limit (sufficient?): 15% weight - Quality (official > private): 15% weight - Documentation (facilitates implementation): 10% weight - Ease of use (format, structure): 10% weight ### Step 5: DECIDE **Consider user constraints**: - Mentioned "free"? → Eliminate paid options - Mentioned "10+ years historical data"? → Check coverage - Mentioned "real-time"? → Prioritize streaming APIs **Apply logic**: 1. Eliminate APIs that violate constraints 2. Of remaining, choose highest score 3. If tie, prefer: - Official > private - Better documentation - Easier to use **FINAL DECISION**: ```markdown ## Selected API: [API Name] **Score**: X.X/10 **Justification**: - ✅ Coverage: [specific details] - ✅ Cost: [free/paid + details] - ✅ Rate limit: [number] requests/day (sufficient for [estimated usage]) - ✅ Quality: [official/private + reliability] - ✅ Documentation: [quality + examples] - ✅ Ease of use: [format, structure] **Fit with requirements**: - Constraint 1 (e.g., free): ✅ Met - Constraint 2 (e.g., 10+ years history): ✅ Met (since [year]) - Primary need (e.g., crop production): ✅ Covered **Alternatives Considered**: **API X**: Score 7.5/10 - Rejected because: [specific reason] - Trade-off: [what we lose vs gain] **API Y**: Score 6.2/10 - Rejected because: [reason] **Conclusion**: [API Name] is the best option because [1-2 sentence synthesis]. ``` ### Step 6: Research Technical Details After deciding, dive deep into documentation: **Load via WebFetch**: - Getting started guide - Complete API reference - Authentication guide - Rate limiting details - Best practices **Extract for implementation**: ```markdown ## Technical Details - [API] ### Authentication **Method**: API key in header **Header**: `X-Api-Key: YOUR_KEY` **Obtaining key**: 1. [step 1] 2. [step 2] 3. [step 3] ### Main Endpoints **Endpoint 1**: [Name] - **URL**: `GET https://api.example.com/v1/endpoint` - **Parameters**: - `param1` (required): [description, type, example] - `param2` (optional): [description, type, default] - **Response** (200 OK): ```json { "data": [...], "meta": {...} } ``` - **Errors**: - 400: [when occurs, how to handle] - 401: [when occurs, how to handle] - 429: [rate limit, how to handle] **Example request**: ```bash curl -H "X-Api-Key: YOUR_KEY" \ "https://api.example.com/v1/endpoint?param1=value" ``` [Repeat for all relevant endpoints] ### Rate Limiting - Limit: [number] requests per [period] - Response headers: - `X-RateLimit-Limit`: Total limit - `X-RateLimit-Remaining`: Remaining requests - `X-RateLimit-Reset`: Reset timestamp - Behavior when exceeded: [429 error, throttling, ban?] - Best practice: [how to implement rate limiting] ### Quirks and Gotchas **Quirk 1**: Values come as strings with formatting - Example: `"2,525,000"` instead of `2525000` - Solution: Remove commas before converting **Quirk 2**: Suppressed data marked as "(D)" - Meaning: Withheld to avoid disclosing data - Solution: Treat as NULL, signal to user **Quirk 3**: [other non-obvious behavior] - Solution: [how to handle] ### Performance Tips - Historical data doesn't change → cache permanently - Recent data may be revised → short cache (7 days) - Use pagination parameters if large response - Make parallel requests when possible (respecting rate limit) ``` ### Step 7: Document for Later Use Save everything in `references/api-guide.md` of the agent to be created. ## Discovery Examples ### Example 1: US Agriculture **Input**: "US crop data" **Research**: ``` WebSearch: "USDA API agricultural data" → Found: NASS QuickStats, ERS, FAS WebFetch: https://quickstats.nass.usda.gov/api → Free, data since 1866, 1000/day rate limit WebFetch: https://www.ers.usda.gov/developer/ → Free, economic focus, less granular WebFetch: https://apps.fas.usda.gov/api → International focus, not domestic ``` **Comparison**: | API | Coverage (US domestic) | Cost | Production Data | Score | |-----|---------------------------|-------|-------------------|-------| | NASS | ⭐⭐⭐⭐⭐ (excellent) | Free | ⭐⭐⭐⭐⭐ | 9.5/10 | | ERS | ⭐⭐⭐⭐ (good) | Free | ⭐⭐⭐ (economic) | 7.0/10 | | FAS | ⭐⭐ (international) | Free | ⭐⭐ (global) | 4.0/10 | **DECISION**: NASS QuickStats API - Best coverage for US domestic agriculture - Free with reasonable rate limit - Complete production, area, yield data ### Example 2: Stock Market **Input**: "technical stock analysis" **Research**: ``` WebSearch: "stock market API free historical data" → Alpha Vantage, Yahoo Finance, IEX Cloud, Polygon.io WebFetch: Alpha Vantage docs → Free, 5 requests/min, 500/day WebFetch: Yahoo Finance (yfinance) → Free, unlimited but unofficial WebFetch: IEX Cloud → Freemium, good docs, 50k free credits/month ``` **Comparison**: | API | Data | Cost | Rate Limit | Official | Score | |-----|-------|-------|------------|---------|-------| | Alpha Vantage | Complete | Free | 500/day | ⭐⭐⭐ | 8.0/10 | | Yahoo Finance | Complete | Free | Unlimited | ❌ Unofficial | 7.5/10 | | IEX Cloud | Excellent | Freemium | 50k/month | ⭐⭐⭐⭐ | 8.5/10 | **DECISION**: IEX Cloud (free tier) - Official and reliable - 50k requests/month sufficient - Excellent documentation - Complete data (OHLCV + volume) ### Example 3: Global Climate **Input**: "global climate data" **Research**: ``` WebSearch: "weather API historical data global" → NOAA, OpenWeather, Weather.gov, Meteostat [Research each one...] ``` **DECISION**: NOAA Climate Data Online (CDO) API - Official (US government) - Free - Global and historical coverage (1900+) - Rate limit: 1000/day ## Decision Documentation Create `DECISIONS.md` file in agent: ```markdown # Architecture Decisions ## Date: [creation date] ## Phase 1: API Selection ### Chosen API **[API Name]** ### Selection Process **APIs Researched**: [list] **Evaluation Criteria**: 1. Data coverage (fit with need) 2. Cost (preference for free) 3. Rate limits (viability) 4. Quality (official > private) 5. Documentation (facilitates development) ### Comparison [Comparison table] ### Final Justification [2-3 paragraphs explaining why this API was chosen] ### Trade-offs **What we gain**: - [benefit 1] - [benefit 2] **What we lose** (vs alternatives): - [accepted limitation 1] - [accepted limitation 2] ### Technical Details [Summary of endpoints, authentication, rate limits, etc] **Complete documentation**: See `references/api-guide.md` ``` ## Phase 1 Checklist Before proceeding to Phase 2, verify: - [ ] Research completed (WebSearch + WebFetch) - [ ] Minimum 3 APIs compared - [ ] Decision made with clear justification - [ ] User constraints respected - [ ] Technical details extracted - [ ] DECISIONS.md created - [ ] Ready for analysis design