5.9 KiB
Phase 2: Analysis Design
Objective
DEFINE autonomously which analyses the agent will perform and how.
Detailed Process
Step 1: Brainstorm Use Cases
From the workflow described by the user, think of typical questions they will ask.
Technique: "If I were this user, what would I ask?"
Example (US agriculture):
User said: "download crop data, compare year vs year, make rankings"
Questions they likely ask:
- "What's the corn production in 2023?"
- "How's soybean compared to last year?"
- "Did production grow or fall?"
- "How much did it grow?"
- "Does growth come from area or productivity?"
- "Which states produce most wheat?"
- "Top 5 soybean producers"
- "Did the ranking change vs last year?"
- "Production trend last 5 years?"
- "Forecast for next year?"
- "Average US yield"
- "Which state has best productivity?"
- "Did planted area increase?"
- "Compare Midwest vs South"
- "Production by region"
Objective: List 15-20 typical questions
Step 2: Group by Analysis Type
Group similar questions:
Group 1: Simple Queries (fetching + formatting)
- Questions: 1, 11, 13
- Required analysis: Data Retrieval
- Complexity: Low
Group 2: Temporal Comparisons (YoY)
- Questions: 2, 3, 4, 5
- Required analysis: YoY Comparison + Decomposition
- Complexity: Medium
Group 3: Rankings (sorting + share)
- Questions: 6, 7, 8
- Required analysis: State Ranking
- Complexity: Medium
Group 4: Trends (time series)
- Questions: 9
- Required analysis: Trend Analysis
- Complexity: Medium-High
Group 5: Projections (forecasting)
- Questions: 10
- Required analysis: Forecasting
- Complexity: High
Group 6: Geographic Aggregations
- Questions: 12, 14, 15
- Required analysis: Regional Aggregation
- Complexity: Medium
Step 3: Prioritize Analyses
Prioritization criteria:
- Frequency of use (based on described workflow)
- Analytical value (insight vs effort)
- Implementation complexity (easier first)
- Dependencies (does one analysis depend on another?)
Scoring:
| Analysis | Frequency | Value | Ease | Score |
|---|---|---|---|---|
| YoY Comparison | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | 9.3/10 |
| State Ranking | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | 9.3/10 |
| Regional Agg | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | 8.0/10 |
| Trend Analysis | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | 7.3/10 |
| Data Retrieval | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | 8.3/10 |
| Forecasting | ⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐ | 5.3/10 |
DECISION: Implement top 5
- YoY Comparison (9.3)
- State Ranking (9.3)
- Data Retrieval (8.3)
- Regional Aggregation (8.0)
- Trend Analysis (7.3)
Don't implement initially (can add later):
- Forecasting (5.3) - complex, occasional use
Step 4: Specify Each Analysis
For each selected analysis:
## Analysis: [Name]
**Objective**: [What it does in 1 sentence]
**When to use**: [Types of questions that trigger it]
**Required inputs**:
- Input 1: [type, description]
- Input 2: [type, description]
**Expected outputs**:
- Output 1: [type, description]
- Output 2: [type, description]
**Methodology**:
[Explanation in natural language]
**Formulas**:
Formula 1 = ... Formula 2 = ...
**Data transformations**:
1. [Transformation 1]
2. [Transformation 2]
**Validations**:
- Validation 1: [criteria]
- Validation 2: [criteria]
**Interpretation**:
- If result > X: [interpretation]
- If result < Y: [interpretation]
**Concrete example**:
Input:
- Commodity: Corn
- Year current: 2023
- Year previous: 2022
Processing:
- Fetch 2023 production: 15.3B bu
- Fetch 2022 production: 13.7B bu
- Calculate: (15.3 - 13.7) / 13.7 = +11.7%
Output:
```json
{
"commodity": "CORN",
"year_current": 2023,
"year_previous": 2022,
"production_current": 15.3,
"production_previous": 13.7,
"change_absolute": 1.6,
"change_percent": 11.7,
"interpretation": "significant_increase"
}
Response to user: "Corn production grew 11.7% in 2023 vs 2022 (15.3B bu vs 13.7B bu)."
### Step 5: Specify Methodologies
For quantitative analyses, detail methodology:
**Example: YoY Decomposition**
```markdown
### Growth Decomposition
**Objective**: Understand how much of production growth comes from:
- Planted area increase (extensive)
- Productivity/yield increase (intensive)
**Mathematics**:
Production = Area × Yield
Δ Production = Δ Area × Yield(t-1) + Area(t-1) × Δ Yield + Δ Area × Δ Yield
Interaction term usually small, so approximation:
Δ Production ≈ Δ Area × Yield(t-1) + Area(t-1) × Δ Yield
**Percentage contributions**:
Contrib_Area = (Δ Area% / Δ Production%) × 100
Contrib_Yield = (Δ Yield% / Δ Production%) × 100
**Interpretation**:
- Contrib_Area > 60%: **Extensive growth**
→ Area expansion is main driver
→ Agricultural frontier expanding
- Contrib_Yield > 60%: **Intensive growth**
→ Technology improvement is main driver
→ Productivity/ha increasing
- Both ~50%: **Balanced growth**
**Validation**:
Check: Production(t) ≈ Area(t) × Yield(t) (margin 1%)
Check: Contrib_Area + Contrib_Yield ≈ 100% (margin 5%)
**Example**:
Soybeans 2023 vs 2022:
- Δ Production: +12.4%
- Δ Area: +6.1%
- Δ Yield: +7.6%
Contrib_Area = (6.1 / 12.4) × 100 = 49%
Contrib_Yield = (7.6 / 12.4) × 100 = 61%
**Interpretation**: Intensive growth (61% from yield).
Technology and management improving.
Step 6: Document Analyses
Save all specifications in references/analysis-methods.md of the agent.
Phase 2 Checklist
- 15+ typical questions listed
- Questions grouped by analysis type
- 4-6 analyses prioritized (with scoring)
- Each analysis specified (objective, inputs, outputs, methodology)
- Methodologies detailed with formulas
- Validations defined
- Interpretations specified
- Concrete examples included