Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:27:28 +08:00
commit 8db9c44dd8
79 changed files with 37715 additions and 0 deletions

View File

@@ -0,0 +1,244 @@
# Phase 2: Analysis Design
## Objective
**DEFINE** autonomously which analyses the agent will perform and how.
## Detailed Process
### Step 1: Brainstorm Use Cases
From the workflow described by the user, think of typical questions they will ask.
**Technique**: "If I were this user, what would I ask?"
**Example (US agriculture)**:
User said: "download crop data, compare year vs year, make rankings"
**Questions they likely ask**:
1. "What's the corn production in 2023?"
2. "How's soybean compared to last year?"
3. "Did production grow or fall?"
4. "How much did it grow?"
5. "Does growth come from area or productivity?"
6. "Which states produce most wheat?"
7. "Top 5 soybean producers"
8. "Did the ranking change vs last year?"
9. "Production trend last 5 years?"
10. "Forecast for next year?"
11. "Average US yield"
12. "Which state has best productivity?"
13. "Did planted area increase?"
14. "Compare Midwest vs South"
15. "Production by region"
**Objective**: List 15-20 typical questions
### Step 2: Group by Analysis Type
Group similar questions:
**Group 1: Simple Queries** (fetching + formatting)
- Questions: 1, 11, 13
- Required analysis: **Data Retrieval**
- Complexity: Low
**Group 2: Temporal Comparisons** (YoY)
- Questions: 2, 3, 4, 5
- Required analysis: **YoY Comparison + Decomposition**
- Complexity: Medium
**Group 3: Rankings** (sorting + share)
- Questions: 6, 7, 8
- Required analysis: **State Ranking**
- Complexity: Medium
**Group 4: Trends** (time series)
- Questions: 9
- Required analysis: **Trend Analysis**
- Complexity: Medium-High
**Group 5: Projections** (forecasting)
- Questions: 10
- Required analysis: **Forecasting**
- Complexity: High
**Group 6: Geographic Aggregations**
- Questions: 12, 14, 15
- Required analysis: **Regional Aggregation**
- Complexity: Medium
### Step 3: Prioritize Analyses
**Prioritization criteria**:
1. **Frequency of use** (based on described workflow)
2. **Analytical value** (insight vs effort)
3. **Implementation complexity** (easier first)
4. **Dependencies** (does one analysis depend on another?)
**Scoring**:
| Analysis | Frequency | Value | Ease | Score |
|---------|------------|-------|------------|-------|
| YoY Comparison | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | 9.3/10 |
| State Ranking | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | 9.3/10 |
| Regional Agg | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | 8.0/10 |
| Trend Analysis | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | 7.3/10 |
| Data Retrieval | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | 8.3/10 |
| Forecasting | ⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐ | 5.3/10 |
**DECISION**: Implement top 5
1. YoY Comparison (9.3)
2. State Ranking (9.3)
3. Data Retrieval (8.3)
4. Regional Aggregation (8.0)
5. Trend Analysis (7.3)
**Don't implement initially** (can add later):
- Forecasting (5.3) - complex, occasional use
### Step 4: Specify Each Analysis
For each selected analysis:
```markdown
## Analysis: [Name]
**Objective**: [What it does in 1 sentence]
**When to use**: [Types of questions that trigger it]
**Required inputs**:
- Input 1: [type, description]
- Input 2: [type, description]
**Expected outputs**:
- Output 1: [type, description]
- Output 2: [type, description]
**Methodology**:
[Explanation in natural language]
**Formulas**:
```
Formula 1 = ...
Formula 2 = ...
```
**Data transformations**:
1. [Transformation 1]
2. [Transformation 2]
**Validations**:
- Validation 1: [criteria]
- Validation 2: [criteria]
**Interpretation**:
- If result > X: [interpretation]
- If result < Y: [interpretation]
**Concrete example**:
Input:
- Commodity: Corn
- Year current: 2023
- Year previous: 2022
Processing:
- Fetch 2023 production: 15.3B bu
- Fetch 2022 production: 13.7B bu
- Calculate: (15.3 - 13.7) / 13.7 = +11.7%
Output:
```json
{
"commodity": "CORN",
"year_current": 2023,
"year_previous": 2022,
"production_current": 15.3,
"production_previous": 13.7,
"change_absolute": 1.6,
"change_percent": 11.7,
"interpretation": "significant_increase"
}
```
Response to user:
"Corn production grew 11.7% in 2023 vs 2022 (15.3B bu vs 13.7B bu)."
```
### Step 5: Specify Methodologies
For quantitative analyses, detail methodology:
**Example: YoY Decomposition**
```markdown
### Growth Decomposition
**Objective**: Understand how much of production growth comes from:
- Planted area increase (extensive)
- Productivity/yield increase (intensive)
**Mathematics**:
Production = Area × Yield
Δ Production = Δ Area × Yield(t-1) + Area(t-1) × Δ Yield + Δ Area × Δ Yield
Interaction term usually small, so approximation:
Δ Production ≈ Δ Area × Yield(t-1) + Area(t-1) × Δ Yield
**Percentage contributions**:
Contrib_Area = (Δ Area% / Δ Production%) × 100
Contrib_Yield = (Δ Yield% / Δ Production%) × 100
**Interpretation**:
- Contrib_Area > 60%: **Extensive growth**
→ Area expansion is main driver
→ Agricultural frontier expanding
- Contrib_Yield > 60%: **Intensive growth**
→ Technology improvement is main driver
→ Productivity/ha increasing
- Both ~50%: **Balanced growth**
**Validation**:
Check: Production(t) ≈ Area(t) × Yield(t) (margin 1%)
Check: Contrib_Area + Contrib_Yield ≈ 100% (margin 5%)
**Example**:
Soybeans 2023 vs 2022:
- Δ Production: +12.4%
- Δ Area: +6.1%
- Δ Yield: +7.6%
Contrib_Area = (6.1 / 12.4) × 100 = 49%
Contrib_Yield = (7.6 / 12.4) × 100 = 61%
**Interpretation**: Intensive growth (61% from yield).
Technology and management improving.
```
### Step 6: Document Analyses
Save all specifications in `references/analysis-methods.md` of the agent.
## Phase 2 Checklist
- [ ] 15+ typical questions listed
- [ ] Questions grouped by analysis type
- [ ] 4-6 analyses prioritized (with scoring)
- [ ] Each analysis specified (objective, inputs, outputs, methodology)
- [ ] Methodologies detailed with formulas
- [ ] Validations defined
- [ ] Interpretations specified
- [ ] Concrete examples included