Files
gh-francyjglisboa-agent-ski…/skills/FrancyJGLisboa__agent-skill-creator/references/phase3-architecture.md
2025-11-29 18:27:28 +08:00

7.7 KiB

Phase 3: Architecture and Structuring

Objective

STRUCTURE the agent optimally: folders, files, responsibilities, cache, performance.

Detailed Process

Step 1: Define Agent Name

Based on domain and objective, create descriptive name:

Format: domain-objective-type

Examples:

  • US Agriculture → nass-usda-agriculture
  • Stock analysis → stock-technical-analysis
  • Global climate → noaa-climate-analysis
  • Brazil CONAB → conab-crop-yield-analysis

Rules:

  • lowercase
  • hyphens to separate words
  • maximum 50 characters
  • descriptive but concise

Step 2: Directory Structure

Decision: How many levels of organization?

Option A - Simple (small agents):

agent-name/
├── .claude-plugin/
│   └── marketplace.json    ← REQUIRED for installation!
├── SKILL.md
├── scripts/
│   └── main.py
├── references/
│   └── guide.md
└── assets/
    └── config.json

Option B - Organized (medium agents):

agent-name/
├── .claude-plugin/
│   └── marketplace.json    ← REQUIRED for installation!
├── SKILL.md
├── scripts/
│   ├── fetch.py
│   ├── parse.py
│   ├── analyze.py
│   └── utils/
│       ├── cache.py
│       └── validators.py
├── references/
│   ├── api-guide.md
│   └── methodology.md
└── assets/
    └── config.json

Option C - Complete (complex agents):

agent-name/
├── .claude-plugin/
│   └── marketplace.json    ← REQUIRED for installation!
├── SKILL.md
├── scripts/
│   ├── core/
│   │   ├── fetch_[source].py
│   │   ├── parse_[source].py
│   │   └── analyze_[source].py
│   ├── models/
│   │   ├── forecasting.py
│   │   └── ml_models.py
│   └── utils/
│       ├── cache_manager.py
│       ├── rate_limiter.py
│       └── validators.py
├── references/
│   ├── api/
│   │   └── [api-name]-guide.md
│   ├── methods/
│   │   └── analysis-methods.md
│   └── troubleshooting.md
├── assets/
│   ├── config.json
│   └── metadata.json
└── data/
    ├── raw/
    ├── processed/
    ├── cache/
    └── analysis/

Choose based on:

  • Number of scripts (1-2 → A, 3-5 → B, 6+ → C)
  • Analysis complexity
  • Prefer starting with B, expand to C if needed

Step 3: Define Script Responsibilities

Principle: Separation of Concerns

Typical scripts:

1. fetch_[source].py

  • Responsibility: API requests, authentication, rate limiting
  • Input: Query parameters (commodity, year, etc)
  • Output: Raw JSON from API
  • Does NOT: Parse, transform, analyze
  • Size: 200-300 lines

2. parse_[source].py

  • Responsibility: Parsing, cleaning, validation
  • Input: API JSON
  • Output: Structured DataFrame
  • Does NOT: Fetch, analyze
  • Size: 150-200 lines

3. analyze_[source].py

  • Responsibility: All analyses (YoY, ranking, etc)
  • Input: Clean DataFrame
  • Output: Dict with results
  • Does NOT: Fetch, parse
  • Size: 300-500 lines (all analyses)

Typical utils:

cache_manager.py:

  • Manage response cache
  • Differentiated TTL
  • ~100-150 lines

rate_limiter.py:

  • Control rate limit
  • Persistent counter
  • Alerts
  • ~100-150 lines

validators.py:

  • Data validations
  • Consistency checks
  • ~100-150 lines

unit_converter.py (if needed):

  • Unit conversions
  • ~50-100 lines

Step 4: Plan References

Typical files:

api-guide.md (~1500 words):

  • How to get API key
  • Main endpoints with examples
  • Important parameters
  • Response format
  • Limitations and quirks
  • API troubleshooting

analysis-methods.md (~2000 words):

  • Each analysis explained
  • Mathematical formulas
  • Interpretations
  • Validations
  • Concrete examples

troubleshooting.md (~1000 words):

  • Common problems
  • Step-by-step solutions
  • FAQs

domain-context.md (optional, ~1000 words):

  • Domain context
  • Terminology
  • Important concepts
  • Benchmarks

Step 5: Plan Assets

config.json:

  • API settings (URL, rate limits, timeouts)
  • Cache settings (TTLs, directories)
  • Analysis defaults
  • Validation thresholds

Typical structure:

{
  "api": {
    "base_url": "...",
    "api_key_env": "VAR_NAME",
    "rate_limit_per_day": 1000,
    "timeout_seconds": 30
  },
  "cache": {
    "enabled": true,
    "dir": "data/cache",
    "ttl_historical_days": 365,
    "ttl_current_days": 7
  },
  "defaults": {
    "param1": "value1"
  },
  "validation": {
    "threshold1": 0.01
  }
}

metadata.json (if needed):

  • Domain-specific metadata
  • Mappings (aliases)
  • Conversions (units)
  • Groupings (regions)

Example:

{
  "commodities": {
    "CORN": {
      "aliases": ["corn", "maize"],
      "unit": "BU",
      "conversion_to_mt": 0.0254
    }
  },
  "regions": {
    "MIDWEST": ["IA", "IL", "IN", "OH"]
  }
}

Step 6: Cache Strategy

Decision: What to cache and for how long?

General rule:

  • Historical data (year < current): Permanent cache (365+ days)
  • Current year data: Short cache (7 days - may be revised)
  • Metadata (commodity lists, states): Permanent cache

Implementation:

  • Cache by key (parameter hash)
  • Check age before using
  • Fallback to expired cache if API fails

Example:

def get_cache_ttl(year: int) -> timedelta:
    """Determine cache TTL based on year"""
    current_year = datetime.now().year

    if year < current_year:
        # Historical: cache for 1 year (effectively permanent)
        return timedelta(days=365)
    else:
        # Current year: cache for 7 days (may be revised)
        return timedelta(days=7)

Step 7: Rate Limiting Strategy

Decision: How to control rate limits?

Components:

  1. Persistent counter (file/DB)
  2. Pre-request verification
  3. Alerts (when near limit)
  4. Blocking (when limit reached)

Implementation:

class RateLimiter:
    def __init__(self, max_requests: int, period_seconds: int):
        self.max_requests = max_requests
        self.period = period_seconds
        self.counter_file = Path("data/cache/rate_limit_counter.json")

    def allow_request(self) -> bool:
        """Check if request is allowed"""
        count = self._get_current_count()

        if count >= self.max_requests:
            return False

        # Warn when near limit
        if count > self.max_requests * 0.9:
            print(f"⚠️ Rate limit: {count}/{self.max_requests} requests used")

        return True

    def record_request(self):
        """Record that request was made"""
        count = self._get_current_count()
        self._save_count(count + 1)

Step 8: Document Architecture

Create section in DECISIONS.md:

## Phase 3: Architecture

### Chosen Structure

[Directory tree]


**Justification**:
- Separate scripts for modularity
- Utils for reusable code
- References for progressive disclosure
- Data/ to separate raw vs processed

### Defined Scripts

**fetch_[source].py** (280 lines estimated):
- Responsibility: [...]
- Input/Output: [...]

[For each script...]

### Cache Strategy

- Historical: Permanent cache
- Current: 7 day cache
- Justification: [historical data doesn't change]

### Rate Limiting

- Method: [persistent counter]
- Limits: [1000/day]
- Alerts: [>90% usage]

Phase 3 Checklist

  • Agent name defined
  • Directory structure chosen (A/B/C)
  • Responsibilities of each script defined
  • References planned (which files, content)
  • Assets planned (which configs, structure)
  • Cache strategy defined (what, TTL)
  • Rate limiting strategy defined
  • Architecture documented in DECISIONS.md