# Project Selection Guide Decision tree and heuristics for selecting the right MXCP approach and templates based on **technical requirements**. **Scope**: This guide helps select implementation patterns (SQL vs Python, template selection, architecture patterns) based on data sources, authentication mechanisms, and technical constraints. It does NOT help define business requirements or determine what features to build. ## Decision Tree Use this decision tree to determine the appropriate MXCP implementation approach: ``` User Request ├─ Data File │ ├─ CSV file │ │ ├─ Static data → dbt seed + SQL tool │ │ ├─ Needs transformation → dbt seed + dbt model + SQL tool │ │ └─ Large file (>100MB) → Convert to Parquet + dbt model │ ├─ Excel file (.xlsx, .xls) │ │ ├─ Static/one-time → Convert to CSV + dbt seed │ │ ├─ User upload (dynamic) → Python tool with pandas + DuckDB table │ │ └─ Multi-sheet → Python tool to load all sheets as tables │ ├─ JSON/Parquet │ │ └─ DuckDB read_json/read_parquet directly in SQL tool │ └─ Synthetic data needed │ ├─ For testing → dbt model with GENERATE_SERIES │ ├─ Dynamic generation → Python tool with parameters │ └─ With statistics → Generate + analyze in single tool │ ├─ External API Integration │ ├─ OAuth required │ │ ├─ Google (Calendar, Sheets, etc.) → google-calendar template │ │ ├─ Jira Cloud → jira-oauth template │ │ ├─ Salesforce → salesforce-oauth template │ │ └─ Other OAuth → Adapt google-calendar template │ │ │ ├─ API Token/Basic Auth │ │ ├─ Jira → jira template │ │ ├─ Confluence → confluence template │ │ ├─ Salesforce → salesforce template │ │ ├─ Custom API → python-demo template │ │ └─ REST API → Create new Python tool │ │ │ └─ Public API (no auth) │ └─ Create SQL tool with read_json/read_csv from URL │ ├─ Database Connection │ ├─ PostgreSQL │ │ ├─ Direct query → DuckDB ATTACH + SQL tools │ │ └─ Cache data → dbt source + model + SQL tools │ ├─ MySQL │ │ ├─ Direct query → DuckDB ATTACH + SQL tools │ │ └─ Cache data → dbt source + model │ ├─ SQLite → DuckDB ATTACH + SQL tools (simple) │ ├─ SQL Server → DuckDB ATTACH + SQL tools │ └─ Other/NoSQL → Create Python tool with connection library │ ├─ Complex Logic/Processing │ ├─ Data transformation → dbt model │ ├─ Business logic → Python tool │ ├─ ML/AI processing → Python tool with libraries │ └─ Async operations → Python tool with async/await │ └─ Authentication/Security System ├─ Keycloak → keycloak template ├─ Custom SSO → Adapt keycloak template └─ Policy enforcement → Use MXCP policies ``` ## Available Project Templates ### Data-Focused Templates #### covid_owid **Use when**: Working with external data sources, caching datasets **Features**: - dbt integration for data caching - External CSV/JSON fetching - Data quality tests - Incremental updates **Example use cases**: - "Cache COVID statistics for offline analysis" - "Query external datasets regularly" - "Download and transform public data" **Key files**: - `models/` - dbt models for data transformation - `tools/` - SQL tools querying cached data #### earthquakes **Use when**: Real-time data monitoring, geospatial data **Features**: - Real-time API queries - Geospatial filtering - Time-based queries **Example use cases**: - "Monitor earthquake activity" - "Query geospatial data by region" - "Real-time event tracking" ### API Integration Templates #### google-calendar (OAuth) **Use when**: Integrating with Google APIs or other OAuth 2.0 services **Features**: - OAuth 2.0 authentication flow - Token management - Google API client integration - Python endpoints with async support **Example use cases**: - "Connect to Google Calendar" - "Access Google Sheets data" - "Integrate with Gmail" - "Any OAuth 2.0 API integration" **Adaptation guide**: 1. Replace Google API client with target API client 2. Update OAuth scopes and endpoints 3. Modify tool definitions for new API methods 4. Update configuration with new OAuth provider #### jira (API Token) **Use when**: Integrating with Jira using API tokens **Features**: - API token authentication - JQL query support - Issue, user, project management - Python HTTP client pattern **Example use cases**: - "Query Jira issues" - "Get project information" - "Search for users" #### jira-oauth (OAuth) **Use when**: Jira integration requiring OAuth **Features**: - OAuth 1.0a for Jira - More secure than API tokens - Full Jira REST API access #### confluence **Use when**: Atlassian Confluence integration **Features**: - Confluence REST API - Page and space queries - Content search **Example use cases**: - "Search Confluence pages" - "Get page content" - "List spaces" #### salesforce / salesforce-oauth **Use when**: Salesforce CRM integration **Features**: - Salesforce REST API - SOQL queries - OAuth or username/password auth **Example use cases**: - "Query Salesforce records" - "Get account information" - "Search opportunities" ### Development Templates #### python-demo **Use when**: Building custom Python-based tools **Features**: - Python endpoint patterns - Async/await examples - Database access patterns - Error handling **Example use cases**: - "Create custom API integration" - "Implement complex business logic" - "Build ML/AI-powered tools" **Key patterns**: ```python # Sync endpoint def simple_tool(param: str) -> dict: return {"result": param.upper()} # Async endpoint async def async_tool(ids: list[str]) -> list[dict]: results = await asyncio.gather(*[fetch_data(id) for id in ids]) return results # Database access def db_tool(query: str) -> list[dict]: return db.execute(query).fetchall() ``` ### Infrastructure Templates #### plugin **Use when**: Extending DuckDB with custom functions **Features**: - DuckDB plugin development - Custom SQL functions - Compiled extensions **Example use cases**: - "Add custom SQL functions" - "Integrate C/C++ libraries" - "Optimize performance-critical operations" #### keycloak **Use when**: Enterprise authentication/authorization **Features**: - Keycloak integration - SSO support - Role-based access control **Example use cases**: - "Integrate with Keycloak SSO" - "Implement role-based policies" - "Enterprise user management" #### squirro **Use when**: Enterprise search and insights integration **Features**: - Squirro API integration - Search and analytics - Enterprise data access ## Common Scenarios and Heuristics ### Scenario 1: CSV File to Query **User says**: "I need to connect my chat to a CSV file" **Heuristic**: 1. **DO NOT** use existing templates 2. **CREATE** new MXCP project from scratch 3. **APPROACH**: - Place CSV in `seeds/` directory - Create `seeds/schema.yml` with schema definition and tests - Run `dbt seed` to load into DuckDB - Create SQL tool: `SELECT * FROM ` - Add parameters for filtering if needed **Implementation steps**: ```bash # 1. Initialize project mkdir csv-server && cd csv-server mxcp init --bootstrap # 2. Setup dbt mkdir seeds cp /path/to/file.csv seeds/data.csv # 3. Create schema cat > seeds/schema.yml < tools/query_data.yml < python/api_client.py < dict: secret = get_secret("api_token") async with httpx.AsyncClient() as client: response = await client.get( f"https://api.example.com/{endpoint}", headers={"Authorization": f"Bearer {secret['token']}"} ) return response.json() EOF # 3. Create tool # 4. Configure secret in config.yml # 5. Test ``` ### Scenario 4: Complex Data Transformation **User says**: "Transform this data and provide analytics" **Heuristic**: 1. **Use** dbt for transformations 2. **Use** SQL tools for queries 3. **Pattern**: seed → model → tool **Implementation steps**: ```bash # 1. Load source data (seed or external) # 2. Create dbt model for transformation cat > models/analytics.sql < seeds/schema.yml # Create schema dbt seed # Option B: Dynamic upload → Python tool cat > python/excel_loader.py # Create loader cat > tools/load_excel.yml # Create tool pip install openpyxl pandas # Add dependencies ``` See **references/excel-integration.md** for complete patterns. ### Scenario 6: Synthetic Data Generation **User says**: "Generate test data" or "Create synthetic customer records" or "I need dummy data for testing" **Heuristic**: 1. **If persistent test data**: dbt model with GENERATE_SERIES 2. **If dynamic/parameterized**: Python tool 3. **If with analysis**: Generate + calculate statistics in one tool **Implementation steps**: ```bash # Option A: Persistent via dbt cat > models/synthetic_customers.sql < python/generate_data.py # Create generator cat > tools/generate_test_data.yml # Create tool ``` See **references/synthetic-data-patterns.md** for complete patterns. ### Scenario 7: Python Library Wrapping **User says**: "Wrap the Stripe API" or "Use pandas for analysis" or "Connect to Redis" **Heuristic**: 1. **Check** if it's an API client library (stripe, twilio, etc.) 2. **Check** if it's a data/ML library (pandas, sklearn, etc.) 3. **Use** `python-demo` as base 4. **Add** library to requirements.txt 5. **Use** @on_init for initialization if stateful **Implementation steps**: ```bash # 1. Copy python-demo template cp -r assets/project-templates/python-demo my-project # 2. Install library echo "stripe>=5.4.0" >> requirements.txt pip install stripe # 3. Create wrapper cat > python/stripe_wrapper.py # Implement wrapper functions # 4. Create tools cat > tools/create_customer.yml # Map to wrapper functions # 5. Create project config with secrets cat > config.yml < python/ml_tool.py < list[dict]: results = classifier(texts) return [{"text": t, **r} for t, r in zip(texts, results)] EOF # 4. Create tool definition # 5. Test ``` ### Scenario 9: External Database Connection **User says**: "Connect to my PostgreSQL database" or "Query my MySQL production database" **Heuristic**: 1. **Ask** if data can be exported to CSV (simpler approach) 2. **Ask** if they need real-time data or can cache it 3. **Decide**: Direct query (ATTACH) vs cached (dbt) **Implementation steps - Direct Query (ATTACH)**: ```bash # 1. Create project mkdir db-connection && cd db-connection mxcp init --bootstrap # 2. Create config with credentials cat > config.yml < tools/query_database.yml < models/sources.yml < models/customer_cache.sql < models/schema.yml < tools/query_cached.yml <100MB)" - "How often does the data update? (static, daily, real-time)" ### Security Requirements Unclear - "Who should have access to this data? (everyone, specific roles, specific users)" - "Are there any sensitive fields that need protection?" ### Functionality Unclear - "What questions do you want to ask about this data?" - "What operations should be available through the MCP server?" ## Heuristics When No Interaction Available **If cannot ask questions, use these defaults**: 1. **CSV file mentioned** → dbt seed + SQL tool with `SELECT *` 2. **Excel mentioned** → Convert to CSV + dbt seed OR Python pandas tool 3. **API mentioned** → Check for template, otherwise use Python tool with httpx 4. **OAuth mentioned** → Use google-calendar template as base 5. **Database mentioned** → DuckDB ATTACH for direct query OR dbt for caching 6. **PostgreSQL/MySQL mentioned** → Use ATTACH with read-only user 7. **Transformation needed** → dbt model 8. **Complex logic** → Python tool 9. **Security not mentioned** → No policies (user can add later) 10. **No auth mentioned for API** → Assume token/basic auth ## Configuration Management ### Project-Local Config (Recommended) **ALWAYS create `config.yml` in the project directory, NOT `~/.mxcp/config.yml`** **Why?** - User maintains control over global config - Project is self-contained and portable - Safer for agents (no global config modification) - User can review before copying to ~/.mxcp/ **Basic config.yml template**: ```yaml # config.yml (in project root) mxcp: 1 profiles: default: # Secrets via environment variables (recommended) secrets: - name: api_token type: env parameters: env_var: API_TOKEN # Database configuration (optional, default is data/db-default.duckdb) database: path: "data/db-default.duckdb" # Authentication (if needed) auth: provider: github # or google, microsoft, etc. production: database: path: "prod.duckdb" audit: enabled: true path: "audit.jsonl" ``` **Usage options**: ```bash # Option 1: Auto-discover (mxcp looks for ./config.yml) mxcp serve # Option 2: Explicit path via environment variable MXCP_CONFIG=./config.yml mxcp serve # Option 3: User manually copies to global location cp config.yml ~/.mxcp/config.yml mxcp serve ``` **In skill implementations**: ```bash # CORRECT: Create local config cat > config.yml <` - [ ] Test with invalid inputs - [ ] Test with edge cases (empty data, nulls, etc.) ## Summary **Quick reference for common requests**: | User Request | Approach | Template | Key Steps | |--------------|----------|----------|-----------| | "Query my CSV" | dbt seed + SQL tool | None | seed → schema.yml → dbt seed/test → SQL tool | | "Read Excel file" | Convert to CSV + dbt seed OR pandas tool | None | Excel→CSV → seed OR pandas → DuckDB table | | "Connect to PostgreSQL" | ATTACH + SQL tool OR dbt cache | None | ATTACH → SQL tool OR dbt source/model → SQL tool | | "Connect to MySQL" | ATTACH + SQL tool OR dbt cache | None | ATTACH → SQL tool OR dbt source/model → SQL tool | | "Generate test data" | dbt model or Python | None | GENERATE_SERIES → dbt model or Python tool | | "Wrap library X" | Python wrapper | python-demo | Install lib → wrap functions → create tools | | "Connect to Google Calendar" | OAuth + Python | google-calendar | Copy template → configure OAuth | | "Connect to Jira" | Token + Python | jira or jira-oauth | Copy template → configure token | | "Transform data" | dbt model | None | seed/source → model → schema.yml → dbt run/test → SQL tool | | "Complex logic" | Python tool | python-demo | Copy template → implement function | | "ML/AI task" | Python + libraries | python-demo | Add ML libs → implement model | | "External API" | Python + httpx | python-demo | Implement client → create tool | **Priority order**: 1. Security (auth, policies, validation) 2. Robustness (error handling, types, tests) 3. Testing (validate, test, lint) 4. Features (based on user needs)