Initial commit

2025-11-30 08:37:55 +08:00
commit 506a828b22
59 changed files with 18515 additions and 0 deletions
--- a/skills/project-commands.md
+++ b/skills/project-commands.md
@@ -0,0 +1,247 @@
+---
+name: project-commands
+description: Complete reference for all make commands, development workflows, Azure operations, and database operations. Use when you need to know how to run specific operations.
+---
+
+# Project Commands Reference
+
+Complete command reference for the Unify data migration project.
+
+## Build & Test Commands
+
+### Syntax Validation
+```bash
+python3 -m py_compile <file_path>
+python3 -m py_compile python_files/utilities/session_optimiser.py
+```
+
+### Code Quality
+```bash
+ruff check python_files/          # Linting (must pass)
+ruff format python_files/         # Auto-format code
+```
+
+### Testing
+```bash
+python -m pytest python_files/testing/                    # All tests
+python -m pytest python_files/testing/medallion_testing.py  # Integration
+```
+
+## Pipeline Commands
+
+### Complete Pipeline
+```bash
+make run_all  # Executes: choice_list_mapper → bronze → silver → gold → build_duckdb
+```
+
+### Layer-Specific (WARNING: Deletes existing layer data)
+```bash
+make bronze       # Bronze layer pipeline (deletes /workspaces/data/bronze_*)
+make run_silver   # Silver layer (includes choice_list_mapper, deletes /workspaces/data/silver_*)
+make gold         # Gold layer (includes DuckDB build, deletes /workspaces/data/gold_*)
+```
+
+### Specific Table Execution
+```bash
+# Run specific silver table
+make silver_table FILE_READ_LAYER=silver PATH_DATABASE=silver_fvms RUN_FILE_NAME=s_fvms_incident
+
+# Run specific gold table
+make gold_table G_RUN_FILE_NAME=g_x_mg_statsclasscount
+
+# Run currently open file (auto-detects layer and database)
+make current_table  # Requires: make install_file_tracker (run once, then reload VSCode)
+```
+
+## Development Workflow
+
+### Interactive UI
+```bash
+make ui  # Interactive menu for all commands
+```
+
+### Data Generation
+```bash
+make generate_data  # Generate synthetic test data
+```
+
+## Spark Thrift Server
+
+Enables JDBC/ODBC connections to local Spark data on port 10000:
+
+```bash
+make thrift-start   # Start server
+make thrift-status  # Check if running
+make thrift-stop    # Stop server
+
+# Connect via spark-sql CLI
+spark-sql -e "SHOW DATABASES; SHOW TABLES;"
+spark-sql -e "SELECT * FROM gold_data_model.g_x_mg_statsclasscount LIMIT 10;"
+```
+
+## Database Operations
+
+### Database Inspection
+```bash
+make database-check  # Check Hive databases and tables
+
+# View schemas
+spark-sql -e "SHOW DATABASES; SHOW TABLES;"
+```
+
+### DuckDB Operations
+```bash
+make build_duckdb  # Build local DuckDB database (/workspaces/data/warehouse.duckdb)
+make harly         # Open Harlequin TUI for interactive DuckDB queries
+```
+
+**DuckDB Benefits**:
+- Fast local queries without Azure connection
+- Data exploration and validation
+- Report prototyping
+- Testing query logic before deploying to Synapse
+
+## Azure Operations
+
+### Authentication
+```bash
+make azure_login  # Azure CLI login
+```
+
+### SharePoint Integration
+```bash
+# Download SharePoint files
+make download_sharepoint SHAREPOINT_FILE_ID=<file-id>
+
+# Convert Excel to JSON
+make convert_excel_to_json
+
+# Upload to Azure Storage
+make upload_to_storage UPLOAD_FILE=<file-path>
+```
+
+### Complete Pipelines
+```bash
+# Offence mapping pipeline
+make offence_mapping_build  # download_sharepoint → convert_excel_to_json → upload_to_storage
+
+# Table list management
+make table_lists_pipeline    # download_ors_table_mapping → generate_table_lists → upload_all_table_lists
+make update_pipeline_variables  # Update Azure Synapse pipeline variables
+```
+
+## AI Agent Integration
+
+### User Story Processing
+Automate ETL file generation from Azure DevOps user stories:
+
+```bash
+make user_story_build \
+  A_USER_STORY=44687 \
+  A_FILE_NAME=g_x_mg_statsclasscount \
+  A_READ_LAYER=silver \
+  A_WRITE_LAYER=gold
+```
+
+**What it does**:
+- Reads user story requirements from Azure DevOps
+- Generates ETL transformation code
+- Creates appropriate tests
+- Follows project coding standards
+
+### Agent Session
+```bash
+make session  # Start persistent Claude Code session with dangerously-skip-permissions
+```
+
+## Git Operations
+
+### Branch Merging
+```bash
+make merge_staging   # Merge from staging (adds all changes, commits, pulls with --no-ff)
+make rebase_staging  # Rebase from staging (adds all changes, commits, rebases)
+```
+
+## Environment Variables
+
+### Required for Azure DevOps MCP
+```bash
+export AZURE_DEVOPS_PAT="<your-personal-access-token>"
+export AZURE_DEVOPS_ORGANIZATION="emstas"
+export AZURE_DEVOPS_PROJECT="Program Unify"
+```
+
+### Required for Azure Operations
+See `configuration.yaml` for complete list of Azure environment variables.
+
+## Common Workflows
+
+### Complete Development Cycle
+```bash
+# 1. Generate test data
+make generate_data
+
+# 2. Run full pipeline
+make run_all
+
+# 3. Explore results
+make harly
+
+# 4. Run tests
+python -m pytest python_files/testing/
+
+# 5. Quality checks
+ruff check python_files/
+ruff format python_files/
+```
+
+### Quick Table Development
+```bash
+# 1. Open file in VSCode
+# 2. Run current file
+make current_table
+
+# 3. Check output in DuckDB
+make harly
+```
+
+### Quality Gates Before Commit
+```bash
+# Must run these before committing
+python3 -m py_compile <file>  # 1. Syntax check
+ruff check python_files/       # 2. Linting (must pass)
+ruff format python_files/      # 3. Format code
+```
+
+## Troubleshooting Commands
+
+### Check Spark Session
+```bash
+spark-sql -e "SHOW DATABASES;"
+```
+
+### Verify Azure Connection
+```bash
+make azure_login
+az account show
+```
+
+### Check Data Paths
+```bash
+ls -la /workspaces/data/
+```
+
+## File Tracker Setup
+
+One-time setup for `make current_table`:
+```bash
+make install_file_tracker
+# Then reload VSCode
+```
+
+## Notes
+
+- **Data Deletion**: Layer-specific commands delete existing data before running
+- **Thrift Server**: Port 10000 for JDBC/ODBC connections
+- **DuckDB**: Local analysis without Azure connection required
+- **Quality Gates**: Always run before committing code