Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:37:55 +08:00
commit 506a828b22
59 changed files with 18515 additions and 0 deletions

247
skills/project-commands.md Normal file
View File

@@ -0,0 +1,247 @@
---
name: project-commands
description: Complete reference for all make commands, development workflows, Azure operations, and database operations. Use when you need to know how to run specific operations.
---
# Project Commands Reference
Complete command reference for the Unify data migration project.
## Build & Test Commands
### Syntax Validation
```bash
python3 -m py_compile <file_path>
python3 -m py_compile python_files/utilities/session_optimiser.py
```
### Code Quality
```bash
ruff check python_files/ # Linting (must pass)
ruff format python_files/ # Auto-format code
```
### Testing
```bash
python -m pytest python_files/testing/ # All tests
python -m pytest python_files/testing/medallion_testing.py # Integration
```
## Pipeline Commands
### Complete Pipeline
```bash
make run_all # Executes: choice_list_mapper → bronze → silver → gold → build_duckdb
```
### Layer-Specific (WARNING: Deletes existing layer data)
```bash
make bronze # Bronze layer pipeline (deletes /workspaces/data/bronze_*)
make run_silver # Silver layer (includes choice_list_mapper, deletes /workspaces/data/silver_*)
make gold # Gold layer (includes DuckDB build, deletes /workspaces/data/gold_*)
```
### Specific Table Execution
```bash
# Run specific silver table
make silver_table FILE_READ_LAYER=silver PATH_DATABASE=silver_fvms RUN_FILE_NAME=s_fvms_incident
# Run specific gold table
make gold_table G_RUN_FILE_NAME=g_x_mg_statsclasscount
# Run currently open file (auto-detects layer and database)
make current_table # Requires: make install_file_tracker (run once, then reload VSCode)
```
## Development Workflow
### Interactive UI
```bash
make ui # Interactive menu for all commands
```
### Data Generation
```bash
make generate_data # Generate synthetic test data
```
## Spark Thrift Server
Enables JDBC/ODBC connections to local Spark data on port 10000:
```bash
make thrift-start # Start server
make thrift-status # Check if running
make thrift-stop # Stop server
# Connect via spark-sql CLI
spark-sql -e "SHOW DATABASES; SHOW TABLES;"
spark-sql -e "SELECT * FROM gold_data_model.g_x_mg_statsclasscount LIMIT 10;"
```
## Database Operations
### Database Inspection
```bash
make database-check # Check Hive databases and tables
# View schemas
spark-sql -e "SHOW DATABASES; SHOW TABLES;"
```
### DuckDB Operations
```bash
make build_duckdb # Build local DuckDB database (/workspaces/data/warehouse.duckdb)
make harly # Open Harlequin TUI for interactive DuckDB queries
```
**DuckDB Benefits**:
- Fast local queries without Azure connection
- Data exploration and validation
- Report prototyping
- Testing query logic before deploying to Synapse
## Azure Operations
### Authentication
```bash
make azure_login # Azure CLI login
```
### SharePoint Integration
```bash
# Download SharePoint files
make download_sharepoint SHAREPOINT_FILE_ID=<file-id>
# Convert Excel to JSON
make convert_excel_to_json
# Upload to Azure Storage
make upload_to_storage UPLOAD_FILE=<file-path>
```
### Complete Pipelines
```bash
# Offence mapping pipeline
make offence_mapping_build # download_sharepoint → convert_excel_to_json → upload_to_storage
# Table list management
make table_lists_pipeline # download_ors_table_mapping → generate_table_lists → upload_all_table_lists
make update_pipeline_variables # Update Azure Synapse pipeline variables
```
## AI Agent Integration
### User Story Processing
Automate ETL file generation from Azure DevOps user stories:
```bash
make user_story_build \
A_USER_STORY=44687 \
A_FILE_NAME=g_x_mg_statsclasscount \
A_READ_LAYER=silver \
A_WRITE_LAYER=gold
```
**What it does**:
- Reads user story requirements from Azure DevOps
- Generates ETL transformation code
- Creates appropriate tests
- Follows project coding standards
### Agent Session
```bash
make session # Start persistent Claude Code session with dangerously-skip-permissions
```
## Git Operations
### Branch Merging
```bash
make merge_staging # Merge from staging (adds all changes, commits, pulls with --no-ff)
make rebase_staging # Rebase from staging (adds all changes, commits, rebases)
```
## Environment Variables
### Required for Azure DevOps MCP
```bash
export AZURE_DEVOPS_PAT="<your-personal-access-token>"
export AZURE_DEVOPS_ORGANIZATION="emstas"
export AZURE_DEVOPS_PROJECT="Program Unify"
```
### Required for Azure Operations
See `configuration.yaml` for complete list of Azure environment variables.
## Common Workflows
### Complete Development Cycle
```bash
# 1. Generate test data
make generate_data
# 2. Run full pipeline
make run_all
# 3. Explore results
make harly
# 4. Run tests
python -m pytest python_files/testing/
# 5. Quality checks
ruff check python_files/
ruff format python_files/
```
### Quick Table Development
```bash
# 1. Open file in VSCode
# 2. Run current file
make current_table
# 3. Check output in DuckDB
make harly
```
### Quality Gates Before Commit
```bash
# Must run these before committing
python3 -m py_compile <file> # 1. Syntax check
ruff check python_files/ # 2. Linting (must pass)
ruff format python_files/ # 3. Format code
```
## Troubleshooting Commands
### Check Spark Session
```bash
spark-sql -e "SHOW DATABASES;"
```
### Verify Azure Connection
```bash
make azure_login
az account show
```
### Check Data Paths
```bash
ls -la /workspaces/data/
```
## File Tracker Setup
One-time setup for `make current_table`:
```bash
make install_file_tracker
# Then reload VSCode
```
## Notes
- **Data Deletion**: Layer-specific commands delete existing data before running
- **Thrift Server**: Port 10000 for JDBC/ODBC connections
- **DuckDB**: Local analysis without Azure connection required
- **Quality Gates**: Always run before committing code