5.7 KiB
5.7 KiB
name, description
| name | description |
|---|---|
| project-commands | Complete reference for all make commands, development workflows, Azure operations, and database operations. Use when you need to know how to run specific operations. |
Project Commands Reference
Complete command reference for the Unify data migration project.
Build & Test Commands
Syntax Validation
python3 -m py_compile <file_path>
python3 -m py_compile python_files/utilities/session_optimiser.py
Code Quality
ruff check python_files/ # Linting (must pass)
ruff format python_files/ # Auto-format code
Testing
python -m pytest python_files/testing/ # All tests
python -m pytest python_files/testing/medallion_testing.py # Integration
Pipeline Commands
Complete Pipeline
make run_all # Executes: choice_list_mapper → bronze → silver → gold → build_duckdb
Layer-Specific (WARNING: Deletes existing layer data)
make bronze # Bronze layer pipeline (deletes /workspaces/data/bronze_*)
make run_silver # Silver layer (includes choice_list_mapper, deletes /workspaces/data/silver_*)
make gold # Gold layer (includes DuckDB build, deletes /workspaces/data/gold_*)
Specific Table Execution
# Run specific silver table
make silver_table FILE_READ_LAYER=silver PATH_DATABASE=silver_fvms RUN_FILE_NAME=s_fvms_incident
# Run specific gold table
make gold_table G_RUN_FILE_NAME=g_x_mg_statsclasscount
# Run currently open file (auto-detects layer and database)
make current_table # Requires: make install_file_tracker (run once, then reload VSCode)
Development Workflow
Interactive UI
make ui # Interactive menu for all commands
Data Generation
make generate_data # Generate synthetic test data
Spark Thrift Server
Enables JDBC/ODBC connections to local Spark data on port 10000:
make thrift-start # Start server
make thrift-status # Check if running
make thrift-stop # Stop server
# Connect via spark-sql CLI
spark-sql -e "SHOW DATABASES; SHOW TABLES;"
spark-sql -e "SELECT * FROM gold_data_model.g_x_mg_statsclasscount LIMIT 10;"
Database Operations
Database Inspection
make database-check # Check Hive databases and tables
# View schemas
spark-sql -e "SHOW DATABASES; SHOW TABLES;"
DuckDB Operations
make build_duckdb # Build local DuckDB database (/workspaces/data/warehouse.duckdb)
make harly # Open Harlequin TUI for interactive DuckDB queries
DuckDB Benefits:
- Fast local queries without Azure connection
- Data exploration and validation
- Report prototyping
- Testing query logic before deploying to Synapse
Azure Operations
Authentication
make azure_login # Azure CLI login
SharePoint Integration
# Download SharePoint files
make download_sharepoint SHAREPOINT_FILE_ID=<file-id>
# Convert Excel to JSON
make convert_excel_to_json
# Upload to Azure Storage
make upload_to_storage UPLOAD_FILE=<file-path>
Complete Pipelines
# Offence mapping pipeline
make offence_mapping_build # download_sharepoint → convert_excel_to_json → upload_to_storage
# Table list management
make table_lists_pipeline # download_ors_table_mapping → generate_table_lists → upload_all_table_lists
make update_pipeline_variables # Update Azure Synapse pipeline variables
AI Agent Integration
User Story Processing
Automate ETL file generation from Azure DevOps user stories:
make user_story_build \
A_USER_STORY=44687 \
A_FILE_NAME=g_x_mg_statsclasscount \
A_READ_LAYER=silver \
A_WRITE_LAYER=gold
What it does:
- Reads user story requirements from Azure DevOps
- Generates ETL transformation code
- Creates appropriate tests
- Follows project coding standards
Agent Session
make session # Start persistent Claude Code session with dangerously-skip-permissions
Git Operations
Branch Merging
make merge_staging # Merge from staging (adds all changes, commits, pulls with --no-ff)
make rebase_staging # Rebase from staging (adds all changes, commits, rebases)
Environment Variables
Required for Azure DevOps MCP
export AZURE_DEVOPS_PAT="<your-personal-access-token>"
export AZURE_DEVOPS_ORGANIZATION="emstas"
export AZURE_DEVOPS_PROJECT="Program Unify"
Required for Azure Operations
See configuration.yaml for complete list of Azure environment variables.
Common Workflows
Complete Development Cycle
# 1. Generate test data
make generate_data
# 2. Run full pipeline
make run_all
# 3. Explore results
make harly
# 4. Run tests
python -m pytest python_files/testing/
# 5. Quality checks
ruff check python_files/
ruff format python_files/
Quick Table Development
# 1. Open file in VSCode
# 2. Run current file
make current_table
# 3. Check output in DuckDB
make harly
Quality Gates Before Commit
# Must run these before committing
python3 -m py_compile <file> # 1. Syntax check
ruff check python_files/ # 2. Linting (must pass)
ruff format python_files/ # 3. Format code
Troubleshooting Commands
Check Spark Session
spark-sql -e "SHOW DATABASES;"
Verify Azure Connection
make azure_login
az account show
Check Data Paths
ls -la /workspaces/data/
File Tracker Setup
One-time setup for make current_table:
make install_file_tracker
# Then reload VSCode
Notes
- Data Deletion: Layer-specific commands delete existing data before running
- Thrift Server: Port 10000 for JDBC/ODBC connections
- DuckDB: Local analysis without Azure connection required
- Quality Gates: Always run before committing code