zhongwei/gh-linus-mcmanamey-unify-2-1-plugin

Files

Zhongwei Li 506a828b22 Initial commit

2025-11-30 08:37:55 +08:00

5.7 KiB

Raw Blame History

name, description

name	description
project-commands	Complete reference for all make commands, development workflows, Azure operations, and database operations. Use when you need to know how to run specific operations.

Project Commands Reference

Complete command reference for the Unify data migration project.

Build & Test Commands

Syntax Validation

python3 -m py_compile <file_path>
python3 -m py_compile python_files/utilities/session_optimiser.py

Code Quality

ruff check python_files/          # Linting (must pass)
ruff format python_files/         # Auto-format code

Testing

python -m pytest python_files/testing/                    # All tests
python -m pytest python_files/testing/medallion_testing.py  # Integration

Pipeline Commands

Complete Pipeline

make run_all  # Executes: choice_list_mapper → bronze → silver → gold → build_duckdb

Layer-Specific (WARNING: Deletes existing layer data)

make bronze       # Bronze layer pipeline (deletes /workspaces/data/bronze_*)
make run_silver   # Silver layer (includes choice_list_mapper, deletes /workspaces/data/silver_*)
make gold         # Gold layer (includes DuckDB build, deletes /workspaces/data/gold_*)

Specific Table Execution

# Run specific silver table
make silver_table FILE_READ_LAYER=silver PATH_DATABASE=silver_fvms RUN_FILE_NAME=s_fvms_incident

# Run specific gold table
make gold_table G_RUN_FILE_NAME=g_x_mg_statsclasscount

# Run currently open file (auto-detects layer and database)
make current_table  # Requires: make install_file_tracker (run once, then reload VSCode)

Development Workflow

Interactive UI

make ui  # Interactive menu for all commands

Data Generation

make generate_data  # Generate synthetic test data

Spark Thrift Server

Enables JDBC/ODBC connections to local Spark data on port 10000:

make thrift-start   # Start server
make thrift-status  # Check if running
make thrift-stop    # Stop server

# Connect via spark-sql CLI
spark-sql -e "SHOW DATABASES; SHOW TABLES;"
spark-sql -e "SELECT * FROM gold_data_model.g_x_mg_statsclasscount LIMIT 10;"

Database Operations

Database Inspection

make database-check  # Check Hive databases and tables

# View schemas
spark-sql -e "SHOW DATABASES; SHOW TABLES;"

DuckDB Operations

make build_duckdb  # Build local DuckDB database (/workspaces/data/warehouse.duckdb)
make harly         # Open Harlequin TUI for interactive DuckDB queries

DuckDB Benefits:

Fast local queries without Azure connection
Data exploration and validation
Report prototyping
Testing query logic before deploying to Synapse

Azure Operations

Authentication

make azure_login  # Azure CLI login

SharePoint Integration

# Download SharePoint files
make download_sharepoint SHAREPOINT_FILE_ID=<file-id>

# Convert Excel to JSON
make convert_excel_to_json

# Upload to Azure Storage
make upload_to_storage UPLOAD_FILE=<file-path>

Complete Pipelines

# Offence mapping pipeline
make offence_mapping_build  # download_sharepoint → convert_excel_to_json → upload_to_storage

# Table list management
make table_lists_pipeline    # download_ors_table_mapping → generate_table_lists → upload_all_table_lists
make update_pipeline_variables  # Update Azure Synapse pipeline variables

AI Agent Integration

User Story Processing

Automate ETL file generation from Azure DevOps user stories:

make user_story_build \
  A_USER_STORY=44687 \
  A_FILE_NAME=g_x_mg_statsclasscount \
  A_READ_LAYER=silver \
  A_WRITE_LAYER=gold

What it does:

Reads user story requirements from Azure DevOps
Generates ETL transformation code
Creates appropriate tests
Follows project coding standards

Agent Session

make session  # Start persistent Claude Code session with dangerously-skip-permissions

Git Operations

Branch Merging

make merge_staging   # Merge from staging (adds all changes, commits, pulls with --no-ff)
make rebase_staging  # Rebase from staging (adds all changes, commits, rebases)

Environment Variables

Required for Azure DevOps MCP

export AZURE_DEVOPS_PAT="<your-personal-access-token>"
export AZURE_DEVOPS_ORGANIZATION="emstas"
export AZURE_DEVOPS_PROJECT="Program Unify"

Required for Azure Operations

See configuration.yaml for complete list of Azure environment variables.

Common Workflows

Complete Development Cycle

# 1. Generate test data
make generate_data

# 2. Run full pipeline
make run_all

# 3. Explore results
make harly

# 4. Run tests
python -m pytest python_files/testing/

# 5. Quality checks
ruff check python_files/
ruff format python_files/

Quick Table Development

# 1. Open file in VSCode
# 2. Run current file
make current_table

# 3. Check output in DuckDB
make harly

Quality Gates Before Commit

# Must run these before committing
python3 -m py_compile <file>  # 1. Syntax check
ruff check python_files/       # 2. Linting (must pass)
ruff format python_files/      # 3. Format code

Troubleshooting Commands

Check Spark Session

spark-sql -e "SHOW DATABASES;"

Verify Azure Connection

make azure_login
az account show

Check Data Paths

ls -la /workspaces/data/

File Tracker Setup

One-time setup for make current_table:

make install_file_tracker
# Then reload VSCode

Notes

Data Deletion: Layer-specific commands delete existing data before running
Thrift Server: Port 10000 for JDBC/ODBC connections
DuckDB: Local analysis without Azure connection required
Quality Gates: Always run before committing code

5.7 KiB Raw Blame History