--- name: project-commands description: Complete reference for all make commands, development workflows, Azure operations, and database operations. Use when you need to know how to run specific operations. --- # Project Commands Reference Complete command reference for the Unify data migration project. ## Build & Test Commands ### Syntax Validation ```bash python3 -m py_compile python3 -m py_compile python_files/utilities/session_optimiser.py ``` ### Code Quality ```bash ruff check python_files/ # Linting (must pass) ruff format python_files/ # Auto-format code ``` ### Testing ```bash python -m pytest python_files/testing/ # All tests python -m pytest python_files/testing/medallion_testing.py # Integration ``` ## Pipeline Commands ### Complete Pipeline ```bash make run_all # Executes: choice_list_mapper → bronze → silver → gold → build_duckdb ``` ### Layer-Specific (WARNING: Deletes existing layer data) ```bash make bronze # Bronze layer pipeline (deletes /workspaces/data/bronze_*) make run_silver # Silver layer (includes choice_list_mapper, deletes /workspaces/data/silver_*) make gold # Gold layer (includes DuckDB build, deletes /workspaces/data/gold_*) ``` ### Specific Table Execution ```bash # Run specific silver table make silver_table FILE_READ_LAYER=silver PATH_DATABASE=silver_fvms RUN_FILE_NAME=s_fvms_incident # Run specific gold table make gold_table G_RUN_FILE_NAME=g_x_mg_statsclasscount # Run currently open file (auto-detects layer and database) make current_table # Requires: make install_file_tracker (run once, then reload VSCode) ``` ## Development Workflow ### Interactive UI ```bash make ui # Interactive menu for all commands ``` ### Data Generation ```bash make generate_data # Generate synthetic test data ``` ## Spark Thrift Server Enables JDBC/ODBC connections to local Spark data on port 10000: ```bash make thrift-start # Start server make thrift-status # Check if running make thrift-stop # Stop server # Connect via spark-sql CLI spark-sql -e "SHOW DATABASES; SHOW TABLES;" spark-sql -e "SELECT * FROM gold_data_model.g_x_mg_statsclasscount LIMIT 10;" ``` ## Database Operations ### Database Inspection ```bash make database-check # Check Hive databases and tables # View schemas spark-sql -e "SHOW DATABASES; SHOW TABLES;" ``` ### DuckDB Operations ```bash make build_duckdb # Build local DuckDB database (/workspaces/data/warehouse.duckdb) make harly # Open Harlequin TUI for interactive DuckDB queries ``` **DuckDB Benefits**: - Fast local queries without Azure connection - Data exploration and validation - Report prototyping - Testing query logic before deploying to Synapse ## Azure Operations ### Authentication ```bash make azure_login # Azure CLI login ``` ### SharePoint Integration ```bash # Download SharePoint files make download_sharepoint SHAREPOINT_FILE_ID= # Convert Excel to JSON make convert_excel_to_json # Upload to Azure Storage make upload_to_storage UPLOAD_FILE= ``` ### Complete Pipelines ```bash # Offence mapping pipeline make offence_mapping_build # download_sharepoint → convert_excel_to_json → upload_to_storage # Table list management make table_lists_pipeline # download_ors_table_mapping → generate_table_lists → upload_all_table_lists make update_pipeline_variables # Update Azure Synapse pipeline variables ``` ## AI Agent Integration ### User Story Processing Automate ETL file generation from Azure DevOps user stories: ```bash make user_story_build \ A_USER_STORY=44687 \ A_FILE_NAME=g_x_mg_statsclasscount \ A_READ_LAYER=silver \ A_WRITE_LAYER=gold ``` **What it does**: - Reads user story requirements from Azure DevOps - Generates ETL transformation code - Creates appropriate tests - Follows project coding standards ### Agent Session ```bash make session # Start persistent Claude Code session with dangerously-skip-permissions ``` ## Git Operations ### Branch Merging ```bash make merge_staging # Merge from staging (adds all changes, commits, pulls with --no-ff) make rebase_staging # Rebase from staging (adds all changes, commits, rebases) ``` ## Environment Variables ### Required for Azure DevOps MCP ```bash export AZURE_DEVOPS_PAT="" export AZURE_DEVOPS_ORGANIZATION="emstas" export AZURE_DEVOPS_PROJECT="Program Unify" ``` ### Required for Azure Operations See `configuration.yaml` for complete list of Azure environment variables. ## Common Workflows ### Complete Development Cycle ```bash # 1. Generate test data make generate_data # 2. Run full pipeline make run_all # 3. Explore results make harly # 4. Run tests python -m pytest python_files/testing/ # 5. Quality checks ruff check python_files/ ruff format python_files/ ``` ### Quick Table Development ```bash # 1. Open file in VSCode # 2. Run current file make current_table # 3. Check output in DuckDB make harly ``` ### Quality Gates Before Commit ```bash # Must run these before committing python3 -m py_compile # 1. Syntax check ruff check python_files/ # 2. Linting (must pass) ruff format python_files/ # 3. Format code ``` ## Troubleshooting Commands ### Check Spark Session ```bash spark-sql -e "SHOW DATABASES;" ``` ### Verify Azure Connection ```bash make azure_login az account show ``` ### Check Data Paths ```bash ls -la /workspaces/data/ ``` ## File Tracker Setup One-time setup for `make current_table`: ```bash make install_file_tracker # Then reload VSCode ``` ## Notes - **Data Deletion**: Layer-specific commands delete existing data before running - **Thrift Server**: Port 10000 for JDBC/ODBC connections - **DuckDB**: Local analysis without Azure connection required - **Quality Gates**: Always run before committing code