Initial commit
This commit is contained in:
247
skills/project-commands.md
Normal file
247
skills/project-commands.md
Normal file
@@ -0,0 +1,247 @@
|
||||
---
|
||||
name: project-commands
|
||||
description: Complete reference for all make commands, development workflows, Azure operations, and database operations. Use when you need to know how to run specific operations.
|
||||
---
|
||||
|
||||
# Project Commands Reference
|
||||
|
||||
Complete command reference for the Unify data migration project.
|
||||
|
||||
## Build & Test Commands
|
||||
|
||||
### Syntax Validation
|
||||
```bash
|
||||
python3 -m py_compile <file_path>
|
||||
python3 -m py_compile python_files/utilities/session_optimiser.py
|
||||
```
|
||||
|
||||
### Code Quality
|
||||
```bash
|
||||
ruff check python_files/ # Linting (must pass)
|
||||
ruff format python_files/ # Auto-format code
|
||||
```
|
||||
|
||||
### Testing
|
||||
```bash
|
||||
python -m pytest python_files/testing/ # All tests
|
||||
python -m pytest python_files/testing/medallion_testing.py # Integration
|
||||
```
|
||||
|
||||
## Pipeline Commands
|
||||
|
||||
### Complete Pipeline
|
||||
```bash
|
||||
make run_all # Executes: choice_list_mapper → bronze → silver → gold → build_duckdb
|
||||
```
|
||||
|
||||
### Layer-Specific (WARNING: Deletes existing layer data)
|
||||
```bash
|
||||
make bronze # Bronze layer pipeline (deletes /workspaces/data/bronze_*)
|
||||
make run_silver # Silver layer (includes choice_list_mapper, deletes /workspaces/data/silver_*)
|
||||
make gold # Gold layer (includes DuckDB build, deletes /workspaces/data/gold_*)
|
||||
```
|
||||
|
||||
### Specific Table Execution
|
||||
```bash
|
||||
# Run specific silver table
|
||||
make silver_table FILE_READ_LAYER=silver PATH_DATABASE=silver_fvms RUN_FILE_NAME=s_fvms_incident
|
||||
|
||||
# Run specific gold table
|
||||
make gold_table G_RUN_FILE_NAME=g_x_mg_statsclasscount
|
||||
|
||||
# Run currently open file (auto-detects layer and database)
|
||||
make current_table # Requires: make install_file_tracker (run once, then reload VSCode)
|
||||
```
|
||||
|
||||
## Development Workflow
|
||||
|
||||
### Interactive UI
|
||||
```bash
|
||||
make ui # Interactive menu for all commands
|
||||
```
|
||||
|
||||
### Data Generation
|
||||
```bash
|
||||
make generate_data # Generate synthetic test data
|
||||
```
|
||||
|
||||
## Spark Thrift Server
|
||||
|
||||
Enables JDBC/ODBC connections to local Spark data on port 10000:
|
||||
|
||||
```bash
|
||||
make thrift-start # Start server
|
||||
make thrift-status # Check if running
|
||||
make thrift-stop # Stop server
|
||||
|
||||
# Connect via spark-sql CLI
|
||||
spark-sql -e "SHOW DATABASES; SHOW TABLES;"
|
||||
spark-sql -e "SELECT * FROM gold_data_model.g_x_mg_statsclasscount LIMIT 10;"
|
||||
```
|
||||
|
||||
## Database Operations
|
||||
|
||||
### Database Inspection
|
||||
```bash
|
||||
make database-check # Check Hive databases and tables
|
||||
|
||||
# View schemas
|
||||
spark-sql -e "SHOW DATABASES; SHOW TABLES;"
|
||||
```
|
||||
|
||||
### DuckDB Operations
|
||||
```bash
|
||||
make build_duckdb # Build local DuckDB database (/workspaces/data/warehouse.duckdb)
|
||||
make harly # Open Harlequin TUI for interactive DuckDB queries
|
||||
```
|
||||
|
||||
**DuckDB Benefits**:
|
||||
- Fast local queries without Azure connection
|
||||
- Data exploration and validation
|
||||
- Report prototyping
|
||||
- Testing query logic before deploying to Synapse
|
||||
|
||||
## Azure Operations
|
||||
|
||||
### Authentication
|
||||
```bash
|
||||
make azure_login # Azure CLI login
|
||||
```
|
||||
|
||||
### SharePoint Integration
|
||||
```bash
|
||||
# Download SharePoint files
|
||||
make download_sharepoint SHAREPOINT_FILE_ID=<file-id>
|
||||
|
||||
# Convert Excel to JSON
|
||||
make convert_excel_to_json
|
||||
|
||||
# Upload to Azure Storage
|
||||
make upload_to_storage UPLOAD_FILE=<file-path>
|
||||
```
|
||||
|
||||
### Complete Pipelines
|
||||
```bash
|
||||
# Offence mapping pipeline
|
||||
make offence_mapping_build # download_sharepoint → convert_excel_to_json → upload_to_storage
|
||||
|
||||
# Table list management
|
||||
make table_lists_pipeline # download_ors_table_mapping → generate_table_lists → upload_all_table_lists
|
||||
make update_pipeline_variables # Update Azure Synapse pipeline variables
|
||||
```
|
||||
|
||||
## AI Agent Integration
|
||||
|
||||
### User Story Processing
|
||||
Automate ETL file generation from Azure DevOps user stories:
|
||||
|
||||
```bash
|
||||
make user_story_build \
|
||||
A_USER_STORY=44687 \
|
||||
A_FILE_NAME=g_x_mg_statsclasscount \
|
||||
A_READ_LAYER=silver \
|
||||
A_WRITE_LAYER=gold
|
||||
```
|
||||
|
||||
**What it does**:
|
||||
- Reads user story requirements from Azure DevOps
|
||||
- Generates ETL transformation code
|
||||
- Creates appropriate tests
|
||||
- Follows project coding standards
|
||||
|
||||
### Agent Session
|
||||
```bash
|
||||
make session # Start persistent Claude Code session with dangerously-skip-permissions
|
||||
```
|
||||
|
||||
## Git Operations
|
||||
|
||||
### Branch Merging
|
||||
```bash
|
||||
make merge_staging # Merge from staging (adds all changes, commits, pulls with --no-ff)
|
||||
make rebase_staging # Rebase from staging (adds all changes, commits, rebases)
|
||||
```
|
||||
|
||||
## Environment Variables
|
||||
|
||||
### Required for Azure DevOps MCP
|
||||
```bash
|
||||
export AZURE_DEVOPS_PAT="<your-personal-access-token>"
|
||||
export AZURE_DEVOPS_ORGANIZATION="emstas"
|
||||
export AZURE_DEVOPS_PROJECT="Program Unify"
|
||||
```
|
||||
|
||||
### Required for Azure Operations
|
||||
See `configuration.yaml` for complete list of Azure environment variables.
|
||||
|
||||
## Common Workflows
|
||||
|
||||
### Complete Development Cycle
|
||||
```bash
|
||||
# 1. Generate test data
|
||||
make generate_data
|
||||
|
||||
# 2. Run full pipeline
|
||||
make run_all
|
||||
|
||||
# 3. Explore results
|
||||
make harly
|
||||
|
||||
# 4. Run tests
|
||||
python -m pytest python_files/testing/
|
||||
|
||||
# 5. Quality checks
|
||||
ruff check python_files/
|
||||
ruff format python_files/
|
||||
```
|
||||
|
||||
### Quick Table Development
|
||||
```bash
|
||||
# 1. Open file in VSCode
|
||||
# 2. Run current file
|
||||
make current_table
|
||||
|
||||
# 3. Check output in DuckDB
|
||||
make harly
|
||||
```
|
||||
|
||||
### Quality Gates Before Commit
|
||||
```bash
|
||||
# Must run these before committing
|
||||
python3 -m py_compile <file> # 1. Syntax check
|
||||
ruff check python_files/ # 2. Linting (must pass)
|
||||
ruff format python_files/ # 3. Format code
|
||||
```
|
||||
|
||||
## Troubleshooting Commands
|
||||
|
||||
### Check Spark Session
|
||||
```bash
|
||||
spark-sql -e "SHOW DATABASES;"
|
||||
```
|
||||
|
||||
### Verify Azure Connection
|
||||
```bash
|
||||
make azure_login
|
||||
az account show
|
||||
```
|
||||
|
||||
### Check Data Paths
|
||||
```bash
|
||||
ls -la /workspaces/data/
|
||||
```
|
||||
|
||||
## File Tracker Setup
|
||||
|
||||
One-time setup for `make current_table`:
|
||||
```bash
|
||||
make install_file_tracker
|
||||
# Then reload VSCode
|
||||
```
|
||||
|
||||
## Notes
|
||||
|
||||
- **Data Deletion**: Layer-specific commands delete existing data before running
|
||||
- **Thrift Server**: Port 10000 for JDBC/ODBC connections
|
||||
- **DuckDB**: Local analysis without Azure connection required
|
||||
- **Quality Gates**: Always run before committing code
|
||||
Reference in New Issue
Block a user