Initial commit
This commit is contained in:
666
skills/fabric-cli/references/notebooks.md
Normal file
666
skills/fabric-cli/references/notebooks.md
Normal file
@@ -0,0 +1,666 @@
|
||||
# Notebook Operations
|
||||
|
||||
Comprehensive guide for working with Fabric notebooks using the Fabric CLI.
|
||||
|
||||
## Overview
|
||||
|
||||
Fabric notebooks are interactive documents for data engineering, data science, and analytics. They can be executed, scheduled, and managed via the CLI.
|
||||
|
||||
## Getting Notebook Information
|
||||
|
||||
### Basic Notebook Info
|
||||
|
||||
```bash
|
||||
# Check if notebook exists
|
||||
fab exists "Production.Workspace/ETL Pipeline.Notebook"
|
||||
|
||||
# Get notebook properties
|
||||
fab get "Production.Workspace/ETL Pipeline.Notebook"
|
||||
|
||||
# Get with verbose details
|
||||
fab get "Production.Workspace/ETL Pipeline.Notebook" -v
|
||||
|
||||
# Get only notebook ID
|
||||
fab get "Production.Workspace/ETL Pipeline.Notebook" -q "id"
|
||||
```
|
||||
|
||||
### Get Notebook Definition
|
||||
|
||||
```bash
|
||||
# Get full notebook definition
|
||||
fab get "Production.Workspace/ETL Pipeline.Notebook" -q "definition"
|
||||
|
||||
# Save definition to file
|
||||
fab get "Production.Workspace/ETL Pipeline.Notebook" -q "definition" -o /tmp/notebook-def.json
|
||||
|
||||
# Get notebook content (cells)
|
||||
fab get "Production.Workspace/ETL Pipeline.Notebook" -q "definition.parts[?path=='notebook-content.py'].payload | [0]"
|
||||
```
|
||||
|
||||
## Exporting Notebooks
|
||||
|
||||
### Export as IPYNB
|
||||
|
||||
```bash
|
||||
# Export notebook
|
||||
fab export "Production.Workspace/ETL Pipeline.Notebook" -o /tmp/notebooks
|
||||
|
||||
# This creates:
|
||||
# /tmp/notebooks/ETL Pipeline.Notebook/
|
||||
# ├── notebook-content.py (or .ipynb)
|
||||
# └── metadata files
|
||||
```
|
||||
|
||||
### Export All Notebooks from Workspace
|
||||
|
||||
```bash
|
||||
# Export all notebooks
|
||||
WS_ID=$(fab get "Production.Workspace" -q "id")
|
||||
NOTEBOOKS=$(fab api "workspaces/$WS_ID/items" -q "value[?type=='Notebook'].displayName")
|
||||
|
||||
for NOTEBOOK in $NOTEBOOKS; do
|
||||
fab export "Production.Workspace/$NOTEBOOK.Notebook" -o /tmp/notebooks
|
||||
done
|
||||
```
|
||||
|
||||
## Importing Notebooks
|
||||
|
||||
### Import from Local
|
||||
|
||||
```bash
|
||||
# Import notebook from .ipynb format (default)
|
||||
fab import "Production.Workspace/New ETL.Notebook" -i /tmp/notebooks/ETL\ Pipeline.Notebook
|
||||
|
||||
# Import from .py format
|
||||
fab import "Production.Workspace/Script.Notebook" -i /tmp/script.py --format py
|
||||
```
|
||||
|
||||
### Copy Between Workspaces
|
||||
|
||||
```bash
|
||||
# Copy notebook
|
||||
fab cp "Dev.Workspace/ETL.Notebook" "Production.Workspace"
|
||||
|
||||
# Copy with new name
|
||||
fab cp "Dev.Workspace/ETL.Notebook" "Production.Workspace/Prod ETL.Notebook"
|
||||
```
|
||||
|
||||
## Creating Notebooks
|
||||
|
||||
### Create Blank Notebook
|
||||
|
||||
```bash
|
||||
# Get workspace ID first
|
||||
fab get "Production.Workspace" -q "id"
|
||||
|
||||
# Create via API
|
||||
fab api -X post "workspaces/<workspace-id>/notebooks" -i '{"displayName": "New Data Processing"}'
|
||||
```
|
||||
|
||||
### Create and Configure Query Notebook
|
||||
|
||||
Use this workflow to create a notebook for querying lakehouse tables with Spark SQL.
|
||||
|
||||
#### Step 1: Create the notebook
|
||||
|
||||
```bash
|
||||
# Get workspace ID
|
||||
fab get "Sales.Workspace" -q "id"
|
||||
# Returns: 4caf7825-81ac-4c94-9e46-306b4c20a4d5
|
||||
|
||||
# Create notebook
|
||||
fab api -X post "workspaces/4caf7825-81ac-4c94-9e46-306b4c20a4d5/notebooks" -i '{"displayName": "Data Query"}'
|
||||
# Returns notebook ID: 97bbd18d-c293-46b8-8536-82fb8bc9bd58
|
||||
```
|
||||
|
||||
#### Step 2: Get lakehouse ID (required for notebook metadata)
|
||||
|
||||
```bash
|
||||
fab get "Sales.Workspace/SalesLH.Lakehouse" -q "id"
|
||||
# Returns: ddbcc575-805b-4922-84db-ca451b318755
|
||||
```
|
||||
|
||||
#### Step 3: Create notebook code in Fabric format
|
||||
|
||||
```bash
|
||||
cat > /tmp/notebook.py <<'EOF'
|
||||
# Fabric notebook source
|
||||
|
||||
# METADATA ********************
|
||||
|
||||
# META {
|
||||
# META "kernel_info": {
|
||||
# META "name": "synapse_pyspark"
|
||||
# META },
|
||||
# META "dependencies": {
|
||||
# META "lakehouse": {
|
||||
# META "default_lakehouse": "ddbcc575-805b-4922-84db-ca451b318755",
|
||||
# META "default_lakehouse_name": "SalesLH",
|
||||
# META "default_lakehouse_workspace_id": "4caf7825-81ac-4c94-9e46-306b4c20a4d5"
|
||||
# META }
|
||||
# META }
|
||||
# META }
|
||||
|
||||
# CELL ********************
|
||||
|
||||
# Query lakehouse table
|
||||
df = spark.sql("""
|
||||
SELECT
|
||||
date_key,
|
||||
COUNT(*) as num_records
|
||||
FROM gold.sets
|
||||
GROUP BY date_key
|
||||
ORDER BY date_key DESC
|
||||
LIMIT 10
|
||||
""")
|
||||
|
||||
# IMPORTANT: Convert to pandas and print to capture output
|
||||
# display(df) will NOT show results via API
|
||||
pandas_df = df.toPandas()
|
||||
print(pandas_df)
|
||||
print(f"\nLatest date: {pandas_df.iloc[0]['date_key']}")
|
||||
EOF
|
||||
```
|
||||
|
||||
#### Step 4: Base64 encode and create update definition
|
||||
|
||||
```bash
|
||||
base64 -i /tmp/notebook.py > /tmp/notebook-b64.txt
|
||||
|
||||
cat > /tmp/update.json <<EOF
|
||||
{
|
||||
"definition": {
|
||||
"parts": [
|
||||
{
|
||||
"path": "notebook-content.py",
|
||||
"payload": "$(cat /tmp/notebook-b64.txt)",
|
||||
"payloadType": "InlineBase64"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
EOF
|
||||
```
|
||||
|
||||
#### Step 5: Update notebook with code
|
||||
|
||||
```bash
|
||||
fab api -X post "workspaces/4caf7825-81ac-4c94-9e46-306b4c20a4d5/notebooks/97bbd18d-c293-46b8-8536-82fb8bc9bd58/updateDefinition" -i /tmp/update.json --show_headers
|
||||
# Returns operation ID in Location header
|
||||
```
|
||||
|
||||
#### Step 6: Check update completed
|
||||
|
||||
```bash
|
||||
fab api "operations/<operation-id>"
|
||||
# Wait for status: "Succeeded"
|
||||
```
|
||||
|
||||
#### Step 7: Run the notebook
|
||||
|
||||
```bash
|
||||
fab job start "Sales.Workspace/Data Query.Notebook"
|
||||
# Returns job instance ID
|
||||
```
|
||||
|
||||
#### Step 8: Check execution status
|
||||
|
||||
```bash
|
||||
fab job run-status "Sales.Workspace/Data Query.Notebook" --id <job-id>
|
||||
# Wait for status: "Completed"
|
||||
```
|
||||
|
||||
#### Step 9: Get results (download from Fabric UI)
|
||||
|
||||
- Open notebook in Fabric UI after execution
|
||||
- Print output will be visible in cell outputs
|
||||
- Download .ipynb file to see printed results locally
|
||||
|
||||
#### Critical Requirements
|
||||
|
||||
1. **File format**: Must be `notebook-content.py` (NOT `.ipynb`)
|
||||
2. **Lakehouse ID**: Must include `default_lakehouse` ID in metadata (not just name)
|
||||
3. **Spark session**: Will be automatically available when lakehouse is attached
|
||||
4. **Capturing output**: Use `df.toPandas()` and `print()` - `display()` won't show in API
|
||||
5. **Results location**: Print output visible in UI and downloaded .ipynb, NOT in definition
|
||||
|
||||
#### Common Issues
|
||||
|
||||
- `NameError: name 'spark' is not defined` - Lakehouse not attached (missing default_lakehouse ID)
|
||||
- Job "Completed" but no results - Used display() instead of print()
|
||||
- Update fails - Used .ipynb path instead of .py
|
||||
|
||||
### Create from Template
|
||||
|
||||
```bash
|
||||
# Export template
|
||||
fab export "Templates.Workspace/Template Notebook.Notebook" -o /tmp/templates
|
||||
|
||||
# Import as new notebook
|
||||
fab import "Production.Workspace/Custom Notebook.Notebook" -i /tmp/templates/Template\ Notebook.Notebook
|
||||
```
|
||||
|
||||
## Running Notebooks
|
||||
|
||||
### Run Synchronously (Wait for Completion)
|
||||
|
||||
```bash
|
||||
# Run notebook and wait
|
||||
fab job run "Production.Workspace/ETL Pipeline.Notebook"
|
||||
|
||||
# Run with timeout (seconds)
|
||||
fab job run "Production.Workspace/Long Process.Notebook" --timeout 600
|
||||
```
|
||||
|
||||
### Run with Parameters
|
||||
|
||||
```bash
|
||||
# Run with basic parameters
|
||||
fab job run "Production.Workspace/ETL Pipeline.Notebook" -P \
|
||||
date:string=2024-01-01,\
|
||||
batch_size:int=1000,\
|
||||
debug_mode:bool=false,\
|
||||
threshold:float=0.95
|
||||
|
||||
# Parameters must match types defined in notebook
|
||||
# Supported types: string, int, float, bool
|
||||
```
|
||||
|
||||
### Run with Spark Configuration
|
||||
|
||||
```bash
|
||||
# Run with custom Spark settings
|
||||
fab job run "Production.Workspace/Big Data Processing.Notebook" -C '{
|
||||
"conf": {
|
||||
"spark.executor.memory": "8g",
|
||||
"spark.executor.cores": "4",
|
||||
"spark.dynamicAllocation.enabled": "true"
|
||||
},
|
||||
"environment": {
|
||||
"id": "<environment-id>",
|
||||
"name": "Production Environment"
|
||||
}
|
||||
}'
|
||||
|
||||
# Run with default lakehouse
|
||||
fab job run "Production.Workspace/Data Ingestion.Notebook" -C '{
|
||||
"defaultLakehouse": {
|
||||
"name": "MainLakehouse",
|
||||
"id": "<lakehouse-id>",
|
||||
"workspaceId": "<workspace-id>"
|
||||
}
|
||||
}'
|
||||
|
||||
# Run with workspace Spark pool
|
||||
fab job run "Production.Workspace/Analytics.Notebook" -C '{
|
||||
"useStarterPool": false,
|
||||
"useWorkspacePool": "HighMemoryPool"
|
||||
}'
|
||||
```
|
||||
|
||||
### Run with Combined Parameters and Configuration
|
||||
|
||||
```bash
|
||||
# Combine parameters and configuration
|
||||
fab job run "Production.Workspace/ETL Pipeline.Notebook" \
|
||||
-P date:string=2024-01-01,batch:int=500 \
|
||||
-C '{
|
||||
"defaultLakehouse": {"name": "StagingLH", "id": "<lakehouse-id>"},
|
||||
"conf": {"spark.sql.shuffle.partitions": "200"}
|
||||
}'
|
||||
```
|
||||
|
||||
### Run Asynchronously
|
||||
|
||||
```bash
|
||||
# Start notebook and return immediately
|
||||
JOB_ID=$(fab job start "Production.Workspace/ETL Pipeline.Notebook" | grep -o '"id": "[^"]*"' | cut -d'"' -f4)
|
||||
|
||||
# Check status later
|
||||
fab job run-status "Production.Workspace/ETL Pipeline.Notebook" --id "$JOB_ID"
|
||||
```
|
||||
|
||||
## Monitoring Notebook Executions
|
||||
|
||||
### Get Job Status
|
||||
|
||||
```bash
|
||||
# Check specific job
|
||||
fab job run-status "Production.Workspace/ETL Pipeline.Notebook" --id <job-id>
|
||||
|
||||
# Get detailed status via API
|
||||
WS_ID=$(fab get "Production.Workspace" -q "id")
|
||||
NOTEBOOK_ID=$(fab get "Production.Workspace/ETL Pipeline.Notebook" -q "id")
|
||||
fab api "workspaces/$WS_ID/items/$NOTEBOOK_ID/jobs/instances/<job-id>"
|
||||
```
|
||||
|
||||
### List Execution History
|
||||
|
||||
```bash
|
||||
# List all job runs
|
||||
fab job run-list "Production.Workspace/ETL Pipeline.Notebook"
|
||||
|
||||
# List only scheduled runs
|
||||
fab job run-list "Production.Workspace/ETL Pipeline.Notebook" --schedule
|
||||
|
||||
# Get latest run status
|
||||
fab job run-list "Production.Workspace/ETL Pipeline.Notebook" | head -n 1
|
||||
```
|
||||
|
||||
### Cancel Running Job
|
||||
|
||||
```bash
|
||||
fab job run-cancel "Production.Workspace/ETL Pipeline.Notebook" --id <job-id>
|
||||
```
|
||||
|
||||
## Scheduling Notebooks
|
||||
|
||||
### Create Cron Schedule
|
||||
|
||||
```bash
|
||||
# Run every 30 minutes
|
||||
fab job run-sch "Production.Workspace/ETL Pipeline.Notebook" \
|
||||
--type cron \
|
||||
--interval 30 \
|
||||
--start 2024-11-15T00:00:00 \
|
||||
--end 2025-12-31T23:59:00 \
|
||||
--enable
|
||||
```
|
||||
|
||||
### Create Daily Schedule
|
||||
|
||||
```bash
|
||||
# Run daily at 2 AM and 2 PM
|
||||
fab job run-sch "Production.Workspace/ETL Pipeline.Notebook" \
|
||||
--type daily \
|
||||
--interval 02:00,14:00 \
|
||||
--start 2024-11-15T00:00:00 \
|
||||
--end 2025-12-31T23:59:00 \
|
||||
--enable
|
||||
```
|
||||
|
||||
### Create Weekly Schedule
|
||||
|
||||
```bash
|
||||
# Run Monday and Friday at 9 AM
|
||||
fab job run-sch "Production.Workspace/Weekly Report.Notebook" \
|
||||
--type weekly \
|
||||
--interval 09:00 \
|
||||
--days Monday,Friday \
|
||||
--start 2024-11-15T00:00:00 \
|
||||
--enable
|
||||
```
|
||||
|
||||
### Update Schedule
|
||||
|
||||
```bash
|
||||
# Modify existing schedule
|
||||
fab job run-update "Production.Workspace/ETL Pipeline.Notebook" \
|
||||
--id <schedule-id> \
|
||||
--type daily \
|
||||
--interval 03:00 \
|
||||
--enable
|
||||
|
||||
# Disable schedule
|
||||
fab job run-update "Production.Workspace/ETL Pipeline.Notebook" \
|
||||
--id <schedule-id> \
|
||||
--disable
|
||||
```
|
||||
|
||||
## Notebook Configuration
|
||||
|
||||
### Set Default Lakehouse
|
||||
|
||||
```bash
|
||||
# Via notebook properties
|
||||
fab set "Production.Workspace/ETL.Notebook" -q lakehouse -i '{
|
||||
"known_lakehouses": [{"id": "<lakehouse-id>"}],
|
||||
"default_lakehouse": "<lakehouse-id>",
|
||||
"default_lakehouse_name": "MainLakehouse",
|
||||
"default_lakehouse_workspace_id": "<workspace-id>"
|
||||
}'
|
||||
```
|
||||
|
||||
### Set Default Environment
|
||||
|
||||
```bash
|
||||
fab set "Production.Workspace/ETL.Notebook" -q environment -i '{
|
||||
"environmentId": "<environment-id>",
|
||||
"workspaceId": "<workspace-id>"
|
||||
}'
|
||||
```
|
||||
|
||||
### Set Default Warehouse
|
||||
|
||||
```bash
|
||||
fab set "Production.Workspace/Analytics.Notebook" -q warehouse -i '{
|
||||
"known_warehouses": [{"id": "<warehouse-id>", "type": "Datawarehouse"}],
|
||||
"default_warehouse": "<warehouse-id>"
|
||||
}'
|
||||
```
|
||||
|
||||
## Updating Notebooks
|
||||
|
||||
### Update Display Name
|
||||
|
||||
```bash
|
||||
fab set "Production.Workspace/ETL.Notebook" -q displayName -i "ETL Pipeline v2"
|
||||
```
|
||||
|
||||
### Update Description
|
||||
|
||||
```bash
|
||||
fab set "Production.Workspace/ETL.Notebook" -q description -i "Daily ETL pipeline for sales data ingestion and transformation"
|
||||
```
|
||||
|
||||
## Deleting Notebooks
|
||||
|
||||
```bash
|
||||
# Delete with confirmation (interactive)
|
||||
fab rm "Dev.Workspace/Old Notebook.Notebook"
|
||||
|
||||
# Force delete without confirmation
|
||||
fab rm "Dev.Workspace/Old Notebook.Notebook" -f
|
||||
```
|
||||
|
||||
## Advanced Workflows
|
||||
|
||||
### Parameterized Notebook Execution
|
||||
|
||||
```python
|
||||
# Create parametrized notebook with cell tagged as "parameters"
|
||||
# In notebook, create cell:
|
||||
date = "2024-01-01" # default
|
||||
batch_size = 1000 # default
|
||||
debug = False # default
|
||||
|
||||
# Execute with different parameters
|
||||
fab job run "Production.Workspace/Parameterized.Notebook" -P \
|
||||
date:string=2024-02-15,\
|
||||
batch_size:int=2000,\
|
||||
debug:bool=true
|
||||
```
|
||||
|
||||
### Notebook Orchestration Pipeline
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
|
||||
WORKSPACE="Production.Workspace"
|
||||
DATE=$(date +%Y-%m-%d)
|
||||
|
||||
# 1. Run ingestion notebook
|
||||
echo "Starting data ingestion..."
|
||||
fab job run "$WORKSPACE/1_Ingest_Data.Notebook" -P date:string=$DATE
|
||||
|
||||
# 2. Run transformation notebook
|
||||
echo "Running transformations..."
|
||||
fab job run "$WORKSPACE/2_Transform_Data.Notebook" -P date:string=$DATE
|
||||
|
||||
# 3. Run analytics notebook
|
||||
echo "Generating analytics..."
|
||||
fab job run "$WORKSPACE/3_Analytics.Notebook" -P date:string=$DATE
|
||||
|
||||
# 4. Run reporting notebook
|
||||
echo "Creating reports..."
|
||||
fab job run "$WORKSPACE/4_Reports.Notebook" -P date:string=$DATE
|
||||
|
||||
echo "Pipeline completed for $DATE"
|
||||
```
|
||||
|
||||
### Monitoring Long-Running Notebook
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
|
||||
NOTEBOOK="Production.Workspace/Long Process.Notebook"
|
||||
|
||||
# Start job
|
||||
JOB_ID=$(fab job start "$NOTEBOOK" -P date:string=$(date +%Y-%m-%d) | \
|
||||
grep -o '"id": "[^"]*"' | head -1 | cut -d'"' -f4)
|
||||
|
||||
echo "Started job: $JOB_ID"
|
||||
|
||||
# Poll status every 30 seconds
|
||||
while true; do
|
||||
STATUS=$(fab job run-status "$NOTEBOOK" --id "$JOB_ID" | \
|
||||
grep -o '"status": "[^"]*"' | cut -d'"' -f4)
|
||||
|
||||
echo "[$(date +%H:%M:%S)] Status: $STATUS"
|
||||
|
||||
if [[ "$STATUS" == "Completed" ]] || [[ "$STATUS" == "Failed" ]]; then
|
||||
break
|
||||
fi
|
||||
|
||||
sleep 30
|
||||
done
|
||||
|
||||
if [[ "$STATUS" == "Completed" ]]; then
|
||||
echo "Job completed successfully"
|
||||
exit 0
|
||||
else
|
||||
echo "Job failed"
|
||||
exit 1
|
||||
fi
|
||||
```
|
||||
|
||||
### Conditional Notebook Execution
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
|
||||
WORKSPACE="Production.Workspace"
|
||||
|
||||
# Check if data is ready
|
||||
DATA_READY=$(fab api "workspaces/<ws-id>/lakehouses/<lh-id>/Files/ready.flag" 2>&1 | grep -c "200")
|
||||
|
||||
if [ "$DATA_READY" -eq 1 ]; then
|
||||
echo "Data ready, running notebook..."
|
||||
fab job run "$WORKSPACE/Process Data.Notebook" -P date:string=$(date +%Y-%m-%d)
|
||||
else
|
||||
echo "Data not ready, skipping execution"
|
||||
fi
|
||||
```
|
||||
|
||||
## Notebook Definition Structure
|
||||
|
||||
Notebook definition contains:
|
||||
|
||||
NotebookName.Notebook/
|
||||
├── .platform # Git integration metadata
|
||||
├── notebook-content.py # Python code (or .ipynb format)
|
||||
└── metadata.json # Notebook metadata
|
||||
|
||||
### Query Notebook Content
|
||||
|
||||
```bash
|
||||
NOTEBOOK="Production.Workspace/ETL.Notebook"
|
||||
|
||||
# Get Python code content
|
||||
fab get "$NOTEBOOK" -q "definition.parts[?path=='notebook-content.py'].payload | [0]" | base64 -d
|
||||
|
||||
# Get metadata
|
||||
fab get "$NOTEBOOK" -q "definition.parts[?path=='metadata.json'].payload | [0]" | base64 -d | jq .
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Notebook Execution Failures
|
||||
|
||||
```bash
|
||||
# Check recent execution
|
||||
fab job run-list "Production.Workspace/ETL.Notebook" | head -n 5
|
||||
|
||||
# Get detailed error
|
||||
fab job run-status "Production.Workspace/ETL.Notebook" --id <job-id> -q "error"
|
||||
|
||||
# Common issues:
|
||||
# - Lakehouse not attached
|
||||
# - Invalid parameters
|
||||
# - Spark configuration errors
|
||||
# - Missing dependencies
|
||||
```
|
||||
|
||||
### Parameter Type Mismatches
|
||||
|
||||
```bash
|
||||
# Parameters must match expected types
|
||||
# ❌ Wrong: -P count:string=100 (should be int)
|
||||
# ✅ Right: -P count:int=100
|
||||
|
||||
# Check notebook definition for parameter types
|
||||
fab get "Production.Workspace/ETL.Notebook" -q "definition.parts[?path=='notebook-content.py']"
|
||||
```
|
||||
|
||||
### Lakehouse Access Issues
|
||||
|
||||
```bash
|
||||
# Verify lakehouse exists and is accessible
|
||||
fab exists "Production.Workspace/MainLakehouse.Lakehouse"
|
||||
|
||||
# Check notebook's lakehouse configuration
|
||||
fab get "Production.Workspace/ETL.Notebook" -q "properties.lakehouse"
|
||||
|
||||
# Re-attach lakehouse
|
||||
fab set "Production.Workspace/ETL.Notebook" -q lakehouse -i '{
|
||||
"known_lakehouses": [{"id": "<lakehouse-id>"}],
|
||||
"default_lakehouse": "<lakehouse-id>",
|
||||
"default_lakehouse_name": "MainLakehouse",
|
||||
"default_lakehouse_workspace_id": "<workspace-id>"
|
||||
}'
|
||||
```
|
||||
|
||||
## Performance Tips
|
||||
|
||||
1. **Use workspace pools**: Faster startup than starter pool
|
||||
2. **Cache data in lakehouses**: Avoid re-fetching data
|
||||
3. **Parameterize notebooks**: Reuse logic with different inputs
|
||||
4. **Monitor execution time**: Set appropriate timeouts
|
||||
5. **Use async execution**: Don't block on long-running notebooks
|
||||
6. **Optimize Spark config**: Tune for specific workloads
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Tag parameter cells**: Use "parameters" tag for injected params
|
||||
2. **Handle failures gracefully**: Add error handling and logging
|
||||
3. **Version control notebooks**: Export and commit to Git
|
||||
4. **Use descriptive names**: Clear naming for scheduled jobs
|
||||
5. **Document parameters**: Add comments explaining expected inputs
|
||||
6. **Test locally first**: Validate in development workspace
|
||||
7. **Monitor schedules**: Review execution history regularly
|
||||
8. **Clean up old notebooks**: Remove unused notebooks
|
||||
|
||||
## Security Considerations
|
||||
|
||||
1. **Credential management**: Use Key Vault for secrets
|
||||
2. **Workspace permissions**: Control who can execute notebooks
|
||||
3. **Parameter validation**: Sanitize inputs in notebook code
|
||||
4. **Data access**: Respect lakehouse/warehouse permissions
|
||||
5. **Logging**: Don't log sensitive information
|
||||
|
||||
## Related Scripts
|
||||
|
||||
- `scripts/run_notebook_pipeline.py` - Orchestrate multiple notebooks
|
||||
- `scripts/monitor_notebook.py` - Monitor long-running executions
|
||||
- `scripts/export_notebook.py` - Export with validation
|
||||
- `scripts/schedule_notebook.py` - Simplified scheduling interface
|
||||
Reference in New Issue
Block a user