Files
gh-treasure-data-td-skills-…/skills/trino-cli/SKILL.md
2025-11-30 09:02:59 +08:00

855 lines
20 KiB
Markdown

---
name: trino-cli
description: Expert assistance for using the Trino CLI to query Treasure Data interactively from the command line. Use this skill when users need help with trino command-line tool, interactive query execution, connecting to TD via CLI, or terminal-based data exploration.
---
# Trino CLI for Treasure Data
Expert assistance for using the Trino CLI to query and explore Treasure Data interactively from the command line.
## When to Use This Skill
Use this skill when:
- Running interactive queries against TD from the terminal
- Exploring TD databases, tables, and schemas via command line
- Quick ad-hoc data analysis without opening a web console
- Writing shell scripts that execute TD queries
- Debugging queries with immediate feedback
- Working in terminal-based workflows (SSH, tmux, screen)
- Executing batch queries from the command line
- Testing queries before integrating into applications
## Core Principles
### 1. Installation
**Download Trino CLI:**
```bash
# Download the latest version
curl -o trino https://repo1.maven.org/maven2/io/trino/trino-cli/477/trino-cli-477-executable.jar
# Make it executable
chmod +x trino
# Move to PATH (optional)
sudo mv trino /usr/local/bin/
# Verify installation
trino --version
```
**Requirements:**
- Java 11 or later (Java 22+ recommended)
- Network access to TD API endpoint
- TD API key
**Alternative for Windows:**
```powershell
# Run with Java directly
java -jar trino-cli-477-executable.jar --version
```
### 2. Connecting to Treasure Data
**Basic Connection:**
```bash
trino \
--server https://api-presto.treasuredata.com \
--catalog td \
--user YOUR_TD_API_KEY \
--schema your_database
```
**Using Environment Variable:**
```bash
# Set TD API key as environment variable (recommended)
export TD_API_KEY="your_api_key_here"
# Connect using environment variable
trino \
--server https://api-presto.treasuredata.com \
--catalog td \
--user $TD_API_KEY \
--schema sample_datasets
```
**Regional Endpoints:**
- **US**: `https://api-presto.treasuredata.com`
- **Tokyo**: `https://api-presto.treasuredata.co.jp`
- **EU**: `https://api-presto.eu01.treasuredata.com`
### 3. Interactive Mode
Once connected, you enter an interactive SQL prompt:
```sql
trino:sample_datasets> SELECT COUNT(*) FROM nasdaq;
_col0
---------
8807790
(1 row)
Query 20250123_123456_00001_abcde, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0.45 [8.81M rows, 0B] [19.6M rows/s, 0B/s]
trino:sample_datasets> SHOW TABLES;
Table
-----------
nasdaq
www_access
(2 rows)
```
**Interactive Commands:**
- `QUIT` or `EXIT` - Exit the CLI
- `CLEAR` - Clear the screen
- `HELP` - Show help information
- `HISTORY` - Show command history
- `USE schema_name` - Switch to different database
- `SHOW CATALOGS` - List available catalogs
- `SHOW SCHEMAS` - List databases
- `SHOW TABLES` - List tables in current schema
- `DESCRIBE table_name` - Show table structure
- `EXPLAIN query` - Show query execution plan
### 4. Batch Mode (Non-Interactive)
Execute queries from command line without entering interactive mode:
**Single Query:**
```bash
trino \
--server https://api-presto.treasuredata.com \
--catalog td \
--user $TD_API_KEY \
--schema sample_datasets \
--execute "SELECT COUNT(*) FROM nasdaq"
```
**From File:**
```bash
trino \
--server https://api-presto.treasuredata.com \
--catalog td \
--user $TD_API_KEY \
--schema sample_datasets \
--file queries.sql
```
**From stdin (pipe):**
```bash
echo "SELECT symbol, COUNT(*) as cnt FROM nasdaq GROUP BY symbol LIMIT 10" | \
trino \
--server https://api-presto.treasuredata.com \
--catalog td \
--user $TD_API_KEY \
--schema sample_datasets
```
## Common Patterns
### Pattern 1: Interactive Data Exploration
```bash
# Connect to TD
export TD_API_KEY="your_api_key"
trino \
--server https://api-presto.treasuredata.com \
--catalog td \
--user $TD_API_KEY \
--schema sample_datasets
# Then in the interactive prompt:
```
```sql
-- List all databases
trino:sample_datasets> SHOW SCHEMAS;
-- Switch to a different database
trino:sample_datasets> USE analytics;
-- List tables
trino:analytics> SHOW TABLES;
-- Describe table structure
trino:analytics> DESCRIBE user_events;
-- Preview data
trino:analytics> SELECT * FROM user_events LIMIT 10;
-- Quick aggregation
trino:analytics> SELECT
event_name,
COUNT(*) as cnt
FROM user_events
WHERE TD_INTERVAL(time, '-1d', 'JST')
GROUP BY event_name
ORDER BY cnt DESC
LIMIT 10;
-- Exit
trino:analytics> EXIT;
```
**Explanation:** Interactive mode is perfect for exploring data, testing queries, and understanding table structures with immediate feedback.
### Pattern 2: Scripted Query Execution
```bash
#!/bin/bash
# daily_report.sh - Generate daily report from TD
export TD_API_KEY="your_api_key"
TD_SERVER="https://api-presto.treasuredata.com"
DATABASE="analytics"
# Create SQL file
cat > /tmp/daily_report.sql <<'EOF'
SELECT
TD_TIME_FORMAT(time, 'yyyy-MM-dd', 'JST') as date,
COUNT(*) as total_events,
COUNT(DISTINCT user_id) as unique_users,
APPROX_PERCENTILE(session_duration, 0.5) as median_duration
FROM user_events
WHERE TD_INTERVAL(time, '-1d', 'JST')
GROUP BY 1;
EOF
# Execute query and save results
trino \
--server $TD_SERVER \
--catalog td \
--user $TD_API_KEY \
--schema $DATABASE \
--file /tmp/daily_report.sql \
--output-format CSV_HEADER > daily_report_$(date +%Y%m%d).csv
echo "Report saved to daily_report_$(date +%Y%m%d).csv"
# Clean up
rm /tmp/daily_report.sql
```
**Explanation:** Batch mode is ideal for automation, scheduled reports, and integrating TD queries into shell scripts.
### Pattern 3: Multiple Queries with Error Handling
```bash
#!/bin/bash
# etl_pipeline.sh - Run multiple queries in sequence
export TD_API_KEY="your_api_key"
TD_SERVER="https://api-presto.treasuredata.com"
run_query() {
local query="$1"
local description="$2"
echo "Running: $description"
if trino \
--server $TD_SERVER \
--catalog td \
--user $TD_API_KEY \
--schema analytics \
--execute "$query"; then
echo "✓ Success: $description"
return 0
else
echo "✗ Failed: $description"
return 1
fi
}
# Step 1: Create aggregated table
run_query "
CREATE TABLE IF NOT EXISTS daily_summary AS
SELECT
TD_TIME_FORMAT(time, 'yyyy-MM-dd', 'JST') as date,
user_id,
COUNT(*) as event_count
FROM raw_events
WHERE TD_INTERVAL(time, '-1d', 'JST')
GROUP BY 1, 2
" "Create daily summary table" || exit 1
# Step 2: Validate row count
COUNT=$(trino \
--server $TD_SERVER \
--catalog td \
--user $TD_API_KEY \
--schema analytics \
--execute "SELECT COUNT(*) FROM daily_summary" \
--output-format CSV_UNQUOTED)
echo "Processed $COUNT rows"
if [ "$COUNT" -gt 0 ]; then
echo "✓ Pipeline completed successfully"
else
echo "✗ Warning: No data processed"
exit 1
fi
```
**Explanation:** Demonstrates error handling, sequential query execution, and validation in shell scripts using Trino CLI.
### Pattern 4: Configuration File for Easy Access
```bash
# Create ~/.trino_config
cat > ~/.trino_config <<EOF
server=https://api-presto.treasuredata.com
catalog=td
user=$TD_API_KEY
schema=sample_datasets
output-format-interactive=ALIGNED
EOF
# Now you can simply run:
trino
# No need to specify server, user, etc. every time
```
**Alternative - Create a wrapper script:**
```bash
# Create ~/bin/td-trino
cat > ~/bin/td-trino <<'EOF'
#!/bin/bash
trino \
--server https://api-presto.treasuredata.com \
--catalog td \
--user ${TD_API_KEY} \
--schema ${1:-sample_datasets}
EOF
chmod +x ~/bin/td-trino
# Usage:
td-trino # connects to sample_datasets
td-trino analytics # connects to analytics database
```
**Explanation:** Configuration files and wrapper scripts simplify repeated connections and reduce typing.
### Pattern 5: Formatted Output for Different Use Cases
```bash
export TD_API_KEY="your_api_key"
TD_SERVER="https://api-presto.treasuredata.com"
DATABASE="sample_datasets"
QUERY="SELECT symbol, COUNT(*) as cnt FROM nasdaq GROUP BY symbol ORDER BY cnt DESC LIMIT 10"
# CSV for spreadsheets
trino \
--server $TD_SERVER \
--catalog td \
--user $TD_API_KEY \
--schema $DATABASE \
--execute "$QUERY" \
--output-format CSV_HEADER > results.csv
# JSON for APIs/applications
trino \
--server $TD_SERVER \
--catalog td \
--user $TD_API_KEY \
--schema $DATABASE \
--execute "$QUERY" \
--output-format JSON > results.json
# TSV for data processing
trino \
--server $TD_SERVER \
--catalog td \
--user $TD_API_KEY \
--schema $DATABASE \
--execute "$QUERY" \
--output-format TSV_HEADER > results.tsv
# Markdown for documentation
trino \
--server $TD_SERVER \
--catalog td \
--user $TD_API_KEY \
--schema $DATABASE \
--execute "$QUERY" \
--output-format MARKDOWN > results.md
```
**Explanation:** Different output formats enable integration with various downstream tools and workflows.
## Command-Line Options Reference
### Connection Options
| Option | Description | Example |
|--------|-------------|---------|
| `--server` | TD Presto endpoint | `https://api-presto.treasuredata.com` |
| `--catalog` | Catalog name | `td` |
| `--user` | TD API key | `$TD_API_KEY` |
| `--schema` | Default database | `sample_datasets` |
| `--password` | Enable password prompt | Not used for TD |
### Execution Options
| Option | Description |
|--------|-------------|
| `--execute "SQL"` | Execute single query and exit |
| `--file queries.sql` | Execute queries from file |
| `--ignore-errors` | Continue on error (batch mode) |
| `--client-request-timeout` | Query timeout (default: 2m) |
### Output Options
| Option | Description | Values |
|--------|-------------|--------|
| `--output-format` | Batch mode output format | CSV, JSON, TSV, MARKDOWN, etc. |
| `--output-format-interactive` | Interactive mode format | ALIGNED, VERTICAL, AUTO |
| `--no-progress` | Disable progress indicator | |
| `--pager` | Custom pager program | `less`, `more`, etc. |
### Display Options
| Option | Description |
|--------|-------------|
| `--debug` | Enable debug output |
| `--log-levels-file` | Custom logging configuration |
| `--disable-auto-suggestion` | Turn off autocomplete |
### Configuration
| Option | Description |
|--------|-------------|
| `--config` | Configuration file path | Alternative to `~/.trino_config` |
| `--session property=value` | Set session property |
| `--timezone` | Session timezone |
| `--client-tags` | Add metadata tags |
## Output Formats
Available output formats:
### Batch Mode Formats
- **CSV** - Comma-separated, quoted strings (default for batch)
- **CSV_HEADER** - CSV with header row
- **CSV_UNQUOTED** - CSV without quotes
- **CSV_HEADER_UNQUOTED** - CSV with header, no quotes
- **TSV** - Tab-separated values
- **TSV_HEADER** - TSV with header row
- **JSON** - JSON array of objects
- **MARKDOWN** - Markdown table format
- **NULL** - Execute but discard output
### Interactive Mode Formats
- **ALIGNED** - Pretty-printed table (default)
- **VERTICAL** - One column per line
- **AUTO** - Automatic format selection
**Example:**
```bash
# CSV with header for Excel
trino --execute "SELECT * FROM table" --output-format CSV_HEADER
# JSON for jq processing
trino --execute "SELECT * FROM table" --output-format JSON | jq '.[] | .user_id'
# Aligned for terminal viewing
trino --output-format-interactive ALIGNED
```
## Best Practices
1. **Always Use Environment Variables for API Keys**
```bash
# In ~/.bashrc or ~/.zshrc
export TD_API_KEY="your_api_key"
```
Never hardcode API keys in scripts or commands
2. **Create Configuration File for Frequent Use**
```bash
# ~/.trino_config
server=https://api-presto.treasuredata.com
catalog=td
user=$TD_API_KEY
```
3. **Use TD Time Functions for Partition Pruning**
```sql
-- Good: Uses partition pruning
SELECT * FROM events WHERE TD_INTERVAL(time, '-1d', 'JST')
-- Bad: Scans entire table
SELECT * FROM events WHERE date = '2024-01-01'
```
4. **Add LIMIT for Exploratory Queries**
```sql
-- Safe exploratory query
SELECT * FROM large_table LIMIT 100;
```
5. **Use Batch Mode for Automation**
```bash
# Don't use interactive mode in cron jobs
trino --execute "SELECT ..." --output-format CSV > output.csv
```
6. **Enable Debug Mode for Troubleshooting**
```bash
trino --debug --execute "SELECT ..."
```
7. **Set Reasonable Timeouts**
```bash
# For long-running queries
trino --client-request-timeout 30m --execute "SELECT ..."
```
8. **Use Appropriate Output Format**
- CSV/TSV for data processing
- JSON for programmatic parsing
- ALIGNED for human viewing
- MARKDOWN for documentation
9. **Leverage History in Interactive Mode**
- Use ↑/↓ arrow keys to navigate history
- Use Ctrl+R for reverse search
- History saved in `~/.trino_history`
10. **Test Queries Interactively First**
Test complex queries in interactive mode before adding to scripts
## Common Issues and Solutions
### Issue: Connection Refused or Timeout
**Symptoms:**
- `Connection refused`
- `Read timed out`
- Cannot connect to server
**Solutions:**
1. **Verify Endpoint URL**
```bash
# Check you're using the correct regional endpoint
# US: https://api-presto.treasuredata.com
# Tokyo: https://api-presto.treasuredata.co.jp
# EU: https://api-presto.eu01.treasuredata.com
```
2. **Check Network Connectivity**
```bash
curl -I https://api-presto.treasuredata.com
```
3. **Verify Firewall/Proxy Settings**
```bash
# If behind proxy
trino --http-proxy proxy.example.com:8080 --server ...
```
4. **Increase Timeout**
```bash
trino --client-request-timeout 10m --server ...
```
### Issue: Authentication Errors
**Symptoms:**
- `Authentication failed`
- `Unauthorized`
- `403 Forbidden`
**Solutions:**
1. **Check API Key Format**
```bash
# Verify API key is set
echo $TD_API_KEY # Should display your API key
```
2. **Verify API Key is Set**
```bash
if [ -z "$TD_API_KEY" ]; then
echo "TD_API_KEY is not set"
fi
```
3. **Test API Key with curl**
```bash
curl -H "Authorization: TD1 $TD_API_KEY" \
https://api.treasuredata.com/v3/database/list
```
4. **Regenerate API Key**
- Log in to TD console
- Generate new API key
- Update environment variable
### Issue: Query Timeout
**Symptoms:**
- Query runs but never completes
- `Query exceeded maximum time limit`
**Solutions:**
1. **Add Time Filter**
```sql
-- Add partition pruning
SELECT * FROM table
WHERE TD_INTERVAL(time, '-1d', 'JST')
```
2. **Increase Timeout**
```bash
trino --client-request-timeout 30m --execute "..."
```
3. **Use Aggregations Instead**
```sql
-- Instead of fetching all rows
SELECT * FROM huge_table
-- Aggregate first
SELECT date, COUNT(*) FROM huge_table GROUP BY date
```
4. **Add LIMIT Clause**
```sql
SELECT * FROM large_table LIMIT 10000
```
### Issue: Java Not Found
**Symptoms:**
- `java: command not found`
- `JAVA_HOME not set`
**Solutions:**
1. **Install Java**
```bash
# macOS
brew install openjdk@17
# Ubuntu/Debian
sudo apt-get install openjdk-17-jdk
# RHEL/CentOS
sudo yum install java-17-openjdk
```
2. **Set JAVA_HOME**
```bash
# Add to ~/.bashrc or ~/.zshrc
export JAVA_HOME=$(/usr/libexec/java_home -v 17) # macOS
export JAVA_HOME=/usr/lib/jvm/java-17-openjdk # Linux
```
3. **Verify Java Version**
```bash
java -version # Should show 11 or higher
```
### Issue: Output Not Formatted Correctly
**Symptoms:**
- Broken table alignment
- Missing columns
- Garbled characters
**Solutions:**
1. **Specify Output Format Explicitly**
```bash
# For batch mode
trino --execute "..." --output-format CSV_HEADER
# For interactive mode
trino --output-format-interactive ALIGNED
```
2. **Check Terminal Width**
```bash
# Wider terminal for better formatting
stty size # Check current size
```
3. **Use VERTICAL Format for Wide Tables**
```sql
trino> SELECT * FROM wide_table\G
-- Or set format
trino> --output-format-interactive VERTICAL
```
4. **Disable Pager if Issues**
```bash
trino --pager='' # Disable pager
```
### Issue: History Not Working
**Symptoms:**
- Arrow keys don't show previous commands
- History not saved between sessions
**Solutions:**
1. **Check History File Permissions**
```bash
ls -la ~/.trino_history
chmod 600 ~/.trino_history
```
2. **Specify Custom History File**
```bash
trino --history-file ~/my_trino_history
```
3. **Check Disk Space**
```bash
df -h ~ # Ensure home directory has space
```
## Advanced Topics
### Session Properties
Set query-specific properties:
```bash
# Set query priority
trino \
--session query_priority=1 \
--server https://api-presto.treasuredata.com \
--catalog td \
--user $TD_API_KEY \
--execute "SELECT * FROM large_table"
# Set multiple properties
trino \
--session query_max_run_time=1h \
--session query_priority=2 \
--execute "SELECT ..."
```
### Using with jq for JSON Processing
```bash
# Query and process with jq
trino \
--server https://api-presto.treasuredata.com \
--catalog td \
--user $TD_API_KEY \
--schema sample_datasets \
--execute "SELECT symbol, COUNT(*) as cnt FROM nasdaq GROUP BY symbol LIMIT 10" \
--output-format JSON | \
jq '.[] | select(.cnt > 1000) | .symbol'
```
### Parallel Query Execution
```bash
#!/bin/bash
# Run multiple queries in parallel
export TD_API_KEY="your_api_key"
run_query() {
local database=$1
local output=$2
trino \
--server https://api-presto.treasuredata.com \
--catalog td \
--user $TD_API_KEY \
--schema $database \
--execute "SELECT COUNT(*) FROM events WHERE TD_INTERVAL(time, '-1d', 'JST')" \
--output-format CSV > $output
}
# Run in parallel using background jobs
run_query "database1" "count1.csv" &
run_query "database2" "count2.csv" &
run_query "database3" "count3.csv" &
# Wait for all to complete
wait
echo "All queries completed"
```
### Integration with Other Tools
**With csvkit:**
```bash
trino --execute "SELECT * FROM table" --output-format CSV | \
csvstat
```
**With awk:**
```bash
trino --execute "SELECT symbol, cnt FROM nasdaq_summary" --output-format TSV | \
awk '$2 > 1000 { print $1 }'
```
**With Python:**
```bash
trino --execute "SELECT * FROM table" --output-format JSON | \
python -c "import sys, json; data = json.load(sys.stdin); print(len(data))"
```
## Interactive Commands Reference
Commands available in interactive mode:
| Command | Description |
|---------|-------------|
| `QUIT` or `EXIT` | Exit the CLI |
| `CLEAR` | Clear the screen |
| `HELP` | Show help information |
| `HISTORY` | Display command history |
| `USE schema` | Switch to different database |
| `SHOW CATALOGS` | List available catalogs |
| `SHOW SCHEMAS` | List all databases |
| `SHOW TABLES` | List tables in current schema |
| `SHOW COLUMNS FROM table` | Show table structure |
| `DESCRIBE table` | Show detailed table info |
| `EXPLAIN query` | Show query execution plan |
| `SHOW FUNCTIONS` | List available functions |
## Resources
- **Trino CLI Documentation**: https://trino.io/docs/current/client/cli.html
- **TD Presto Endpoints**:
- US: https://api-presto.treasuredata.com
- Tokyo: https://api-presto.treasuredata.co.jp
- EU: https://api-presto.eu01.treasuredata.com
- **TD Documentation**: https://docs.treasuredata.com/
- **Trino SQL Reference**: https://trino.io/docs/current/sql.html
## Related Skills
- **trino**: SQL query syntax and optimization for Trino
- **hive**: Understanding Hive SQL differences
- **pytd**: Python-based querying (alternative to CLI)
- **td-javascript-sdk**: Browser-based data collection
## Comparison with Other Tools
| Tool | Purpose | When to Use |
|------|---------|-------------|
| **Trino CLI** | Interactive command-line queries | Ad-hoc queries, exploration, shell scripts |
| **TD Console** | Web-based query interface | GUI preference, visualization, sharing |
| **pytd** | Python SDK | Complex ETL, pandas integration, notebooks |
| **TD Toolbelt** | TD-specific CLI | Bulk import, job management, administration |
**Recommendation:** Use Trino CLI for quick interactive queries and terminal-based workflows. Use TD Console for visualization and sharing. Use pytd for complex data pipelines.
---
*Last updated: 2025-01 | Trino CLI version: 477+*