Initial commit
This commit is contained in:
541
skills/databento/references/api-parameters.md
Normal file
541
skills/databento/references/api-parameters.md
Normal file
@@ -0,0 +1,541 @@
|
||||
# Databento API Parameters Reference
|
||||
|
||||
Complete parameter reference for all Databento MCP tools with accepted values, formats, and requirements.
|
||||
|
||||
## Date and Time Formats
|
||||
|
||||
### Date Format
|
||||
**Accepted formats:**
|
||||
- `YYYY-MM-DD` (e.g., "2024-01-15")
|
||||
- ISO 8601 with time (e.g., "2024-01-15T14:30:00Z")
|
||||
|
||||
**Important:**
|
||||
- Dates are in UTC timezone
|
||||
- Inclusive for `start`, exclusive for `end`
|
||||
- Time portion is optional
|
||||
|
||||
### Timestamp Format
|
||||
**Accepted formats:**
|
||||
- ISO 8601 string: "2024-01-15T14:30:00Z"
|
||||
- Unix timestamp (seconds): 1705329000
|
||||
- Unix timestamp (nanoseconds): 1705329000000000000
|
||||
|
||||
## Schema Parameter
|
||||
|
||||
Valid schema values for historical data requests.
|
||||
|
||||
### OHLCV Schemas
|
||||
```
|
||||
"ohlcv-1s" # 1-second bars
|
||||
"ohlcv-1m" # 1-minute bars
|
||||
"ohlcv-1h" # 1-hour bars
|
||||
"ohlcv-1d" # Daily bars
|
||||
"ohlcv-eod" # End-of-day bars
|
||||
```
|
||||
|
||||
### Trade and Quote Schemas
|
||||
```
|
||||
"trades" # Individual trades
|
||||
"mbp-1" # Market by price - level 1 (top of book)
|
||||
"mbp-10" # Market by price - 10 levels of depth
|
||||
"mbo" # Market by order - level 3 (order-level)
|
||||
"tbbo" # Top of book best bid/offer
|
||||
```
|
||||
|
||||
### Metadata Schemas
|
||||
```
|
||||
"definition" # Instrument definitions and metadata
|
||||
"statistics" # Market statistics
|
||||
"status" # Trading status changes
|
||||
"imbalance" # Order imbalance data
|
||||
```
|
||||
|
||||
**Usage:**
|
||||
```python
|
||||
# timeseries_get_range
|
||||
schema="ohlcv-1h"
|
||||
|
||||
# batch_submit_job
|
||||
schema="trades"
|
||||
```
|
||||
|
||||
## Symbology Type (stype) Parameter
|
||||
|
||||
Used for symbol input and output format specification.
|
||||
|
||||
### stype_in (Input Symbol Type)
|
||||
|
||||
```
|
||||
"raw_symbol" # Native exchange symbols (ESH5, AAPL)
|
||||
"instrument_id" # Databento numeric IDs
|
||||
"continuous" # Continuous contracts (ES.c.0)
|
||||
"parent" # Parent symbols (ES, NQ)
|
||||
"nasdaq" # Nasdaq symbology
|
||||
"cms" # CMS symbology
|
||||
"bats" # BATS symbology
|
||||
"smart" # Smart routing symbols
|
||||
```
|
||||
|
||||
### stype_out (Output Symbol Type)
|
||||
|
||||
Same values as `stype_in`.
|
||||
|
||||
**Common Patterns:**
|
||||
```python
|
||||
# Continuous to instrument_id (most common)
|
||||
stype_in="continuous"
|
||||
stype_out="instrument_id"
|
||||
|
||||
# Raw symbol to instrument_id
|
||||
stype_in="raw_symbol"
|
||||
stype_out="instrument_id"
|
||||
|
||||
# Continuous to raw symbol (see current contract)
|
||||
stype_in="continuous"
|
||||
stype_out="raw_symbol"
|
||||
```
|
||||
|
||||
**Important:** Always match stype_in to your actual symbol format:
|
||||
- `"ES.c.0"` → stype_in="continuous"
|
||||
- `"ESH5"` → stype_in="raw_symbol"
|
||||
- `123456` → stype_in="instrument_id"
|
||||
|
||||
## Dataset Parameter
|
||||
|
||||
Dataset codes identify the data source and venue.
|
||||
|
||||
### Common Datasets
|
||||
|
||||
**Futures (CME):**
|
||||
```
|
||||
"GLBX.MDP3" # CME Globex - ES, NQ, and other CME futures
|
||||
```
|
||||
|
||||
**Equities:**
|
||||
```
|
||||
"XNAS.ITCH" # Nasdaq - all Nasdaq-listed stocks
|
||||
"XNYS.PILLAR" # NYSE - NYSE-listed stocks
|
||||
"XCHI.PILLAR" # Chicago Stock Exchange
|
||||
"BATS.PITCH" # BATS exchange
|
||||
"IEXG.TOPS" # IEX exchange
|
||||
```
|
||||
|
||||
**Options:**
|
||||
```
|
||||
"OPRA.PILLAR" # US equity options
|
||||
```
|
||||
|
||||
**Crypto:**
|
||||
```
|
||||
"DBEQ.BASIC" # Databento equities (subset)
|
||||
```
|
||||
|
||||
**Usage:**
|
||||
```python
|
||||
# ES/NQ futures
|
||||
dataset="GLBX.MDP3"
|
||||
|
||||
# Nasdaq equities
|
||||
dataset="XNAS.ITCH"
|
||||
```
|
||||
|
||||
## Symbols Parameter
|
||||
|
||||
### Format Variations
|
||||
|
||||
**String (comma-separated):**
|
||||
```python
|
||||
symbols="ES.c.0,NQ.c.0,GC.c.0"
|
||||
```
|
||||
|
||||
**Array:**
|
||||
```python
|
||||
symbols=["ES.c.0", "NQ.c.0", "GC.c.0"]
|
||||
```
|
||||
|
||||
**Single symbol:**
|
||||
```python
|
||||
symbols="ES.c.0"
|
||||
# or
|
||||
symbols=["ES.c.0"]
|
||||
```
|
||||
|
||||
### Limits
|
||||
- Maximum: 2000 symbols per request
|
||||
- Must match stype_in format
|
||||
|
||||
### Symbol Wildcards
|
||||
|
||||
Some endpoints support wildcards:
|
||||
```
|
||||
"ES*" # All ES contracts
|
||||
"*" # All instruments (use with caution)
|
||||
```
|
||||
|
||||
## Encoding Parameter (Batch Jobs)
|
||||
|
||||
Output format for batch download jobs.
|
||||
|
||||
```
|
||||
"dbn" # Databento Binary (native format, most efficient)
|
||||
"csv" # Comma-separated values
|
||||
"json" # JSON format
|
||||
```
|
||||
|
||||
**Recommendations:**
|
||||
- `"dbn"` - Best for large datasets, fastest processing
|
||||
- `"csv"` - Good for spreadsheet analysis
|
||||
- `"json"` - Good for custom parsing, human-readable
|
||||
|
||||
**Usage:**
|
||||
```python
|
||||
# batch_submit_job
|
||||
encoding="dbn"
|
||||
```
|
||||
|
||||
## Compression Parameter (Batch Jobs)
|
||||
|
||||
Compression algorithm for batch downloads.
|
||||
|
||||
```
|
||||
"zstd" # Zstandard (default, best compression)
|
||||
"gzip" # Gzip (widely supported)
|
||||
"none" # No compression
|
||||
```
|
||||
|
||||
**Recommendations:**
|
||||
- `"zstd"` - Best compression ratio, fastest
|
||||
- `"gzip"` - Good compatibility
|
||||
- `"none"` - Only for small datasets or testing
|
||||
|
||||
**Usage:**
|
||||
```python
|
||||
# batch_submit_job
|
||||
compression="zstd"
|
||||
```
|
||||
|
||||
## Limit Parameter
|
||||
|
||||
Maximum number of records to return.
|
||||
|
||||
**Default:** 100 (varies by tool)
|
||||
**Maximum:** No hard limit, but consider:
|
||||
- Timeseries: practical limit ~10M records
|
||||
- Batch jobs: unlimited but affects processing time
|
||||
|
||||
**Usage:**
|
||||
```python
|
||||
# timeseries_get_range
|
||||
limit=1000 # Return up to 1000 records
|
||||
```
|
||||
|
||||
**Important:** For large datasets, use batch jobs instead of increasing limit.
|
||||
|
||||
## Timeframe Parameter (get_historical_bars)
|
||||
|
||||
Specific to the `get_historical_bars` convenience tool.
|
||||
|
||||
```
|
||||
"1h" # 1-hour bars
|
||||
"H4" # 4-hour bars (alternative notation)
|
||||
"1d" # Daily bars
|
||||
```
|
||||
|
||||
**Usage:**
|
||||
```python
|
||||
# get_historical_bars (ES/NQ only)
|
||||
timeframe="1h"
|
||||
count=100
|
||||
```
|
||||
|
||||
## Symbol Parameter (get_futures_quote)
|
||||
|
||||
Specific to the `get_futures_quote` tool.
|
||||
|
||||
```
|
||||
"ES" # E-mini S&P 500
|
||||
"NQ" # E-mini Nasdaq-100
|
||||
```
|
||||
|
||||
**Usage:**
|
||||
```python
|
||||
# get_futures_quote
|
||||
symbol="ES"
|
||||
```
|
||||
|
||||
**Note:** Uses root symbol only, not full contract code.
|
||||
|
||||
## Split Parameters (Batch Jobs)
|
||||
|
||||
Control how batch job output files are split.
|
||||
|
||||
### split_duration
|
||||
```
|
||||
"day" # One file per day
|
||||
"week" # One file per week
|
||||
"month" # One file per month
|
||||
"none" # Single file (default)
|
||||
```
|
||||
|
||||
### split_size
|
||||
```
|
||||
split_size=1000000000 # Split at 1GB
|
||||
split_size=5000000000 # Split at 5GB
|
||||
```
|
||||
|
||||
### split_symbols
|
||||
```
|
||||
split_symbols=True # One file per symbol
|
||||
split_symbols=False # All symbols in same file (default)
|
||||
```
|
||||
|
||||
**Usage:**
|
||||
```python
|
||||
# batch_submit_job
|
||||
split_duration="day" # Daily files
|
||||
split_symbols=True # Separate file per symbol
|
||||
```
|
||||
|
||||
## Filter Parameters
|
||||
|
||||
### State Filter (list_jobs)
|
||||
```
|
||||
states=["received", "queued", "processing", "done", "expired"]
|
||||
```
|
||||
|
||||
### Time Filter (list_jobs)
|
||||
```
|
||||
since="2024-01-01T00:00:00Z" # Jobs since this timestamp
|
||||
```
|
||||
|
||||
**Usage:**
|
||||
```python
|
||||
# batch_list_jobs
|
||||
states=["done", "processing"]
|
||||
since="2024-01-01"
|
||||
```
|
||||
|
||||
## Mode Parameter (get_cost)
|
||||
|
||||
Query mode for cost estimation.
|
||||
|
||||
```
|
||||
"historical" # Historical data (default)
|
||||
"historical-streaming" # Streaming historical
|
||||
"live" # Live data
|
||||
```
|
||||
|
||||
**Usage:**
|
||||
```python
|
||||
# metadata_get_cost
|
||||
mode="historical"
|
||||
```
|
||||
|
||||
## Complete Parameter Examples
|
||||
|
||||
### timeseries_get_range
|
||||
```python
|
||||
{
|
||||
"dataset": "GLBX.MDP3",
|
||||
"symbols": "ES.c.0,NQ.c.0",
|
||||
"schema": "ohlcv-1h",
|
||||
"start": "2024-01-01",
|
||||
"end": "2024-01-31",
|
||||
"stype_in": "continuous",
|
||||
"stype_out": "instrument_id",
|
||||
"limit": 1000
|
||||
}
|
||||
```
|
||||
|
||||
### batch_submit_job
|
||||
```python
|
||||
{
|
||||
"dataset": "GLBX.MDP3",
|
||||
"symbols": ["ES.c.0", "NQ.c.0"],
|
||||
"schema": "trades",
|
||||
"start": "2024-01-01",
|
||||
"end": "2024-12-31",
|
||||
"stype_in": "continuous",
|
||||
"stype_out": "instrument_id",
|
||||
"encoding": "dbn",
|
||||
"compression": "zstd",
|
||||
"split_duration": "day",
|
||||
"split_symbols": False
|
||||
}
|
||||
```
|
||||
|
||||
### symbology_resolve
|
||||
```python
|
||||
{
|
||||
"dataset": "GLBX.MDP3",
|
||||
"symbols": ["ES.c.0", "NQ.c.0"],
|
||||
"stype_in": "continuous",
|
||||
"stype_out": "instrument_id",
|
||||
"start_date": "2024-01-01",
|
||||
"end_date": "2024-12-31"
|
||||
}
|
||||
```
|
||||
|
||||
### metadata_get_cost
|
||||
```python
|
||||
{
|
||||
"dataset": "GLBX.MDP3",
|
||||
"start": "2024-01-01",
|
||||
"end": "2024-01-31",
|
||||
"symbols": "ES.c.0",
|
||||
"schema": "ohlcv-1h",
|
||||
"stype_in": "continuous",
|
||||
"mode": "historical"
|
||||
}
|
||||
```
|
||||
|
||||
### get_futures_quote
|
||||
```python
|
||||
{
|
||||
"symbol": "ES" # or "NQ"
|
||||
}
|
||||
```
|
||||
|
||||
### get_session_info
|
||||
```python
|
||||
{
|
||||
"timestamp": "2024-01-15T14:30:00Z" # Optional
|
||||
}
|
||||
```
|
||||
|
||||
### get_historical_bars
|
||||
```python
|
||||
{
|
||||
"symbol": "ES", # or "NQ"
|
||||
"timeframe": "1h",
|
||||
"count": 100
|
||||
}
|
||||
```
|
||||
|
||||
## Common Parameter Mistakes
|
||||
|
||||
### 1. Wrong stype_in for Symbol Format
|
||||
**Wrong:**
|
||||
```python
|
||||
symbols="ES.c.0"
|
||||
stype_in="raw_symbol" # WRONG!
|
||||
```
|
||||
|
||||
**Correct:**
|
||||
```python
|
||||
symbols="ES.c.0"
|
||||
stype_in="continuous"
|
||||
```
|
||||
|
||||
### 2. Date Format Errors
|
||||
**Wrong:**
|
||||
```python
|
||||
start="01/15/2024" # US date format - WRONG
|
||||
start="15-01-2024" # Non-ISO format - WRONG
|
||||
```
|
||||
|
||||
**Correct:**
|
||||
```python
|
||||
start="2024-01-15" # ISO format - CORRECT
|
||||
```
|
||||
|
||||
### 3. Missing Required Parameters
|
||||
**Wrong:**
|
||||
```python
|
||||
# metadata_get_cost
|
||||
dataset="GLBX.MDP3"
|
||||
start="2024-01-01"
|
||||
# Missing symbols and schema!
|
||||
```
|
||||
|
||||
**Correct:**
|
||||
```python
|
||||
dataset="GLBX.MDP3"
|
||||
start="2024-01-01"
|
||||
symbols="ES.c.0"
|
||||
schema="ohlcv-1h"
|
||||
```
|
||||
|
||||
### 4. Schema Typos
|
||||
**Wrong:**
|
||||
```python
|
||||
schema="OHLCV-1H" # Wrong case
|
||||
schema="ohlcv-1hour" # Wrong format
|
||||
schema="ohlcv_1h" # Wrong separator
|
||||
```
|
||||
|
||||
**Correct:**
|
||||
```python
|
||||
schema="ohlcv-1h" # Lowercase, hyphenated
|
||||
```
|
||||
|
||||
### 5. Symbol Array vs String Confusion
|
||||
**Wrong:**
|
||||
```python
|
||||
# batch_submit_job expects array
|
||||
symbols="ES.c.0,NQ.c.0" # WRONG for batch jobs
|
||||
```
|
||||
|
||||
**Correct:**
|
||||
```python
|
||||
# batch_submit_job
|
||||
symbols=["ES.c.0", "NQ.c.0"] # CORRECT
|
||||
```
|
||||
|
||||
### 6. Encoding/Compression Not Strings
|
||||
**Wrong:**
|
||||
```python
|
||||
encoding=dbn # Not a string
|
||||
compression=zstd # Not a string
|
||||
```
|
||||
|
||||
**Correct:**
|
||||
```python
|
||||
encoding="dbn"
|
||||
compression="zstd"
|
||||
```
|
||||
|
||||
## Parameter Validation Checklist
|
||||
|
||||
Before making API calls, verify:
|
||||
|
||||
- [ ] Date format is YYYY-MM-DD or ISO 8601
|
||||
- [ ] Dataset matches your data source (GLBX.MDP3 for ES/NQ)
|
||||
- [ ] Schema is valid and lowercase
|
||||
- [ ] stype_in matches symbol format
|
||||
- [ ] Symbols parameter matches tool expectation (string vs array)
|
||||
- [ ] All required parameters are present
|
||||
- [ ] Enum values are exact strings (case-sensitive)
|
||||
- [ ] start_date <= end_date
|
||||
- [ ] limit is reasonable for dataset size
|
||||
|
||||
## Quick Reference: Required Parameters
|
||||
|
||||
### timeseries_get_range
|
||||
**Required:** dataset, symbols, schema, start
|
||||
|
||||
**Optional:** end, stype_in, stype_out, limit
|
||||
|
||||
### batch_submit_job
|
||||
**Required:** dataset, symbols, schema, start
|
||||
|
||||
**Optional:** end, stype_in, stype_out, encoding, compression, split_duration, split_size, split_symbols, limit
|
||||
|
||||
### symbology_resolve
|
||||
**Required:** dataset, symbols, stype_in, stype_out, start_date
|
||||
|
||||
**Optional:** end_date
|
||||
|
||||
### metadata_get_cost
|
||||
**Required:** dataset, start
|
||||
|
||||
**Optional:** end, symbols, schema, stype_in, mode
|
||||
|
||||
### get_futures_quote
|
||||
**Required:** symbol
|
||||
|
||||
### get_session_info
|
||||
**Optional:** timestamp
|
||||
|
||||
### get_historical_bars
|
||||
**Required:** symbol, timeframe, count
|
||||
501
skills/databento/references/cost-optimization.md
Normal file
501
skills/databento/references/cost-optimization.md
Normal file
@@ -0,0 +1,501 @@
|
||||
# Databento Cost Optimization Guide
|
||||
|
||||
Strategies and best practices for minimizing costs when working with Databento market data.
|
||||
|
||||
## Databento Pricing Model
|
||||
|
||||
### Cost Components
|
||||
|
||||
1. **Databento Usage Fees** - Pay-per-use or subscription
|
||||
2. **Exchange License Fees** - Venue-dependent (varies by exchange)
|
||||
3. **Data Volume** - Amount of data retrieved
|
||||
|
||||
### Pricing Tiers
|
||||
|
||||
**Free Credits:**
|
||||
- $125 free credits for new users
|
||||
- Good for initial development and testing
|
||||
|
||||
**Usage-Based:**
|
||||
- Pay only for data you use
|
||||
- Varies by venue and data type
|
||||
- No minimum commitment
|
||||
|
||||
**Subscriptions:**
|
||||
- Basic Plan: $199/month
|
||||
- Corporate Actions/Security Master: $299/month
|
||||
- Flat-rate access to specific datasets
|
||||
|
||||
## Cost Estimation (ALWAYS Do This First)
|
||||
|
||||
### Use metadata_get_cost Before Every Request
|
||||
|
||||
**Always** estimate cost before fetching data:
|
||||
|
||||
```python
|
||||
mcp__databento__metadata_get_cost(
|
||||
dataset="GLBX.MDP3",
|
||||
start="2024-01-01",
|
||||
end="2024-01-31",
|
||||
symbols="ES.c.0",
|
||||
schema="ohlcv-1h"
|
||||
)
|
||||
```
|
||||
|
||||
**Returns:**
|
||||
- Estimated cost in USD
|
||||
- Data size estimate
|
||||
- Helps decide if request is reasonable
|
||||
|
||||
### When Cost Checks Matter Most
|
||||
|
||||
1. **Multi-day tick data** - Can be expensive
|
||||
2. **Multiple symbols** - Costs multiply
|
||||
3. **High-granularity schemas** - trades, mbp-1, mbo
|
||||
4. **Long date ranges** - Weeks or months of data
|
||||
|
||||
**Example Cost Check:**
|
||||
```python
|
||||
# Cheap: 1 month of daily bars
|
||||
cost_check(schema="ohlcv-1d", start="2024-01-01", end="2024-01-31")
|
||||
# Estimated: $0.10
|
||||
|
||||
# Expensive: 1 month of tick trades
|
||||
cost_check(schema="trades", start="2024-01-01", end="2024-01-31")
|
||||
# Estimated: $50-$200 (depends on volume)
|
||||
```
|
||||
|
||||
## Historical Data (T+1) - No Licensing Required
|
||||
|
||||
**Key Insight:** Historical data that is **24+ hours old (T+1)** does not require exchange licensing fees.
|
||||
|
||||
### Cost Breakdown
|
||||
|
||||
**Live/Recent Data (< 24 hours):**
|
||||
- Databento fees + Exchange licensing fees
|
||||
|
||||
**Historical Data (24+ hours old):**
|
||||
- Databento fees only (no exchange licensing)
|
||||
- Significantly cheaper
|
||||
|
||||
### Optimization Strategy
|
||||
|
||||
**For Development:**
|
||||
- Use T+1 data for strategy development
|
||||
- Switch to live data only for production
|
||||
|
||||
**For Backtesting:**
|
||||
- Always use historical (T+1) data
|
||||
- Much more cost-effective
|
||||
- Same data quality
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
# Expensive: Yesterday's data (< 24 hours)
|
||||
start="2024-11-05" # Requires licensing
|
||||
|
||||
# Cheap: 3 days ago (> 24 hours)
|
||||
start="2024-11-03" # No licensing required
|
||||
```
|
||||
|
||||
## Schema Selection for Cost
|
||||
|
||||
Different schemas have vastly different costs due to data volume.
|
||||
|
||||
### Schema Cost Hierarchy (Cheapest to Most Expensive)
|
||||
|
||||
1. **ohlcv-1d** (Cheapest)
|
||||
- ~100 bytes per record
|
||||
- ~250 records per symbol per year
|
||||
- **Best for:** Long-term backtesting
|
||||
|
||||
2. **ohlcv-1h**
|
||||
- ~100 bytes per record
|
||||
- ~6,000 records per symbol per year
|
||||
- **Best for:** Multi-day backtesting
|
||||
|
||||
3. **ohlcv-1m**
|
||||
- ~100 bytes per record
|
||||
- ~360,000 records per symbol per year
|
||||
- **Best for:** Intraday strategies
|
||||
|
||||
4. **trades**
|
||||
- ~50 bytes per record
|
||||
- ~100K-500K records per symbol per day (ES/NQ)
|
||||
- **Best for:** Tick analysis (use selectively)
|
||||
|
||||
5. **mbp-1**
|
||||
- ~150 bytes per record
|
||||
- ~1M-5M records per symbol per day
|
||||
- **Best for:** Order flow analysis (use selectively)
|
||||
|
||||
6. **mbp-10**
|
||||
- ~500 bytes per record
|
||||
- ~1M-5M records per symbol per day
|
||||
- **Best for:** Deep order book analysis (expensive!)
|
||||
|
||||
7. **mbo** (Most Expensive)
|
||||
- ~80 bytes per record
|
||||
- ~5M-20M records per symbol per day
|
||||
- **Best for:** Order-level research (very expensive!)
|
||||
|
||||
### Cost Optimization Strategy
|
||||
|
||||
**Start with lower granularity:**
|
||||
1. Develop strategy with ohlcv-1h or ohlcv-1d
|
||||
2. Validate with ohlcv-1m if needed
|
||||
3. Only use trades/mbp-1 if absolutely necessary
|
||||
4. Avoid mbp-10/mbo unless essential
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
# Cheap: Daily bars for 1 year
|
||||
schema="ohlcv-1d"
|
||||
start="2023-01-01"
|
||||
end="2023-12-31"
|
||||
# Cost: < $1
|
||||
|
||||
# Expensive: Trades for 1 year
|
||||
schema="trades"
|
||||
start="2023-01-01"
|
||||
end="2023-12-31"
|
||||
# Cost: $500-$2000 (depending on venue)
|
||||
```
|
||||
|
||||
## Symbol Selection
|
||||
|
||||
Fewer symbols = lower cost. Be selective.
|
||||
|
||||
### Strategies
|
||||
|
||||
**1. Start with Single Symbol**
|
||||
```python
|
||||
# Development
|
||||
symbols="ES.c.0" # Just ES
|
||||
|
||||
# After validation, expand
|
||||
symbols="ES.c.0,NQ.c.0" # Add NQ
|
||||
```
|
||||
|
||||
**2. Use Continuous Contracts**
|
||||
```python
|
||||
# Good: Single continuous contract
|
||||
symbols="ES.c.0" # Covers all front months
|
||||
|
||||
# Wasteful: Multiple specific contracts
|
||||
symbols="ESH5,ESM5,ESU5,ESZ5" # Same data, 4x cost
|
||||
```
|
||||
|
||||
**3. Avoid Symbol Wildcards**
|
||||
```python
|
||||
# Expensive: All instruments
|
||||
symbols="*" # Don't do this!
|
||||
|
||||
# Targeted: Just what you need
|
||||
symbols="ES.c.0,NQ.c.0" # Explicit
|
||||
```
|
||||
|
||||
## Date Range Optimization
|
||||
|
||||
Request only the data you need.
|
||||
|
||||
### Strategies
|
||||
|
||||
**1. Iterative Refinement**
|
||||
```python
|
||||
# First: Test with small range
|
||||
start="2024-01-01"
|
||||
end="2024-01-07" # Just 1 week
|
||||
|
||||
# Then: Expand after validation
|
||||
start="2024-01-01"
|
||||
end="2024-12-31" # Full year
|
||||
```
|
||||
|
||||
**2. Segment Long Ranges**
|
||||
```python
|
||||
# Instead of: 5 years at once
|
||||
start="2019-01-01"
|
||||
end="2024-12-31"
|
||||
|
||||
# Do: Segment by year
|
||||
start="2024-01-01"
|
||||
end="2024-12-31"
|
||||
# Process, then request next year if needed
|
||||
```
|
||||
|
||||
**3. Use Limit for Testing**
|
||||
```python
|
||||
# Test with small limit first
|
||||
limit=100 # Just 100 records
|
||||
|
||||
# After validation, increase or remove
|
||||
limit=10000 # Larger sample
|
||||
```
|
||||
|
||||
## Batch vs Timeseries Selection
|
||||
|
||||
Choose the right tool for the job.
|
||||
|
||||
### Timeseries (< 5GB)
|
||||
**When to use:**
|
||||
- Small to medium datasets
|
||||
- Quick exploration
|
||||
- <= 1 day of tick data
|
||||
- Any OHLCV data
|
||||
|
||||
**Benefits:**
|
||||
- Immediate results
|
||||
- No job management
|
||||
- Direct response
|
||||
|
||||
**Costs:**
|
||||
- Same per-record cost as batch
|
||||
|
||||
### Batch Downloads (> 5GB)
|
||||
**When to use:**
|
||||
- Large datasets (> 5GB)
|
||||
- Multi-day tick data
|
||||
- Multiple symbols over long periods
|
||||
- Production data pipelines
|
||||
|
||||
**Benefits:**
|
||||
- More efficient for large data
|
||||
- Can split output files
|
||||
- Asynchronous processing
|
||||
|
||||
**Costs:**
|
||||
- Same per-record cost as timeseries
|
||||
- No additional fees for batch processing
|
||||
|
||||
### Decision Matrix
|
||||
|
||||
| Data Type | Date Range | Method |
|
||||
|-----------|-----------|--------|
|
||||
| ohlcv-1h | 1 year | Timeseries |
|
||||
| ohlcv-1d | Any | Timeseries |
|
||||
| trades | 1 day | Timeseries |
|
||||
| trades | 1 week+ | Batch |
|
||||
| mbp-1 | 1 day | Batch (safer) |
|
||||
| mbp-1 | 1 week+ | Batch |
|
||||
|
||||
## DBEQ Bundle - Zero Exchange Fees
|
||||
|
||||
Databento offers a special bundle for US equities with **$0 exchange fees**.
|
||||
|
||||
### DBEQ.BASIC Dataset
|
||||
|
||||
**Coverage:**
|
||||
- US equity securities
|
||||
- Zero licensing fees
|
||||
- Databento usage fees only
|
||||
|
||||
**When to use:**
|
||||
- Equity market breadth for ES/NQ analysis
|
||||
- Testing equity strategies
|
||||
- Learning market data APIs
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
# Regular equity dataset (has exchange fees)
|
||||
dataset="XNAS.ITCH"
|
||||
# Cost: Databento + Nasdaq fees
|
||||
|
||||
# DBEQ bundle (no exchange fees)
|
||||
dataset="DBEQ.BASIC"
|
||||
# Cost: Databento fees only
|
||||
```
|
||||
|
||||
## Caching and Reuse
|
||||
|
||||
Don't fetch the same data multiple times.
|
||||
|
||||
### Strategies
|
||||
|
||||
**1. Cache Locally**
|
||||
```python
|
||||
# First request: Fetch and save
|
||||
data = fetch_data(...)
|
||||
save_to_disk(data, "ES_2024_ohlcv1h.csv")
|
||||
|
||||
# Subsequent runs: Load from disk
|
||||
data = load_from_disk("ES_2024_ohlcv1h.csv")
|
||||
```
|
||||
|
||||
**2. Incremental Updates**
|
||||
```python
|
||||
# Initial: Fetch full history
|
||||
start="2023-01-01"
|
||||
end="2024-01-01"
|
||||
|
||||
# Later: Fetch only new data
|
||||
start="2024-01-01" # Resume from last fetch
|
||||
end="2024-12-31"
|
||||
```
|
||||
|
||||
**3. Share Data Across Analyses**
|
||||
```python
|
||||
# Fetch once
|
||||
historical_data = fetch_data(schema="ohlcv-1h", ...)
|
||||
|
||||
# Use multiple times
|
||||
backtest_strategy_a(historical_data)
|
||||
backtest_strategy_b(historical_data)
|
||||
backtest_strategy_c(historical_data)
|
||||
```
|
||||
|
||||
## Session-Based Analysis
|
||||
|
||||
For ES/NQ, consider filtering by trading session to reduce data volume.
|
||||
|
||||
### Sessions
|
||||
|
||||
- **Asian Session:** 6pm-2am ET
|
||||
- **London Session:** 2am-8am ET
|
||||
- **New York Session:** 8am-4pm ET
|
||||
|
||||
### Cost Benefit
|
||||
|
||||
**Full 24-hour data:**
|
||||
- Maximum data volume
|
||||
- Higher cost
|
||||
|
||||
**Session-filtered data:**
|
||||
- 1/3 to 1/2 the volume
|
||||
- Lower cost
|
||||
- May be sufficient for analysis
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
# Expensive: Full 24-hour data
|
||||
# Process all records
|
||||
|
||||
# Cheaper: NY session only
|
||||
# Filter records to 8am-4pm ET
|
||||
# ~1/3 the data volume
|
||||
```
|
||||
|
||||
Use `scripts/session_filter.py` to filter post-fetch, or request only specific hours.
|
||||
|
||||
## Monitoring Usage
|
||||
|
||||
Track your usage to avoid surprises.
|
||||
|
||||
### Check Dashboard
|
||||
- Databento provides usage dashboard
|
||||
- Monitor monthly spend
|
||||
- Set alerts for limits
|
||||
|
||||
### Set Monthly Limits
|
||||
```python
|
||||
# In account settings
|
||||
monthly_limit=$500
|
||||
```
|
||||
|
||||
### Review Costs Regularly
|
||||
- Check cost estimates vs actual
|
||||
- Identify expensive queries
|
||||
- Adjust strategies
|
||||
|
||||
## Cost Optimization Checklist
|
||||
|
||||
Before every data request:
|
||||
|
||||
- [ ] **Estimate cost first** - Use metadata_get_cost
|
||||
- [ ] **Use T+1 data** - Avoid < 24 hour data unless necessary
|
||||
- [ ] **Choose lowest granularity schema** - Start with ohlcv, not trades
|
||||
- [ ] **Minimize symbols** - Only request what you need
|
||||
- [ ] **Limit date range** - Test with small range first
|
||||
- [ ] **Use continuous contracts** - Avoid requesting multiple months
|
||||
- [ ] **Cache locally** - Don't re-fetch same data
|
||||
- [ ] **Consider DBEQ** - Use zero-fee dataset when applicable
|
||||
- [ ] **Filter by session** - Reduce volume if session-specific
|
||||
- [ ] **Use batch for large data** - More efficient for > 5GB
|
||||
|
||||
## Cost Examples
|
||||
|
||||
### Cheap Requests (< $1)
|
||||
|
||||
```python
|
||||
# Daily bars for 1 year
|
||||
dataset="GLBX.MDP3"
|
||||
symbols="ES.c.0"
|
||||
schema="ohlcv-1d"
|
||||
start="2023-01-01"
|
||||
end="2023-12-31"
|
||||
# Estimated cost: $0.10
|
||||
```
|
||||
|
||||
### Moderate Requests ($1-$10)
|
||||
|
||||
```python
|
||||
# Hourly bars for 1 year
|
||||
dataset="GLBX.MDP3"
|
||||
symbols="ES.c.0,NQ.c.0"
|
||||
schema="ohlcv-1h"
|
||||
start="2023-01-01"
|
||||
end="2023-12-31"
|
||||
# Estimated cost: $2-5
|
||||
```
|
||||
|
||||
### Expensive Requests ($10-$100)
|
||||
|
||||
```python
|
||||
# Trades for 1 month
|
||||
dataset="GLBX.MDP3"
|
||||
symbols="ES.c.0"
|
||||
schema="trades"
|
||||
start="2024-01-01"
|
||||
end="2024-01-31"
|
||||
# Estimated cost: $20-50
|
||||
```
|
||||
|
||||
### Very Expensive Requests ($100+)
|
||||
|
||||
```python
|
||||
# MBP-10 for 1 month
|
||||
dataset="GLBX.MDP3"
|
||||
symbols="ES.c.0,NQ.c.0"
|
||||
schema="mbp-10"
|
||||
start="2024-01-01"
|
||||
end="2024-01-31"
|
||||
# Estimated cost: $200-500
|
||||
```
|
||||
|
||||
## Free Credit Strategy
|
||||
|
||||
Make the most of your $125 free credits:
|
||||
|
||||
1. **Development Phase** - Use free credits for:
|
||||
- Testing API integration
|
||||
- Small-scale strategy development
|
||||
- Learning the platform
|
||||
|
||||
2. **Prioritize T+1 Data** - Stretch credits further:
|
||||
- Avoid real-time data during development
|
||||
- Use historical data (no licensing fees)
|
||||
|
||||
3. **Start with OHLCV** - Cheapest data:
|
||||
- Develop strategy with daily/hourly bars
|
||||
- Validate before moving to tick data
|
||||
|
||||
4. **Cache Everything** - Don't waste credits:
|
||||
- Save all fetched data locally
|
||||
- Reuse for multiple analyses
|
||||
|
||||
5. **Monitor Remaining Balance**:
|
||||
- Check credit usage regularly
|
||||
- Adjust requests to stay within budget
|
||||
|
||||
## Summary
|
||||
|
||||
**Most Important Cost-Saving Strategies:**
|
||||
|
||||
1. ✅ **Always check cost first** - Use metadata_get_cost
|
||||
2. ✅ **Use T+1 data** - 24+ hours old, no licensing fees
|
||||
3. ✅ **Start with OHLCV schemas** - Much cheaper than tick data
|
||||
4. ✅ **Cache and reuse data** - Don't fetch twice
|
||||
5. ✅ **Be selective with symbols** - Fewer symbols = lower cost
|
||||
6. ✅ **Test with small ranges** - Validate before large requests
|
||||
7. ✅ **Use continuous contracts** - One symbol instead of many
|
||||
8. ✅ **Monitor usage** - Track spending, set limits
|
||||
372
skills/databento/references/schemas.md
Normal file
372
skills/databento/references/schemas.md
Normal file
@@ -0,0 +1,372 @@
|
||||
# Databento Schema Reference
|
||||
|
||||
Comprehensive documentation of Databento schemas with field-level details, data types, and usage guidance.
|
||||
|
||||
## Schema Overview
|
||||
|
||||
Databento provides 12+ schema types representing different granularity levels of market data. All schemas share common timestamp fields for consistency.
|
||||
|
||||
## Common Fields (All Schemas)
|
||||
|
||||
Every schema includes these timestamp fields:
|
||||
|
||||
| Field | Type | Description | Unit |
|
||||
|-------|------|-------------|------|
|
||||
| `ts_event` | uint64 | Event timestamp from venue | Nanoseconds (Unix epoch) |
|
||||
| `ts_recv` | uint64 | Databento gateway receipt time | Nanoseconds (Unix epoch) |
|
||||
|
||||
**Important:** Databento provides up to 4 timestamps per event for sub-microsecond accuracy.
|
||||
|
||||
## OHLCV Schemas
|
||||
|
||||
Candlestick/bar data at various time intervals.
|
||||
|
||||
### ohlcv-1s (1 Second Bars)
|
||||
### ohlcv-1m (1 Minute Bars)
|
||||
### ohlcv-1h (1 Hour Bars)
|
||||
### ohlcv-1d (Daily Bars)
|
||||
### ohlcv-eod (End of Day)
|
||||
|
||||
**Common OHLCV Fields:**
|
||||
|
||||
| Field | Type | Description | Unit |
|
||||
|-------|------|-------------|------|
|
||||
| `open` | int64 | Opening price | Fixed-point (divide by 1e9 for decimal) |
|
||||
| `high` | int64 | Highest price | Fixed-point (divide by 1e9 for decimal) |
|
||||
| `low` | int64 | Lowest price | Fixed-point (divide by 1e9 for decimal) |
|
||||
| `close` | int64 | Closing price | Fixed-point (divide by 1e9 for decimal) |
|
||||
| `volume` | uint64 | Total volume | Contracts/shares |
|
||||
|
||||
**When to Use:**
|
||||
- **1h/1d**: Historical backtesting, multi-day analysis
|
||||
- **1m**: Intraday strategy development
|
||||
- **1s**: High-frequency analysis (use batch for large ranges)
|
||||
- **eod**: Long-term investment analysis
|
||||
|
||||
**Pricing Format:**
|
||||
Prices are in fixed-point notation. To convert to decimal:
|
||||
```
|
||||
decimal_price = int64_price / 1_000_000_000
|
||||
```
|
||||
|
||||
For ES futures at 4500.00, the value would be stored as `4500000000000`.
|
||||
|
||||
## Trades Schema
|
||||
|
||||
Individual trade executions with price, size, and side information.
|
||||
|
||||
| Field | Type | Description | Values |
|
||||
|-------|------|-------------|--------|
|
||||
| `price` | int64 | Trade execution price | Fixed-point (÷ 1e9) |
|
||||
| `size` | uint32 | Trade size | Contracts/shares |
|
||||
| `action` | char | Trade action | 'T' = trade, 'C' = cancel |
|
||||
| `side` | char | Aggressor side | 'B' = buy, 'S' = sell, 'N' = none |
|
||||
| `flags` | uint8 | Trade flags | Bitmask |
|
||||
| `depth` | uint8 | Depth level | Usually 0 |
|
||||
| `ts_in_delta` | int32 | Time delta | Nanoseconds |
|
||||
| `sequence` | uint32 | Sequence number | Venue-specific |
|
||||
|
||||
**When to Use:**
|
||||
- Intraday order flow analysis
|
||||
- Tick-by-tick backtesting
|
||||
- Market microstructure research
|
||||
- Volume profile analysis
|
||||
|
||||
**Aggressor Side:**
|
||||
- `B` = Buy-side aggressor (market buy hit the ask)
|
||||
- `S` = Sell-side aggressor (market sell hit the bid)
|
||||
- `N` = Cannot be determined or not applicable
|
||||
|
||||
**Important:** For multi-day tick data, use batch downloads. Trades can generate millions of records per day.
|
||||
|
||||
## MBP-1 Schema (Market By Price - Top of Book)
|
||||
|
||||
Level 1 order book data showing best bid and ask.
|
||||
|
||||
| Field | Type | Description | Values |
|
||||
|-------|------|-------------|--------|
|
||||
| `price` | int64 | Reference price (usually last trade) | Fixed-point (÷ 1e9) |
|
||||
| `size` | uint32 | Reference size | Contracts/shares |
|
||||
| `action` | char | Book action | 'A' = add, 'C' = cancel, 'M' = modify, 'T' = trade |
|
||||
| `side` | char | Order side | 'B' = bid, 'A' = ask, 'N' = none |
|
||||
| `flags` | uint8 | Flags | Bitmask |
|
||||
| `depth` | uint8 | Depth level | Always 0 for MBP-1 |
|
||||
| `ts_in_delta` | int32 | Time delta | Nanoseconds |
|
||||
| `sequence` | uint32 | Sequence number | Venue-specific |
|
||||
| `bid_px_00` | int64 | Best bid price | Fixed-point (÷ 1e9) |
|
||||
| `ask_px_00` | int64 | Best ask price | Fixed-point (÷ 1e9) |
|
||||
| `bid_sz_00` | uint32 | Best bid size | Contracts/shares |
|
||||
| `ask_sz_00` | uint32 | Best ask size | Contracts/shares |
|
||||
| `bid_ct_00` | uint32 | Bid order count | Number of orders |
|
||||
| `ask_ct_00` | uint32 | Ask order count | Number of orders |
|
||||
|
||||
**When to Use:**
|
||||
- Bid/ask spread analysis
|
||||
- Liquidity analysis
|
||||
- Market microstructure studies
|
||||
- Quote-based strategies
|
||||
|
||||
**Key Metrics:**
|
||||
```
|
||||
spread = ask_px_00 - bid_px_00
|
||||
mid_price = (bid_px_00 + ask_px_00) / 2
|
||||
bid_ask_imbalance = (bid_sz_00 - ask_sz_00) / (bid_sz_00 + ask_sz_00)
|
||||
```
|
||||
|
||||
## MBP-10 Schema (Market By Price - 10 Levels)
|
||||
|
||||
Level 2 order book data showing 10 levels of depth.
|
||||
|
||||
**Fields:** Same as MBP-1, plus 9 additional levels:
|
||||
- `bid_px_01` through `bid_px_09` (10 bid levels)
|
||||
- `ask_px_01` through `ask_px_09` (10 ask levels)
|
||||
- `bid_sz_01` through `bid_sz_09`
|
||||
- `ask_sz_01` through `ask_sz_09`
|
||||
- `bid_ct_01` through `bid_ct_09`
|
||||
- `ask_ct_01` through `ask_ct_09`
|
||||
|
||||
**When to Use:**
|
||||
- Order book depth analysis
|
||||
- Liquidity beyond top of book
|
||||
- Order flow imbalance at multiple levels
|
||||
- Market impact modeling
|
||||
|
||||
**Important:** MBP-10 generates significantly more data than MBP-1. Use batch downloads for multi-day requests.
|
||||
|
||||
## MBO Schema (Market By Order)
|
||||
|
||||
Level 3 order-level data with individual order IDs - most granular.
|
||||
|
||||
| Field | Type | Description | Values |
|
||||
|-------|------|-------------|--------|
|
||||
| `order_id` | uint64 | Unique order ID | Venue-specific |
|
||||
| `price` | int64 | Order price | Fixed-point (÷ 1e9) |
|
||||
| `size` | uint32 | Order size | Contracts/shares |
|
||||
| `flags` | uint8 | Flags | Bitmask |
|
||||
| `channel_id` | uint8 | Channel ID | Venue-specific |
|
||||
| `action` | char | Order action | 'A' = add, 'C' = cancel, 'M' = modify, 'F' = fill, 'T' = trade |
|
||||
| `side` | char | Order side | 'B' = bid, 'A' = ask, 'N' = none |
|
||||
| `ts_in_delta` | int32 | Time delta | Nanoseconds |
|
||||
| `sequence` | uint32 | Sequence number | Venue-specific |
|
||||
|
||||
**When to Use:**
|
||||
- Highest granularity order flow analysis
|
||||
- Order-level reconstructions
|
||||
- Advanced market microstructure research
|
||||
- Queue position analysis
|
||||
|
||||
**Important:** MBO data is extremely granular and generates massive datasets. Always use batch downloads and carefully check costs.
|
||||
|
||||
## Definition Schema
|
||||
|
||||
Instrument metadata and definitions.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `ts_recv` | uint64 | Receipt timestamp |
|
||||
| `min_price_increment` | int64 | Minimum tick size |
|
||||
| `display_factor` | int64 | Display factor for prices |
|
||||
| `expiration` | uint64 | Contract expiration timestamp |
|
||||
| `activation` | uint64 | Contract activation timestamp |
|
||||
| `high_limit_price` | int64 | Upper price limit |
|
||||
| `low_limit_price` | int64 | Lower price limit |
|
||||
| `max_price_variation` | int64 | Maximum price move |
|
||||
| `trading_reference_price` | int64 | Reference price |
|
||||
| `unit_of_measure_qty` | int64 | Contract size |
|
||||
| `min_price_increment_amount` | int64 | Tick value |
|
||||
| `price_ratio` | int64 | Price ratio |
|
||||
| `inst_attrib_value` | int32 | Instrument attributes |
|
||||
| `underlying_id` | uint32 | Underlying instrument ID |
|
||||
| `raw_instrument_id` | uint32 | Raw instrument ID |
|
||||
| `market_depth_implied` | int32 | Implied depth |
|
||||
| `market_depth` | int32 | Market depth |
|
||||
| `market_segment_id` | uint32 | Market segment |
|
||||
| `max_trade_vol` | uint32 | Maximum trade volume |
|
||||
| `min_lot_size` | int32 | Minimum lot size |
|
||||
| `min_lot_size_block` | int32 | Block trade minimum |
|
||||
| `min_lot_size_round_lot` | int32 | Round lot minimum |
|
||||
| `min_trade_vol` | uint32 | Minimum trade volume |
|
||||
| `contract_multiplier` | int32 | Contract multiplier |
|
||||
| `decay_quantity` | int32 | Decay quantity |
|
||||
| `original_contract_size` | int32 | Original size |
|
||||
| `trading_reference_date` | uint16 | Reference date |
|
||||
| `appl_id` | int16 | Application ID |
|
||||
| `maturity_year` | uint16 | Year |
|
||||
| `decay_start_date` | uint16 | Decay start |
|
||||
| `channel_id` | uint16 | Channel |
|
||||
| `currency` | string | Currency code |
|
||||
| `settl_currency` | string | Settlement currency |
|
||||
| `secsubtype` | string | Security subtype |
|
||||
| `raw_symbol` | string | Raw symbol |
|
||||
| `group` | string | Instrument group |
|
||||
| `exchange` | string | Exchange code |
|
||||
| `asset` | string | Asset class |
|
||||
| `cfi` | string | CFI code |
|
||||
| `security_type` | string | Security type |
|
||||
| `unit_of_measure` | string | Unit of measure |
|
||||
| `underlying` | string | Underlying symbol |
|
||||
| `strike_price_currency` | string | Strike currency |
|
||||
| `instrument_class` | char | Class |
|
||||
| `strike_price` | int64 | Strike price (options) |
|
||||
| `match_algorithm` | char | Matching algorithm |
|
||||
| `md_security_trading_status` | uint8 | Trading status |
|
||||
| `main_fraction` | uint8 | Main fraction |
|
||||
| `price_display_format` | uint8 | Display format |
|
||||
| `settl_price_type` | uint8 | Settlement type |
|
||||
| `sub_fraction` | uint8 | Sub fraction |
|
||||
| `underlying_product` | uint8 | Underlying product |
|
||||
| `security_update_action` | char | Update action |
|
||||
| `maturity_month` | uint8 | Month |
|
||||
| `maturity_day` | uint8 | Day |
|
||||
| `maturity_week` | uint8 | Week |
|
||||
| `user_defined_instrument` | char | User-defined |
|
||||
| `contract_multiplier_unit` | int8 | Multiplier unit |
|
||||
| `flow_schedule_type` | int8 | Flow schedule |
|
||||
| `tick_rule` | uint8 | Tick rule |
|
||||
|
||||
**When to Use:**
|
||||
- Understanding instrument specifications
|
||||
- Calculating tick values
|
||||
- Contract expiration management
|
||||
- Symbol resolution and mapping
|
||||
|
||||
**Key Fields for ES/NQ:**
|
||||
- `min_price_increment`: Tick size (0.25 for ES, 0.25 for NQ)
|
||||
- `expiration`: Contract expiration timestamp
|
||||
- `raw_symbol`: Exchange symbol
|
||||
- `contract_multiplier`: Usually 50 for ES, 20 for NQ
|
||||
|
||||
## Statistics Schema
|
||||
|
||||
Market statistics and calculated metrics.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `ts_recv` | uint64 | Receipt timestamp |
|
||||
| `ts_ref` | uint64 | Reference timestamp |
|
||||
| `price` | int64 | Reference price |
|
||||
| `quantity` | int64 | Reference quantity |
|
||||
| `sequence` | uint32 | Sequence number |
|
||||
| `ts_in_delta` | int32 | Time delta |
|
||||
| `stat_type` | uint16 | Statistic type |
|
||||
| `channel_id` | uint16 | Channel ID |
|
||||
| `update_action` | uint8 | Update action |
|
||||
| `stat_flags` | uint8 | Statistic flags |
|
||||
|
||||
**Common Statistic Types:**
|
||||
- Opening price
|
||||
- Settlement price
|
||||
- High/low prices
|
||||
- Trading volume
|
||||
- Open interest
|
||||
|
||||
**When to Use:**
|
||||
- Official settlement prices
|
||||
- Open interest analysis
|
||||
- Exchange-calculated statistics
|
||||
|
||||
## Status Schema
|
||||
|
||||
Instrument trading status and state changes.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `ts_recv` | uint64 | Receipt timestamp |
|
||||
| `ts_event` | uint64 | Event timestamp |
|
||||
| `action` | uint16 | Status action |
|
||||
| `reason` | uint16 | Status reason |
|
||||
| `trading_event` | uint16 | Trading event |
|
||||
| `is_trading` | int8 | Trading flag (1 = trading, 0 = not trading) |
|
||||
| `is_quoting` | int8 | Quoting flag |
|
||||
| `is_short_sell_restricted` | int8 | Short sell flag |
|
||||
|
||||
**When to Use:**
|
||||
- Detecting trading halts
|
||||
- Understanding market status changes
|
||||
- Filtering data by trading status
|
||||
|
||||
## Imbalance Schema
|
||||
|
||||
Order imbalance data for auctions and closes.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `ts_recv` | uint64 | Receipt timestamp |
|
||||
| `ts_event` | uint64 | Event timestamp |
|
||||
| `ref_price` | int64 | Reference price |
|
||||
| `auction_time` | uint64 | Auction timestamp |
|
||||
| `cont_book_clr_price` | int64 | Continuous book clearing price |
|
||||
| `auct_interest_clr_price` | int64 | Auction interest clearing price |
|
||||
| `paired_qty` | uint64 | Paired quantity |
|
||||
| `total_imbalance_qty` | uint64 | Total imbalance |
|
||||
| `side` | char | Imbalance side ('B' or 'A') |
|
||||
| `significant_imbalance` | char | Significance flag |
|
||||
|
||||
**When to Use:**
|
||||
- Opening/closing auction analysis
|
||||
- Imbalance trading strategies
|
||||
- End-of-day positioning
|
||||
|
||||
## Schema Selection Decision Matrix
|
||||
|
||||
| Analysis Type | Recommended Schema | Alternative |
|
||||
|---------------|-------------------|-------------|
|
||||
| Daily backtesting | ohlcv-1d | ohlcv-1h |
|
||||
| Intraday backtesting | ohlcv-1h, ohlcv-1m | trades |
|
||||
| Spread analysis | mbp-1 | trades |
|
||||
| Order flow | trades | mbp-1 |
|
||||
| Market depth | mbp-10 | mbo |
|
||||
| Tick-by-tick | trades | mbo |
|
||||
| Liquidity analysis | mbp-1, mbp-10 | mbo |
|
||||
| Contract specifications | definition | - |
|
||||
| Settlement prices | statistics | definition |
|
||||
| Trading halts | status | - |
|
||||
| Auction analysis | imbalance | trades |
|
||||
|
||||
## Data Type Reference
|
||||
|
||||
### Fixed-Point Prices
|
||||
All price fields are stored as int64 in fixed-point notation with 9 decimal places of precision.
|
||||
|
||||
**Conversion:**
|
||||
```python
|
||||
decimal_price = int64_price / 1_000_000_000
|
||||
```
|
||||
|
||||
**Example:**
|
||||
- ES at 4500.25 → stored as 4500250000000
|
||||
- NQ at 15000.50 → stored as 15000500000000
|
||||
|
||||
### Timestamps
|
||||
All timestamps are uint64 nanoseconds since Unix epoch (1970-01-01 00:00:00 UTC).
|
||||
|
||||
**Conversion to datetime:**
|
||||
```python
|
||||
import datetime
|
||||
dt = datetime.datetime.fromtimestamp(ts_event / 1_000_000_000, tz=datetime.timezone.utc)
|
||||
```
|
||||
|
||||
### Character Fields
|
||||
Single-character fields (char) represent enums:
|
||||
- Action: 'A' (add), 'C' (cancel), 'M' (modify), 'T' (trade), 'F' (fill)
|
||||
- Side: 'B' (bid), 'A' (ask), 'N' (none/unknown)
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
### Schema Size (Approximate bytes per record)
|
||||
|
||||
| Schema | Size | Records/GB |
|
||||
|--------|------|------------|
|
||||
| ohlcv-1d | ~100 | ~10M |
|
||||
| ohlcv-1h | ~100 | ~10M |
|
||||
| trades | ~50 | ~20M |
|
||||
| mbp-1 | ~150 | ~6.7M |
|
||||
| mbp-10 | ~500 | ~2M |
|
||||
| mbo | ~80 | ~12.5M |
|
||||
|
||||
**Planning requests:**
|
||||
- 1 day of ES trades ≈ 100K-500K records ≈ 5-25 MB
|
||||
- 1 day of ES mbp-1 ≈ 1M-5M records ≈ 150-750 MB
|
||||
- 1 year of ES ohlcv-1h ≈ 6K records ≈ 600 KB
|
||||
|
||||
Use these estimates to decide between timeseries (< 5GB) and batch downloads (> 5GB).
|
||||
451
skills/databento/references/symbology.md
Normal file
451
skills/databento/references/symbology.md
Normal file
@@ -0,0 +1,451 @@
|
||||
# Databento Symbology Reference
|
||||
|
||||
Comprehensive guide to Databento's symbology system including continuous contracts, symbol types, and resolution strategies.
|
||||
|
||||
## Symbol Types (stypes)
|
||||
|
||||
Databento supports multiple symbology naming conventions. Use `mcp__databento__symbology_resolve` to convert between types.
|
||||
|
||||
### raw_symbol
|
||||
Native exchange symbols as provided by the venue.
|
||||
|
||||
**Examples:**
|
||||
- `ESH5` - ES March 2025 contract
|
||||
- `NQM5` - NQ June 2025 contract
|
||||
- `AAPL` - Apple Inc. stock
|
||||
- `SPY` - SPDR S&P 500 ETF
|
||||
|
||||
**When to use:**
|
||||
- Working with specific contract months
|
||||
- Exact symbol from exchange documentation
|
||||
- Historical analysis of specific expirations
|
||||
|
||||
**Limitations:**
|
||||
- Requires knowing exact contract codes
|
||||
- Different venues use different conventions
|
||||
- Doesn't handle roll automatically
|
||||
|
||||
### instrument_id
|
||||
Databento's internal numeric identifier for each instrument.
|
||||
|
||||
**Examples:**
|
||||
- `123456789` - Unique ID for ESH5
|
||||
- `987654321` - Unique ID for NQM5
|
||||
|
||||
**When to use:**
|
||||
- After symbol resolution
|
||||
- Internally within Databento system
|
||||
- When guaranteed uniqueness is required
|
||||
|
||||
**Benefits:**
|
||||
- Globally unique across all venues
|
||||
- Never changes for a given instrument
|
||||
- Most efficient for API requests
|
||||
|
||||
**Limitations:**
|
||||
- Not human-readable
|
||||
- Requires resolution step to obtain
|
||||
|
||||
### continuous
|
||||
Continuous contract notation with automatic rolling for futures.
|
||||
|
||||
**Format:** `{ROOT}.{STRATEGY}.{OFFSET}`
|
||||
|
||||
**Examples:**
|
||||
- `ES.c.0` - ES front month, calendar roll
|
||||
- `NQ.n.0` - NQ front month, open interest roll
|
||||
- `ES.v.1` - ES second month, volume roll
|
||||
- `GC.c.0` - Gold front month, calendar roll
|
||||
|
||||
**When to use:**
|
||||
- Backtesting across multiple expirations
|
||||
- Avoiding roll gaps in analysis
|
||||
- Long-term continuous price series
|
||||
|
||||
**Benefits:**
|
||||
- Automatic roll handling
|
||||
- Consistent symbology across time
|
||||
- Ideal for backtesting
|
||||
|
||||
### parent
|
||||
Parent contract symbols for options or complex instruments.
|
||||
|
||||
**Examples:**
|
||||
- `ES` - Parent for all ES contracts
|
||||
- `NQ` - Parent for all NQ contracts
|
||||
|
||||
**When to use:**
|
||||
- Options underlying symbols
|
||||
- Querying all contracts in a family
|
||||
- Getting contract family metadata
|
||||
|
||||
## Continuous Contract Deep Dive
|
||||
|
||||
Continuous contracts are the most powerful feature for futures analysis. They automatically handle contract rolls using different strategies.
|
||||
|
||||
### Roll Strategies
|
||||
|
||||
#### Calendar Roll (.c.X)
|
||||
Rolls on fixed calendar dates regardless of market activity.
|
||||
|
||||
**Notation:** `ES.c.0`, `NQ.c.1`
|
||||
|
||||
**Roll Timing:**
|
||||
- ES: Rolls 8 days before contract expiration
|
||||
- NQ: Rolls 8 days before contract expiration
|
||||
|
||||
**When to use:**
|
||||
- Standard backtesting
|
||||
- Most predictable roll schedule
|
||||
- When roll timing is less critical
|
||||
|
||||
**Pros:**
|
||||
- Predictable roll dates
|
||||
- Consistent across instruments
|
||||
- Simple to understand
|
||||
|
||||
**Cons:**
|
||||
- May roll during low liquidity
|
||||
- Doesn't consider market dynamics
|
||||
|
||||
#### Open Interest Roll (.n.X)
|
||||
Rolls when open interest moves to the next contract.
|
||||
|
||||
**Notation:** `ES.n.0`, `NQ.n.1`
|
||||
|
||||
**Roll Timing:**
|
||||
- Switches when next contract's OI > current contract's OI
|
||||
|
||||
**When to use:**
|
||||
- Avoiding early rolls
|
||||
- Following market participants
|
||||
- When market dynamics matter
|
||||
|
||||
**Pros:**
|
||||
- Follows market behavior
|
||||
- Natural transition point
|
||||
- Avoids artificial timing
|
||||
|
||||
**Cons:**
|
||||
- Less predictable timing
|
||||
- Can be delayed during low volume
|
||||
- Different instruments roll at different times
|
||||
|
||||
#### Volume Roll (.v.X)
|
||||
Rolls when trading volume moves to the next contract.
|
||||
|
||||
**Notation:** `ES.v.0`, `NQ.v.1`
|
||||
|
||||
**Roll Timing:**
|
||||
- Switches when next contract's volume > current contract's volume
|
||||
|
||||
**When to use:**
|
||||
- Following most liquid contract
|
||||
- High-frequency analysis
|
||||
- When execution quality matters
|
||||
|
||||
**Pros:**
|
||||
- Always in most liquid contract
|
||||
- Best for execution
|
||||
- Real-time liquidity tracking
|
||||
|
||||
**Cons:**
|
||||
- Most variable timing
|
||||
- Can switch back and forth
|
||||
- Requires careful validation
|
||||
|
||||
### Offset Parameter (.X)
|
||||
|
||||
The offset determines which contract month in the series.
|
||||
|
||||
| Offset | Description | Example Usage |
|
||||
|--------|-------------|---------------|
|
||||
| `.0` | Front month | Primary trading contract |
|
||||
| `.1` | Second month | Spread analysis vs front |
|
||||
| `.2` | Third month | Deferred spread analysis |
|
||||
| `.3+` | Further months | Calendar spread strategies |
|
||||
|
||||
**Common Patterns:**
|
||||
- `ES.c.0` - Standard ES continuous (front month)
|
||||
- `ES.c.0,ES.c.1` - ES calendar spread (front vs back)
|
||||
- `ES.c.0,NQ.c.0` - ES/NQ pair analysis
|
||||
|
||||
## ES/NQ Specific Symbology
|
||||
|
||||
### ES (E-mini S&P 500)
|
||||
|
||||
**Contract Months:** H (Mar), M (Jun), U (Sep), Z (Dec)
|
||||
|
||||
**Raw Symbol Format:** `ES{MONTH}{YEAR}`
|
||||
- `ESH5` = March 2025
|
||||
- `ESM5` = June 2025
|
||||
- `ESU5` = September 2025
|
||||
- `ESZ5` = December 2025
|
||||
|
||||
**Continuous Contracts:**
|
||||
- `ES.c.0` - Front month (most common)
|
||||
- `ES.n.0` - OI-based front month
|
||||
- `ES.v.0` - Volume-based front month
|
||||
|
||||
**Tick Size:** 0.25 points ($12.50 per tick)
|
||||
**Contract Multiplier:** $50 per point
|
||||
**Trading Hours:** Nearly 24 hours (Sunday 6pm - Friday 5pm ET)
|
||||
|
||||
### NQ (E-mini Nasdaq-100)
|
||||
|
||||
**Contract Months:** H (Mar), M (Jun), U (Sep), Z (Dec)
|
||||
|
||||
**Raw Symbol Format:** `NQ{MONTH}{YEAR}`
|
||||
- `NQH5` = March 2025
|
||||
- `NQM5` = June 2025
|
||||
- `NQU5` = September 2025
|
||||
- `NQZ5` = December 2025
|
||||
|
||||
**Continuous Contracts:**
|
||||
- `NQ.c.0` - Front month (most common)
|
||||
- `NQ.n.0` - OI-based front month
|
||||
- `NQ.v.0` - Volume-based front month
|
||||
|
||||
**Tick Size:** 0.25 points ($5.00 per tick)
|
||||
**Contract Multiplier:** $20 per point
|
||||
**Trading Hours:** Nearly 24 hours (Sunday 6pm - Friday 5pm ET)
|
||||
|
||||
### Month Codes Reference
|
||||
|
||||
| Code | Month | Typical Expiration |
|
||||
|------|-------|-------------------|
|
||||
| F | January | 3rd Friday |
|
||||
| G | February | 3rd Friday |
|
||||
| H | March | 3rd Friday |
|
||||
| J | April | 3rd Friday |
|
||||
| K | May | 3rd Friday |
|
||||
| M | June | 3rd Friday |
|
||||
| N | July | 3rd Friday |
|
||||
| Q | August | 3rd Friday |
|
||||
| U | September | 3rd Friday |
|
||||
| V | October | 3rd Friday |
|
||||
| X | November | 3rd Friday |
|
||||
| Z | December | 3rd Friday |
|
||||
|
||||
**Note:** ES/NQ only trade quarterly contracts (H, M, U, Z).
|
||||
|
||||
## Symbol Resolution
|
||||
|
||||
Use `mcp__databento__symbology_resolve` to convert between symbol types.
|
||||
|
||||
### Common Resolution Patterns
|
||||
|
||||
**Continuous to Instrument ID:**
|
||||
```
|
||||
Input: ES.c.0
|
||||
stype_in: continuous
|
||||
stype_out: instrument_id
|
||||
Result: Maps to current front month's instrument_id
|
||||
```
|
||||
|
||||
**Raw Symbol to Instrument ID:**
|
||||
```
|
||||
Input: ESH5
|
||||
stype_in: raw_symbol
|
||||
stype_out: instrument_id
|
||||
Result: Specific instrument_id for ESH5
|
||||
```
|
||||
|
||||
**Continuous to Raw Symbol:**
|
||||
```
|
||||
Input: ES.c.0
|
||||
stype_in: continuous
|
||||
stype_out: raw_symbol
|
||||
Result: Current front month symbol (e.g., ESH5)
|
||||
```
|
||||
|
||||
### Time-Based Resolution
|
||||
|
||||
Symbol resolution is **date-dependent**. The same continuous contract resolves to different instruments across time.
|
||||
|
||||
**Example:**
|
||||
- `ES.c.0` on 2024-01-15 → ESH4 (March 2024)
|
||||
- `ES.c.0` on 2024-04-15 → ESM4 (June 2024)
|
||||
- `ES.c.0` on 2024-07-15 → ESU4 (September 2024)
|
||||
|
||||
**Important:** Always specify `start_date` and `end_date` when resolving symbols for historical analysis.
|
||||
|
||||
### Resolution Parameters
|
||||
|
||||
```
|
||||
mcp__databento__symbology_resolve
|
||||
- dataset: "GLBX.MDP3"
|
||||
- symbols: ["ES.c.0", "NQ.c.0"]
|
||||
- stype_in: "continuous"
|
||||
- stype_out: "instrument_id"
|
||||
- start_date: "2024-01-01"
|
||||
- end_date: "2024-12-31"
|
||||
```
|
||||
|
||||
Returns mapping of continuous symbols to instrument IDs for each day in the range.
|
||||
|
||||
## Expiration Handling
|
||||
|
||||
### Roll Dates
|
||||
|
||||
ES/NQ contracts expire on the **3rd Friday of the contract month** at 9:30 AM ET.
|
||||
|
||||
**Calendar Roll (.c.0) Schedule:**
|
||||
- Rolls **8 days before expiration**
|
||||
- Always rolls on the same relative day
|
||||
- Predictable for backtesting
|
||||
|
||||
**Example for ESH5 (March 2025):**
|
||||
- Expiration: Friday, March 21, 2025
|
||||
- Calendar roll: March 13, 2025 (8 days before)
|
||||
|
||||
### Roll Detection
|
||||
|
||||
To detect when a continuous contract rolled, compare instrument_id or raw_symbol across consecutive timestamps.
|
||||
|
||||
**Example:**
|
||||
```
|
||||
2024-03-12: ES.c.0 → ESH4
|
||||
2024-03-13: ES.c.0 → ESM4 (rolled!)
|
||||
```
|
||||
|
||||
### Handling Roll Gaps
|
||||
|
||||
Price discontinuities often occur at roll:
|
||||
|
||||
**Gap Detection:**
|
||||
```
|
||||
if abs(close_before_roll - open_after_roll) > threshold:
|
||||
# Roll gap detected
|
||||
```
|
||||
|
||||
**Adjustment Strategies:**
|
||||
1. **Ratio Adjustment:** Multiply historical prices by ratio
|
||||
2. **Difference Adjustment:** Add/subtract difference
|
||||
3. **No Adjustment:** Keep raw prices (most common for futures)
|
||||
|
||||
For ES/NQ futures, **no adjustment** is standard since contracts are similar.
|
||||
|
||||
## Symbol Validation
|
||||
|
||||
### Valid Symbol Patterns
|
||||
|
||||
**Continuous:**
|
||||
- Must match: `{ROOT}.{c|n|v}.{0-9+}`
|
||||
- Examples: `ES.c.0`, `NQ.n.1`, `GC.v.0`
|
||||
|
||||
**Raw Symbols (Futures):**
|
||||
- Must match: `{ROOT}{MONTH_CODE}{YEAR}`
|
||||
- Examples: `ESH5`, `NQZ4`, `GCM6`
|
||||
|
||||
**Equity Symbols:**
|
||||
- 1-5 uppercase letters
|
||||
- Examples: `AAPL`, `MSFT`, `SPY`, `GOOGL`
|
||||
|
||||
### Symbol Existence Validation
|
||||
|
||||
Before using a symbol, validate it exists in the dataset:
|
||||
|
||||
1. Use `mcp__databento__symbology_resolve` to resolve
|
||||
2. Use `mcp__databento__reference_search_securities` for metadata
|
||||
3. Check definition schema for instrument details
|
||||
|
||||
## Common Symbol Pitfalls
|
||||
|
||||
### 1. Wrong stype_in for Continuous Contracts
|
||||
**Wrong:**
|
||||
```
|
||||
symbols: "ES.c.0"
|
||||
stype_in: "raw_symbol" # WRONG!
|
||||
```
|
||||
|
||||
**Correct:**
|
||||
```
|
||||
symbols: "ES.c.0"
|
||||
stype_in: "continuous" # CORRECT
|
||||
```
|
||||
|
||||
### 2. Forgetting Date Range for Resolution
|
||||
**Wrong:**
|
||||
```
|
||||
symbology_resolve(symbols=["ES.c.0"], start_date="2024-01-01")
|
||||
# Missing end_date - only resolves for one day
|
||||
```
|
||||
|
||||
**Correct:**
|
||||
```
|
||||
symbology_resolve(symbols=["ES.c.0"], start_date="2024-01-01", end_date="2024-12-31")
|
||||
# Resolves for entire year
|
||||
```
|
||||
|
||||
### 3. Using Expired Contracts
|
||||
**Wrong:**
|
||||
```
|
||||
# ESH4 expired in March 2024
|
||||
symbols: "ESH4"
|
||||
start_date: "2024-06-01" # After expiration!
|
||||
```
|
||||
|
||||
**Correct:**
|
||||
```
|
||||
# Use continuous contract
|
||||
symbols: "ES.c.0"
|
||||
start_date: "2024-06-01" # Automatically maps to ESM4
|
||||
```
|
||||
|
||||
### 4. Mixing Symbol Types
|
||||
**Wrong:**
|
||||
```
|
||||
symbols: "ES.c.0,ESH5,123456" # Mixed types!
|
||||
```
|
||||
|
||||
**Correct:**
|
||||
```
|
||||
# Resolve separately or use same type
|
||||
symbols: "ES.c.0,NQ.c.0" # All continuous
|
||||
```
|
||||
|
||||
## Symbol Best Practices
|
||||
|
||||
1. **Use continuous contracts for backtesting** - Avoids manual roll management
|
||||
2. **Prefer calendar rolls (.c.X) unless specific reason** - Most predictable
|
||||
3. **Always validate symbols exist** - Use symbology_resolve before fetching data
|
||||
4. **Specify date ranges for resolution** - Symbol meanings change over time
|
||||
5. **Use instrument_id after resolution** - Most efficient for API calls
|
||||
6. **Document roll strategy** - Know which roll type (.c/.n/.v) you're using
|
||||
7. **Test around roll dates** - Verify behavior during contract transitions
|
||||
8. **Cache symbol mappings** - Don't re-resolve repeatedly
|
||||
|
||||
## Quick Reference: Common Symbols
|
||||
|
||||
### ES/NQ Continuous (Most Common)
|
||||
```
|
||||
ES.c.0 # ES front month, calendar roll
|
||||
NQ.c.0 # NQ front month, calendar roll
|
||||
ES.c.1 # ES second month
|
||||
NQ.c.1 # NQ second month
|
||||
```
|
||||
|
||||
### ES/NQ Specific Contracts (2025)
|
||||
```
|
||||
ESH5 # ES March 2025
|
||||
ESM5 # ES June 2025
|
||||
ESU5 # ES September 2025
|
||||
ESZ5 # ES December 2025
|
||||
|
||||
NQH5 # NQ March 2025
|
||||
NQM5 # NQ June 2025
|
||||
NQU5 # NQ September 2025
|
||||
NQZ5 # NQ December 2025
|
||||
```
|
||||
|
||||
### Equity Market Breadth (Supporting ES/NQ Analysis)
|
||||
```
|
||||
SPY # SPDR S&P 500 ETF
|
||||
QQQ # Invesco QQQ (Nasdaq-100 ETF)
|
||||
VIX # CBOE Volatility Index
|
||||
TICK # NYSE TICK
|
||||
VOLD # NYSE Volume Delta
|
||||
```
|
||||
|
||||
For equity symbols, use dataset `XNAS.ITCH` (Nasdaq) or other appropriate equity dataset.
|
||||
Reference in New Issue
Block a user