11 KiB
Databento Cost Optimization Guide
Strategies and best practices for minimizing costs when working with Databento market data.
Databento Pricing Model
Cost Components
- Databento Usage Fees - Pay-per-use or subscription
- Exchange License Fees - Venue-dependent (varies by exchange)
- Data Volume - Amount of data retrieved
Pricing Tiers
Free Credits:
- $125 free credits for new users
- Good for initial development and testing
Usage-Based:
- Pay only for data you use
- Varies by venue and data type
- No minimum commitment
Subscriptions:
- Basic Plan: $199/month
- Corporate Actions/Security Master: $299/month
- Flat-rate access to specific datasets
Cost Estimation (ALWAYS Do This First)
Use metadata_get_cost Before Every Request
Always estimate cost before fetching data:
mcp__databento__metadata_get_cost(
dataset="GLBX.MDP3",
start="2024-01-01",
end="2024-01-31",
symbols="ES.c.0",
schema="ohlcv-1h"
)
Returns:
- Estimated cost in USD
- Data size estimate
- Helps decide if request is reasonable
When Cost Checks Matter Most
- Multi-day tick data - Can be expensive
- Multiple symbols - Costs multiply
- High-granularity schemas - trades, mbp-1, mbo
- Long date ranges - Weeks or months of data
Example Cost Check:
# Cheap: 1 month of daily bars
cost_check(schema="ohlcv-1d", start="2024-01-01", end="2024-01-31")
# Estimated: $0.10
# Expensive: 1 month of tick trades
cost_check(schema="trades", start="2024-01-01", end="2024-01-31")
# Estimated: $50-$200 (depends on volume)
Historical Data (T+1) - No Licensing Required
Key Insight: Historical data that is 24+ hours old (T+1) does not require exchange licensing fees.
Cost Breakdown
Live/Recent Data (< 24 hours):
- Databento fees + Exchange licensing fees
Historical Data (24+ hours old):
- Databento fees only (no exchange licensing)
- Significantly cheaper
Optimization Strategy
For Development:
- Use T+1 data for strategy development
- Switch to live data only for production
For Backtesting:
- Always use historical (T+1) data
- Much more cost-effective
- Same data quality
Example:
# Expensive: Yesterday's data (< 24 hours)
start="2024-11-05" # Requires licensing
# Cheap: 3 days ago (> 24 hours)
start="2024-11-03" # No licensing required
Schema Selection for Cost
Different schemas have vastly different costs due to data volume.
Schema Cost Hierarchy (Cheapest to Most Expensive)
-
ohlcv-1d (Cheapest)
- ~100 bytes per record
- ~250 records per symbol per year
- Best for: Long-term backtesting
-
ohlcv-1h
- ~100 bytes per record
- ~6,000 records per symbol per year
- Best for: Multi-day backtesting
-
ohlcv-1m
- ~100 bytes per record
- ~360,000 records per symbol per year
- Best for: Intraday strategies
-
trades
- ~50 bytes per record
- ~100K-500K records per symbol per day (ES/NQ)
- Best for: Tick analysis (use selectively)
-
mbp-1
- ~150 bytes per record
- ~1M-5M records per symbol per day
- Best for: Order flow analysis (use selectively)
-
mbp-10
- ~500 bytes per record
- ~1M-5M records per symbol per day
- Best for: Deep order book analysis (expensive!)
-
mbo (Most Expensive)
- ~80 bytes per record
- ~5M-20M records per symbol per day
- Best for: Order-level research (very expensive!)
Cost Optimization Strategy
Start with lower granularity:
- Develop strategy with ohlcv-1h or ohlcv-1d
- Validate with ohlcv-1m if needed
- Only use trades/mbp-1 if absolutely necessary
- Avoid mbp-10/mbo unless essential
Example:
# Cheap: Daily bars for 1 year
schema="ohlcv-1d"
start="2023-01-01"
end="2023-12-31"
# Cost: < $1
# Expensive: Trades for 1 year
schema="trades"
start="2023-01-01"
end="2023-12-31"
# Cost: $500-$2000 (depending on venue)
Symbol Selection
Fewer symbols = lower cost. Be selective.
Strategies
1. Start with Single Symbol
# Development
symbols="ES.c.0" # Just ES
# After validation, expand
symbols="ES.c.0,NQ.c.0" # Add NQ
2. Use Continuous Contracts
# Good: Single continuous contract
symbols="ES.c.0" # Covers all front months
# Wasteful: Multiple specific contracts
symbols="ESH5,ESM5,ESU5,ESZ5" # Same data, 4x cost
3. Avoid Symbol Wildcards
# Expensive: All instruments
symbols="*" # Don't do this!
# Targeted: Just what you need
symbols="ES.c.0,NQ.c.0" # Explicit
Date Range Optimization
Request only the data you need.
Strategies
1. Iterative Refinement
# First: Test with small range
start="2024-01-01"
end="2024-01-07" # Just 1 week
# Then: Expand after validation
start="2024-01-01"
end="2024-12-31" # Full year
2. Segment Long Ranges
# Instead of: 5 years at once
start="2019-01-01"
end="2024-12-31"
# Do: Segment by year
start="2024-01-01"
end="2024-12-31"
# Process, then request next year if needed
3. Use Limit for Testing
# Test with small limit first
limit=100 # Just 100 records
# After validation, increase or remove
limit=10000 # Larger sample
Batch vs Timeseries Selection
Choose the right tool for the job.
Timeseries (< 5GB)
When to use:
- Small to medium datasets
- Quick exploration
- <= 1 day of tick data
- Any OHLCV data
Benefits:
- Immediate results
- No job management
- Direct response
Costs:
- Same per-record cost as batch
Batch Downloads (> 5GB)
When to use:
- Large datasets (> 5GB)
- Multi-day tick data
- Multiple symbols over long periods
- Production data pipelines
Benefits:
- More efficient for large data
- Can split output files
- Asynchronous processing
Costs:
- Same per-record cost as timeseries
- No additional fees for batch processing
Decision Matrix
| Data Type | Date Range | Method |
|---|---|---|
| ohlcv-1h | 1 year | Timeseries |
| ohlcv-1d | Any | Timeseries |
| trades | 1 day | Timeseries |
| trades | 1 week+ | Batch |
| mbp-1 | 1 day | Batch (safer) |
| mbp-1 | 1 week+ | Batch |
DBEQ Bundle - Zero Exchange Fees
Databento offers a special bundle for US equities with $0 exchange fees.
DBEQ.BASIC Dataset
Coverage:
- US equity securities
- Zero licensing fees
- Databento usage fees only
When to use:
- Equity market breadth for ES/NQ analysis
- Testing equity strategies
- Learning market data APIs
Example:
# Regular equity dataset (has exchange fees)
dataset="XNAS.ITCH"
# Cost: Databento + Nasdaq fees
# DBEQ bundle (no exchange fees)
dataset="DBEQ.BASIC"
# Cost: Databento fees only
Caching and Reuse
Don't fetch the same data multiple times.
Strategies
1. Cache Locally
# First request: Fetch and save
data = fetch_data(...)
save_to_disk(data, "ES_2024_ohlcv1h.csv")
# Subsequent runs: Load from disk
data = load_from_disk("ES_2024_ohlcv1h.csv")
2. Incremental Updates
# Initial: Fetch full history
start="2023-01-01"
end="2024-01-01"
# Later: Fetch only new data
start="2024-01-01" # Resume from last fetch
end="2024-12-31"
3. Share Data Across Analyses
# Fetch once
historical_data = fetch_data(schema="ohlcv-1h", ...)
# Use multiple times
backtest_strategy_a(historical_data)
backtest_strategy_b(historical_data)
backtest_strategy_c(historical_data)
Session-Based Analysis
For ES/NQ, consider filtering by trading session to reduce data volume.
Sessions
- Asian Session: 6pm-2am ET
- London Session: 2am-8am ET
- New York Session: 8am-4pm ET
Cost Benefit
Full 24-hour data:
- Maximum data volume
- Higher cost
Session-filtered data:
- 1/3 to 1/2 the volume
- Lower cost
- May be sufficient for analysis
Example:
# Expensive: Full 24-hour data
# Process all records
# Cheaper: NY session only
# Filter records to 8am-4pm ET
# ~1/3 the data volume
Use scripts/session_filter.py to filter post-fetch, or request only specific hours.
Monitoring Usage
Track your usage to avoid surprises.
Check Dashboard
- Databento provides usage dashboard
- Monitor monthly spend
- Set alerts for limits
Set Monthly Limits
# In account settings
monthly_limit=$500
Review Costs Regularly
- Check cost estimates vs actual
- Identify expensive queries
- Adjust strategies
Cost Optimization Checklist
Before every data request:
- Estimate cost first - Use metadata_get_cost
- Use T+1 data - Avoid < 24 hour data unless necessary
- Choose lowest granularity schema - Start with ohlcv, not trades
- Minimize symbols - Only request what you need
- Limit date range - Test with small range first
- Use continuous contracts - Avoid requesting multiple months
- Cache locally - Don't re-fetch same data
- Consider DBEQ - Use zero-fee dataset when applicable
- Filter by session - Reduce volume if session-specific
- Use batch for large data - More efficient for > 5GB
Cost Examples
Cheap Requests (< $1)
# Daily bars for 1 year
dataset="GLBX.MDP3"
symbols="ES.c.0"
schema="ohlcv-1d"
start="2023-01-01"
end="2023-12-31"
# Estimated cost: $0.10
Moderate Requests ($1-$10)
# Hourly bars for 1 year
dataset="GLBX.MDP3"
symbols="ES.c.0,NQ.c.0"
schema="ohlcv-1h"
start="2023-01-01"
end="2023-12-31"
# Estimated cost: $2-5
Expensive Requests ($10-$100)
# Trades for 1 month
dataset="GLBX.MDP3"
symbols="ES.c.0"
schema="trades"
start="2024-01-01"
end="2024-01-31"
# Estimated cost: $20-50
Very Expensive Requests ($100+)
# MBP-10 for 1 month
dataset="GLBX.MDP3"
symbols="ES.c.0,NQ.c.0"
schema="mbp-10"
start="2024-01-01"
end="2024-01-31"
# Estimated cost: $200-500
Free Credit Strategy
Make the most of your $125 free credits:
-
Development Phase - Use free credits for:
- Testing API integration
- Small-scale strategy development
- Learning the platform
-
Prioritize T+1 Data - Stretch credits further:
- Avoid real-time data during development
- Use historical data (no licensing fees)
-
Start with OHLCV - Cheapest data:
- Develop strategy with daily/hourly bars
- Validate before moving to tick data
-
Cache Everything - Don't waste credits:
- Save all fetched data locally
- Reuse for multiple analyses
-
Monitor Remaining Balance:
- Check credit usage regularly
- Adjust requests to stay within budget
Summary
Most Important Cost-Saving Strategies:
- ✅ Always check cost first - Use metadata_get_cost
- ✅ Use T+1 data - 24+ hours old, no licensing fees
- ✅ Start with OHLCV schemas - Much cheaper than tick data
- ✅ Cache and reuse data - Don't fetch twice
- ✅ Be selective with symbols - Fewer symbols = lower cost
- ✅ Test with small ranges - Validate before large requests
- ✅ Use continuous contracts - One symbol instead of many
- ✅ Monitor usage - Track spending, set limits