zhongwei/gh-nice-wolf-studio-wolf-skills-marketplace-integrations

Fork 0

Files

Zhongwei Li d6cdda3f30 Initial commit

2025-11-30 08:43:40 +08:00

11 KiB

Raw Blame History

Databento Cost Optimization Guide

Strategies and best practices for minimizing costs when working with Databento market data.

Databento Pricing Model

Cost Components

Databento Usage Fees - Pay-per-use or subscription
Exchange License Fees - Venue-dependent (varies by exchange)
Data Volume - Amount of data retrieved

Pricing Tiers

Free Credits:

$125 free credits for new users
Good for initial development and testing

Usage-Based:

Pay only for data you use
Varies by venue and data type
No minimum commitment

Subscriptions:

Basic Plan: $199/month
Corporate Actions/Security Master: $299/month
Flat-rate access to specific datasets

Cost Estimation (ALWAYS Do This First)

Use metadata_get_cost Before Every Request

Always estimate cost before fetching data:

mcp__databento__metadata_get_cost(
    dataset="GLBX.MDP3",
    start="2024-01-01",
    end="2024-01-31",
    symbols="ES.c.0",
    schema="ohlcv-1h"
)

Returns:

Estimated cost in USD
Data size estimate
Helps decide if request is reasonable

When Cost Checks Matter Most

Multi-day tick data - Can be expensive
Multiple symbols - Costs multiply
High-granularity schemas - trades, mbp-1, mbo
Long date ranges - Weeks or months of data

Example Cost Check:

# Cheap: 1 month of daily bars
cost_check(schema="ohlcv-1d", start="2024-01-01", end="2024-01-31")
# Estimated: $0.10

# Expensive: 1 month of tick trades
cost_check(schema="trades", start="2024-01-01", end="2024-01-31")
# Estimated: $50-$200 (depends on volume)

Historical Data (T+1) - No Licensing Required

Key Insight: Historical data that is 24+ hours old (T+1) does not require exchange licensing fees.

Cost Breakdown

Live/Recent Data (< 24 hours):

Databento fees + Exchange licensing fees

Historical Data (24+ hours old):

Databento fees only (no exchange licensing)
Significantly cheaper

Optimization Strategy

For Development:

Use T+1 data for strategy development
Switch to live data only for production

For Backtesting:

Always use historical (T+1) data
Much more cost-effective
Same data quality

Example:

# Expensive: Yesterday's data (< 24 hours)
start="2024-11-05"  # Requires licensing

# Cheap: 3 days ago (> 24 hours)
start="2024-11-03"  # No licensing required

Schema Selection for Cost

Different schemas have vastly different costs due to data volume.

Schema Cost Hierarchy (Cheapest to Most Expensive)

ohlcv-1d (Cheapest)
- ~100 bytes per record
- ~250 records per symbol per year
- Best for: Long-term backtesting
ohlcv-1h
- ~100 bytes per record
- ~6,000 records per symbol per year
- Best for: Multi-day backtesting
ohlcv-1m
- ~100 bytes per record
- ~360,000 records per symbol per year
- Best for: Intraday strategies
trades
- ~50 bytes per record
- ~100K-500K records per symbol per day (ES/NQ)
- Best for: Tick analysis (use selectively)
mbp-1
- ~150 bytes per record
- ~1M-5M records per symbol per day
- Best for: Order flow analysis (use selectively)
mbp-10
- ~500 bytes per record
- ~1M-5M records per symbol per day
- Best for: Deep order book analysis (expensive!)
mbo (Most Expensive)
- ~80 bytes per record
- ~5M-20M records per symbol per day
- Best for: Order-level research (very expensive!)

Cost Optimization Strategy

Start with lower granularity:

Develop strategy with ohlcv-1h or ohlcv-1d
Validate with ohlcv-1m if needed
Only use trades/mbp-1 if absolutely necessary
Avoid mbp-10/mbo unless essential

Example:

# Cheap: Daily bars for 1 year
schema="ohlcv-1d"
start="2023-01-01"
end="2023-12-31"
# Cost: < $1

# Expensive: Trades for 1 year
schema="trades"
start="2023-01-01"
end="2023-12-31"
# Cost: $500-$2000 (depending on venue)

Symbol Selection

Fewer symbols = lower cost. Be selective.

Strategies

1. Start with Single Symbol

# Development
symbols="ES.c.0"  # Just ES

# After validation, expand
symbols="ES.c.0,NQ.c.0"  # Add NQ

2. Use Continuous Contracts

# Good: Single continuous contract
symbols="ES.c.0"  # Covers all front months

# Wasteful: Multiple specific contracts
symbols="ESH5,ESM5,ESU5,ESZ5"  # Same data, 4x cost

3. Avoid Symbol Wildcards

# Expensive: All instruments
symbols="*"  # Don't do this!

# Targeted: Just what you need
symbols="ES.c.0,NQ.c.0"  # Explicit

Date Range Optimization

Request only the data you need.

Strategies

1. Iterative Refinement

# First: Test with small range
start="2024-01-01"
end="2024-01-07"  # Just 1 week

# Then: Expand after validation
start="2024-01-01"
end="2024-12-31"  # Full year

2. Segment Long Ranges

# Instead of: 5 years at once
start="2019-01-01"
end="2024-12-31"

# Do: Segment by year
start="2024-01-01"
end="2024-12-31"
# Process, then request next year if needed

3. Use Limit for Testing

# Test with small limit first
limit=100  # Just 100 records

# After validation, increase or remove
limit=10000  # Larger sample

Batch vs Timeseries Selection

Choose the right tool for the job.

Timeseries (< 5GB)

When to use:

Small to medium datasets
Quick exploration
<= 1 day of tick data
Any OHLCV data

Benefits:

Immediate results
No job management
Direct response

Costs:

Same per-record cost as batch

Batch Downloads (> 5GB)

When to use:

Large datasets (> 5GB)
Multi-day tick data
Multiple symbols over long periods
Production data pipelines

Benefits:

More efficient for large data
Can split output files
Asynchronous processing

Costs:

Same per-record cost as timeseries
No additional fees for batch processing

Decision Matrix

Data Type	Date Range	Method
ohlcv-1h	1 year	Timeseries
ohlcv-1d	Any	Timeseries
trades	1 day	Timeseries
trades	1 week+	Batch
mbp-1	1 day	Batch (safer)
mbp-1	1 week+	Batch

DBEQ Bundle - Zero Exchange Fees

Databento offers a special bundle for US equities with $0 exchange fees.

DBEQ.BASIC Dataset

Coverage:

US equity securities
Zero licensing fees
Databento usage fees only

When to use:

Equity market breadth for ES/NQ analysis
Testing equity strategies
Learning market data APIs

Example:

# Regular equity dataset (has exchange fees)
dataset="XNAS.ITCH"
# Cost: Databento + Nasdaq fees

# DBEQ bundle (no exchange fees)
dataset="DBEQ.BASIC"
# Cost: Databento fees only

Caching and Reuse

Don't fetch the same data multiple times.

Strategies

1. Cache Locally

# First request: Fetch and save
data = fetch_data(...)
save_to_disk(data, "ES_2024_ohlcv1h.csv")

# Subsequent runs: Load from disk
data = load_from_disk("ES_2024_ohlcv1h.csv")

2. Incremental Updates

# Initial: Fetch full history
start="2023-01-01"
end="2024-01-01"

# Later: Fetch only new data
start="2024-01-01"  # Resume from last fetch
end="2024-12-31"

3. Share Data Across Analyses

# Fetch once
historical_data = fetch_data(schema="ohlcv-1h", ...)

# Use multiple times
backtest_strategy_a(historical_data)
backtest_strategy_b(historical_data)
backtest_strategy_c(historical_data)

Session-Based Analysis

For ES/NQ, consider filtering by trading session to reduce data volume.

Sessions

Asian Session: 6pm-2am ET
London Session: 2am-8am ET
New York Session: 8am-4pm ET

Cost Benefit

Full 24-hour data:

Maximum data volume
Higher cost

Session-filtered data:

1/3 to 1/2 the volume
Lower cost
May be sufficient for analysis

Example:

# Expensive: Full 24-hour data
# Process all records

# Cheaper: NY session only
# Filter records to 8am-4pm ET
# ~1/3 the data volume

Use scripts/session_filter.py to filter post-fetch, or request only specific hours.

Monitoring Usage

Track your usage to avoid surprises.

Check Dashboard

Databento provides usage dashboard
Monitor monthly spend
Set alerts for limits

Set Monthly Limits

# In account settings
monthly_limit=$500

Review Costs Regularly

Check cost estimates vs actual
Identify expensive queries
Adjust strategies

Cost Optimization Checklist

Before every data request:

Estimate cost first - Use metadata_get_cost
Use T+1 data - Avoid < 24 hour data unless necessary
Choose lowest granularity schema - Start with ohlcv, not trades
Minimize symbols - Only request what you need
Limit date range - Test with small range first
Use continuous contracts - Avoid requesting multiple months
Cache locally - Don't re-fetch same data
Consider DBEQ - Use zero-fee dataset when applicable
Filter by session - Reduce volume if session-specific
Use batch for large data - More efficient for > 5GB

Cost Examples

Cheap Requests (< $1)

# Daily bars for 1 year
dataset="GLBX.MDP3"
symbols="ES.c.0"
schema="ohlcv-1d"
start="2023-01-01"
end="2023-12-31"
# Estimated cost: $0.10

Moderate Requests ($1-$10)

# Hourly bars for 1 year
dataset="GLBX.MDP3"
symbols="ES.c.0,NQ.c.0"
schema="ohlcv-1h"
start="2023-01-01"
end="2023-12-31"
# Estimated cost: $2-5

Expensive Requests ($10-$100)

# Trades for 1 month
dataset="GLBX.MDP3"
symbols="ES.c.0"
schema="trades"
start="2024-01-01"
end="2024-01-31"
# Estimated cost: $20-50

Very Expensive Requests ($100+)

# MBP-10 for 1 month
dataset="GLBX.MDP3"
symbols="ES.c.0,NQ.c.0"
schema="mbp-10"
start="2024-01-01"
end="2024-01-31"
# Estimated cost: $200-500

Free Credit Strategy

Make the most of your $125 free credits:

Development Phase - Use free credits for:
- Testing API integration
- Small-scale strategy development
- Learning the platform
Prioritize T+1 Data - Stretch credits further:
- Avoid real-time data during development
- Use historical data (no licensing fees)
Start with OHLCV - Cheapest data:
- Develop strategy with daily/hourly bars
- Validate before moving to tick data
Cache Everything - Don't waste credits:
- Save all fetched data locally
- Reuse for multiple analyses
Monitor Remaining Balance:
- Check credit usage regularly
- Adjust requests to stay within budget

Summary

Most Important Cost-Saving Strategies:

✅ Always check cost first - Use metadata_get_cost
✅ Use T+1 data - 24+ hours old, no licensing fees
✅ Start with OHLCV schemas - Much cheaper than tick data
✅ Cache and reuse data - Don't fetch twice
✅ Be selective with symbols - Fewer symbols = lower cost
✅ Test with small ranges - Validate before large requests
✅ Use continuous contracts - One symbol instead of many
✅ Monitor usage - Track spending, set limits

11 KiB Raw Blame History

Databento Cost Optimization Guide

Databento Pricing Model

Cost Components

Pricing Tiers

Cost Estimation (ALWAYS Do This First)

Use metadata_get_cost Before Every Request

When Cost Checks Matter Most

Historical Data (T+1) - No Licensing Required

Cost Breakdown

Optimization Strategy

Schema Selection for Cost

Schema Cost Hierarchy (Cheapest to Most Expensive)

Cost Optimization Strategy

Symbol Selection

Strategies

Date Range Optimization

Strategies

Batch vs Timeseries Selection

Timeseries (< 5GB)

Batch Downloads (> 5GB)

Decision Matrix

DBEQ Bundle - Zero Exchange Fees

DBEQ.BASIC Dataset

Caching and Reuse

Strategies

Session-Based Analysis

Sessions

Cost Benefit

Monitoring Usage

Check Dashboard

Set Monthly Limits

Review Costs Regularly

Cost Optimization Checklist

Cost Examples

Cheap Requests (< $1)

Moderate Requests ($1-$10)

Expensive Requests ($10-$100)

Very Expensive Requests ($100+)

Free Credit Strategy

Summary

11 KiB

Raw Blame History