Files
gh-nice-wolf-studio-wolf-sk…/skills/databento/references/cost-optimization.md
2025-11-30 08:43:40 +08:00

11 KiB

Databento Cost Optimization Guide

Strategies and best practices for minimizing costs when working with Databento market data.

Databento Pricing Model

Cost Components

  1. Databento Usage Fees - Pay-per-use or subscription
  2. Exchange License Fees - Venue-dependent (varies by exchange)
  3. Data Volume - Amount of data retrieved

Pricing Tiers

Free Credits:

  • $125 free credits for new users
  • Good for initial development and testing

Usage-Based:

  • Pay only for data you use
  • Varies by venue and data type
  • No minimum commitment

Subscriptions:

  • Basic Plan: $199/month
  • Corporate Actions/Security Master: $299/month
  • Flat-rate access to specific datasets

Cost Estimation (ALWAYS Do This First)

Use metadata_get_cost Before Every Request

Always estimate cost before fetching data:

mcp__databento__metadata_get_cost(
    dataset="GLBX.MDP3",
    start="2024-01-01",
    end="2024-01-31",
    symbols="ES.c.0",
    schema="ohlcv-1h"
)

Returns:

  • Estimated cost in USD
  • Data size estimate
  • Helps decide if request is reasonable

When Cost Checks Matter Most

  1. Multi-day tick data - Can be expensive
  2. Multiple symbols - Costs multiply
  3. High-granularity schemas - trades, mbp-1, mbo
  4. Long date ranges - Weeks or months of data

Example Cost Check:

# Cheap: 1 month of daily bars
cost_check(schema="ohlcv-1d", start="2024-01-01", end="2024-01-31")
# Estimated: $0.10

# Expensive: 1 month of tick trades
cost_check(schema="trades", start="2024-01-01", end="2024-01-31")
# Estimated: $50-$200 (depends on volume)

Historical Data (T+1) - No Licensing Required

Key Insight: Historical data that is 24+ hours old (T+1) does not require exchange licensing fees.

Cost Breakdown

Live/Recent Data (< 24 hours):

  • Databento fees + Exchange licensing fees

Historical Data (24+ hours old):

  • Databento fees only (no exchange licensing)
  • Significantly cheaper

Optimization Strategy

For Development:

  • Use T+1 data for strategy development
  • Switch to live data only for production

For Backtesting:

  • Always use historical (T+1) data
  • Much more cost-effective
  • Same data quality

Example:

# Expensive: Yesterday's data (< 24 hours)
start="2024-11-05"  # Requires licensing

# Cheap: 3 days ago (> 24 hours)
start="2024-11-03"  # No licensing required

Schema Selection for Cost

Different schemas have vastly different costs due to data volume.

Schema Cost Hierarchy (Cheapest to Most Expensive)

  1. ohlcv-1d (Cheapest)

    • ~100 bytes per record
    • ~250 records per symbol per year
    • Best for: Long-term backtesting
  2. ohlcv-1h

    • ~100 bytes per record
    • ~6,000 records per symbol per year
    • Best for: Multi-day backtesting
  3. ohlcv-1m

    • ~100 bytes per record
    • ~360,000 records per symbol per year
    • Best for: Intraday strategies
  4. trades

    • ~50 bytes per record
    • ~100K-500K records per symbol per day (ES/NQ)
    • Best for: Tick analysis (use selectively)
  5. mbp-1

    • ~150 bytes per record
    • ~1M-5M records per symbol per day
    • Best for: Order flow analysis (use selectively)
  6. mbp-10

    • ~500 bytes per record
    • ~1M-5M records per symbol per day
    • Best for: Deep order book analysis (expensive!)
  7. mbo (Most Expensive)

    • ~80 bytes per record
    • ~5M-20M records per symbol per day
    • Best for: Order-level research (very expensive!)

Cost Optimization Strategy

Start with lower granularity:

  1. Develop strategy with ohlcv-1h or ohlcv-1d
  2. Validate with ohlcv-1m if needed
  3. Only use trades/mbp-1 if absolutely necessary
  4. Avoid mbp-10/mbo unless essential

Example:

# Cheap: Daily bars for 1 year
schema="ohlcv-1d"
start="2023-01-01"
end="2023-12-31"
# Cost: < $1

# Expensive: Trades for 1 year
schema="trades"
start="2023-01-01"
end="2023-12-31"
# Cost: $500-$2000 (depending on venue)

Symbol Selection

Fewer symbols = lower cost. Be selective.

Strategies

1. Start with Single Symbol

# Development
symbols="ES.c.0"  # Just ES

# After validation, expand
symbols="ES.c.0,NQ.c.0"  # Add NQ

2. Use Continuous Contracts

# Good: Single continuous contract
symbols="ES.c.0"  # Covers all front months

# Wasteful: Multiple specific contracts
symbols="ESH5,ESM5,ESU5,ESZ5"  # Same data, 4x cost

3. Avoid Symbol Wildcards

# Expensive: All instruments
symbols="*"  # Don't do this!

# Targeted: Just what you need
symbols="ES.c.0,NQ.c.0"  # Explicit

Date Range Optimization

Request only the data you need.

Strategies

1. Iterative Refinement

# First: Test with small range
start="2024-01-01"
end="2024-01-07"  # Just 1 week

# Then: Expand after validation
start="2024-01-01"
end="2024-12-31"  # Full year

2. Segment Long Ranges

# Instead of: 5 years at once
start="2019-01-01"
end="2024-12-31"

# Do: Segment by year
start="2024-01-01"
end="2024-12-31"
# Process, then request next year if needed

3. Use Limit for Testing

# Test with small limit first
limit=100  # Just 100 records

# After validation, increase or remove
limit=10000  # Larger sample

Batch vs Timeseries Selection

Choose the right tool for the job.

Timeseries (< 5GB)

When to use:

  • Small to medium datasets
  • Quick exploration
  • <= 1 day of tick data
  • Any OHLCV data

Benefits:

  • Immediate results
  • No job management
  • Direct response

Costs:

  • Same per-record cost as batch

Batch Downloads (> 5GB)

When to use:

  • Large datasets (> 5GB)
  • Multi-day tick data
  • Multiple symbols over long periods
  • Production data pipelines

Benefits:

  • More efficient for large data
  • Can split output files
  • Asynchronous processing

Costs:

  • Same per-record cost as timeseries
  • No additional fees for batch processing

Decision Matrix

Data Type Date Range Method
ohlcv-1h 1 year Timeseries
ohlcv-1d Any Timeseries
trades 1 day Timeseries
trades 1 week+ Batch
mbp-1 1 day Batch (safer)
mbp-1 1 week+ Batch

DBEQ Bundle - Zero Exchange Fees

Databento offers a special bundle for US equities with $0 exchange fees.

DBEQ.BASIC Dataset

Coverage:

  • US equity securities
  • Zero licensing fees
  • Databento usage fees only

When to use:

  • Equity market breadth for ES/NQ analysis
  • Testing equity strategies
  • Learning market data APIs

Example:

# Regular equity dataset (has exchange fees)
dataset="XNAS.ITCH"
# Cost: Databento + Nasdaq fees

# DBEQ bundle (no exchange fees)
dataset="DBEQ.BASIC"
# Cost: Databento fees only

Caching and Reuse

Don't fetch the same data multiple times.

Strategies

1. Cache Locally

# First request: Fetch and save
data = fetch_data(...)
save_to_disk(data, "ES_2024_ohlcv1h.csv")

# Subsequent runs: Load from disk
data = load_from_disk("ES_2024_ohlcv1h.csv")

2. Incremental Updates

# Initial: Fetch full history
start="2023-01-01"
end="2024-01-01"

# Later: Fetch only new data
start="2024-01-01"  # Resume from last fetch
end="2024-12-31"

3. Share Data Across Analyses

# Fetch once
historical_data = fetch_data(schema="ohlcv-1h", ...)

# Use multiple times
backtest_strategy_a(historical_data)
backtest_strategy_b(historical_data)
backtest_strategy_c(historical_data)

Session-Based Analysis

For ES/NQ, consider filtering by trading session to reduce data volume.

Sessions

  • Asian Session: 6pm-2am ET
  • London Session: 2am-8am ET
  • New York Session: 8am-4pm ET

Cost Benefit

Full 24-hour data:

  • Maximum data volume
  • Higher cost

Session-filtered data:

  • 1/3 to 1/2 the volume
  • Lower cost
  • May be sufficient for analysis

Example:

# Expensive: Full 24-hour data
# Process all records

# Cheaper: NY session only
# Filter records to 8am-4pm ET
# ~1/3 the data volume

Use scripts/session_filter.py to filter post-fetch, or request only specific hours.

Monitoring Usage

Track your usage to avoid surprises.

Check Dashboard

  • Databento provides usage dashboard
  • Monitor monthly spend
  • Set alerts for limits

Set Monthly Limits

# In account settings
monthly_limit=$500

Review Costs Regularly

  • Check cost estimates vs actual
  • Identify expensive queries
  • Adjust strategies

Cost Optimization Checklist

Before every data request:

  • Estimate cost first - Use metadata_get_cost
  • Use T+1 data - Avoid < 24 hour data unless necessary
  • Choose lowest granularity schema - Start with ohlcv, not trades
  • Minimize symbols - Only request what you need
  • Limit date range - Test with small range first
  • Use continuous contracts - Avoid requesting multiple months
  • Cache locally - Don't re-fetch same data
  • Consider DBEQ - Use zero-fee dataset when applicable
  • Filter by session - Reduce volume if session-specific
  • Use batch for large data - More efficient for > 5GB

Cost Examples

Cheap Requests (< $1)

# Daily bars for 1 year
dataset="GLBX.MDP3"
symbols="ES.c.0"
schema="ohlcv-1d"
start="2023-01-01"
end="2023-12-31"
# Estimated cost: $0.10

Moderate Requests ($1-$10)

# Hourly bars for 1 year
dataset="GLBX.MDP3"
symbols="ES.c.0,NQ.c.0"
schema="ohlcv-1h"
start="2023-01-01"
end="2023-12-31"
# Estimated cost: $2-5

Expensive Requests ($10-$100)

# Trades for 1 month
dataset="GLBX.MDP3"
symbols="ES.c.0"
schema="trades"
start="2024-01-01"
end="2024-01-31"
# Estimated cost: $20-50

Very Expensive Requests ($100+)

# MBP-10 for 1 month
dataset="GLBX.MDP3"
symbols="ES.c.0,NQ.c.0"
schema="mbp-10"
start="2024-01-01"
end="2024-01-31"
# Estimated cost: $200-500

Free Credit Strategy

Make the most of your $125 free credits:

  1. Development Phase - Use free credits for:

    • Testing API integration
    • Small-scale strategy development
    • Learning the platform
  2. Prioritize T+1 Data - Stretch credits further:

    • Avoid real-time data during development
    • Use historical data (no licensing fees)
  3. Start with OHLCV - Cheapest data:

    • Develop strategy with daily/hourly bars
    • Validate before moving to tick data
  4. Cache Everything - Don't waste credits:

    • Save all fetched data locally
    • Reuse for multiple analyses
  5. Monitor Remaining Balance:

    • Check credit usage regularly
    • Adjust requests to stay within budget

Summary

Most Important Cost-Saving Strategies:

  1. Always check cost first - Use metadata_get_cost
  2. Use T+1 data - 24+ hours old, no licensing fees
  3. Start with OHLCV schemas - Much cheaper than tick data
  4. Cache and reuse data - Don't fetch twice
  5. Be selective with symbols - Fewer symbols = lower cost
  6. Test with small ranges - Validate before large requests
  7. Use continuous contracts - One symbol instead of many
  8. Monitor usage - Track spending, set limits