Files
gh-nice-wolf-studio-wolf-sk…/skills/databento/references/schemas.md
2025-11-30 08:43:40 +08:00

13 KiB

Databento Schema Reference

Comprehensive documentation of Databento schemas with field-level details, data types, and usage guidance.

Schema Overview

Databento provides 12+ schema types representing different granularity levels of market data. All schemas share common timestamp fields for consistency.

Common Fields (All Schemas)

Every schema includes these timestamp fields:

Field Type Description Unit
ts_event uint64 Event timestamp from venue Nanoseconds (Unix epoch)
ts_recv uint64 Databento gateway receipt time Nanoseconds (Unix epoch)

Important: Databento provides up to 4 timestamps per event for sub-microsecond accuracy.

OHLCV Schemas

Candlestick/bar data at various time intervals.

ohlcv-1s (1 Second Bars)

ohlcv-1m (1 Minute Bars)

ohlcv-1h (1 Hour Bars)

ohlcv-1d (Daily Bars)

ohlcv-eod (End of Day)

Common OHLCV Fields:

Field Type Description Unit
open int64 Opening price Fixed-point (divide by 1e9 for decimal)
high int64 Highest price Fixed-point (divide by 1e9 for decimal)
low int64 Lowest price Fixed-point (divide by 1e9 for decimal)
close int64 Closing price Fixed-point (divide by 1e9 for decimal)
volume uint64 Total volume Contracts/shares

When to Use:

  • 1h/1d: Historical backtesting, multi-day analysis
  • 1m: Intraday strategy development
  • 1s: High-frequency analysis (use batch for large ranges)
  • eod: Long-term investment analysis

Pricing Format: Prices are in fixed-point notation. To convert to decimal:

decimal_price = int64_price / 1_000_000_000

For ES futures at 4500.00, the value would be stored as 4500000000000.

Trades Schema

Individual trade executions with price, size, and side information.

Field Type Description Values
price int64 Trade execution price Fixed-point (÷ 1e9)
size uint32 Trade size Contracts/shares
action char Trade action 'T' = trade, 'C' = cancel
side char Aggressor side 'B' = buy, 'S' = sell, 'N' = none
flags uint8 Trade flags Bitmask
depth uint8 Depth level Usually 0
ts_in_delta int32 Time delta Nanoseconds
sequence uint32 Sequence number Venue-specific

When to Use:

  • Intraday order flow analysis
  • Tick-by-tick backtesting
  • Market microstructure research
  • Volume profile analysis

Aggressor Side:

  • B = Buy-side aggressor (market buy hit the ask)
  • S = Sell-side aggressor (market sell hit the bid)
  • N = Cannot be determined or not applicable

Important: For multi-day tick data, use batch downloads. Trades can generate millions of records per day.

MBP-1 Schema (Market By Price - Top of Book)

Level 1 order book data showing best bid and ask.

Field Type Description Values
price int64 Reference price (usually last trade) Fixed-point (÷ 1e9)
size uint32 Reference size Contracts/shares
action char Book action 'A' = add, 'C' = cancel, 'M' = modify, 'T' = trade
side char Order side 'B' = bid, 'A' = ask, 'N' = none
flags uint8 Flags Bitmask
depth uint8 Depth level Always 0 for MBP-1
ts_in_delta int32 Time delta Nanoseconds
sequence uint32 Sequence number Venue-specific
bid_px_00 int64 Best bid price Fixed-point (÷ 1e9)
ask_px_00 int64 Best ask price Fixed-point (÷ 1e9)
bid_sz_00 uint32 Best bid size Contracts/shares
ask_sz_00 uint32 Best ask size Contracts/shares
bid_ct_00 uint32 Bid order count Number of orders
ask_ct_00 uint32 Ask order count Number of orders

When to Use:

  • Bid/ask spread analysis
  • Liquidity analysis
  • Market microstructure studies
  • Quote-based strategies

Key Metrics:

spread = ask_px_00 - bid_px_00
mid_price = (bid_px_00 + ask_px_00) / 2
bid_ask_imbalance = (bid_sz_00 - ask_sz_00) / (bid_sz_00 + ask_sz_00)

MBP-10 Schema (Market By Price - 10 Levels)

Level 2 order book data showing 10 levels of depth.

Fields: Same as MBP-1, plus 9 additional levels:

  • bid_px_01 through bid_px_09 (10 bid levels)
  • ask_px_01 through ask_px_09 (10 ask levels)
  • bid_sz_01 through bid_sz_09
  • ask_sz_01 through ask_sz_09
  • bid_ct_01 through bid_ct_09
  • ask_ct_01 through ask_ct_09

When to Use:

  • Order book depth analysis
  • Liquidity beyond top of book
  • Order flow imbalance at multiple levels
  • Market impact modeling

Important: MBP-10 generates significantly more data than MBP-1. Use batch downloads for multi-day requests.

MBO Schema (Market By Order)

Level 3 order-level data with individual order IDs - most granular.

Field Type Description Values
order_id uint64 Unique order ID Venue-specific
price int64 Order price Fixed-point (÷ 1e9)
size uint32 Order size Contracts/shares
flags uint8 Flags Bitmask
channel_id uint8 Channel ID Venue-specific
action char Order action 'A' = add, 'C' = cancel, 'M' = modify, 'F' = fill, 'T' = trade
side char Order side 'B' = bid, 'A' = ask, 'N' = none
ts_in_delta int32 Time delta Nanoseconds
sequence uint32 Sequence number Venue-specific

When to Use:

  • Highest granularity order flow analysis
  • Order-level reconstructions
  • Advanced market microstructure research
  • Queue position analysis

Important: MBO data is extremely granular and generates massive datasets. Always use batch downloads and carefully check costs.

Definition Schema

Instrument metadata and definitions.

Field Type Description
ts_recv uint64 Receipt timestamp
min_price_increment int64 Minimum tick size
display_factor int64 Display factor for prices
expiration uint64 Contract expiration timestamp
activation uint64 Contract activation timestamp
high_limit_price int64 Upper price limit
low_limit_price int64 Lower price limit
max_price_variation int64 Maximum price move
trading_reference_price int64 Reference price
unit_of_measure_qty int64 Contract size
min_price_increment_amount int64 Tick value
price_ratio int64 Price ratio
inst_attrib_value int32 Instrument attributes
underlying_id uint32 Underlying instrument ID
raw_instrument_id uint32 Raw instrument ID
market_depth_implied int32 Implied depth
market_depth int32 Market depth
market_segment_id uint32 Market segment
max_trade_vol uint32 Maximum trade volume
min_lot_size int32 Minimum lot size
min_lot_size_block int32 Block trade minimum
min_lot_size_round_lot int32 Round lot minimum
min_trade_vol uint32 Minimum trade volume
contract_multiplier int32 Contract multiplier
decay_quantity int32 Decay quantity
original_contract_size int32 Original size
trading_reference_date uint16 Reference date
appl_id int16 Application ID
maturity_year uint16 Year
decay_start_date uint16 Decay start
channel_id uint16 Channel
currency string Currency code
settl_currency string Settlement currency
secsubtype string Security subtype
raw_symbol string Raw symbol
group string Instrument group
exchange string Exchange code
asset string Asset class
cfi string CFI code
security_type string Security type
unit_of_measure string Unit of measure
underlying string Underlying symbol
strike_price_currency string Strike currency
instrument_class char Class
strike_price int64 Strike price (options)
match_algorithm char Matching algorithm
md_security_trading_status uint8 Trading status
main_fraction uint8 Main fraction
price_display_format uint8 Display format
settl_price_type uint8 Settlement type
sub_fraction uint8 Sub fraction
underlying_product uint8 Underlying product
security_update_action char Update action
maturity_month uint8 Month
maturity_day uint8 Day
maturity_week uint8 Week
user_defined_instrument char User-defined
contract_multiplier_unit int8 Multiplier unit
flow_schedule_type int8 Flow schedule
tick_rule uint8 Tick rule

When to Use:

  • Understanding instrument specifications
  • Calculating tick values
  • Contract expiration management
  • Symbol resolution and mapping

Key Fields for ES/NQ:

  • min_price_increment: Tick size (0.25 for ES, 0.25 for NQ)
  • expiration: Contract expiration timestamp
  • raw_symbol: Exchange symbol
  • contract_multiplier: Usually 50 for ES, 20 for NQ

Statistics Schema

Market statistics and calculated metrics.

Field Type Description
ts_recv uint64 Receipt timestamp
ts_ref uint64 Reference timestamp
price int64 Reference price
quantity int64 Reference quantity
sequence uint32 Sequence number
ts_in_delta int32 Time delta
stat_type uint16 Statistic type
channel_id uint16 Channel ID
update_action uint8 Update action
stat_flags uint8 Statistic flags

Common Statistic Types:

  • Opening price
  • Settlement price
  • High/low prices
  • Trading volume
  • Open interest

When to Use:

  • Official settlement prices
  • Open interest analysis
  • Exchange-calculated statistics

Status Schema

Instrument trading status and state changes.

Field Type Description
ts_recv uint64 Receipt timestamp
ts_event uint64 Event timestamp
action uint16 Status action
reason uint16 Status reason
trading_event uint16 Trading event
is_trading int8 Trading flag (1 = trading, 0 = not trading)
is_quoting int8 Quoting flag
is_short_sell_restricted int8 Short sell flag

When to Use:

  • Detecting trading halts
  • Understanding market status changes
  • Filtering data by trading status

Imbalance Schema

Order imbalance data for auctions and closes.

Field Type Description
ts_recv uint64 Receipt timestamp
ts_event uint64 Event timestamp
ref_price int64 Reference price
auction_time uint64 Auction timestamp
cont_book_clr_price int64 Continuous book clearing price
auct_interest_clr_price int64 Auction interest clearing price
paired_qty uint64 Paired quantity
total_imbalance_qty uint64 Total imbalance
side char Imbalance side ('B' or 'A')
significant_imbalance char Significance flag

When to Use:

  • Opening/closing auction analysis
  • Imbalance trading strategies
  • End-of-day positioning

Schema Selection Decision Matrix

Analysis Type Recommended Schema Alternative
Daily backtesting ohlcv-1d ohlcv-1h
Intraday backtesting ohlcv-1h, ohlcv-1m trades
Spread analysis mbp-1 trades
Order flow trades mbp-1
Market depth mbp-10 mbo
Tick-by-tick trades mbo
Liquidity analysis mbp-1, mbp-10 mbo
Contract specifications definition -
Settlement prices statistics definition
Trading halts status -
Auction analysis imbalance trades

Data Type Reference

Fixed-Point Prices

All price fields are stored as int64 in fixed-point notation with 9 decimal places of precision.

Conversion:

decimal_price = int64_price / 1_000_000_000

Example:

  • ES at 4500.25 → stored as 4500250000000
  • NQ at 15000.50 → stored as 15000500000000

Timestamps

All timestamps are uint64 nanoseconds since Unix epoch (1970-01-01 00:00:00 UTC).

Conversion to datetime:

import datetime
dt = datetime.datetime.fromtimestamp(ts_event / 1_000_000_000, tz=datetime.timezone.utc)

Character Fields

Single-character fields (char) represent enums:

  • Action: 'A' (add), 'C' (cancel), 'M' (modify), 'T' (trade), 'F' (fill)
  • Side: 'B' (bid), 'A' (ask), 'N' (none/unknown)

Performance Considerations

Schema Size (Approximate bytes per record)

Schema Size Records/GB
ohlcv-1d ~100 ~10M
ohlcv-1h ~100 ~10M
trades ~50 ~20M
mbp-1 ~150 ~6.7M
mbp-10 ~500 ~2M
mbo ~80 ~12.5M

Planning requests:

  • 1 day of ES trades ≈ 100K-500K records ≈ 5-25 MB
  • 1 day of ES mbp-1 ≈ 1M-5M records ≈ 150-750 MB
  • 1 year of ES ohlcv-1h ≈ 6K records ≈ 600 KB

Use these estimates to decide between timeseries (< 5GB) and batch downloads (> 5GB).