13 KiB
Databento Schema Reference
Comprehensive documentation of Databento schemas with field-level details, data types, and usage guidance.
Schema Overview
Databento provides 12+ schema types representing different granularity levels of market data. All schemas share common timestamp fields for consistency.
Common Fields (All Schemas)
Every schema includes these timestamp fields:
| Field | Type | Description | Unit |
|---|---|---|---|
ts_event |
uint64 | Event timestamp from venue | Nanoseconds (Unix epoch) |
ts_recv |
uint64 | Databento gateway receipt time | Nanoseconds (Unix epoch) |
Important: Databento provides up to 4 timestamps per event for sub-microsecond accuracy.
OHLCV Schemas
Candlestick/bar data at various time intervals.
ohlcv-1s (1 Second Bars)
ohlcv-1m (1 Minute Bars)
ohlcv-1h (1 Hour Bars)
ohlcv-1d (Daily Bars)
ohlcv-eod (End of Day)
Common OHLCV Fields:
| Field | Type | Description | Unit |
|---|---|---|---|
open |
int64 | Opening price | Fixed-point (divide by 1e9 for decimal) |
high |
int64 | Highest price | Fixed-point (divide by 1e9 for decimal) |
low |
int64 | Lowest price | Fixed-point (divide by 1e9 for decimal) |
close |
int64 | Closing price | Fixed-point (divide by 1e9 for decimal) |
volume |
uint64 | Total volume | Contracts/shares |
When to Use:
- 1h/1d: Historical backtesting, multi-day analysis
- 1m: Intraday strategy development
- 1s: High-frequency analysis (use batch for large ranges)
- eod: Long-term investment analysis
Pricing Format: Prices are in fixed-point notation. To convert to decimal:
decimal_price = int64_price / 1_000_000_000
For ES futures at 4500.00, the value would be stored as 4500000000000.
Trades Schema
Individual trade executions with price, size, and side information.
| Field | Type | Description | Values |
|---|---|---|---|
price |
int64 | Trade execution price | Fixed-point (÷ 1e9) |
size |
uint32 | Trade size | Contracts/shares |
action |
char | Trade action | 'T' = trade, 'C' = cancel |
side |
char | Aggressor side | 'B' = buy, 'S' = sell, 'N' = none |
flags |
uint8 | Trade flags | Bitmask |
depth |
uint8 | Depth level | Usually 0 |
ts_in_delta |
int32 | Time delta | Nanoseconds |
sequence |
uint32 | Sequence number | Venue-specific |
When to Use:
- Intraday order flow analysis
- Tick-by-tick backtesting
- Market microstructure research
- Volume profile analysis
Aggressor Side:
B= Buy-side aggressor (market buy hit the ask)S= Sell-side aggressor (market sell hit the bid)N= Cannot be determined or not applicable
Important: For multi-day tick data, use batch downloads. Trades can generate millions of records per day.
MBP-1 Schema (Market By Price - Top of Book)
Level 1 order book data showing best bid and ask.
| Field | Type | Description | Values |
|---|---|---|---|
price |
int64 | Reference price (usually last trade) | Fixed-point (÷ 1e9) |
size |
uint32 | Reference size | Contracts/shares |
action |
char | Book action | 'A' = add, 'C' = cancel, 'M' = modify, 'T' = trade |
side |
char | Order side | 'B' = bid, 'A' = ask, 'N' = none |
flags |
uint8 | Flags | Bitmask |
depth |
uint8 | Depth level | Always 0 for MBP-1 |
ts_in_delta |
int32 | Time delta | Nanoseconds |
sequence |
uint32 | Sequence number | Venue-specific |
bid_px_00 |
int64 | Best bid price | Fixed-point (÷ 1e9) |
ask_px_00 |
int64 | Best ask price | Fixed-point (÷ 1e9) |
bid_sz_00 |
uint32 | Best bid size | Contracts/shares |
ask_sz_00 |
uint32 | Best ask size | Contracts/shares |
bid_ct_00 |
uint32 | Bid order count | Number of orders |
ask_ct_00 |
uint32 | Ask order count | Number of orders |
When to Use:
- Bid/ask spread analysis
- Liquidity analysis
- Market microstructure studies
- Quote-based strategies
Key Metrics:
spread = ask_px_00 - bid_px_00
mid_price = (bid_px_00 + ask_px_00) / 2
bid_ask_imbalance = (bid_sz_00 - ask_sz_00) / (bid_sz_00 + ask_sz_00)
MBP-10 Schema (Market By Price - 10 Levels)
Level 2 order book data showing 10 levels of depth.
Fields: Same as MBP-1, plus 9 additional levels:
bid_px_01throughbid_px_09(10 bid levels)ask_px_01throughask_px_09(10 ask levels)bid_sz_01throughbid_sz_09ask_sz_01throughask_sz_09bid_ct_01throughbid_ct_09ask_ct_01throughask_ct_09
When to Use:
- Order book depth analysis
- Liquidity beyond top of book
- Order flow imbalance at multiple levels
- Market impact modeling
Important: MBP-10 generates significantly more data than MBP-1. Use batch downloads for multi-day requests.
MBO Schema (Market By Order)
Level 3 order-level data with individual order IDs - most granular.
| Field | Type | Description | Values |
|---|---|---|---|
order_id |
uint64 | Unique order ID | Venue-specific |
price |
int64 | Order price | Fixed-point (÷ 1e9) |
size |
uint32 | Order size | Contracts/shares |
flags |
uint8 | Flags | Bitmask |
channel_id |
uint8 | Channel ID | Venue-specific |
action |
char | Order action | 'A' = add, 'C' = cancel, 'M' = modify, 'F' = fill, 'T' = trade |
side |
char | Order side | 'B' = bid, 'A' = ask, 'N' = none |
ts_in_delta |
int32 | Time delta | Nanoseconds |
sequence |
uint32 | Sequence number | Venue-specific |
When to Use:
- Highest granularity order flow analysis
- Order-level reconstructions
- Advanced market microstructure research
- Queue position analysis
Important: MBO data is extremely granular and generates massive datasets. Always use batch downloads and carefully check costs.
Definition Schema
Instrument metadata and definitions.
| Field | Type | Description |
|---|---|---|
ts_recv |
uint64 | Receipt timestamp |
min_price_increment |
int64 | Minimum tick size |
display_factor |
int64 | Display factor for prices |
expiration |
uint64 | Contract expiration timestamp |
activation |
uint64 | Contract activation timestamp |
high_limit_price |
int64 | Upper price limit |
low_limit_price |
int64 | Lower price limit |
max_price_variation |
int64 | Maximum price move |
trading_reference_price |
int64 | Reference price |
unit_of_measure_qty |
int64 | Contract size |
min_price_increment_amount |
int64 | Tick value |
price_ratio |
int64 | Price ratio |
inst_attrib_value |
int32 | Instrument attributes |
underlying_id |
uint32 | Underlying instrument ID |
raw_instrument_id |
uint32 | Raw instrument ID |
market_depth_implied |
int32 | Implied depth |
market_depth |
int32 | Market depth |
market_segment_id |
uint32 | Market segment |
max_trade_vol |
uint32 | Maximum trade volume |
min_lot_size |
int32 | Minimum lot size |
min_lot_size_block |
int32 | Block trade minimum |
min_lot_size_round_lot |
int32 | Round lot minimum |
min_trade_vol |
uint32 | Minimum trade volume |
contract_multiplier |
int32 | Contract multiplier |
decay_quantity |
int32 | Decay quantity |
original_contract_size |
int32 | Original size |
trading_reference_date |
uint16 | Reference date |
appl_id |
int16 | Application ID |
maturity_year |
uint16 | Year |
decay_start_date |
uint16 | Decay start |
channel_id |
uint16 | Channel |
currency |
string | Currency code |
settl_currency |
string | Settlement currency |
secsubtype |
string | Security subtype |
raw_symbol |
string | Raw symbol |
group |
string | Instrument group |
exchange |
string | Exchange code |
asset |
string | Asset class |
cfi |
string | CFI code |
security_type |
string | Security type |
unit_of_measure |
string | Unit of measure |
underlying |
string | Underlying symbol |
strike_price_currency |
string | Strike currency |
instrument_class |
char | Class |
strike_price |
int64 | Strike price (options) |
match_algorithm |
char | Matching algorithm |
md_security_trading_status |
uint8 | Trading status |
main_fraction |
uint8 | Main fraction |
price_display_format |
uint8 | Display format |
settl_price_type |
uint8 | Settlement type |
sub_fraction |
uint8 | Sub fraction |
underlying_product |
uint8 | Underlying product |
security_update_action |
char | Update action |
maturity_month |
uint8 | Month |
maturity_day |
uint8 | Day |
maturity_week |
uint8 | Week |
user_defined_instrument |
char | User-defined |
contract_multiplier_unit |
int8 | Multiplier unit |
flow_schedule_type |
int8 | Flow schedule |
tick_rule |
uint8 | Tick rule |
When to Use:
- Understanding instrument specifications
- Calculating tick values
- Contract expiration management
- Symbol resolution and mapping
Key Fields for ES/NQ:
min_price_increment: Tick size (0.25 for ES, 0.25 for NQ)expiration: Contract expiration timestampraw_symbol: Exchange symbolcontract_multiplier: Usually 50 for ES, 20 for NQ
Statistics Schema
Market statistics and calculated metrics.
| Field | Type | Description |
|---|---|---|
ts_recv |
uint64 | Receipt timestamp |
ts_ref |
uint64 | Reference timestamp |
price |
int64 | Reference price |
quantity |
int64 | Reference quantity |
sequence |
uint32 | Sequence number |
ts_in_delta |
int32 | Time delta |
stat_type |
uint16 | Statistic type |
channel_id |
uint16 | Channel ID |
update_action |
uint8 | Update action |
stat_flags |
uint8 | Statistic flags |
Common Statistic Types:
- Opening price
- Settlement price
- High/low prices
- Trading volume
- Open interest
When to Use:
- Official settlement prices
- Open interest analysis
- Exchange-calculated statistics
Status Schema
Instrument trading status and state changes.
| Field | Type | Description |
|---|---|---|
ts_recv |
uint64 | Receipt timestamp |
ts_event |
uint64 | Event timestamp |
action |
uint16 | Status action |
reason |
uint16 | Status reason |
trading_event |
uint16 | Trading event |
is_trading |
int8 | Trading flag (1 = trading, 0 = not trading) |
is_quoting |
int8 | Quoting flag |
is_short_sell_restricted |
int8 | Short sell flag |
When to Use:
- Detecting trading halts
- Understanding market status changes
- Filtering data by trading status
Imbalance Schema
Order imbalance data for auctions and closes.
| Field | Type | Description |
|---|---|---|
ts_recv |
uint64 | Receipt timestamp |
ts_event |
uint64 | Event timestamp |
ref_price |
int64 | Reference price |
auction_time |
uint64 | Auction timestamp |
cont_book_clr_price |
int64 | Continuous book clearing price |
auct_interest_clr_price |
int64 | Auction interest clearing price |
paired_qty |
uint64 | Paired quantity |
total_imbalance_qty |
uint64 | Total imbalance |
side |
char | Imbalance side ('B' or 'A') |
significant_imbalance |
char | Significance flag |
When to Use:
- Opening/closing auction analysis
- Imbalance trading strategies
- End-of-day positioning
Schema Selection Decision Matrix
| Analysis Type | Recommended Schema | Alternative |
|---|---|---|
| Daily backtesting | ohlcv-1d | ohlcv-1h |
| Intraday backtesting | ohlcv-1h, ohlcv-1m | trades |
| Spread analysis | mbp-1 | trades |
| Order flow | trades | mbp-1 |
| Market depth | mbp-10 | mbo |
| Tick-by-tick | trades | mbo |
| Liquidity analysis | mbp-1, mbp-10 | mbo |
| Contract specifications | definition | - |
| Settlement prices | statistics | definition |
| Trading halts | status | - |
| Auction analysis | imbalance | trades |
Data Type Reference
Fixed-Point Prices
All price fields are stored as int64 in fixed-point notation with 9 decimal places of precision.
Conversion:
decimal_price = int64_price / 1_000_000_000
Example:
- ES at 4500.25 → stored as 4500250000000
- NQ at 15000.50 → stored as 15000500000000
Timestamps
All timestamps are uint64 nanoseconds since Unix epoch (1970-01-01 00:00:00 UTC).
Conversion to datetime:
import datetime
dt = datetime.datetime.fromtimestamp(ts_event / 1_000_000_000, tz=datetime.timezone.utc)
Character Fields
Single-character fields (char) represent enums:
- Action: 'A' (add), 'C' (cancel), 'M' (modify), 'T' (trade), 'F' (fill)
- Side: 'B' (bid), 'A' (ask), 'N' (none/unknown)
Performance Considerations
Schema Size (Approximate bytes per record)
| Schema | Size | Records/GB |
|---|---|---|
| ohlcv-1d | ~100 | ~10M |
| ohlcv-1h | ~100 | ~10M |
| trades | ~50 | ~20M |
| mbp-1 | ~150 | ~6.7M |
| mbp-10 | ~500 | ~2M |
| mbo | ~80 | ~12.5M |
Planning requests:
- 1 day of ES trades ≈ 100K-500K records ≈ 5-25 MB
- 1 day of ES mbp-1 ≈ 1M-5M records ≈ 150-750 MB
- 1 year of ES ohlcv-1h ≈ 6K records ≈ 600 KB
Use these estimates to decide between timeseries (< 5GB) and batch downloads (> 5GB).