Initial commit
This commit is contained in:
372
skills/databento/references/schemas.md
Normal file
372
skills/databento/references/schemas.md
Normal file
@@ -0,0 +1,372 @@
|
||||
# Databento Schema Reference
|
||||
|
||||
Comprehensive documentation of Databento schemas with field-level details, data types, and usage guidance.
|
||||
|
||||
## Schema Overview
|
||||
|
||||
Databento provides 12+ schema types representing different granularity levels of market data. All schemas share common timestamp fields for consistency.
|
||||
|
||||
## Common Fields (All Schemas)
|
||||
|
||||
Every schema includes these timestamp fields:
|
||||
|
||||
| Field | Type | Description | Unit |
|
||||
|-------|------|-------------|------|
|
||||
| `ts_event` | uint64 | Event timestamp from venue | Nanoseconds (Unix epoch) |
|
||||
| `ts_recv` | uint64 | Databento gateway receipt time | Nanoseconds (Unix epoch) |
|
||||
|
||||
**Important:** Databento provides up to 4 timestamps per event for sub-microsecond accuracy.
|
||||
|
||||
## OHLCV Schemas
|
||||
|
||||
Candlestick/bar data at various time intervals.
|
||||
|
||||
### ohlcv-1s (1 Second Bars)
|
||||
### ohlcv-1m (1 Minute Bars)
|
||||
### ohlcv-1h (1 Hour Bars)
|
||||
### ohlcv-1d (Daily Bars)
|
||||
### ohlcv-eod (End of Day)
|
||||
|
||||
**Common OHLCV Fields:**
|
||||
|
||||
| Field | Type | Description | Unit |
|
||||
|-------|------|-------------|------|
|
||||
| `open` | int64 | Opening price | Fixed-point (divide by 1e9 for decimal) |
|
||||
| `high` | int64 | Highest price | Fixed-point (divide by 1e9 for decimal) |
|
||||
| `low` | int64 | Lowest price | Fixed-point (divide by 1e9 for decimal) |
|
||||
| `close` | int64 | Closing price | Fixed-point (divide by 1e9 for decimal) |
|
||||
| `volume` | uint64 | Total volume | Contracts/shares |
|
||||
|
||||
**When to Use:**
|
||||
- **1h/1d**: Historical backtesting, multi-day analysis
|
||||
- **1m**: Intraday strategy development
|
||||
- **1s**: High-frequency analysis (use batch for large ranges)
|
||||
- **eod**: Long-term investment analysis
|
||||
|
||||
**Pricing Format:**
|
||||
Prices are in fixed-point notation. To convert to decimal:
|
||||
```
|
||||
decimal_price = int64_price / 1_000_000_000
|
||||
```
|
||||
|
||||
For ES futures at 4500.00, the value would be stored as `4500000000000`.
|
||||
|
||||
## Trades Schema
|
||||
|
||||
Individual trade executions with price, size, and side information.
|
||||
|
||||
| Field | Type | Description | Values |
|
||||
|-------|------|-------------|--------|
|
||||
| `price` | int64 | Trade execution price | Fixed-point (÷ 1e9) |
|
||||
| `size` | uint32 | Trade size | Contracts/shares |
|
||||
| `action` | char | Trade action | 'T' = trade, 'C' = cancel |
|
||||
| `side` | char | Aggressor side | 'B' = buy, 'S' = sell, 'N' = none |
|
||||
| `flags` | uint8 | Trade flags | Bitmask |
|
||||
| `depth` | uint8 | Depth level | Usually 0 |
|
||||
| `ts_in_delta` | int32 | Time delta | Nanoseconds |
|
||||
| `sequence` | uint32 | Sequence number | Venue-specific |
|
||||
|
||||
**When to Use:**
|
||||
- Intraday order flow analysis
|
||||
- Tick-by-tick backtesting
|
||||
- Market microstructure research
|
||||
- Volume profile analysis
|
||||
|
||||
**Aggressor Side:**
|
||||
- `B` = Buy-side aggressor (market buy hit the ask)
|
||||
- `S` = Sell-side aggressor (market sell hit the bid)
|
||||
- `N` = Cannot be determined or not applicable
|
||||
|
||||
**Important:** For multi-day tick data, use batch downloads. Trades can generate millions of records per day.
|
||||
|
||||
## MBP-1 Schema (Market By Price - Top of Book)
|
||||
|
||||
Level 1 order book data showing best bid and ask.
|
||||
|
||||
| Field | Type | Description | Values |
|
||||
|-------|------|-------------|--------|
|
||||
| `price` | int64 | Reference price (usually last trade) | Fixed-point (÷ 1e9) |
|
||||
| `size` | uint32 | Reference size | Contracts/shares |
|
||||
| `action` | char | Book action | 'A' = add, 'C' = cancel, 'M' = modify, 'T' = trade |
|
||||
| `side` | char | Order side | 'B' = bid, 'A' = ask, 'N' = none |
|
||||
| `flags` | uint8 | Flags | Bitmask |
|
||||
| `depth` | uint8 | Depth level | Always 0 for MBP-1 |
|
||||
| `ts_in_delta` | int32 | Time delta | Nanoseconds |
|
||||
| `sequence` | uint32 | Sequence number | Venue-specific |
|
||||
| `bid_px_00` | int64 | Best bid price | Fixed-point (÷ 1e9) |
|
||||
| `ask_px_00` | int64 | Best ask price | Fixed-point (÷ 1e9) |
|
||||
| `bid_sz_00` | uint32 | Best bid size | Contracts/shares |
|
||||
| `ask_sz_00` | uint32 | Best ask size | Contracts/shares |
|
||||
| `bid_ct_00` | uint32 | Bid order count | Number of orders |
|
||||
| `ask_ct_00` | uint32 | Ask order count | Number of orders |
|
||||
|
||||
**When to Use:**
|
||||
- Bid/ask spread analysis
|
||||
- Liquidity analysis
|
||||
- Market microstructure studies
|
||||
- Quote-based strategies
|
||||
|
||||
**Key Metrics:**
|
||||
```
|
||||
spread = ask_px_00 - bid_px_00
|
||||
mid_price = (bid_px_00 + ask_px_00) / 2
|
||||
bid_ask_imbalance = (bid_sz_00 - ask_sz_00) / (bid_sz_00 + ask_sz_00)
|
||||
```
|
||||
|
||||
## MBP-10 Schema (Market By Price - 10 Levels)
|
||||
|
||||
Level 2 order book data showing 10 levels of depth.
|
||||
|
||||
**Fields:** Same as MBP-1, plus 9 additional levels:
|
||||
- `bid_px_01` through `bid_px_09` (10 bid levels)
|
||||
- `ask_px_01` through `ask_px_09` (10 ask levels)
|
||||
- `bid_sz_01` through `bid_sz_09`
|
||||
- `ask_sz_01` through `ask_sz_09`
|
||||
- `bid_ct_01` through `bid_ct_09`
|
||||
- `ask_ct_01` through `ask_ct_09`
|
||||
|
||||
**When to Use:**
|
||||
- Order book depth analysis
|
||||
- Liquidity beyond top of book
|
||||
- Order flow imbalance at multiple levels
|
||||
- Market impact modeling
|
||||
|
||||
**Important:** MBP-10 generates significantly more data than MBP-1. Use batch downloads for multi-day requests.
|
||||
|
||||
## MBO Schema (Market By Order)
|
||||
|
||||
Level 3 order-level data with individual order IDs - most granular.
|
||||
|
||||
| Field | Type | Description | Values |
|
||||
|-------|------|-------------|--------|
|
||||
| `order_id` | uint64 | Unique order ID | Venue-specific |
|
||||
| `price` | int64 | Order price | Fixed-point (÷ 1e9) |
|
||||
| `size` | uint32 | Order size | Contracts/shares |
|
||||
| `flags` | uint8 | Flags | Bitmask |
|
||||
| `channel_id` | uint8 | Channel ID | Venue-specific |
|
||||
| `action` | char | Order action | 'A' = add, 'C' = cancel, 'M' = modify, 'F' = fill, 'T' = trade |
|
||||
| `side` | char | Order side | 'B' = bid, 'A' = ask, 'N' = none |
|
||||
| `ts_in_delta` | int32 | Time delta | Nanoseconds |
|
||||
| `sequence` | uint32 | Sequence number | Venue-specific |
|
||||
|
||||
**When to Use:**
|
||||
- Highest granularity order flow analysis
|
||||
- Order-level reconstructions
|
||||
- Advanced market microstructure research
|
||||
- Queue position analysis
|
||||
|
||||
**Important:** MBO data is extremely granular and generates massive datasets. Always use batch downloads and carefully check costs.
|
||||
|
||||
## Definition Schema
|
||||
|
||||
Instrument metadata and definitions.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `ts_recv` | uint64 | Receipt timestamp |
|
||||
| `min_price_increment` | int64 | Minimum tick size |
|
||||
| `display_factor` | int64 | Display factor for prices |
|
||||
| `expiration` | uint64 | Contract expiration timestamp |
|
||||
| `activation` | uint64 | Contract activation timestamp |
|
||||
| `high_limit_price` | int64 | Upper price limit |
|
||||
| `low_limit_price` | int64 | Lower price limit |
|
||||
| `max_price_variation` | int64 | Maximum price move |
|
||||
| `trading_reference_price` | int64 | Reference price |
|
||||
| `unit_of_measure_qty` | int64 | Contract size |
|
||||
| `min_price_increment_amount` | int64 | Tick value |
|
||||
| `price_ratio` | int64 | Price ratio |
|
||||
| `inst_attrib_value` | int32 | Instrument attributes |
|
||||
| `underlying_id` | uint32 | Underlying instrument ID |
|
||||
| `raw_instrument_id` | uint32 | Raw instrument ID |
|
||||
| `market_depth_implied` | int32 | Implied depth |
|
||||
| `market_depth` | int32 | Market depth |
|
||||
| `market_segment_id` | uint32 | Market segment |
|
||||
| `max_trade_vol` | uint32 | Maximum trade volume |
|
||||
| `min_lot_size` | int32 | Minimum lot size |
|
||||
| `min_lot_size_block` | int32 | Block trade minimum |
|
||||
| `min_lot_size_round_lot` | int32 | Round lot minimum |
|
||||
| `min_trade_vol` | uint32 | Minimum trade volume |
|
||||
| `contract_multiplier` | int32 | Contract multiplier |
|
||||
| `decay_quantity` | int32 | Decay quantity |
|
||||
| `original_contract_size` | int32 | Original size |
|
||||
| `trading_reference_date` | uint16 | Reference date |
|
||||
| `appl_id` | int16 | Application ID |
|
||||
| `maturity_year` | uint16 | Year |
|
||||
| `decay_start_date` | uint16 | Decay start |
|
||||
| `channel_id` | uint16 | Channel |
|
||||
| `currency` | string | Currency code |
|
||||
| `settl_currency` | string | Settlement currency |
|
||||
| `secsubtype` | string | Security subtype |
|
||||
| `raw_symbol` | string | Raw symbol |
|
||||
| `group` | string | Instrument group |
|
||||
| `exchange` | string | Exchange code |
|
||||
| `asset` | string | Asset class |
|
||||
| `cfi` | string | CFI code |
|
||||
| `security_type` | string | Security type |
|
||||
| `unit_of_measure` | string | Unit of measure |
|
||||
| `underlying` | string | Underlying symbol |
|
||||
| `strike_price_currency` | string | Strike currency |
|
||||
| `instrument_class` | char | Class |
|
||||
| `strike_price` | int64 | Strike price (options) |
|
||||
| `match_algorithm` | char | Matching algorithm |
|
||||
| `md_security_trading_status` | uint8 | Trading status |
|
||||
| `main_fraction` | uint8 | Main fraction |
|
||||
| `price_display_format` | uint8 | Display format |
|
||||
| `settl_price_type` | uint8 | Settlement type |
|
||||
| `sub_fraction` | uint8 | Sub fraction |
|
||||
| `underlying_product` | uint8 | Underlying product |
|
||||
| `security_update_action` | char | Update action |
|
||||
| `maturity_month` | uint8 | Month |
|
||||
| `maturity_day` | uint8 | Day |
|
||||
| `maturity_week` | uint8 | Week |
|
||||
| `user_defined_instrument` | char | User-defined |
|
||||
| `contract_multiplier_unit` | int8 | Multiplier unit |
|
||||
| `flow_schedule_type` | int8 | Flow schedule |
|
||||
| `tick_rule` | uint8 | Tick rule |
|
||||
|
||||
**When to Use:**
|
||||
- Understanding instrument specifications
|
||||
- Calculating tick values
|
||||
- Contract expiration management
|
||||
- Symbol resolution and mapping
|
||||
|
||||
**Key Fields for ES/NQ:**
|
||||
- `min_price_increment`: Tick size (0.25 for ES, 0.25 for NQ)
|
||||
- `expiration`: Contract expiration timestamp
|
||||
- `raw_symbol`: Exchange symbol
|
||||
- `contract_multiplier`: Usually 50 for ES, 20 for NQ
|
||||
|
||||
## Statistics Schema
|
||||
|
||||
Market statistics and calculated metrics.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `ts_recv` | uint64 | Receipt timestamp |
|
||||
| `ts_ref` | uint64 | Reference timestamp |
|
||||
| `price` | int64 | Reference price |
|
||||
| `quantity` | int64 | Reference quantity |
|
||||
| `sequence` | uint32 | Sequence number |
|
||||
| `ts_in_delta` | int32 | Time delta |
|
||||
| `stat_type` | uint16 | Statistic type |
|
||||
| `channel_id` | uint16 | Channel ID |
|
||||
| `update_action` | uint8 | Update action |
|
||||
| `stat_flags` | uint8 | Statistic flags |
|
||||
|
||||
**Common Statistic Types:**
|
||||
- Opening price
|
||||
- Settlement price
|
||||
- High/low prices
|
||||
- Trading volume
|
||||
- Open interest
|
||||
|
||||
**When to Use:**
|
||||
- Official settlement prices
|
||||
- Open interest analysis
|
||||
- Exchange-calculated statistics
|
||||
|
||||
## Status Schema
|
||||
|
||||
Instrument trading status and state changes.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `ts_recv` | uint64 | Receipt timestamp |
|
||||
| `ts_event` | uint64 | Event timestamp |
|
||||
| `action` | uint16 | Status action |
|
||||
| `reason` | uint16 | Status reason |
|
||||
| `trading_event` | uint16 | Trading event |
|
||||
| `is_trading` | int8 | Trading flag (1 = trading, 0 = not trading) |
|
||||
| `is_quoting` | int8 | Quoting flag |
|
||||
| `is_short_sell_restricted` | int8 | Short sell flag |
|
||||
|
||||
**When to Use:**
|
||||
- Detecting trading halts
|
||||
- Understanding market status changes
|
||||
- Filtering data by trading status
|
||||
|
||||
## Imbalance Schema
|
||||
|
||||
Order imbalance data for auctions and closes.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `ts_recv` | uint64 | Receipt timestamp |
|
||||
| `ts_event` | uint64 | Event timestamp |
|
||||
| `ref_price` | int64 | Reference price |
|
||||
| `auction_time` | uint64 | Auction timestamp |
|
||||
| `cont_book_clr_price` | int64 | Continuous book clearing price |
|
||||
| `auct_interest_clr_price` | int64 | Auction interest clearing price |
|
||||
| `paired_qty` | uint64 | Paired quantity |
|
||||
| `total_imbalance_qty` | uint64 | Total imbalance |
|
||||
| `side` | char | Imbalance side ('B' or 'A') |
|
||||
| `significant_imbalance` | char | Significance flag |
|
||||
|
||||
**When to Use:**
|
||||
- Opening/closing auction analysis
|
||||
- Imbalance trading strategies
|
||||
- End-of-day positioning
|
||||
|
||||
## Schema Selection Decision Matrix
|
||||
|
||||
| Analysis Type | Recommended Schema | Alternative |
|
||||
|---------------|-------------------|-------------|
|
||||
| Daily backtesting | ohlcv-1d | ohlcv-1h |
|
||||
| Intraday backtesting | ohlcv-1h, ohlcv-1m | trades |
|
||||
| Spread analysis | mbp-1 | trades |
|
||||
| Order flow | trades | mbp-1 |
|
||||
| Market depth | mbp-10 | mbo |
|
||||
| Tick-by-tick | trades | mbo |
|
||||
| Liquidity analysis | mbp-1, mbp-10 | mbo |
|
||||
| Contract specifications | definition | - |
|
||||
| Settlement prices | statistics | definition |
|
||||
| Trading halts | status | - |
|
||||
| Auction analysis | imbalance | trades |
|
||||
|
||||
## Data Type Reference
|
||||
|
||||
### Fixed-Point Prices
|
||||
All price fields are stored as int64 in fixed-point notation with 9 decimal places of precision.
|
||||
|
||||
**Conversion:**
|
||||
```python
|
||||
decimal_price = int64_price / 1_000_000_000
|
||||
```
|
||||
|
||||
**Example:**
|
||||
- ES at 4500.25 → stored as 4500250000000
|
||||
- NQ at 15000.50 → stored as 15000500000000
|
||||
|
||||
### Timestamps
|
||||
All timestamps are uint64 nanoseconds since Unix epoch (1970-01-01 00:00:00 UTC).
|
||||
|
||||
**Conversion to datetime:**
|
||||
```python
|
||||
import datetime
|
||||
dt = datetime.datetime.fromtimestamp(ts_event / 1_000_000_000, tz=datetime.timezone.utc)
|
||||
```
|
||||
|
||||
### Character Fields
|
||||
Single-character fields (char) represent enums:
|
||||
- Action: 'A' (add), 'C' (cancel), 'M' (modify), 'T' (trade), 'F' (fill)
|
||||
- Side: 'B' (bid), 'A' (ask), 'N' (none/unknown)
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
### Schema Size (Approximate bytes per record)
|
||||
|
||||
| Schema | Size | Records/GB |
|
||||
|--------|------|------------|
|
||||
| ohlcv-1d | ~100 | ~10M |
|
||||
| ohlcv-1h | ~100 | ~10M |
|
||||
| trades | ~50 | ~20M |
|
||||
| mbp-1 | ~150 | ~6.7M |
|
||||
| mbp-10 | ~500 | ~2M |
|
||||
| mbo | ~80 | ~12.5M |
|
||||
|
||||
**Planning requests:**
|
||||
- 1 day of ES trades ≈ 100K-500K records ≈ 5-25 MB
|
||||
- 1 day of ES mbp-1 ≈ 1M-5M records ≈ 150-750 MB
|
||||
- 1 year of ES ohlcv-1h ≈ 6K records ≈ 600 KB
|
||||
|
||||
Use these estimates to decide between timeseries (< 5GB) and batch downloads (> 5GB).
|
||||
Reference in New Issue
Block a user