5.4 KiB
Observation Endpoint - Statistical Data Queries
Purpose
The Observation API retrieves statistical observations—data points linking entities, variables, and specific dates. Examples include:
- "USA population in 2020"
- "California GDP over time"
- "Unemployment rate for all counties in a state"
Core Methods
1. fetch()
Primary method for retrieving observations with flexible entity specification.
Key Parameters:
variable_dcids(required): List of statistical variable identifiersentity_dcidsorentity_expression(required): Specify entities by ID or relation expressiondate(optional): Defaults to "latest". Accepts:- ISO-8601 format (e.g., "2020", "2020-01", "2020-01-15")
- "all" for complete time series
- "latest" for most recent data
select(optional): Controls returned fields- Default:
["date", "entity", "variable", "value"] - Alternative:
["entity", "variable", "facet"]to check availability without data
- Default:
filter_facet_domains: Filter by data source domainfilter_facet_ids: Filter by specific facet IDs
Response Structure: Data organized hierarchically by variable → entity, with metadata about "facets" (data sources) including:
- Provenance URLs
- Measurement methods
- Observation periods
- Import names
Example Usage:
from datacommons_client import DataCommonsClient
client = DataCommonsClient()
# Get latest population for multiple entities
response = client.observation.fetch(
variable_dcids=["Count_Person"],
entity_dcids=["geoId/06", "geoId/48"], # California and Texas
date="latest"
)
# Get complete time series
response = client.observation.fetch(
variable_dcids=["Count_Person"],
entity_dcids=["country/USA"],
date="all"
)
# Use relation expressions to query hierarchies
response = client.observation.fetch(
variable_dcids=["Count_Person"],
entity_expression="geoId/06<-containedInPlace+{typeOf:County}",
date="2020"
)
2. fetch_available_statistical_variables()
Discovers which statistical variables contain data for given entities.
Input: Entity DCIDs only Output: Dictionary of available variables organized by entity
Example Usage:
# Check what variables are available for California
available = client.observation.fetch_available_statistical_variables(
entity_dcids=["geoId/06"]
)
3. fetch_observations_by_entity_dcid()
Explicit method targeting specific entities by DCID (functionally equivalent to fetch() with entity_dcids).
4. fetch_observations_by_entity_type()
Retrieves observations for multiple entities grouped by parent and type—useful for querying all countries in a region or all counties within a state.
Parameters:
parent_entity: Parent entity DCIDentity_type: Type of child entitiesvariable_dcids: Statistical variables to querydate: Time specificationselectand filter options
Example Usage:
# Get population for all counties in California
response = client.observation.fetch_observations_by_entity_type(
parent_entity="geoId/06",
entity_type="County",
variable_dcids=["Count_Person"],
date="2020"
)
Response Object Methods
All response objects support:
to_json(): Format as JSON stringto_dict(): Return as dictionaryget_data_by_entity(): Reorganize by entity instead of variableto_observations_as_records(): Flatten into individual records
Common Use Cases
Use Case 1: Check Data Availability Before Querying
Use select=["entity", "variable"] to confirm entities have observations without retrieving actual data:
response = client.observation.fetch(
variable_dcids=["Count_Person"],
entity_dcids=["geoId/06"],
select=["entity", "variable"]
)
Use Case 2: Access Complete Time Series
Request date="all" to obtain complete historical observations for trend analysis:
response = client.observation.fetch(
variable_dcids=["Count_Person", "UnemploymentRate_Person"],
entity_dcids=["country/USA"],
date="all"
)
Use Case 3: Filter by Data Source
Specify filter_facet_domains to retrieve data from specific sources for consistency:
response = client.observation.fetch(
variable_dcids=["Count_Person"],
entity_dcids=["country/USA"],
filter_facet_domains=["census.gov"]
)
Use Case 4: Query Hierarchical Relationships
Use relation expressions to fetch observations for related entities:
# Get data for all counties within California
response = client.observation.fetch(
variable_dcids=["MedianIncome_Household"],
entity_expression="geoId/06<-containedInPlace+{typeOf:County}",
date="2020"
)
Working with Pandas
The API integrates seamlessly with Pandas. Install with Pandas support:
pip install "datacommons-client[Pandas]"
Response objects can be converted to DataFrames for analysis:
response = client.observation.fetch(
variable_dcids=["Count_Person"],
entity_dcids=["geoId/06", "geoId/48"],
date="all"
)
# Convert to DataFrame
df = response.to_observations_as_records()
# Returns DataFrame with columns: date, entity, variable, value
Important Notes
- facets represent data sources and include provenance metadata
- orderedFacets are sorted by reliability/recency
- Use relation expressions for complex graph queries
- The
fetch()method is the most flexible—use it for most queries