# Observation Endpoint - Statistical Data Queries ## Purpose The Observation API retrieves statistical observations—data points linking entities, variables, and specific dates. Examples include: - "USA population in 2020" - "California GDP over time" - "Unemployment rate for all counties in a state" ## Core Methods ### 1. fetch() Primary method for retrieving observations with flexible entity specification. **Key Parameters:** - `variable_dcids` (required): List of statistical variable identifiers - `entity_dcids` or `entity_expression` (required): Specify entities by ID or relation expression - `date` (optional): Defaults to "latest". Accepts: - ISO-8601 format (e.g., "2020", "2020-01", "2020-01-15") - "all" for complete time series - "latest" for most recent data - `select` (optional): Controls returned fields - Default: `["date", "entity", "variable", "value"]` - Alternative: `["entity", "variable", "facet"]` to check availability without data - `filter_facet_domains`: Filter by data source domain - `filter_facet_ids`: Filter by specific facet IDs **Response Structure:** Data organized hierarchically by variable → entity, with metadata about "facets" (data sources) including: - Provenance URLs - Measurement methods - Observation periods - Import names **Example Usage:** ```python from datacommons_client import DataCommonsClient client = DataCommonsClient() # Get latest population for multiple entities response = client.observation.fetch( variable_dcids=["Count_Person"], entity_dcids=["geoId/06", "geoId/48"], # California and Texas date="latest" ) # Get complete time series response = client.observation.fetch( variable_dcids=["Count_Person"], entity_dcids=["country/USA"], date="all" ) # Use relation expressions to query hierarchies response = client.observation.fetch( variable_dcids=["Count_Person"], entity_expression="geoId/06<-containedInPlace+{typeOf:County}", date="2020" ) ``` ### 2. fetch_available_statistical_variables() Discovers which statistical variables contain data for given entities. **Input:** Entity DCIDs only **Output:** Dictionary of available variables organized by entity **Example Usage:** ```python # Check what variables are available for California available = client.observation.fetch_available_statistical_variables( entity_dcids=["geoId/06"] ) ``` ### 3. fetch_observations_by_entity_dcid() Explicit method targeting specific entities by DCID (functionally equivalent to `fetch()` with entity_dcids). ### 4. fetch_observations_by_entity_type() Retrieves observations for multiple entities grouped by parent and type—useful for querying all countries in a region or all counties within a state. **Parameters:** - `parent_entity`: Parent entity DCID - `entity_type`: Type of child entities - `variable_dcids`: Statistical variables to query - `date`: Time specification - `select` and filter options **Example Usage:** ```python # Get population for all counties in California response = client.observation.fetch_observations_by_entity_type( parent_entity="geoId/06", entity_type="County", variable_dcids=["Count_Person"], date="2020" ) ``` ## Response Object Methods All response objects support: - `to_json()`: Format as JSON string - `to_dict()`: Return as dictionary - `get_data_by_entity()`: Reorganize by entity instead of variable - `to_observations_as_records()`: Flatten into individual records ## Common Use Cases ### Use Case 1: Check Data Availability Before Querying Use `select=["entity", "variable"]` to confirm entities have observations without retrieving actual data: ```python response = client.observation.fetch( variable_dcids=["Count_Person"], entity_dcids=["geoId/06"], select=["entity", "variable"] ) ``` ### Use Case 2: Access Complete Time Series Request `date="all"` to obtain complete historical observations for trend analysis: ```python response = client.observation.fetch( variable_dcids=["Count_Person", "UnemploymentRate_Person"], entity_dcids=["country/USA"], date="all" ) ``` ### Use Case 3: Filter by Data Source Specify `filter_facet_domains` to retrieve data from specific sources for consistency: ```python response = client.observation.fetch( variable_dcids=["Count_Person"], entity_dcids=["country/USA"], filter_facet_domains=["census.gov"] ) ``` ### Use Case 4: Query Hierarchical Relationships Use relation expressions to fetch observations for related entities: ```python # Get data for all counties within California response = client.observation.fetch( variable_dcids=["MedianIncome_Household"], entity_expression="geoId/06<-containedInPlace+{typeOf:County}", date="2020" ) ``` ## Working with Pandas The API integrates seamlessly with Pandas. Install with Pandas support: ```bash pip install "datacommons-client[Pandas]" ``` Response objects can be converted to DataFrames for analysis: ```python response = client.observation.fetch( variable_dcids=["Count_Person"], entity_dcids=["geoId/06", "geoId/48"], date="all" ) # Convert to DataFrame df = response.to_observations_as_records() # Returns DataFrame with columns: date, entity, variable, value ``` ## Important Notes - **facets** represent data sources and include provenance metadata - **orderedFacets** are sorted by reliability/recency - Use relation expressions for complex graph queries - The `fetch()` method is the most flexible—use it for most queries