Initial commit

2025-11-30 08:30:10 +08:00
commit f0bd18fb4e
824 changed files with 331919 additions and 0 deletions
--- a/skills/datacommons-client/references/observation.md
+++ b/skills/datacommons-client/references/observation.md
@@ -0,0 +1,185 @@
+# Observation Endpoint - Statistical Data Queries
+
+## Purpose
+
+The Observation API retrieves statistical observations—data points linking entities, variables, and specific dates. Examples include:
+- "USA population in 2020"
+- "California GDP over time"
+- "Unemployment rate for all counties in a state"
+
+## Core Methods
+
+### 1. fetch()
+
+Primary method for retrieving observations with flexible entity specification.
+
+**Key Parameters:**
+- `variable_dcids` (required): List of statistical variable identifiers
+- `entity_dcids` or `entity_expression` (required): Specify entities by ID or relation expression
+- `date` (optional): Defaults to "latest". Accepts:
+  - ISO-8601 format (e.g., "2020", "2020-01", "2020-01-15")
+  - "all" for complete time series
+  - "latest" for most recent data
+- `select` (optional): Controls returned fields
+  - Default: `["date", "entity", "variable", "value"]`
+  - Alternative: `["entity", "variable", "facet"]` to check availability without data
+- `filter_facet_domains`: Filter by data source domain
+- `filter_facet_ids`: Filter by specific facet IDs
+
+**Response Structure:**
+Data organized hierarchically by variable → entity, with metadata about "facets" (data sources) including:
+- Provenance URLs
+- Measurement methods
+- Observation periods
+- Import names
+
+**Example Usage:**
+```python
+from datacommons_client import DataCommonsClient
+
+client = DataCommonsClient()
+
+# Get latest population for multiple entities
+response = client.observation.fetch(
+    variable_dcids=["Count_Person"],
+    entity_dcids=["geoId/06", "geoId/48"],  # California and Texas
+    date="latest"
+)
+
+# Get complete time series
+response = client.observation.fetch(
+    variable_dcids=["Count_Person"],
+    entity_dcids=["country/USA"],
+    date="all"
+)
+
+# Use relation expressions to query hierarchies
+response = client.observation.fetch(
+    variable_dcids=["Count_Person"],
+    entity_expression="geoId/06<-containedInPlace+{typeOf:County}",
+    date="2020"
+)
+```
+
+### 2. fetch_available_statistical_variables()
+
+Discovers which statistical variables contain data for given entities.
+
+**Input:** Entity DCIDs only
+**Output:** Dictionary of available variables organized by entity
+
+**Example Usage:**
+```python
+# Check what variables are available for California
+available = client.observation.fetch_available_statistical_variables(
+    entity_dcids=["geoId/06"]
+)
+```
+
+### 3. fetch_observations_by_entity_dcid()
+
+Explicit method targeting specific entities by DCID (functionally equivalent to `fetch()` with entity_dcids).
+
+### 4. fetch_observations_by_entity_type()
+
+Retrieves observations for multiple entities grouped by parent and type—useful for querying all countries in a region or all counties within a state.
+
+**Parameters:**
+- `parent_entity`: Parent entity DCID
+- `entity_type`: Type of child entities
+- `variable_dcids`: Statistical variables to query
+- `date`: Time specification
+- `select` and filter options
+
+**Example Usage:**
+```python
+# Get population for all counties in California
+response = client.observation.fetch_observations_by_entity_type(
+    parent_entity="geoId/06",
+    entity_type="County",
+    variable_dcids=["Count_Person"],
+    date="2020"
+)
+```
+
+## Response Object Methods
+
+All response objects support:
+- `to_json()`: Format as JSON string
+- `to_dict()`: Return as dictionary
+- `get_data_by_entity()`: Reorganize by entity instead of variable
+- `to_observations_as_records()`: Flatten into individual records
+
+## Common Use Cases
+
+### Use Case 1: Check Data Availability Before Querying
+
+Use `select=["entity", "variable"]` to confirm entities have observations without retrieving actual data:
+```python
+response = client.observation.fetch(
+    variable_dcids=["Count_Person"],
+    entity_dcids=["geoId/06"],
+    select=["entity", "variable"]
+)
+```
+
+### Use Case 2: Access Complete Time Series
+
+Request `date="all"` to obtain complete historical observations for trend analysis:
+```python
+response = client.observation.fetch(
+    variable_dcids=["Count_Person", "UnemploymentRate_Person"],
+    entity_dcids=["country/USA"],
+    date="all"
+)
+```
+
+### Use Case 3: Filter by Data Source
+
+Specify `filter_facet_domains` to retrieve data from specific sources for consistency:
+```python
+response = client.observation.fetch(
+    variable_dcids=["Count_Person"],
+    entity_dcids=["country/USA"],
+    filter_facet_domains=["census.gov"]
+)
+```
+
+### Use Case 4: Query Hierarchical Relationships
+
+Use relation expressions to fetch observations for related entities:
+```python
+# Get data for all counties within California
+response = client.observation.fetch(
+    variable_dcids=["MedianIncome_Household"],
+    entity_expression="geoId/06<-containedInPlace+{typeOf:County}",
+    date="2020"
+)
+```
+
+## Working with Pandas
+
+The API integrates seamlessly with Pandas. Install with Pandas support:
+```bash
+pip install "datacommons-client[Pandas]"
+```
+
+Response objects can be converted to DataFrames for analysis:
+```python
+response = client.observation.fetch(
+    variable_dcids=["Count_Person"],
+    entity_dcids=["geoId/06", "geoId/48"],
+    date="all"
+)
+
+# Convert to DataFrame
+df = response.to_observations_as_records()
+# Returns DataFrame with columns: date, entity, variable, value
+```
+
+## Important Notes
+
+- **facets** represent data sources and include provenance metadata
+- **orderedFacets** are sorted by reliability/recency
+- Use relation expressions for complex graph queries
+- The `fetch()` method is the most flexible—use it for most queries