Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 18:32:40 +08:00
commit 0ea8352871
72 changed files with 30043 additions and 0 deletions

View File

@@ -0,0 +1,412 @@
---
name: ga4-bigquery
description: Complete guide to GA4 BigQuery export including setup, schema documentation, SQL query patterns, and data analysis. Use when exporting GA4 data to BigQuery, writing SQL queries for GA4 data, analyzing event-level data, working with nested/repeated fields (UNNEST), or building custom reports from raw data. Covers BigQuery linking, events_* tables, SQL patterns, and performance optimization.
---
# GA4 BigQuery Export and Analysis
## Overview
GA4 BigQuery export provides raw, event-level data access for advanced analysis, custom reporting, machine learning, and long-term data warehousing.
## When to Use This Skill
Invoke this skill when:
- Exporting GA4 raw data to BigQuery
- Writing SQL queries for GA4 event data
- Analyzing unsampled event-level data
- Working with nested/repeated fields using UNNEST
- Building custom reports beyond GA4 UI limits
- Creating attribution models with raw data
- Performing user journey analysis across all events
- Integrating GA4 data with other data sources
- Building machine learning models on GA4 data
- Analyzing historical data beyond GA4 retention limits
- Optimizing BigQuery query performance
- Working with events_* table schema
- Extracting event parameters from nested structures
## Core Capabilities
### BigQuery Export Setup
**Requirements:**
- GA4 property (standard or 360)
- Google Cloud project
- BigQuery API enabled
- Appropriate permissions
**Setup Steps:**
1. **Create Google Cloud Project:**
- Go to console.cloud.google.com
- Create new project or select existing
- Enable BigQuery API
2. **Link GA4 to BigQuery:**
- GA4 Admin → Product Links → BigQuery Links
- Click "Link"
- Choose Google Cloud project
- Select dataset location (US, EU, etc.)
- Configure export:
- **Daily:** Complete export once per day (~9AM property timezone)
- **Streaming:** Real-time export (360 only)
- Click "Next"
- Confirm setup
**Export Options:**
- **Daily Export:** Free for standard GA4, once per day
- **Streaming Export:** GA4 360 only, near real-time
- **Include Advertising IDs:** Optional, for Ads integration
**Data Availability:**
- Daily tables: ~24 hours after day ends
- Intraday tables: ~3 updates per day
- Streaming: Minutes after event collection (360)
### BigQuery Table Structure
**Table Naming:**
- `project.dataset.events_YYYYMMDD` - Daily export
- `project.dataset.events_intraday_YYYYMMDD` - Intraday (partial day)
- `project.dataset.events_*` - Wildcard for all dates
**Key Schema Fields:**
**Event Fields:**
- `event_date`: YYYYMMDD format (STRING)
- `event_timestamp`: Microseconds since epoch (INTEGER)
- `event_name`: Event name (STRING)
- `event_params`: Event parameters (RECORD, REPEATED)
- `event_value_in_usd`: Event value in USD (FLOAT)
**User Fields:**
- `user_id`: User ID if set (STRING)
- `user_pseudo_id`: Anonymous user ID (STRING)
- `user_properties`: User properties (RECORD, REPEATED)
- `user_first_touch_timestamp`: First visit timestamp (INTEGER)
**Device Fields:**
- `device.category`: desktop, mobile, tablet
- `device.operating_system`: Windows, iOS, Android
- `device.browser`: Chrome, Safari, etc.
**Geo Fields:**
- `geo.country`: Country name
- `geo.region`: State/region
- `geo.city`: City name
**Traffic Source Fields:**
- `traffic_source.source`: Source (google, direct)
- `traffic_source.medium`: Medium (organic, cpc)
- `traffic_source.name`: Campaign name
**E-commerce Fields:**
- `ecommerce.transaction_id`: Transaction ID (STRING)
- `ecommerce.purchase_revenue_in_usd`: Purchase revenue (FLOAT)
- `items`: Items array (RECORD, REPEATED)
### Basic SQL Query Patterns
#### Query 1: Event Count by Name
```sql
SELECT
event_name,
COUNT(*) as event_count
FROM
`project.dataset.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
GROUP BY
event_name
ORDER BY
event_count DESC
```
#### Query 2: Extract Event Parameters
```sql
SELECT
event_date,
event_name,
user_pseudo_id,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') as page_location,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_title') as page_title
FROM
`project.dataset.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
AND event_name = 'page_view'
LIMIT 1000
```
#### Query 3: Purchase Analysis
```sql
SELECT
event_date,
COUNT(DISTINCT user_pseudo_id) as purchasers,
COUNT(DISTINCT ecommerce.transaction_id) as transactions,
SUM(ecommerce.purchase_revenue_in_usd) as total_revenue,
AVG(ecommerce.purchase_revenue_in_usd) as avg_order_value
FROM
`project.dataset.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
AND event_name = 'purchase'
AND ecommerce.transaction_id IS NOT NULL
GROUP BY
event_date
ORDER BY
event_date
```
#### Query 4: UNNEST Items Array
```sql
SELECT
event_date,
item.item_name,
item.item_category,
SUM(item.quantity) as total_quantity,
SUM(item.item_revenue_in_usd) as total_revenue
FROM
`project.dataset.events_*`,
UNNEST(items) as item
WHERE
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
AND event_name = 'purchase'
GROUP BY
event_date,
item.item_name,
item.item_category
ORDER BY
total_revenue DESC
```
### Advanced Query Patterns
#### User Journey Analysis
```sql
WITH user_events AS (
SELECT
user_pseudo_id,
event_timestamp,
event_name,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') as page_location
FROM
`project.dataset.events_*`
WHERE
_TABLE_SUFFIX = '20250115'
)
SELECT
user_pseudo_id,
ARRAY_AGG(
STRUCT(event_name, page_location, event_timestamp)
ORDER BY event_timestamp
) as event_sequence
FROM
user_events
GROUP BY
user_pseudo_id
LIMIT 100
```
#### Session Attribution
```sql
SELECT
event_date,
traffic_source.source,
traffic_source.medium,
traffic_source.name as campaign,
COUNT(DISTINCT user_pseudo_id) as users,
COUNT(DISTINCT CONCAT(user_pseudo_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id'))) as sessions
FROM
`project.dataset.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
GROUP BY
event_date,
traffic_source.source,
traffic_source.medium,
traffic_source.name
ORDER BY
sessions DESC
```
#### Helper Functions
```sql
-- Create reusable functions for parameter extraction
CREATE TEMP FUNCTION GetParamString(params ANY TYPE, target_key STRING)
RETURNS STRING
AS (
(SELECT value.string_value FROM UNNEST(params) WHERE key = target_key)
);
CREATE TEMP FUNCTION GetParamInt(params ANY TYPE, target_key STRING)
RETURNS INT64
AS (
(SELECT value.int_value FROM UNNEST(params) WHERE key = target_key)
);
-- Use in query
SELECT
event_date,
GetParamString(event_params, 'page_location') as page_location,
GetParamInt(event_params, 'engagement_time_msec') as engagement_time
FROM
`project.dataset.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
```
### Query Optimization
**Best Practices:**
1. **Use _TABLE_SUFFIX Filtering:**
```sql
WHERE _TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
```
NOT:
```sql
WHERE event_date BETWEEN '20250101' AND '20250131'
```
2. **Filter on Clustered Columns:**
GA4 tables clustered by `event_name` and `event_timestamp`:
```sql
WHERE event_name IN ('page_view', 'purchase')
```
3. **Select Specific Columns:**
```sql
SELECT event_name, user_pseudo_id, event_timestamp
```
NOT:
```sql
SELECT *
```
4. **Limit UNNEST Operations:**
```sql
-- Good: inline UNNEST
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location')
-- Avoid: full UNNEST in FROM
FROM table, UNNEST(event_params) as param
WHERE param.key = 'page_location'
```
5. **Use LIMIT During Development:**
```sql
LIMIT 1000 -- Test query first
```
### Cost Management
**BigQuery Pricing:**
- **Storage:** ~$0.02/GB/month
- **Queries:** ~$5/TB scanned
- **Streaming inserts:** ~$0.05/GB (360 only)
**Reducing Costs:**
- Partition by date using _TABLE_SUFFIX
- Select only needed columns
- Use LIMIT for testing
- Create materialized views for frequent queries
- Set up cost alerts in Google Cloud
**Free Tier:**
- 10 GB storage free/month
- 1 TB queries free/month
### Data Retention
**GA4 Export Retention:**
- Standard GA4: 2 months or 14 months (Admin setting)
- BigQuery: Unlimited (until manually deleted)
- Set table expiration if needed (optional)
**Setting Expiration:**
```sql
ALTER TABLE `project.dataset.events_20250101`
SET OPTIONS (
expiration_timestamp=TIMESTAMP "2026-01-01 00:00:00 UTC"
)
```
### Common Use Cases
**1. Unsampled Reporting:**
- GA4 UI may sample large datasets
- BigQuery = full, unsampled data
- Use for accurate reporting
**2. Custom Attribution:**
- Access full user journey
- Build custom attribution models
- Credit touchpoints as needed
**3. Data Integration:**
- Join GA4 with CRM data
- Combine with product catalog
- Enrich with external sources
**4. Machine Learning:**
- Export to ML tools
- Predict churn, LTV, conversions
- Train custom models
**5. Long-term Analysis:**
- Historical analysis beyond GA4 limits
- Year-over-year comparisons
- Trend analysis
## Integration with Other Skills
- **ga4-setup** - Initial property setup before BigQuery export
- **ga4-recommended-events** - Event structure in BigQuery tables
- **ga4-custom-events** - Custom event parameters in BigQuery
- **ga4-custom-dimensions** - Custom dimensions in event_params
- **ga4-reporting** - Comparing BigQuery vs GA4 UI reports
- **ga4-measurement-protocol** - Server-side events in BigQuery
## References
- **references/bigquery-setup-complete.md** - Step-by-step BigQuery linking
- **references/schema-reference.md** - Complete table schema documentation
- **references/sql-patterns.md** - Common SQL query patterns and examples
- **references/optimization-guide.md** - Performance and cost optimization
## Quick Reference
**Table Names:**
- Daily: `events_YYYYMMDD`
- Intraday: `events_intraday_YYYYMMDD`
- Wildcard: `events_*`
**Filter by Date:**
```sql
WHERE _TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
```
**Extract Parameter:**
```sql
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'param_name')
```
**UNNEST Items:**
```sql
FROM table, UNNEST(items) as item
```
**Costs:**
- Storage: $0.02/GB/month
- Queries: $5/TB scanned

View File

@@ -0,0 +1,525 @@
# GA4 BigQuery SQL Query Cookbook
## Helper Functions (Use at Start of Queries)
```sql
-- String parameter extraction
CREATE TEMP FUNCTION GetParam(params ANY TYPE, key STRING)
RETURNS STRING AS (
(SELECT value.string_value FROM UNNEST(params) WHERE key = key)
);
-- Integer parameter extraction
CREATE TEMP FUNCTION GetParamInt(params ANY TYPE, key STRING)
RETURNS INT64 AS (
(SELECT value.int_value FROM UNNEST(params) WHERE key = key)
);
-- Float parameter extraction
CREATE TEMP FUNCTION GetParamFloat(params ANY TYPE, key STRING)
RETURNS FLOAT64 AS (
(SELECT value.float_value FROM UNNEST(params) WHERE key = key)
);
-- Get any parameter type (returns as string)
CREATE TEMP FUNCTION GetParamAny(params ANY TYPE, key STRING)
RETURNS STRING AS (
(SELECT COALESCE(
value.string_value,
CAST(value.int_value AS STRING),
CAST(value.float_value AS STRING),
CAST(value.double_value AS STRING)
) FROM UNNEST(params) WHERE key = key)
);
```
## Basic Queries
### 1. Daily Active Users
```sql
SELECT
event_date,
COUNT(DISTINCT user_pseudo_id) as active_users
FROM
`project.dataset.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
GROUP BY
event_date
ORDER BY
event_date
```
### 2. Top Pages by Views
```sql
SELECT
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') as page_location,
COUNT(*) as page_views,
COUNT(DISTINCT user_pseudo_id) as unique_users
FROM
`project.dataset.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
AND event_name = 'page_view'
GROUP BY
page_location
ORDER BY
page_views DESC
LIMIT 20
```
### 3. Session Count by Source/Medium
```sql
SELECT
traffic_source.source,
traffic_source.medium,
COUNT(DISTINCT CONCAT(user_pseudo_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')
)) as sessions,
COUNT(DISTINCT user_pseudo_id) as users
FROM
`project.dataset.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
GROUP BY
traffic_source.source,
traffic_source.medium
ORDER BY
sessions DESC
```
## E-commerce Queries
### 4. Revenue by Date
```sql
SELECT
event_date,
COUNT(DISTINCT ecommerce.transaction_id) as transactions,
COUNT(DISTINCT user_pseudo_id) as purchasers,
SUM(ecommerce.purchase_revenue_in_usd) as revenue,
AVG(ecommerce.purchase_revenue_in_usd) as avg_order_value
FROM
`project.dataset.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
AND event_name = 'purchase'
AND ecommerce.transaction_id IS NOT NULL
GROUP BY
event_date
ORDER BY
event_date
```
### 5. Top Selling Products
```sql
SELECT
item.item_name,
item.item_category,
SUM(item.quantity) as units_sold,
SUM(item.item_revenue_in_usd) as total_revenue,
COUNT(DISTINCT ecommerce.transaction_id) as transactions
FROM
`project.dataset.events_*`,
UNNEST(items) as item
WHERE
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
AND event_name = 'purchase'
GROUP BY
item.item_name,
item.item_category
ORDER BY
total_revenue DESC
LIMIT 20
```
### 6. Conversion Funnel Analysis
```sql
WITH funnel AS (
SELECT
user_pseudo_id,
MAX(IF(event_name = 'view_item_list', 1, 0)) as viewed_list,
MAX(IF(event_name = 'view_item', 1, 0)) as viewed_item,
MAX(IF(event_name = 'add_to_cart', 1, 0)) as added_cart,
MAX(IF(event_name = 'begin_checkout', 1, 0)) as began_checkout,
MAX(IF(event_name = 'purchase', 1, 0)) as purchased
FROM
`project.dataset.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
GROUP BY
user_pseudo_id
)
SELECT
SUM(viewed_list) as step1_viewed_list,
SUM(viewed_item) as step2_viewed_item,
SUM(added_cart) as step3_added_cart,
SUM(began_checkout) as step4_began_checkout,
SUM(purchased) as step5_purchased,
-- Conversion rates
ROUND(SUM(viewed_item) / SUM(viewed_list) * 100, 2) as pct_list_to_item,
ROUND(SUM(added_cart) / SUM(viewed_item) * 100, 2) as pct_item_to_cart,
ROUND(SUM(began_checkout) / SUM(added_cart) * 100, 2) as pct_cart_to_checkout,
ROUND(SUM(purchased) / SUM(began_checkout) * 100, 2) as pct_checkout_to_purchase,
ROUND(SUM(purchased) / SUM(viewed_list) * 100, 2) as overall_conversion_rate
FROM
funnel
```
### 7. Cart Abandonment Rate
```sql
SELECT
event_date,
COUNT(DISTINCT IF(event_name = 'add_to_cart', user_pseudo_id, NULL)) as users_added_cart,
COUNT(DISTINCT IF(event_name = 'purchase', user_pseudo_id, NULL)) as users_purchased,
ROUND((1 - COUNT(DISTINCT IF(event_name = 'purchase', user_pseudo_id, NULL)) /
COUNT(DISTINCT IF(event_name = 'add_to_cart', user_pseudo_id, NULL))) * 100, 2) as abandonment_rate_pct
FROM
`project.dataset.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
AND event_name IN ('add_to_cart', 'purchase')
GROUP BY
event_date
ORDER BY
event_date
```
## User Behavior Queries
### 8. User Journey (Event Sequence)
```sql
SELECT
user_pseudo_id,
ARRAY_AGG(
STRUCT(
event_timestamp,
event_name,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') as page
)
ORDER BY event_timestamp
) as journey
FROM
`project.dataset.events_*`
WHERE
_TABLE_SUFFIX = '20250115'
AND user_pseudo_id = 'USER_ID_HERE'
GROUP BY
user_pseudo_id
```
### 9. Session Duration Distribution
```sql
WITH sessions AS (
SELECT
CONCAT(user_pseudo_id, '-',
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')
) as session_id,
MAX(event_timestamp) - MIN(event_timestamp) as session_duration_micros
FROM
`project.dataset.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
GROUP BY
session_id
)
SELECT
CASE
WHEN session_duration_micros < 10000000 THEN '0-10 sec'
WHEN session_duration_micros < 30000000 THEN '10-30 sec'
WHEN session_duration_micros < 60000000 THEN '30-60 sec'
WHEN session_duration_micros < 180000000 THEN '1-3 min'
WHEN session_duration_micros < 600000000 THEN '3-10 min'
ELSE '10+ min'
END as duration_bucket,
COUNT(*) as session_count
FROM
sessions
GROUP BY
duration_bucket
ORDER BY
MIN(session_duration_micros)
```
### 10. New vs Returning Users
```sql
WITH first_visits AS (
SELECT
user_pseudo_id,
MIN(event_timestamp) as first_visit_timestamp
FROM
`project.dataset.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20240101' AND '20250131'
GROUP BY
user_pseudo_id
)
SELECT
event_date,
COUNT(DISTINCT IF(TIMESTAMP_MICROS(event_timestamp) = TIMESTAMP_MICROS(fv.first_visit_timestamp),
e.user_pseudo_id, NULL)) as new_users,
COUNT(DISTINCT IF(TIMESTAMP_MICROS(event_timestamp) > TIMESTAMP_MICROS(fv.first_visit_timestamp),
e.user_pseudo_id, NULL)) as returning_users
FROM
`project.dataset.events_*` e
LEFT JOIN
first_visits fv
ON
e.user_pseudo_id = fv.user_pseudo_id
WHERE
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
GROUP BY
event_date
ORDER BY
event_date
```
## Attribution Queries
### 11. First Touch Attribution
```sql
WITH first_touch AS (
SELECT
user_pseudo_id,
ARRAY_AGG(
STRUCT(
traffic_source.source,
traffic_source.medium,
traffic_source.name as campaign
)
ORDER BY event_timestamp LIMIT 1
)[OFFSET(0)] as first_source
FROM
`project.dataset.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
AND traffic_source.source IS NOT NULL
GROUP BY
user_pseudo_id
),
purchases AS (
SELECT
user_pseudo_id,
COUNT(DISTINCT ecommerce.transaction_id) as purchases,
SUM(ecommerce.purchase_revenue_in_usd) as revenue
FROM
`project.dataset.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
AND event_name = 'purchase'
GROUP BY
user_pseudo_id
)
SELECT
ft.first_source.source,
ft.first_source.medium,
ft.first_source.campaign,
COUNT(DISTINCT ft.user_pseudo_id) as users,
SUM(p.purchases) as total_purchases,
SUM(p.revenue) as total_revenue
FROM
first_touch ft
LEFT JOIN
purchases p
ON
ft.user_pseudo_id = p.user_pseudo_id
GROUP BY
ft.first_source.source,
ft.first_source.medium,
ft.first_source.campaign
ORDER BY
total_revenue DESC
```
### 12. Last Touch Attribution
```sql
WITH last_touch AS (
SELECT
ecommerce.transaction_id,
ARRAY_AGG(
STRUCT(
traffic_source.source,
traffic_source.medium
)
ORDER BY event_timestamp DESC LIMIT 1
)[OFFSET(0)] as last_source,
SUM(ecommerce.purchase_revenue_in_usd) as revenue
FROM
`project.dataset.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
AND event_name = 'purchase'
GROUP BY
ecommerce.transaction_id
)
SELECT
last_source.source,
last_source.medium,
COUNT(DISTINCT transaction_id) as conversions,
SUM(revenue) as total_revenue
FROM
last_touch
GROUP BY
last_source.source,
last_source.medium
ORDER BY
total_revenue DESC
```
## Device and Technology
### 13. Device Category Performance
```sql
SELECT
device.category as device_category,
COUNT(DISTINCT user_pseudo_id) as users,
COUNT(DISTINCT CONCAT(user_pseudo_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')
)) as sessions,
COUNTIF(event_name = 'purchase') as purchases,
SUM(IF(event_name = 'purchase', ecommerce.purchase_revenue_in_usd, 0)) as revenue,
ROUND(COUNTIF(event_name = 'purchase') / COUNT(DISTINCT CONCAT(user_pseudo_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')
)) * 100, 2) as conversion_rate_pct
FROM
`project.dataset.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
GROUP BY
device.category
ORDER BY
users DESC
```
### 14. Browser and OS Analysis
```sql
SELECT
device.browser,
device.operating_system,
COUNT(DISTINCT user_pseudo_id) as users,
AVG((SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engagement_time_msec')) / 1000 as avg_engagement_sec
FROM
`project.dataset.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
AND event_name = 'user_engagement'
GROUP BY
device.browser,
device.operating_system
HAVING
users > 100
ORDER BY
users DESC
```
## Cohort and Retention
### 15. Weekly Cohort Retention
```sql
WITH cohorts AS (
SELECT
user_pseudo_id,
FORMAT_DATE('%Y-W%V', PARSE_DATE('%Y%m%d', MIN(event_date))) as cohort_week
FROM
`project.dataset.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
GROUP BY
user_pseudo_id
),
activity AS (
SELECT
user_pseudo_id,
FORMAT_DATE('%Y-W%V', PARSE_DATE('%Y%m%d', event_date)) as activity_week
FROM
`project.dataset.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
GROUP BY
user_pseudo_id,
activity_week
)
SELECT
c.cohort_week,
a.activity_week,
COUNT(DISTINCT c.user_pseudo_id) as cohort_size,
COUNT(DISTINCT a.user_pseudo_id) as active_users,
ROUND(COUNT(DISTINCT a.user_pseudo_id) / COUNT(DISTINCT c.user_pseudo_id) * 100, 2) as retention_pct
FROM
cohorts c
LEFT JOIN
activity a
ON
c.user_pseudo_id = a.user_pseudo_id
GROUP BY
c.cohort_week,
a.activity_week
ORDER BY
c.cohort_week,
a.activity_week
```
## Custom Dimensions and Parameters
### 16. Query Custom Event Parameters
```sql
SELECT
event_name,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'custom_parameter') as custom_value,
COUNT(*) as event_count
FROM
`project.dataset.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
AND event_name = 'custom_event'
GROUP BY
event_name,
custom_value
ORDER BY
event_count DESC
```
### 17. User Properties Analysis
```sql
SELECT
(SELECT value.string_value FROM UNNEST(user_properties) WHERE key = 'user_tier') as user_tier,
COUNT(DISTINCT user_pseudo_id) as users,
SUM(IF(event_name = 'purchase', ecommerce.purchase_revenue_in_usd, 0)) as total_revenue,
SUM(IF(event_name = 'purchase', ecommerce.purchase_revenue_in_usd, 0)) /
COUNT(DISTINCT user_pseudo_id) as revenue_per_user
FROM
`project.dataset.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
GROUP BY
user_tier
ORDER BY
total_revenue DESC
```
## Performance Tips
1. **Always use _TABLE_SUFFIX filtering** (not event_date)
2. **Filter on clustered columns** (event_name, event_timestamp)
3. **Select only needed columns**
4. **Use LIMIT during development**
5. **Create helper functions** for repeated UNNEST operations
6. **Avoid SELECT *** unless necessary
7. **Use materialized views** for frequently run queries
8. **Monitor query costs** in BigQuery console