Initial commit
This commit is contained in:
412
skills/ga4-bigquery/SKILL.md
Normal file
412
skills/ga4-bigquery/SKILL.md
Normal file
@@ -0,0 +1,412 @@
|
||||
---
|
||||
name: ga4-bigquery
|
||||
description: Complete guide to GA4 BigQuery export including setup, schema documentation, SQL query patterns, and data analysis. Use when exporting GA4 data to BigQuery, writing SQL queries for GA4 data, analyzing event-level data, working with nested/repeated fields (UNNEST), or building custom reports from raw data. Covers BigQuery linking, events_* tables, SQL patterns, and performance optimization.
|
||||
---
|
||||
|
||||
# GA4 BigQuery Export and Analysis
|
||||
|
||||
## Overview
|
||||
|
||||
GA4 BigQuery export provides raw, event-level data access for advanced analysis, custom reporting, machine learning, and long-term data warehousing.
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
Invoke this skill when:
|
||||
|
||||
- Exporting GA4 raw data to BigQuery
|
||||
- Writing SQL queries for GA4 event data
|
||||
- Analyzing unsampled event-level data
|
||||
- Working with nested/repeated fields using UNNEST
|
||||
- Building custom reports beyond GA4 UI limits
|
||||
- Creating attribution models with raw data
|
||||
- Performing user journey analysis across all events
|
||||
- Integrating GA4 data with other data sources
|
||||
- Building machine learning models on GA4 data
|
||||
- Analyzing historical data beyond GA4 retention limits
|
||||
- Optimizing BigQuery query performance
|
||||
- Working with events_* table schema
|
||||
- Extracting event parameters from nested structures
|
||||
|
||||
## Core Capabilities
|
||||
|
||||
### BigQuery Export Setup
|
||||
|
||||
**Requirements:**
|
||||
- GA4 property (standard or 360)
|
||||
- Google Cloud project
|
||||
- BigQuery API enabled
|
||||
- Appropriate permissions
|
||||
|
||||
**Setup Steps:**
|
||||
|
||||
1. **Create Google Cloud Project:**
|
||||
- Go to console.cloud.google.com
|
||||
- Create new project or select existing
|
||||
- Enable BigQuery API
|
||||
|
||||
2. **Link GA4 to BigQuery:**
|
||||
- GA4 Admin → Product Links → BigQuery Links
|
||||
- Click "Link"
|
||||
- Choose Google Cloud project
|
||||
- Select dataset location (US, EU, etc.)
|
||||
- Configure export:
|
||||
- **Daily:** Complete export once per day (~9AM property timezone)
|
||||
- **Streaming:** Real-time export (360 only)
|
||||
- Click "Next"
|
||||
- Confirm setup
|
||||
|
||||
**Export Options:**
|
||||
- **Daily Export:** Free for standard GA4, once per day
|
||||
- **Streaming Export:** GA4 360 only, near real-time
|
||||
- **Include Advertising IDs:** Optional, for Ads integration
|
||||
|
||||
**Data Availability:**
|
||||
- Daily tables: ~24 hours after day ends
|
||||
- Intraday tables: ~3 updates per day
|
||||
- Streaming: Minutes after event collection (360)
|
||||
|
||||
### BigQuery Table Structure
|
||||
|
||||
**Table Naming:**
|
||||
- `project.dataset.events_YYYYMMDD` - Daily export
|
||||
- `project.dataset.events_intraday_YYYYMMDD` - Intraday (partial day)
|
||||
- `project.dataset.events_*` - Wildcard for all dates
|
||||
|
||||
**Key Schema Fields:**
|
||||
|
||||
**Event Fields:**
|
||||
- `event_date`: YYYYMMDD format (STRING)
|
||||
- `event_timestamp`: Microseconds since epoch (INTEGER)
|
||||
- `event_name`: Event name (STRING)
|
||||
- `event_params`: Event parameters (RECORD, REPEATED)
|
||||
- `event_value_in_usd`: Event value in USD (FLOAT)
|
||||
|
||||
**User Fields:**
|
||||
- `user_id`: User ID if set (STRING)
|
||||
- `user_pseudo_id`: Anonymous user ID (STRING)
|
||||
- `user_properties`: User properties (RECORD, REPEATED)
|
||||
- `user_first_touch_timestamp`: First visit timestamp (INTEGER)
|
||||
|
||||
**Device Fields:**
|
||||
- `device.category`: desktop, mobile, tablet
|
||||
- `device.operating_system`: Windows, iOS, Android
|
||||
- `device.browser`: Chrome, Safari, etc.
|
||||
|
||||
**Geo Fields:**
|
||||
- `geo.country`: Country name
|
||||
- `geo.region`: State/region
|
||||
- `geo.city`: City name
|
||||
|
||||
**Traffic Source Fields:**
|
||||
- `traffic_source.source`: Source (google, direct)
|
||||
- `traffic_source.medium`: Medium (organic, cpc)
|
||||
- `traffic_source.name`: Campaign name
|
||||
|
||||
**E-commerce Fields:**
|
||||
- `ecommerce.transaction_id`: Transaction ID (STRING)
|
||||
- `ecommerce.purchase_revenue_in_usd`: Purchase revenue (FLOAT)
|
||||
- `items`: Items array (RECORD, REPEATED)
|
||||
|
||||
### Basic SQL Query Patterns
|
||||
|
||||
#### Query 1: Event Count by Name
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
event_name,
|
||||
COUNT(*) as event_count
|
||||
FROM
|
||||
`project.dataset.events_*`
|
||||
WHERE
|
||||
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
|
||||
GROUP BY
|
||||
event_name
|
||||
ORDER BY
|
||||
event_count DESC
|
||||
```
|
||||
|
||||
#### Query 2: Extract Event Parameters
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
event_date,
|
||||
event_name,
|
||||
user_pseudo_id,
|
||||
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') as page_location,
|
||||
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_title') as page_title
|
||||
FROM
|
||||
`project.dataset.events_*`
|
||||
WHERE
|
||||
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
|
||||
AND event_name = 'page_view'
|
||||
LIMIT 1000
|
||||
```
|
||||
|
||||
#### Query 3: Purchase Analysis
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
event_date,
|
||||
COUNT(DISTINCT user_pseudo_id) as purchasers,
|
||||
COUNT(DISTINCT ecommerce.transaction_id) as transactions,
|
||||
SUM(ecommerce.purchase_revenue_in_usd) as total_revenue,
|
||||
AVG(ecommerce.purchase_revenue_in_usd) as avg_order_value
|
||||
FROM
|
||||
`project.dataset.events_*`
|
||||
WHERE
|
||||
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
|
||||
AND event_name = 'purchase'
|
||||
AND ecommerce.transaction_id IS NOT NULL
|
||||
GROUP BY
|
||||
event_date
|
||||
ORDER BY
|
||||
event_date
|
||||
```
|
||||
|
||||
#### Query 4: UNNEST Items Array
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
event_date,
|
||||
item.item_name,
|
||||
item.item_category,
|
||||
SUM(item.quantity) as total_quantity,
|
||||
SUM(item.item_revenue_in_usd) as total_revenue
|
||||
FROM
|
||||
`project.dataset.events_*`,
|
||||
UNNEST(items) as item
|
||||
WHERE
|
||||
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
|
||||
AND event_name = 'purchase'
|
||||
GROUP BY
|
||||
event_date,
|
||||
item.item_name,
|
||||
item.item_category
|
||||
ORDER BY
|
||||
total_revenue DESC
|
||||
```
|
||||
|
||||
### Advanced Query Patterns
|
||||
|
||||
#### User Journey Analysis
|
||||
|
||||
```sql
|
||||
WITH user_events AS (
|
||||
SELECT
|
||||
user_pseudo_id,
|
||||
event_timestamp,
|
||||
event_name,
|
||||
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') as page_location
|
||||
FROM
|
||||
`project.dataset.events_*`
|
||||
WHERE
|
||||
_TABLE_SUFFIX = '20250115'
|
||||
)
|
||||
SELECT
|
||||
user_pseudo_id,
|
||||
ARRAY_AGG(
|
||||
STRUCT(event_name, page_location, event_timestamp)
|
||||
ORDER BY event_timestamp
|
||||
) as event_sequence
|
||||
FROM
|
||||
user_events
|
||||
GROUP BY
|
||||
user_pseudo_id
|
||||
LIMIT 100
|
||||
```
|
||||
|
||||
#### Session Attribution
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
event_date,
|
||||
traffic_source.source,
|
||||
traffic_source.medium,
|
||||
traffic_source.name as campaign,
|
||||
COUNT(DISTINCT user_pseudo_id) as users,
|
||||
COUNT(DISTINCT CONCAT(user_pseudo_id,
|
||||
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id'))) as sessions
|
||||
FROM
|
||||
`project.dataset.events_*`
|
||||
WHERE
|
||||
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
|
||||
GROUP BY
|
||||
event_date,
|
||||
traffic_source.source,
|
||||
traffic_source.medium,
|
||||
traffic_source.name
|
||||
ORDER BY
|
||||
sessions DESC
|
||||
```
|
||||
|
||||
#### Helper Functions
|
||||
|
||||
```sql
|
||||
-- Create reusable functions for parameter extraction
|
||||
CREATE TEMP FUNCTION GetParamString(params ANY TYPE, target_key STRING)
|
||||
RETURNS STRING
|
||||
AS (
|
||||
(SELECT value.string_value FROM UNNEST(params) WHERE key = target_key)
|
||||
);
|
||||
|
||||
CREATE TEMP FUNCTION GetParamInt(params ANY TYPE, target_key STRING)
|
||||
RETURNS INT64
|
||||
AS (
|
||||
(SELECT value.int_value FROM UNNEST(params) WHERE key = target_key)
|
||||
);
|
||||
|
||||
-- Use in query
|
||||
SELECT
|
||||
event_date,
|
||||
GetParamString(event_params, 'page_location') as page_location,
|
||||
GetParamInt(event_params, 'engagement_time_msec') as engagement_time
|
||||
FROM
|
||||
`project.dataset.events_*`
|
||||
WHERE
|
||||
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
|
||||
```
|
||||
|
||||
### Query Optimization
|
||||
|
||||
**Best Practices:**
|
||||
|
||||
1. **Use _TABLE_SUFFIX Filtering:**
|
||||
```sql
|
||||
WHERE _TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
|
||||
```
|
||||
NOT:
|
||||
```sql
|
||||
WHERE event_date BETWEEN '20250101' AND '20250131'
|
||||
```
|
||||
|
||||
2. **Filter on Clustered Columns:**
|
||||
GA4 tables clustered by `event_name` and `event_timestamp`:
|
||||
```sql
|
||||
WHERE event_name IN ('page_view', 'purchase')
|
||||
```
|
||||
|
||||
3. **Select Specific Columns:**
|
||||
```sql
|
||||
SELECT event_name, user_pseudo_id, event_timestamp
|
||||
```
|
||||
NOT:
|
||||
```sql
|
||||
SELECT *
|
||||
```
|
||||
|
||||
4. **Limit UNNEST Operations:**
|
||||
```sql
|
||||
-- Good: inline UNNEST
|
||||
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location')
|
||||
|
||||
-- Avoid: full UNNEST in FROM
|
||||
FROM table, UNNEST(event_params) as param
|
||||
WHERE param.key = 'page_location'
|
||||
```
|
||||
|
||||
5. **Use LIMIT During Development:**
|
||||
```sql
|
||||
LIMIT 1000 -- Test query first
|
||||
```
|
||||
|
||||
### Cost Management
|
||||
|
||||
**BigQuery Pricing:**
|
||||
- **Storage:** ~$0.02/GB/month
|
||||
- **Queries:** ~$5/TB scanned
|
||||
- **Streaming inserts:** ~$0.05/GB (360 only)
|
||||
|
||||
**Reducing Costs:**
|
||||
- Partition by date using _TABLE_SUFFIX
|
||||
- Select only needed columns
|
||||
- Use LIMIT for testing
|
||||
- Create materialized views for frequent queries
|
||||
- Set up cost alerts in Google Cloud
|
||||
|
||||
**Free Tier:**
|
||||
- 10 GB storage free/month
|
||||
- 1 TB queries free/month
|
||||
|
||||
### Data Retention
|
||||
|
||||
**GA4 Export Retention:**
|
||||
- Standard GA4: 2 months or 14 months (Admin setting)
|
||||
- BigQuery: Unlimited (until manually deleted)
|
||||
- Set table expiration if needed (optional)
|
||||
|
||||
**Setting Expiration:**
|
||||
```sql
|
||||
ALTER TABLE `project.dataset.events_20250101`
|
||||
SET OPTIONS (
|
||||
expiration_timestamp=TIMESTAMP "2026-01-01 00:00:00 UTC"
|
||||
)
|
||||
```
|
||||
|
||||
### Common Use Cases
|
||||
|
||||
**1. Unsampled Reporting:**
|
||||
- GA4 UI may sample large datasets
|
||||
- BigQuery = full, unsampled data
|
||||
- Use for accurate reporting
|
||||
|
||||
**2. Custom Attribution:**
|
||||
- Access full user journey
|
||||
- Build custom attribution models
|
||||
- Credit touchpoints as needed
|
||||
|
||||
**3. Data Integration:**
|
||||
- Join GA4 with CRM data
|
||||
- Combine with product catalog
|
||||
- Enrich with external sources
|
||||
|
||||
**4. Machine Learning:**
|
||||
- Export to ML tools
|
||||
- Predict churn, LTV, conversions
|
||||
- Train custom models
|
||||
|
||||
**5. Long-term Analysis:**
|
||||
- Historical analysis beyond GA4 limits
|
||||
- Year-over-year comparisons
|
||||
- Trend analysis
|
||||
|
||||
## Integration with Other Skills
|
||||
|
||||
- **ga4-setup** - Initial property setup before BigQuery export
|
||||
- **ga4-recommended-events** - Event structure in BigQuery tables
|
||||
- **ga4-custom-events** - Custom event parameters in BigQuery
|
||||
- **ga4-custom-dimensions** - Custom dimensions in event_params
|
||||
- **ga4-reporting** - Comparing BigQuery vs GA4 UI reports
|
||||
- **ga4-measurement-protocol** - Server-side events in BigQuery
|
||||
|
||||
## References
|
||||
|
||||
- **references/bigquery-setup-complete.md** - Step-by-step BigQuery linking
|
||||
- **references/schema-reference.md** - Complete table schema documentation
|
||||
- **references/sql-patterns.md** - Common SQL query patterns and examples
|
||||
- **references/optimization-guide.md** - Performance and cost optimization
|
||||
|
||||
## Quick Reference
|
||||
|
||||
**Table Names:**
|
||||
- Daily: `events_YYYYMMDD`
|
||||
- Intraday: `events_intraday_YYYYMMDD`
|
||||
- Wildcard: `events_*`
|
||||
|
||||
**Filter by Date:**
|
||||
```sql
|
||||
WHERE _TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
|
||||
```
|
||||
|
||||
**Extract Parameter:**
|
||||
```sql
|
||||
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'param_name')
|
||||
```
|
||||
|
||||
**UNNEST Items:**
|
||||
```sql
|
||||
FROM table, UNNEST(items) as item
|
||||
```
|
||||
|
||||
**Costs:**
|
||||
- Storage: $0.02/GB/month
|
||||
- Queries: $5/TB scanned
|
||||
525
skills/ga4-bigquery/references/sql-query-cookbook.md
Normal file
525
skills/ga4-bigquery/references/sql-query-cookbook.md
Normal file
@@ -0,0 +1,525 @@
|
||||
# GA4 BigQuery SQL Query Cookbook
|
||||
|
||||
## Helper Functions (Use at Start of Queries)
|
||||
|
||||
```sql
|
||||
-- String parameter extraction
|
||||
CREATE TEMP FUNCTION GetParam(params ANY TYPE, key STRING)
|
||||
RETURNS STRING AS (
|
||||
(SELECT value.string_value FROM UNNEST(params) WHERE key = key)
|
||||
);
|
||||
|
||||
-- Integer parameter extraction
|
||||
CREATE TEMP FUNCTION GetParamInt(params ANY TYPE, key STRING)
|
||||
RETURNS INT64 AS (
|
||||
(SELECT value.int_value FROM UNNEST(params) WHERE key = key)
|
||||
);
|
||||
|
||||
-- Float parameter extraction
|
||||
CREATE TEMP FUNCTION GetParamFloat(params ANY TYPE, key STRING)
|
||||
RETURNS FLOAT64 AS (
|
||||
(SELECT value.float_value FROM UNNEST(params) WHERE key = key)
|
||||
);
|
||||
|
||||
-- Get any parameter type (returns as string)
|
||||
CREATE TEMP FUNCTION GetParamAny(params ANY TYPE, key STRING)
|
||||
RETURNS STRING AS (
|
||||
(SELECT COALESCE(
|
||||
value.string_value,
|
||||
CAST(value.int_value AS STRING),
|
||||
CAST(value.float_value AS STRING),
|
||||
CAST(value.double_value AS STRING)
|
||||
) FROM UNNEST(params) WHERE key = key)
|
||||
);
|
||||
```
|
||||
|
||||
## Basic Queries
|
||||
|
||||
### 1. Daily Active Users
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
event_date,
|
||||
COUNT(DISTINCT user_pseudo_id) as active_users
|
||||
FROM
|
||||
`project.dataset.events_*`
|
||||
WHERE
|
||||
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
|
||||
GROUP BY
|
||||
event_date
|
||||
ORDER BY
|
||||
event_date
|
||||
```
|
||||
|
||||
### 2. Top Pages by Views
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') as page_location,
|
||||
COUNT(*) as page_views,
|
||||
COUNT(DISTINCT user_pseudo_id) as unique_users
|
||||
FROM
|
||||
`project.dataset.events_*`
|
||||
WHERE
|
||||
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
|
||||
AND event_name = 'page_view'
|
||||
GROUP BY
|
||||
page_location
|
||||
ORDER BY
|
||||
page_views DESC
|
||||
LIMIT 20
|
||||
```
|
||||
|
||||
### 3. Session Count by Source/Medium
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
traffic_source.source,
|
||||
traffic_source.medium,
|
||||
COUNT(DISTINCT CONCAT(user_pseudo_id,
|
||||
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')
|
||||
)) as sessions,
|
||||
COUNT(DISTINCT user_pseudo_id) as users
|
||||
FROM
|
||||
`project.dataset.events_*`
|
||||
WHERE
|
||||
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
|
||||
GROUP BY
|
||||
traffic_source.source,
|
||||
traffic_source.medium
|
||||
ORDER BY
|
||||
sessions DESC
|
||||
```
|
||||
|
||||
## E-commerce Queries
|
||||
|
||||
### 4. Revenue by Date
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
event_date,
|
||||
COUNT(DISTINCT ecommerce.transaction_id) as transactions,
|
||||
COUNT(DISTINCT user_pseudo_id) as purchasers,
|
||||
SUM(ecommerce.purchase_revenue_in_usd) as revenue,
|
||||
AVG(ecommerce.purchase_revenue_in_usd) as avg_order_value
|
||||
FROM
|
||||
`project.dataset.events_*`
|
||||
WHERE
|
||||
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
|
||||
AND event_name = 'purchase'
|
||||
AND ecommerce.transaction_id IS NOT NULL
|
||||
GROUP BY
|
||||
event_date
|
||||
ORDER BY
|
||||
event_date
|
||||
```
|
||||
|
||||
### 5. Top Selling Products
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
item.item_name,
|
||||
item.item_category,
|
||||
SUM(item.quantity) as units_sold,
|
||||
SUM(item.item_revenue_in_usd) as total_revenue,
|
||||
COUNT(DISTINCT ecommerce.transaction_id) as transactions
|
||||
FROM
|
||||
`project.dataset.events_*`,
|
||||
UNNEST(items) as item
|
||||
WHERE
|
||||
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
|
||||
AND event_name = 'purchase'
|
||||
GROUP BY
|
||||
item.item_name,
|
||||
item.item_category
|
||||
ORDER BY
|
||||
total_revenue DESC
|
||||
LIMIT 20
|
||||
```
|
||||
|
||||
### 6. Conversion Funnel Analysis
|
||||
|
||||
```sql
|
||||
WITH funnel AS (
|
||||
SELECT
|
||||
user_pseudo_id,
|
||||
MAX(IF(event_name = 'view_item_list', 1, 0)) as viewed_list,
|
||||
MAX(IF(event_name = 'view_item', 1, 0)) as viewed_item,
|
||||
MAX(IF(event_name = 'add_to_cart', 1, 0)) as added_cart,
|
||||
MAX(IF(event_name = 'begin_checkout', 1, 0)) as began_checkout,
|
||||
MAX(IF(event_name = 'purchase', 1, 0)) as purchased
|
||||
FROM
|
||||
`project.dataset.events_*`
|
||||
WHERE
|
||||
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
|
||||
GROUP BY
|
||||
user_pseudo_id
|
||||
)
|
||||
SELECT
|
||||
SUM(viewed_list) as step1_viewed_list,
|
||||
SUM(viewed_item) as step2_viewed_item,
|
||||
SUM(added_cart) as step3_added_cart,
|
||||
SUM(began_checkout) as step4_began_checkout,
|
||||
SUM(purchased) as step5_purchased,
|
||||
-- Conversion rates
|
||||
ROUND(SUM(viewed_item) / SUM(viewed_list) * 100, 2) as pct_list_to_item,
|
||||
ROUND(SUM(added_cart) / SUM(viewed_item) * 100, 2) as pct_item_to_cart,
|
||||
ROUND(SUM(began_checkout) / SUM(added_cart) * 100, 2) as pct_cart_to_checkout,
|
||||
ROUND(SUM(purchased) / SUM(began_checkout) * 100, 2) as pct_checkout_to_purchase,
|
||||
ROUND(SUM(purchased) / SUM(viewed_list) * 100, 2) as overall_conversion_rate
|
||||
FROM
|
||||
funnel
|
||||
```
|
||||
|
||||
### 7. Cart Abandonment Rate
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
event_date,
|
||||
COUNT(DISTINCT IF(event_name = 'add_to_cart', user_pseudo_id, NULL)) as users_added_cart,
|
||||
COUNT(DISTINCT IF(event_name = 'purchase', user_pseudo_id, NULL)) as users_purchased,
|
||||
ROUND((1 - COUNT(DISTINCT IF(event_name = 'purchase', user_pseudo_id, NULL)) /
|
||||
COUNT(DISTINCT IF(event_name = 'add_to_cart', user_pseudo_id, NULL))) * 100, 2) as abandonment_rate_pct
|
||||
FROM
|
||||
`project.dataset.events_*`
|
||||
WHERE
|
||||
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
|
||||
AND event_name IN ('add_to_cart', 'purchase')
|
||||
GROUP BY
|
||||
event_date
|
||||
ORDER BY
|
||||
event_date
|
||||
```
|
||||
|
||||
## User Behavior Queries
|
||||
|
||||
### 8. User Journey (Event Sequence)
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
user_pseudo_id,
|
||||
ARRAY_AGG(
|
||||
STRUCT(
|
||||
event_timestamp,
|
||||
event_name,
|
||||
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') as page
|
||||
)
|
||||
ORDER BY event_timestamp
|
||||
) as journey
|
||||
FROM
|
||||
`project.dataset.events_*`
|
||||
WHERE
|
||||
_TABLE_SUFFIX = '20250115'
|
||||
AND user_pseudo_id = 'USER_ID_HERE'
|
||||
GROUP BY
|
||||
user_pseudo_id
|
||||
```
|
||||
|
||||
### 9. Session Duration Distribution
|
||||
|
||||
```sql
|
||||
WITH sessions AS (
|
||||
SELECT
|
||||
CONCAT(user_pseudo_id, '-',
|
||||
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')
|
||||
) as session_id,
|
||||
MAX(event_timestamp) - MIN(event_timestamp) as session_duration_micros
|
||||
FROM
|
||||
`project.dataset.events_*`
|
||||
WHERE
|
||||
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
|
||||
GROUP BY
|
||||
session_id
|
||||
)
|
||||
SELECT
|
||||
CASE
|
||||
WHEN session_duration_micros < 10000000 THEN '0-10 sec'
|
||||
WHEN session_duration_micros < 30000000 THEN '10-30 sec'
|
||||
WHEN session_duration_micros < 60000000 THEN '30-60 sec'
|
||||
WHEN session_duration_micros < 180000000 THEN '1-3 min'
|
||||
WHEN session_duration_micros < 600000000 THEN '3-10 min'
|
||||
ELSE '10+ min'
|
||||
END as duration_bucket,
|
||||
COUNT(*) as session_count
|
||||
FROM
|
||||
sessions
|
||||
GROUP BY
|
||||
duration_bucket
|
||||
ORDER BY
|
||||
MIN(session_duration_micros)
|
||||
```
|
||||
|
||||
### 10. New vs Returning Users
|
||||
|
||||
```sql
|
||||
WITH first_visits AS (
|
||||
SELECT
|
||||
user_pseudo_id,
|
||||
MIN(event_timestamp) as first_visit_timestamp
|
||||
FROM
|
||||
`project.dataset.events_*`
|
||||
WHERE
|
||||
_TABLE_SUFFIX BETWEEN '20240101' AND '20250131'
|
||||
GROUP BY
|
||||
user_pseudo_id
|
||||
)
|
||||
SELECT
|
||||
event_date,
|
||||
COUNT(DISTINCT IF(TIMESTAMP_MICROS(event_timestamp) = TIMESTAMP_MICROS(fv.first_visit_timestamp),
|
||||
e.user_pseudo_id, NULL)) as new_users,
|
||||
COUNT(DISTINCT IF(TIMESTAMP_MICROS(event_timestamp) > TIMESTAMP_MICROS(fv.first_visit_timestamp),
|
||||
e.user_pseudo_id, NULL)) as returning_users
|
||||
FROM
|
||||
`project.dataset.events_*` e
|
||||
LEFT JOIN
|
||||
first_visits fv
|
||||
ON
|
||||
e.user_pseudo_id = fv.user_pseudo_id
|
||||
WHERE
|
||||
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
|
||||
GROUP BY
|
||||
event_date
|
||||
ORDER BY
|
||||
event_date
|
||||
```
|
||||
|
||||
## Attribution Queries
|
||||
|
||||
### 11. First Touch Attribution
|
||||
|
||||
```sql
|
||||
WITH first_touch AS (
|
||||
SELECT
|
||||
user_pseudo_id,
|
||||
ARRAY_AGG(
|
||||
STRUCT(
|
||||
traffic_source.source,
|
||||
traffic_source.medium,
|
||||
traffic_source.name as campaign
|
||||
)
|
||||
ORDER BY event_timestamp LIMIT 1
|
||||
)[OFFSET(0)] as first_source
|
||||
FROM
|
||||
`project.dataset.events_*`
|
||||
WHERE
|
||||
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
|
||||
AND traffic_source.source IS NOT NULL
|
||||
GROUP BY
|
||||
user_pseudo_id
|
||||
),
|
||||
purchases AS (
|
||||
SELECT
|
||||
user_pseudo_id,
|
||||
COUNT(DISTINCT ecommerce.transaction_id) as purchases,
|
||||
SUM(ecommerce.purchase_revenue_in_usd) as revenue
|
||||
FROM
|
||||
`project.dataset.events_*`
|
||||
WHERE
|
||||
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
|
||||
AND event_name = 'purchase'
|
||||
GROUP BY
|
||||
user_pseudo_id
|
||||
)
|
||||
SELECT
|
||||
ft.first_source.source,
|
||||
ft.first_source.medium,
|
||||
ft.first_source.campaign,
|
||||
COUNT(DISTINCT ft.user_pseudo_id) as users,
|
||||
SUM(p.purchases) as total_purchases,
|
||||
SUM(p.revenue) as total_revenue
|
||||
FROM
|
||||
first_touch ft
|
||||
LEFT JOIN
|
||||
purchases p
|
||||
ON
|
||||
ft.user_pseudo_id = p.user_pseudo_id
|
||||
GROUP BY
|
||||
ft.first_source.source,
|
||||
ft.first_source.medium,
|
||||
ft.first_source.campaign
|
||||
ORDER BY
|
||||
total_revenue DESC
|
||||
```
|
||||
|
||||
### 12. Last Touch Attribution
|
||||
|
||||
```sql
|
||||
WITH last_touch AS (
|
||||
SELECT
|
||||
ecommerce.transaction_id,
|
||||
ARRAY_AGG(
|
||||
STRUCT(
|
||||
traffic_source.source,
|
||||
traffic_source.medium
|
||||
)
|
||||
ORDER BY event_timestamp DESC LIMIT 1
|
||||
)[OFFSET(0)] as last_source,
|
||||
SUM(ecommerce.purchase_revenue_in_usd) as revenue
|
||||
FROM
|
||||
`project.dataset.events_*`
|
||||
WHERE
|
||||
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
|
||||
AND event_name = 'purchase'
|
||||
GROUP BY
|
||||
ecommerce.transaction_id
|
||||
)
|
||||
SELECT
|
||||
last_source.source,
|
||||
last_source.medium,
|
||||
COUNT(DISTINCT transaction_id) as conversions,
|
||||
SUM(revenue) as total_revenue
|
||||
FROM
|
||||
last_touch
|
||||
GROUP BY
|
||||
last_source.source,
|
||||
last_source.medium
|
||||
ORDER BY
|
||||
total_revenue DESC
|
||||
```
|
||||
|
||||
## Device and Technology
|
||||
|
||||
### 13. Device Category Performance
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
device.category as device_category,
|
||||
COUNT(DISTINCT user_pseudo_id) as users,
|
||||
COUNT(DISTINCT CONCAT(user_pseudo_id,
|
||||
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')
|
||||
)) as sessions,
|
||||
COUNTIF(event_name = 'purchase') as purchases,
|
||||
SUM(IF(event_name = 'purchase', ecommerce.purchase_revenue_in_usd, 0)) as revenue,
|
||||
ROUND(COUNTIF(event_name = 'purchase') / COUNT(DISTINCT CONCAT(user_pseudo_id,
|
||||
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')
|
||||
)) * 100, 2) as conversion_rate_pct
|
||||
FROM
|
||||
`project.dataset.events_*`
|
||||
WHERE
|
||||
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
|
||||
GROUP BY
|
||||
device.category
|
||||
ORDER BY
|
||||
users DESC
|
||||
```
|
||||
|
||||
### 14. Browser and OS Analysis
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
device.browser,
|
||||
device.operating_system,
|
||||
COUNT(DISTINCT user_pseudo_id) as users,
|
||||
AVG((SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engagement_time_msec')) / 1000 as avg_engagement_sec
|
||||
FROM
|
||||
`project.dataset.events_*`
|
||||
WHERE
|
||||
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
|
||||
AND event_name = 'user_engagement'
|
||||
GROUP BY
|
||||
device.browser,
|
||||
device.operating_system
|
||||
HAVING
|
||||
users > 100
|
||||
ORDER BY
|
||||
users DESC
|
||||
```
|
||||
|
||||
## Cohort and Retention
|
||||
|
||||
### 15. Weekly Cohort Retention
|
||||
|
||||
```sql
|
||||
WITH cohorts AS (
|
||||
SELECT
|
||||
user_pseudo_id,
|
||||
FORMAT_DATE('%Y-W%V', PARSE_DATE('%Y%m%d', MIN(event_date))) as cohort_week
|
||||
FROM
|
||||
`project.dataset.events_*`
|
||||
WHERE
|
||||
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
|
||||
GROUP BY
|
||||
user_pseudo_id
|
||||
),
|
||||
activity AS (
|
||||
SELECT
|
||||
user_pseudo_id,
|
||||
FORMAT_DATE('%Y-W%V', PARSE_DATE('%Y%m%d', event_date)) as activity_week
|
||||
FROM
|
||||
`project.dataset.events_*`
|
||||
WHERE
|
||||
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
|
||||
GROUP BY
|
||||
user_pseudo_id,
|
||||
activity_week
|
||||
)
|
||||
SELECT
|
||||
c.cohort_week,
|
||||
a.activity_week,
|
||||
COUNT(DISTINCT c.user_pseudo_id) as cohort_size,
|
||||
COUNT(DISTINCT a.user_pseudo_id) as active_users,
|
||||
ROUND(COUNT(DISTINCT a.user_pseudo_id) / COUNT(DISTINCT c.user_pseudo_id) * 100, 2) as retention_pct
|
||||
FROM
|
||||
cohorts c
|
||||
LEFT JOIN
|
||||
activity a
|
||||
ON
|
||||
c.user_pseudo_id = a.user_pseudo_id
|
||||
GROUP BY
|
||||
c.cohort_week,
|
||||
a.activity_week
|
||||
ORDER BY
|
||||
c.cohort_week,
|
||||
a.activity_week
|
||||
```
|
||||
|
||||
## Custom Dimensions and Parameters
|
||||
|
||||
### 16. Query Custom Event Parameters
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
event_name,
|
||||
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'custom_parameter') as custom_value,
|
||||
COUNT(*) as event_count
|
||||
FROM
|
||||
`project.dataset.events_*`
|
||||
WHERE
|
||||
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
|
||||
AND event_name = 'custom_event'
|
||||
GROUP BY
|
||||
event_name,
|
||||
custom_value
|
||||
ORDER BY
|
||||
event_count DESC
|
||||
```
|
||||
|
||||
### 17. User Properties Analysis
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
(SELECT value.string_value FROM UNNEST(user_properties) WHERE key = 'user_tier') as user_tier,
|
||||
COUNT(DISTINCT user_pseudo_id) as users,
|
||||
SUM(IF(event_name = 'purchase', ecommerce.purchase_revenue_in_usd, 0)) as total_revenue,
|
||||
SUM(IF(event_name = 'purchase', ecommerce.purchase_revenue_in_usd, 0)) /
|
||||
COUNT(DISTINCT user_pseudo_id) as revenue_per_user
|
||||
FROM
|
||||
`project.dataset.events_*`
|
||||
WHERE
|
||||
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
|
||||
GROUP BY
|
||||
user_tier
|
||||
ORDER BY
|
||||
total_revenue DESC
|
||||
```
|
||||
|
||||
## Performance Tips
|
||||
|
||||
1. **Always use _TABLE_SUFFIX filtering** (not event_date)
|
||||
2. **Filter on clustered columns** (event_name, event_timestamp)
|
||||
3. **Select only needed columns**
|
||||
4. **Use LIMIT during development**
|
||||
5. **Create helper functions** for repeated UNNEST operations
|
||||
6. **Avoid SELECT *** unless necessary
|
||||
7. **Use materialized views** for frequently run queries
|
||||
8. **Monitor query costs** in BigQuery console
|
||||
Reference in New Issue
Block a user