Unlock GA4 User Acquisition Insights: A BigQuery Guide for Marketers
Understanding where your new users are coming from is crucial for optimizing your marketing efforts and growing your business. Google Analytics 4 (GA4) provides valuable user acquisition data, but sometimes you need a more granular and customizable view. That’s where BigQuery comes in. By exporting your GA4 data to BigQuery, you can leverage the power of SQL to create custom reports tailored to your specific needs. This post will guide you through creating a GA4 user acquisition report in BigQuery, providing you with actionable insights into your marketing performance.
Why Use BigQuery for GA4 User Acquisition Reporting?
While GA4’s interface offers pre-built reports, BigQuery unlocks a new level of flexibility and depth. Here’s why you should consider using BigQuery for your user acquisition analysis:
- Customization: Create reports that precisely match your business requirements, filtering and segmenting data in ways that aren’t possible in the standard GA4 interface.
- Scalability: Handle large datasets with ease, even as your website traffic grows. BigQuery is designed to process massive amounts of data quickly and efficiently.
- Integration: Combine GA4 data with other data sources, such as CRM data or advertising platform data, for a holistic view of your customer journey.
- Data Retention: Overcome GA4’s data retention limits by storing your data indefinitely in BigQuery.
- Advanced Analysis: Perform complex calculations and statistical analysis to uncover hidden patterns and trends in your user acquisition data.
Prerequisites
Before you begin, ensure you have the following:
- A Google Analytics 4 Property: You need a GA4 property set up and collecting data.
- BigQuery Export Enabled: You must have enabled the BigQuery Export feature in your GA4 property. This will automatically export your GA4 data to a BigQuery dataset.
- Google Cloud Project: You’ll need a Google Cloud project with BigQuery enabled.
- Basic SQL Knowledge: Familiarity with SQL is helpful for writing and understanding the queries in this guide.
Understanding the GA4 BigQuery Schema
GA4 data in BigQuery is stored in a nested and repeated format. Each day’s data is stored in a separate table named `events_YYYYMMDD`. Understanding the key fields is crucial for building your user acquisition report:
- `event_name`: The name of the event, such as `first_visit` (for new users) or `session_start`.
- `event_params`: An array of key-value pairs containing additional information about the event. You’ll find parameters like `medium`, `source`, and `campaign` here.
- `user_pseudo_id`: A pseudonymous identifier for each user.
- `event_timestamp`: The timestamp of the event in microseconds.
- `user_id`: A user identifier (if you have user-id enabled).
The SQL Query for User Acquisition Report
Here’s the SQL query to generate a user acquisition report in BigQuery. This query calculates key metrics like new users, engaged sessions, engagement time, and engagement rate, segmented by medium, source, and campaign.
WITH
UserAcquisition AS (
SELECT
FORMAT_DATE('%Y%m%d', PARSE_DATE('%Y%m%d', event_date)) AS event_date,
user_pseudo_id,
(FIRST_VALUE(event_timestamp) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp ASC)) AS first_touch_timestamp
FROM
`your_project.your_dataset.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20230101' AND '20230131' -- Date range
AND event_name = 'session_start' -- Consider only session_start events
),
FirstTouchAttribution AS (
SELECT
ua.event_date,
ua.user_pseudo_id,
COALESCE(ep_source.value.string_value, '(direct)') AS source,
COALESCE(ep_medium.value.string_value, '(none)') AS medium,
COALESCE(ep_campaign.value.string_value, '(none)') AS campaign
FROM
UserAcquisition ua
LEFT JOIN
`your_project.your_dataset.events_*` AS ga ON ua.user_pseudo_id = ga.user_pseudo_id
AND ua.first_touch_timestamp = ga.event_timestamp
LEFT JOIN
UNNEST(ga.event_params) AS ep_source ON ep_source.key = 'source'
LEFT JOIN
UNNEST(ga.event_params) AS ep_medium ON ep_medium.key = 'medium'
LEFT JOIN
UNNEST(ga.event_params) AS ep_campaign ON ep_campaign.key = 'campaign'
WHERE
ga._TABLE_SUFFIX BETWEEN '20230101' AND '20230131'
AND ga.event_name = 'session_start'
),
EngagementData AS (
SELECT
FORMAT_DATE('%Y%m%d', PARSE_DATE('%Y%m%d', event_date)) AS event_date,
user_pseudo_id,
SUM(CASE WHEN event_name = 'user_engagement' THEN 1 ELSE 0 END) AS engaged_sessions,
SUM(CASE WHEN event_name = 'user_engagement' THEN (SELECT value.int_value FROM UNNEST(event_params) WHERE KEY = 'engagement_time_msec') ELSE 0 END) AS total_engagement_time_msec
FROM
`your_project.your_dataset.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20230101' AND '20230131'
GROUP BY
1,2
)
SELECT
fta.event_date,
fta.medium,
fta.source,
fta.campaign,
COUNT(DISTINCT fta.user_pseudo_id) AS new_users,
SUM(ed.engaged_sessions) AS engaged_sessions,
SUM(ed.total_engagement_time_msec) / 60000 AS total_engagement_time_minutes,
SAFE_DIVIDE(SUM(ed.engaged_sessions), COUNT(DISTINCT fta.user_pseudo_id)) AS engagement_rate
FROM
FirstTouchAttribution fta
LEFT JOIN
EngagementData ed ON fta.user_pseudo_id = ed.user_pseudo_id AND fta.event_date = ed.event_date
GROUP BY
1,2,3,4
ORDER BY
1,2,3,4;
Important: Replace `your_project.your_dataset` with your actual Google Cloud project ID and BigQuery dataset name. Also, adjust the date range in the `WHERE` clause to match the period you want to analyze.
Explanation of the Query
Let’s break down the query step by step:
- UserAcquisition CTE: This Common Table Expression (CTE) identifies new users based on their first `session_start` event. It extracts the date and the timestamp of their first visit.
- FirstTouchAttribution CTE: This CTE determines the source, medium, and campaign for each new user based on their first touchpoint. It joins the `UserAcquisition` CTE with the `events_*` table and uses `UNNEST` to extract the values from the `event_params` array. It also uses `COALESCE` to handle cases where the source, medium, or campaign are not available, defaulting to `(direct)` or `(none)`.
- EngagementData CTE: This CTE calculates the number of engaged sessions and the total engagement time for each user. It aggregates data from the `events_*` table, filtering for `user_engagement` events and extracting the engagement time from the `event_params` array.
- Final SELECT Statement: This statement joins the `FirstTouchAttribution` and `EngagementData` CTEs to produce the final user acquisition report. It calculates the number of new users, engaged sessions, total engagement time (in minutes), and engagement rate, grouped by date, medium, source, and campaign.
Customizing the Query
This query provides a solid foundation for your user acquisition report. You can customize it further to meet your specific needs:
- Add More Metrics: Include other relevant metrics, such as conversion rate, revenue, or lifetime value.
- Filter by Event Parameters: Filter the data based on specific event parameters, such as device category or country.
- Segment by User Properties: Segment the data based on user properties, such as age or gender (if you are collecting this data).
- Analyze Specific Campaigns: Focus on specific marketing campaigns by filtering the data based on campaign name.
Visualizing the Data
Once you have generated the user acquisition report in BigQuery, you can visualize the data using tools like:
- Google Data Studio (Looker Studio): Connect directly to your BigQuery dataset and create interactive dashboards and reports.
- Tableau: Another popular data visualization tool that can connect to BigQuery.
- Python (with libraries like Matplotlib and Seaborn): Use Python to create custom visualizations and perform more advanced analysis.
Best Practices
Here are some best practices to keep in mind when working with GA4 data in BigQuery:
- Use Parameterized Queries: To prevent SQL injection vulnerabilities, use parameterized queries when passing user input to your queries.
- Optimize Your Queries: Use appropriate indexes and partitions to improve query performance.
- Monitor Your Costs: BigQuery charges for query processing and storage. Monitor your usage and optimize your queries to minimize costs.
- Use a Consistent Naming Convention: Use a consistent naming convention for your tables, columns, and views to improve readability and maintainability.
Conclusion
By leveraging the power of BigQuery, you can unlock deeper insights into your GA4 user acquisition data and gain a competitive edge. This guide provides a starting point for creating custom user acquisition reports that are tailored to your specific business needs. Remember to experiment with different metrics, filters, and segments to uncover hidden patterns and trends in your data. With a little SQL knowledge and a willingness to explore, you can transform your GA4 data into actionable intelligence that drives growth.
For further reading and more advanced techniques, consider exploring these resources: