Data Retention in Google Analytics
As analysts, we are relying on properly collected and stored data to inform business decision processes and digital marketing budget allocation. Our day to day work usually revolves around manipulating the data we have available into reporting and meaningful, actionable insights generation.
That being said, let’s ask ourselves this question: how’s your data? Pretty simple question, right?
This is where the concept of Data Governance comes into play. But, what is Data Governance? According to IBM, “Data governance is a set of policies, processes and an organizational structure to support enterprise data management”. Expanding on the definition and the components of data governance is part of a series of posts we will be posting in the near future.
For now though, let’s focus on data retention, more specifically, what does that mean and what does it look like in the Google Analytics world.
While Universal Analytics is offering the option to retain data for an unlimited time period, that is not the case with Google Analytics 4, the new Google kid on the block.
With Google Analytics 4, the default data retention period is two months, and the maximum allowed retention period for your data is now 14 months for the free version. The enterprise version has an extended range for retaining the data.
One thing to notice here is that the data retention setting does not affect standard aggregated reports in your Google Analytics 4 property, even if you create comparisons in the reports. The data retention setting only affects Explorations reports.
It goes without saying, access to historical data is essential. Losing that access results in the inability to perform granular and comparative analysis over time. Therefore, your GA4 setup checklist should include switching the data retention settings. Here are the steps to do so:
- Log into your GA4 account
- Select the property you want to update the settings for
- Click “Admin” on the bottom left
- Under “Property,” click “Data settings.”
- Select “Data retention.”
- Select the 14-month option from the drop-down menu and click “Save.”
You are now set to keep the collected data for 14 months… Is that enough though? The eventual deprecation of Universal Analytics will leave unprepared digital marketers in a pickle as the wealth of data collected so far will be lost.
In today’s online landscape, consumers’ expectations are evolving rapidly, and adapting digital marketing efforts to a user-centric approach is key to organizational growth and success. A holistic understanding of who the customer is and what they are looking for, driven by all available and significant information about them, captured at all the touchpoints in the journey is the long and complex road to the 360 degree customer view. According to Gartner, “fewer than 10% of companies have a 360-degree customer view, and only 5% use this view to systemically grow their businesses.”
Web data is only one of the multiple data sources that feed into advanced analytics and predictive marketing. CRM, transactional, market research and other third party data sets come together when creating that holistic, 360 view of the customer’s journey.
The 14 month data retention period offered by GA4 is limiting the organization’s capabilities to augment it’s data analytics practice and fully embrace a culture of data driven decision making.
There is a way to address that limitation though by leveraging the GA4 – Google BigQuery integration. GA4 offers the option to setup BigQuery data exports, and, this is actually available in the free version of GA4. Previously, the 360 version of Universal Analytics was a prerequisite for this integration.
GA4 will export the data for each property into a single dataset, analytics_<property_id> (where <property _id> is the GA4 property id in question). Within each dataset, a table is imported for each day of export. These tables have the format “events_YYYYMMDD“. Additionally, a table is imported for app events received throughout the current day. This table is named “events_intraday_YYYYMMDD” and it is populated in real-time as app events are collected. The data is exported in a specific format and follows a set schema. Details on the data schema are here.
Please note using BiqQuery sandbox (free, exploration and learning tool), there are limitations in terms of storage capacity and volume of data processed. The sandbox also has limited features when compared to the paid version.
There are a few steps required to enable the GA4 – BigQuery connection:
- Create a Google-APIs-Console project and enable BigQuery
- Prepare your project for BigQuery Export
- Link BigQuery to a Google Analytics 4 property
In this step, there are two export options you can choose from, Daily (once a day) or Streaming (continuous).
For each day, streaming export creates one new table:
- events_intraday_YYYYMMDD: is an internal staging table that includes records of session activity took place during the day. Streaming export is a best-effort operation and may not include all data for reasons such as the processing of late events and/or failed uploads. Data is exported continuously throughout the day. This table can include records of a session when that session spans multiple export operations. This table is deleted when events_YYYYMMDD is complete.
If you select the daily option when you set up BigQuery Export, then the following table is also created each day.
- events_YYYYMMDD: The full daily export of events.
Google recommends querying events_YYYYMMDD since this dataset is more stable.
That being said, the tools are readily available to address the evolving limitations and regulations around data collection and storage.
Drop us a line and let’s have a conversation about keeping up with those changes and maintaining a rich and healthy data stream for analysts to use.
If you’re interested in learning more about data retention, read our previous blog post on data retention best practices for digital marketing here.