How to Remove Personally Identifiable Information (PII) from Google Analytics
In the face of increasing privacy regulations such as the GDPR, protecting the identity of your website’s users has never been more important than it is today.
If you have Google Analytics set up on your site and believe there is a chance that PII data in the URL could be sent to Google, it is crucial for that information to be redacted. In addition to simply following best practices and the law, Google policy requires that marketers must restrict all PII from being passed to Google. The most common information includes email addresses, phone numbers, full names, usernames, passwords and zip codes.
This how-to post breaks down how marketers and data professionals can find and remove personally identifiable information from Google Analytics data.
How do you find PII data in your Google Analytics reports?
The simplest way to check for an email address is to navigate to “Behavior” > “Site Content” > “All Pages.” Then add a filter using the ampersand signal @. This will show any pageviews that have common emails in them.
How to redact PII using Google Tag Manager
Follow the instructions below to learn how to use Google Tag Manager to redact PII from the URL before the data is sent to Google.
1. Create a custom JavaScript variable in GTM with the following code:
function() {
return function(model) {
try{
// Add the PII patterns into this array as objects
var piiRegex = [{
name: 'EMAIL',
regex: /[^\/][a-zA-Z0-9._-]+(@|%40)(?!yoursite\.com)[^\/]+[a-zA-Z0-9._-]/gi,
group: '' },{
name: 'SELF-EMAIL',
regex: /[^\/][a-zA-Z0-9._-]+(@|%40)(?=yoursite\.com)[^\/]+[a-zA-Z0-9._-]/gi,
group: '' },{
name: 'TEL',
regex: /((tel=)|(telephone=)|(phone=)|(mobile=)|(mob=))[\d\+\s][^&\/\?]+/gi,
group: '$1' },{
name: 'NAME',
regex: /((firstname=)|(lastname=)|(surname=))[^&\/\?]+/gi,
group: '$1' },{
name: 'PASSWORD',
regex: /((password=)|(passwd=)|(pass=))[^&\/\?]+/gi,
group: '$1' },{
name: 'ZIP',
regex: /((postcode=)|(zipcode=)|(zip=))[^&\/\?]+/gi,
group: '$1' }
];
// Fetch reference to the original sendHitTask
var originalSendTask = model.get('sendHitTask');
var i, hitPayload, data, val;
model.set('sendHitTask', function(sendModel) {
hitPayload = model.get('hitPayload');
// Let's convert the current querystring into a key,value object
data = (hitPayload).replace(/(^\?)/,'').split("&").map(function(n){return n = n.split("="),this[n[0]] = n[1],this}.bind({}))[0];
// We'll be looping thu all key and values now
for(var key in data){
// Let's have the value decoded before matching it against our array of regexes
piiRegex.forEach(function(pii) {
var val = decodeURIComponent(data[key]);
// The value is matching?
if(val.match(pii.regex)){
// Let's replace the key value based on the regex and let's reencode the value
data[key] = encodeURIComponent(val.replace(pii.regex, pii.group + '[REDACTED ' + pii.name + ']'));
}
});
}
// Going back to roots, convert our data object into a querystring again =)
sendModel.set('hitPayload', Object.keys(data).map(function(key) { return (key) + '=' + (data[key]); }).join('&'), true);
// Set the value
originalSendTask(sendModel);
});
}catch(e){}
};
}
You can also get a copy of this code from Brian Clifton’s blog article “Remove PII from Google Analytics.” I made a few changes in the email and self-email regular expression so that the entire email address would be redacted instead of only the four characters before and after the @ symbol. You could also update the other regular expression to match the parameters you pass in your URL. For example, you may want to add the name parameter to the regex if your site uses the name parameter in your URL.
/((firstname=)|(lastname=)|(surname=)|(name))[^&\/\?]+/gi,
2. Update your Google Analytics setting to add a custom task to call the new variable you created from the previous step.
Once you have the GTM setting and variable in place, you can set up a “Site Content” report in Google Analytics to check on any URLs that contain redacted PII data by adding a page filter that matches the word “REDACTED.” You can check the “Site Content” report to see if you are passing any PII data in your URLs and take action on your site to remove that data from the source.
Follow ForwardPMX
You May Find These Interesting
Luxury in 2021: Four Focus Areas for Growth And Success
The year 2020 will forever be synonymous with words like uncertainty, unpredictable and unprecedented. But despite all the question marks that the year left us with, it has also given us some clarity on what the path forward for luxury brands could look like. Our...
“Our connection is our strength as a team”: Our RiseFP winners open up about what our values mean to them
At FP, our core values reflect the essence of our identity and what’s truly important for our team to live and thrive by. Last year, our People and Marketing teams conducted a series of global interviews, focus groups and surveys to define our Values with the input of...
Meetings of the Minds: Roundtable Discussions with Nonprofit Leaders
Throughout the months of September and October, ForwardPMX brought together leadership from top nonprofits to have intimate discussions about the state of nonprofit, individual experiences from this year and what can be expected from these final weeks of 2020. We were...