How to Remove Personally Identifiable Information (PII) from Google Analytics

November 14, 2019

In the face of increasing privacy regulations such as the GDPR, protecting the identity of your website’s users has never been more important than it is today.

If you have Google Analytics set up on your site and believe there is a chance that PII data in the URL could be sent to Google, it is crucial for that information to be redacted. In addition to simply following best practices and the law, Google policy requires that marketers must restrict all PII from being passed to Google. The most common information includes email addresses, phone numbers, full names, usernames, passwords and zip codes.

This how-to post breaks down how marketers and data professionals can find and remove personally identifiable information from Google Analytics data.

How do you find PII data in your Google Analytics reports?

The simplest way to check for an email address is to navigate to “Behavior” > “Site Content” > “All Pages.” Then add a filter using the ampersand signal @.  This will show any pageviews that have common emails in them.

How to redact PII using Google Tag Manager

Follow the instructions below to learn how to use Google Tag Manager to redact PII from the URL before the data is sent to Google.

1. Create a custom JavaScript variable in GTM with the following code:

function() {

 return function(model) {

    try{

      // Add the PII patterns into this array as objects

      var piiRegex = [{

        name: 'EMAIL',

        regex: /[^\/][a-zA-Z0-9._-]+(@|%40)(?!yoursite\.com)[^\/]+[a-zA-Z0-9._-]/gi,

        group: '' },{

      name: 'SELF-EMAIL',

        regex: /[^\/][a-zA-Z0-9._-]+(@|%40)(?=yoursite\.com)[^\/]+[a-zA-Z0-9._-]/gi,

        group: '' },{

        name: 'TEL',

        regex: /((tel=)|(telephone=)|(phone=)|(mobile=)|(mob=))[\d\+\s][^&\/\?]+/gi,

        group: '$1' },{

        name: 'NAME',

        regex: /((firstname=)|(lastname=)|(surname=))[^&\/\?]+/gi,

        group: '$1' },{

        name: 'PASSWORD',

        regex: /((password=)|(passwd=)|(pass=))[^&\/\?]+/gi,

        group: '$1' },{

        name: 'ZIP',

        regex: /((postcode=)|(zipcode=)|(zip=))[^&\/\?]+/gi,

        group: '$1' }

    ];                                

      // Fetch reference to the original sendHitTask

      var originalSendTask = model.get('sendHitTask');

      var i, hitPayload, data, val;

      model.set('sendHitTask', function(sendModel) {

          hitPayload = model.get('hitPayload');  

          //  Let's convert the current querystring into a key,value object

          data = (hitPayload).replace(/(^\?)/,'').split("&").map(function(n){return n = n.split("="),this[n[0]] = n[1],this}.bind({}))[0];

        //  We'll be looping thu all key and values now

          for(var key in data){

 

              // Let's have the value decoded before matching it against our array of regexes

              piiRegex.forEach(function(pii) {       

                var val = decodeURIComponent(data[key]);                                                           

                // The value is matching?

                if(val.match(pii.regex)){

                  // Let's replace the key value based on the regex and let's reencode the value

                  data[key] = encodeURIComponent(val.replace(pii.regex, pii.group + '[REDACTED ' + pii.name + ']'));                                                     

                }                        

              });                                                   

          }       

          // Going back to roots, convert our data object into a querystring again =)   

          sendModel.set('hitPayload', Object.keys(data).map(function(key) { return (key) + '=' + (data[key]); }).join('&'), true);

          // Set the value

          originalSendTask(sendModel);

      });   

    }catch(e){}

  };

}

You can also get a copy of this code from Brian Clifton’s blog article “Remove PII from Google Analytics.” I made a few changes in the email and self-email regular expression so that the entire email address would be redacted instead of only the four characters before and after the @ symbol. You could also update the other regular expression to match the parameters you pass in your URL.  For example, you may want to add the name parameter to the regex if your site uses the name parameter in your URL.

/((firstname=)|(lastname=)|(surname=)|(name))[^&\/\?]+/gi,

2. Update your Google Analytics setting to add a custom task to call the new variable you created from the previous step.

Once you have the GTM setting and variable in place, you can set up a “Site Content” report in Google Analytics to check on any URLs that contain redacted PII data by adding a page filter that matches the word “REDACTED.”  You can check the “Site Content” report to see if you are passing any PII data in your URLs and take action on your site to remove that data from the source.

Follow ForwardPMX

Our Newsletter

Sign up to receive our monthly insights.

  • This field is for validation purposes and should be left unchanged.

You May Find These Interesting

Year Here Crowdbacker – Building Back Better

Year Here Crowdbacker – Building Back Better

The past few months have further highlighted many things for us as a society; the importance of systematic change, the power of community, my inability to set up a zoom meeting; the list goes on. It’s demonstrated to us how fast change can happen, flipping normality...

read more