Better Marketing Attribution with Clickstream Analytics

HardMath.jpg

Get more out of your web analytics without paying $150,000

Is your digital advertising working? For the vast majority of ecommerce companies out there, the answer seems simple enough: Facebook and Google are happy to tell you which ads lead to conversions, and the free version of Google Analytics has pretty good attribution modeling tools built in. These tools will get better, and a bit more complicated with the forced migration to Google Analytics 4 in 2023.

But if you begin to dig a little deeper, problems begin to emerge: ad platform conversion data doesn’t line up; Google Analytics’ default attribution only credits last click, Facebook credits itself on any click; referrer data is obscured; multi-touch attribution works in a black box you don’t understand. And recently, Apple has created new obstacles to tracking on their mobile devices.

There are solutions out there, like Google Analytics 360, but they come with a six figure price tag. And unlike the free version of Google Analytics, you can export click-level data for deep analysis and modeling in SAS, Python, or R. 

But this begs the question: If you need to export the data to get what you’re looking for, why bother paying?

Maybe you shouldn’t. Perhaps the money would be better spent on analysis that will get you closer to the dynamics of your marketing funnel. The weaknesses, limitations and errors with the standard Facebook and Google toolsets are the same for everyone, with those tools as a standard-issue quiver of slightly bent arrows. 



What is clickstream data?

Clickstream data reflects all the interactions users have on your site, aggregated by User ID and Session. The free version of Google Analytics doesn’t allow you to see this and built-in reports are based on a sampling of the data. 

Clickstream is particularly vital for understanding user behavior, which is only possible when you can see the order of interactions a single user has on your site. More on this below.



How do I get access to clickstream?

Google Analytics 360 gives access to clickstream data, but the cost may be prohibitive, or by our assessment, simply bad value.

There are open source systems such as Matomo; free when self-hosted or at a small recurring cost if bought as a service. Self-hosting Matomo speaks to a larger philosophical point of taking complete ownership of analytics data. The potential importance of this point must however be balanced against the extra setup and maintenance required for e-commerce and goal information to be pulled into a second system, and the additional page load a second analytics tag will cause.

Our preferred solution comes from Denmark-based Scitylana

For a relatively low subscription fee, their system will collect click data from your site and help you push it to a BigQuery database that you own. This is all done through a standard implementation of Google with some small adjustments easily achieved through Google Tag Manager.



What you can do with clickstream

Clickstream gives you a view deep inside the analytics game

Clickstream gives you a view deep inside the analytics game

Three crucial analyses you can perform with clickstream data include:

  1. Marketing attribution - The contribution every channel makes to a conversion goal.

  2. Click path analysis - The routes taken by site visitors each they arrive at your site and the clicks and actions they take within the site.

  3. Clustering - An effective way to group or segment users by a relevant complement of variables, such as number of visits, time on site, and device.

Of course these analyses can be performed from within Google Analytics, but nowhere near the level that can be achieved by using clickstream data and external analysis tools.

For this article we’ll cover using clickstream for marketing attribution.



Why marketing attribution with clickstream is superior

The path your customer takes from vague interest to purchase is rarely linear. They may see an ad, read a review, perform a search, or get a recommendation from a friend. All of these factors are important, but only a few do you, the advertiser, pay for. And, like any smart marketer, you want to know what you’re getting for your money. 

Unfortunately, Google and Facebook want to prove their value too, and they are incentivized to push their version of the truth. Their tools work well enough, but in the end you are always going to be seeing a filtered view. 

The only way to get your own truth is to own your data. 

Clickstream allows you to look deep into every user’s journey, where they come from, and what they do. You can customize your analysis, and if you want to examine every single click, you can. 



Choosing the right attribution model

Russian math whiz and proto-hipster Andrey Markov

Russian math whiz and proto-hipster Andrey Markov

Attribution models come in many forms, and none is a panacea. In our experience, the best path is comparative and probabilistic. By comparative, we mean that you should compare the results of several models concurrently. Seeing the differences between models like Last Touch, First Touch, Linear, and Time Decay is almost always instructive in itself. 

But if you need to show your attribution model to someone who isn’t a data scientist, you need to choose just one. And it should be driven by data more than heuristic interpretation.

Our preferred data-driven model is known as Markov probability analysis. Put simply, Markov attribution gives value to each channel based on the probable impact of its removal from the user journey. Another popular method, Shapley value analysis, is used in Google Analytics 360 attribution. 


How to implement clickstream attribution

Scitylana is essentially an extraction tool that provides a daily tab-separated text file where each row is a click or event and each column is a Google Analytics field. This data can be auto-downloaded to a server or posted to a Google BigQuery table.

We typically download the data for easy analysis in R, but there are good scaling reasons to host in BigQuery if you can afford it.

There are four stages to our data prep and analysis:

1. Data assembly

Clickstream data needs to be assembled to cover a given time period (such as a day, week, or month of activity) plus a “look back window” — meaning if we’re reporting on February attribution, we need to know what February’s visitors might have done on the site for some portion of January so we can build up accurate visitor paths. We typically use a 30 day look back, but this varies based on customer journey.

In addition to preparing the time series, we need to set all the columns as the correct classes (numeric, date, etc) and omit extraneous data.

2. External domain removal

Part of the purchase journey may involve a separate subdomain or domain. For e-commerce sites, this often includes third party financing or the checkout system itself. These other domains will look like a separate session and so the subsequent converting session, such as with financing, will show the referring source as the finance provider rather than the true originating source.

This is why we like R - these sort of thorny problems can be solved once with the same code run repeatedly on new data. In this case the sources from what is considered the previous visit can be carried forward to the converting visit.

3. Source cleanup and channel grouping 

An obvious shortcoming of the Google Analytics interface is the large amount of work necessary to harmonize sources. Take a cursory look at the source/medium items and you’ll see a default channel grouping that bears no resemblance to anyone’s default wish. Facebook represents one third of the digital media market, but there is no intelligent consolidation of all the different Facebook referral sources. Instead you’ll see at least 3, possibly 5 or 6 different Facebook referral sources all lumped into an arbitrary channel called ‘social’. Some of this can be resolved by intelligent use of UTMs and the channel grouping feature, but arguably the latter takes longer to set up than some basic “if else” logic in R. 

By contrast, simple rules in R can be set up to group sources, mediums and channels to represent the optimal way to report media value. The grouping logic can be adjusted as exceptions emerge. Our preference is to separate out paid from organic social, remarketing and prospecting campaigns, and branded versus product/service keywords for paid search.


4. Attribution analysis
There’s a handy package in R called ’ChannelAttribution’ that has a number of models baked in. We’ve gone one stage further and developed clickMarkov, an R package that works directly with Scitylana outputs files and feeds into ChannelAttribution.

The output from either package is a simple table with each column representing a model and each row a channel.

Sample channel attribution comparison output using our R package, clickMarkov

Sample channel attribution comparison output using our R package, clickMarkov

The standard models such as first-touch, linear and last-touch are known as ‘heuristic’ models. This is nothing more than a clever way of saying ‘good judgement’ or ‘best guess’. While, as we’ve mentioned above, judgement (or heuristics) can be beneficial to compare which channels give more value on first versus last touch, there still needs to be a model that represents ‘the truth’ in terms of the mathematically estimated real value of each channel. In this case that is the ‘Markov Conversion Value’.

Actions you can take with more confidence

1. Calculate ROAS by channel and make smart investments
With a higher quality estimate of revenue by channel, you can calculate Return on Ad Spend (ROAS) by channel with more confidence. For paid digital channels, this is an easy calculation, but for channels such as PR there may need to be some allocation of fixed and agency costs.

A lower ROAS shouldn’t mean that a channel should be abandoned, rather it’s an indication that some effort and strategic thinking be applied to lower performing channels to increase the ROAS. Some channels may naturally have lower ROAS scores than others and for example not every brand interaction through PR can be measured digitally.

Over time, marketing activities can be matched to increases in revenue and channel and a deeper understanding will be gained for how to make marketing investment decisions. This is particularly important for brand-building programs such as PR where they may be a time-shift between an activity and its effect upon sales.

2. Adjust creative, messaging & UX against purchase path position
The differences between attribution models offers insight into the role each channel plays in the conversion funnel.

Example 1: Organic Search vs Direct.
In the above table, we can see Organic Search plays far more of a valuable role towards the last touch, whereas the Direct channel plays more of a role for the first touch. A natural question from this discovery would be “how do we adapt the home page experience to accommodate, or provide a different experience for new and returning visitors?”

Example 2: Paid Search
Likewise we see a massive 25 percent difference between first and last touch values for paid search. In this case we’d want to investigate firing up RLSA to test different creative for new and returning searchers.

Example 3: Paid Social
Facebook has a first-touch value considerably higher than last-touch indicating the channel’s prospecting role, but the overall Markov value is similar to last touch. This indicates those first touches require extra help from other channels to get the purchaser over the line. Could better creative or a more relevant landing page increase the overall channel value?

3. Calculate the true return from affiliate marketing
Affiliate marketing platforms are very sophisticated in their ability to track user visits and monitor attribution. However, in the interest of partner incentivization, commissions are typically paid out on an “any-touch” basis. This means that if a user visits your site three times before they purchase, any of those visits via affiliate will be liable for commission regardless of if the other two visits contributed more to the overall purchase decision. Hence the commissions paid out may cost more than the actual value delivered. Depending on the role of your affiliate channel and commission contracts, you may want to renegotiate to match value to cost.

Is all this work worth it?

If you care about where your money is going, you should care about clickstream. Google Analytics’ built in free channel attribution tools can provide a lot of insight, but give you little control to look inside the numbers and make decisions with confidence.

Note: This article was originally published in 2019 and updated in February 2021.