Posted by Tom.Capper
Digital marketing is a proudly data-driven field. Yet, as SEOs especially, we often have such incomplete or questionable data to work with, that we end up jumping to the wrong conclusions in our attempts to substantiate our arguments or quantify our issues and opportunities.
In this post, I’m going to outline 4 data analysis pitfalls that are endemic in our industry, and how to avoid them.
1. Jumping to conclusions
Earlier this year, I conducted a ranking factor study around brand awareness, and I posted this caveat:
"...the fact that Domain Authority (or branded search volume, or anything else) is positively correlated with rankings could indicate that any or all of the following is likely:
However, I want to go into this in a bit more depth and give you a framework for analyzing these yourself, because it still comes up a lot. Take, for example, this recent study by Stone Temple, which you may have seen in the Moz Top 10 or Rand’s tweets, or this excellent article discussing SEMRush’s recent direct traffic findings. To be absolutely clear, I’m not criticizing either of the studies, but I do want to draw attention to how we might interpret them.
Firstly, we do tend to suffer a little confirmation bias — we’re all too eager to call out the cliché “confirmation vs. causation” distinction when we see successful sites that are keyword-stuffed, but all too approving when we see studies doing the same with something we think is or was effective, like links.
Secondly, we fail to critically analyze the potential mechanisms. The options aren’t just causation or coincidence.
Before you jump to a conclusion based on a correlation, you’re obliged to consider various possibilities:
If those don’t make any sense, then that’s fair enough — they’re jargon. Let’s go through an example:
Before I warn you not to eat cheese because you may die in your bedsheets, I’m obliged to check that it isn’t any of the following:
So we have 4 “Yes” answers and one “No” answer from those 5 checks.
If your example doesn’t get 5 “No” answers from those 5 checks, it’s a fail, and you don’t get to say that the study has established either a ranking factor or a fatal side effect of cheese consumption.
A similar process should apply to case studies, which are another form of correlation — the correlation between you making a change, and something good (or bad!) happening. For example, ask:
This is particularly challenging for SEOs, because we rarely have data of this quality, but I’d suggest an additional pair of questions to help you navigate this minefield:
Direct traffic as a ranking factor passes the “could” test, but only barely — Google could use data from Chrome, Android, or ISPs, but it’d be sketchy. It doesn’t really pass the “would” test, though — it’d be far easier for Google to use branded search traffic, which would answer the same questions you might try to answer by comparing direct traffic levels (e.g. how popular is this website?).
2. Missing the context
If I told you that my traffic was up 20% week on week today, what would you say? Congratulations?
What if it was up 20% this time last year?
What if I told you it had been up 20% year on year, up until recently?
It’s funny how a little context can completely change this. This is another problem with case studies and their evil inverted twin, traffic drop analyses.
If we really want to understand whether to be surprised at something, positively or negatively, we need to compare it to our expectations, and then figure out what deviation from our expectations is “normal.” If this is starting to sound like statistics, that’s because it is statistics — indeed, I wrote about a statistical approach to measuring change way back in 2015.
If you want to be lazy, though, a good rule of thumb is to zoom out, and add in those previous years. And if someone shows you data that is suspiciously zoomed in, you might want to take it with a pinch of salt.
3. Trusting our tools
Would you make a multi-million dollar business decision based on a number that your competitor could manipulate at will? Well, chances are you do, and the number can be found in Google Analytics. I’ve covered this extensively in other places, but there are some major problems with most analytics platforms around:
For example, did you know that the Google Analytics API v3 can heavily sample data whilst telling you that the data is unsampled, above a certain amount of traffic (~500,000 within date range)? Neither did I, until we ran into it whilst building Distilled ODN.
Similar problems exist with many “Search Analytics” tools. My colleague Sam Nemzer has written a bunch about this — did you know that most rank tracking platforms report completely different rankings? Or how about the fact that the keywords grouped by Google (and thus tools like SEMRush and STAT, too) are not equivalent, and don’t necessarily have the volumes quoted?
It’s important to understand the strengths and weaknesses of tools that we use, so that we can at least know when they’re directionally accurate (as in, their insights guide you in the right direction), even if not perfectly accurate. All I can really recommend here is that skilling up in SEO (or any other digital channel) necessarily means understanding the mechanics behind your measurement platforms — which is why all new starts at Distilled end up learning how to do analytics audits.
One of the most common solutions to the root problem is combining multiple data sources, but…
4. Combining data sources
There are numerous platforms out there that will “defeat (not provided)” by bringing together data from two or more of:
The problems here are that, firstly, these platforms do not have equivalent definitions, and secondly, ironically, (not provided) tends to break them.
Let’s deal with definitions first, with an example — let’s look at a landing page with a channel:
Fine, though — it may not be precise, but you can at least get to some directionally useful data given these limitations. However, about that “(not provided)”...
Most of your landing pages get traffic from more than one keyword. It’s very likely that some of these keywords convert better than others, particularly if they are branded, meaning that even the most thorough click-through rate model isn’t going to help you. So how do you know which keywords are valuable?
The best answer is to generalize from AdWords data for those keywords, but it’s very unlikely that you have analytics data for all those combinations of keyword and landing page. Essentially, the tools that report on this make the very bold assumption that a given page converts identically for all keywords. Some are more transparent about this than others.
Again, this isn’t to say that those tools aren’t valuable — they just need to be understood carefully. The only way you could reliably fill in these blanks created by “not provided” would be to spend a ton on paid search to get decent volume, conversion rate, and bounce rate estimates for all your keywords, and even then, you’ve not fixed the inconsistent definitions issues.
Bonus peeve: Average rank
I still see this way too often. Three questions:
Hopefully, you’ve found this useful. To summarize the main takeaways:
Let me know what data analysis fallacies bug you, in the comments below.
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!Post source = http://tracking.feedpress.it/link/9375/7644262
via Blogger Don't Be Fooled by Data: 4 Data Analysis Pitfalls & How to Avoid Them