Posted by Emily.Potter A/B testing your SEO changes can bring you a competitive edge and dodge the bullet of negative changes that could lower your traffic. In this episode of Whiteboard Friday, Emily Potter shares not only why A/B testing your changes is important, but how to develop a hypothesis, what goes into collecting and analyzing the data, and thoughts around drawing your conclusions. Click on the whiteboard image above to open a high resolution version in a new tab! Video TranscriptionHowdy, Moz fans. I'm Emily Potter, and I work at Distilled over in our London office. Today I'm going to talk to you about hypothesis testing in SEO and statistical significance. At Distilled, we use a platform called ODN, which is the Distilled Optimization Delivery Network, to do SEO A/B testing. Now, in that, we use hypothesis testing. You may not be able to deploy ODN, but I still think today that you can learn something valuable from what I'm talking about. Hypothesis testingThe four main steps of hypothesis testingSo when we're using hypothesis testing, we use four main steps:
The most important part of A/B testing is having a strong hypothesis. So up here, I've talked about how to formulate a strong SEO hypothesis. 1. Forming your hypothesisThree mechanisms to help formulate a hypothesisNow we need to remember that with SEO we are trying to look to impact three things to increase organic traffic.
You could also be impacting a mixture of all three of these things. But you just want to make sure that one of these is clearly being targeted or else it's not really an SEO test. 2. Collecting the dataNow next, we collect our data. Again, at Distilled, we use the ODN platform to do this. Now, with the ODN platform, we do A/B testing, and we split pages up into statistically similar buckets. A/B test with your control and your variantSo once we do that, we take our variant group and we use a mathematical analysis to decide what we think the variant group would have done had we not made that change. So up here, we have the black line, and that's what that's doing. It's predicting what our model thought the variant group would do if we had not made any change. This dotted line here is when the test began. So you can see after the test there was a separation. This blue line is actually what happened. Now, because there's a difference between these two lines, we can see a change. If we move down here, we've just plotted the difference between those two lines. Because the blue line is above the black line, we call this a positive test. Now this green part here is our confidence interval, and this one, as a standard, is a 95% confidence interval. Now we use that because we use statistical testing. So when the green lines are all above the zero line, or all below it for a negative test, we can call this a statistically significant test. For this one, our best estimate is that this would have increased sessions by 12%, and that roughly turns out to be about 7,000 monthly organic sessions. Now, on either side here, you can see I have written 2.5%. That's to make this all add up to 100, and the reason for that is that you never get a 100% confident result. There's always the opportunity that there's a random chance and you have a false negative or positive. That's why we then say we are 97.5% confident this was positive. That's because we have 95 plus 2.5. Tests without statistical significanceNow, at Distilled, we've found that there are a lot of circumstances where we have tests that are not statistically significant, but there's pretty strong evidence that they had an uplift. If we move down here, I have an example of that. So this is an example of something that wasn't statistically significant, but we saw a strong uplift. Now you can see our green line still has an area in it that is negative, and that's saying there's still a chance that, at 95% confidence interval, this was a negative test. Now if we drop down again below, I've done our pink again. So we have 5% on both sides, and we can say here that we're 95% confident there was a positive result. That's because this 5% is always above as well. 3. Analyze the data to test hypothesisNow the reason we do this is to try and be able to implement changes that we have a strong hypothesis with and be able to get those wins from those instead of just rejecting it completely. Now part of the reason for this is also that we say we're doing business and not science. Here I've created a chart of when we would maybe deploy a test that was not statistically significant, and this is based off how strong or weak the hypothesis is and how cheap or expensive the change is.
|
AuthorWrite something about yourself. No need to be fancy, just an overview. Archives
December 2017
Categories |