The Ultimate Guide To Split Testing
Posted by Pete - 14/07/08 at 01:07:54 pmWell, the last post I did on split testing things went over the heads of a lot of people, so I thought I’d take some time to go back and revisit the topic of split testing, covering what it is, the various versions, and how you should be using it.
If you’d rather just get the takeaway points and the files for this post, scroll down to the bottom.
Contents
- What Is It?
- How to Conduct a Split Test
- Using Orthogonal Arrays
- Choosing Your Test Sets
- Gathering Your Data and Validating the Results
- Analysing the Data
- Takeaway Points and Downloads
What Is It?
A split test is where you have two or more versions of the same thing, and you then test both and see which one produces the better outcome. There are three basic kinds of split test, known as AB, ABA and multivariate. In an AB test, you have just two versions, one called A, which is your original, and one called B, which is the refined version. In an ABA, you actually test your original twice, hence the two A’s, which will give you an expected variance. This means you can better understand the level of variance you’d expect to see on your B version.
In a multivariate test, you pick a number of parameters, which could be things like PPC advert headline, copy and URL, or possible layouts for a form, or anything else with multiple parameters. You then test a number of levels for each of these things, so you might have three variations of each headline, copy and url in the PPC example, or in the case of the form, you might have thee variations of a form layout.
Whatever kind of split test you end up performing, the important thing is to make sure you know your interrogative statement (what do you want to find out), your data set reliability (are the people testing an indicative sample of the traffic you’re going to get in the future) and your result accuracy (how much variance you’d expect to see in your result if you kept on adding data).
How To Conduct a Split Test
Whether you’re conducting an AB or multivariate test, the methodology is pretty similar. As such, you should be able to take the following framework and apply it to pretty much any test you’ll ever run.
The first thing to do is define the question. What is it you want to know? What are you trying to find? It might be the optimum layout of a page, the best PPC ad for a campaign, or how to make the ultimate scrambled egg (I’m not joking either - you can apply this stuff to anything).
Essentially, whatever the question is, you’ll be looking to answer a “which” question. For instance,
- Which headline will convert best?
- Which form is best for converting signups?
- Which offer will incentivise the best?
- Which copy will read, reassure and reinforce best?
Choosing Your Test Sets
The process from this point is very simple. Once you’ve identified your “which”, you need to create your variations. Now, if you’re doing a simple AB test, that means simply taking your control, modifying it so you’ve got a refined element, and then throwing traffic at it. The process is pretty much the same if you’re doing an ABA test too. If you’re doing a multivariate test however, it’s slightly more complex.
The problem comes in how you choose the combinations of the parameters and levels you’re going to test. This happens because, when you’re doing multivariate tests, you can end up very quickly with more options than it’s feasible to study. For instance, if you had three parameters, each with five levels (not an uncommon set), you’d have 243 potential variations to test. Even worse, if you had seven parameters, each with five levels (the biggest I’ve ever done), you could construct 16,807 variations. To get around this, we employ Taguchi orthogonal arrays.
Using Orthogonal Arrays
Bear with me, because honestly what we’re about to do isn’t as scary as it looks… The way this works, is we pick the array with the right number of levels, and then make the number of columns equal to the number of parameters. So if you’ve got five parameters, you need the first 5 columns of the array. For example, if we start with the following array:
1111111
1112222
1221122
1222211
2121212
2122121
2211221
2212112
We could test up to seven parameters, each with two levels (because it has seven columns, and every number is a 1 or 0). If we now pick the first four columns…
1111
1112
1221
1222
2121
2122
2211
2212
We would be able to test a representative sample of all the possible variations. So instead of running 32 tests, we only run eight. Similarly, if we want to test three parameters, each with four levels, we would start with the array below:
11111
12222
13333
14444
21234
22143
23412
24321
31342
32431
33124
34213
41423
42314
43241
44132
And then pick the first three columns…
111
122
133
144
212
221
234
243
313
324
331
342
414
423
432
441
And then test these 16 combinations, instead of the 81 we could potentially construct. I won’t go into the math behind how you construct these arrays, as it’s frankly mind-bogglingly dull. But suffice to say, you can’t just pick random variations. So please stick to the arrays you’ll find in the zip file at the end of this.
As a quick example, if you wanted to split test two PPC ads, you’d have three parameters, each with two levels.
Gathering Your Data and Validating the Results
As a general rule, a sample is statistically valid when it will result in variation of no more than 5% when the sample size is increased. This is where ABA tests really come into their own, as you’ve got a running tally in the form of your second A test, that shows you how accurate your data is, so when the two samples get to being consistently within 5% of each other, you know you’re done. If however you’re running a standard AB or multivariate test, simply graph your results, and when the line trends out to less than a 5% wobble when you compare 20% of the results against another 20% of them, you’re done.
Validation also tends to be fairly simple. You want to check for any extraneous or instrumentation based effects on your data. Extraneous effects include things like news events that might skew your data to include the wrong kind of people, online and offline mentions that send odd traffic, or anything else that might get people outside of your intended sample into the mix. Instrumentation effects include any problems in the sandbox area that can alter results, such as a problem with analytics implementation, or changing analytics services half way though the test.
Analysing the Data
When you’ve finished the test and collected the data, the only thing left to do is to work out which version performed best. Now, in the case of AB and ABA tests, that’s pretty simple; you just take whichever one worked best, and use that.
However, the multivariate tests make things a bit more complicated. Here’s what you do…
When you’ve got your data, take the lines from your array, and number them sequentially. So if we use the first array we had earlier:
1111
1112
1221
1222
2121
2122
2211
2212
We’d call the first row 1, the second 2 and so on. This gives us 8 numbers. Next to each one, write down the conversion rate. This will give you something like this:
1 3.9%
2 4.7%
3 2.1%
4 3.3%
5 5.5%
6 4.8%
7 2.5%
8 6.2%
Now we’re going to create a table. Write down your parameters along the top, and then the permutation numbers down the sides. So in our example, we’d have a table with four columns and eight rows. The table should then calculate the averages of where each level, by adding the results from each level of a given permutation, and dividing by the number of times it appears. For instance:
Add all the results of Permutation 1, Level 1, and divide by 4. Perm 1, Level 1 appears in tests 1, 2, 3 and 4. The total of these is 14%. Divide this by 4 and we get 3.5%
Now add all the results of Permutation 2, Level 1, and divide by 4. Perm 2, Level 1 appears in tests 1, 2, 5 and 6. The total of these is 18.9%. Divide this by 4 and we get 4.73%.
Keep doing this until you’ve gone through all the results, and you’ll be left with the best performing levels for each permutation. Stick them together, and that’s your perfect advert.
Takeaway Points and Downloads
It doesn’t matter what method you use to test. It only matters that you do
A result is only as good as the data that went into creating it
Multivariate tests may be sexy, but they take much more time. Don’t assume they’re always the way forward
Don’t rush in to anything. Make sure you do the legwork first, and get everything set up properly. A ruined test wastes time and money
Download the Taguchi Orthogonal Arrays
Download the article as a pdf
No Comments yet »
RSS feed for comments on this post. TrackBack URI
Leave a comment
You must be logged in to post a comment.




