Split Testing Adwords: You're Doing It Wrong

I’m sure the title of this post is going to make a few folks upset. Don’t worry, the content is going to really make you mad. Not at me, but maybe at yourself, and especially at whoever taught you how to split test.

That’s because, for most Adwords advertisers, the title of this post is very accurate. Most advertisers are doing split testing all wrong.

If you’re doing¬† A/B, you’re robbing yourself of profits at every step.

That’s because, in a typical A/B split test, what you’re doing is keeping your Control (the best performing ad) running, and creating a new test ad to run against it. Doing it that way is perfectly normal, but it’s also completely, utterly, and totally wrong.

Most of the time, these test ads fail to beat the control… and predictably so. Your Control is the Control because it’s doing well. After a few rounds of testing, it becomes less and less likely that your test ad will beat the control.

The “usual method” of ad testing delivers invalid results – here’s why:

  • The Control has a strong performance history, but your test ad has no performance history.
  • As a result, your test ad may be ranked lower on the page, leading to lower CTR and (quite possibly)¬† conversion.

In other words, your test ad might have been better – but it didn’t get a fair chance! This means that you may have already rejected many great ads, because you didn’t test them correctly.

But that’s not the worst part! “Control vs. test ad” also loses money:

When you run two ads in equal rotation, your Control is only getting half of the available ad impressions.

When a test ad fails fails to deliver good click-through and conversion rates, you’ve just given up as much as 50% of the profits that you would have had, if you had just left your ad group alone.

Wouldn’t it be better if you could construct a valid test that didn’t bleed money? Here’s how it’s done…

Step 1: Set Up A Valid Test

Instead of running your test ad against the Control, you create multiple copies of your Control.

How many copies you make is determined by how many of the ad impressions you want to go to the test ad.

In the example above, we’ve created 3 copies of the Control, left the Control running, and set up one test ad. Therefore, the test ad only gets 20% of the ad impressions.

This method leaves far less of your profit at risk.

It also allows you to run a valid test, by comparing the performance of your test ad against the copies of the Control only.

Since none of the ads in the “testing pool” has performance history, you can have far more confidence in the test results.

This method has reversed the results of many “close misses” I had with the old method of testing, which I was using myself until 2006.

Step Two: End Test (Fail) or Ramp Up To Full Test

In one of my many “hallway conversations” at the last Stompernet Live event, Andy Edmonds (Chief Scientist @ Stompernet and a real expert in testing) pointed out the real advantage of this method… which is that failed tests usually “fail early.”

Scientific testing types have a much more formal language for this, of course, but I got tired head when he explained that part of it.

I’ll just summarize the main points for you:

  • You can usually kill your test ad pretty early because most test ads will be obvious failures.
  • Running 3 copies of the Control actually accelerates your testing most of the time.
  • If the test ad appears to perform well, you can progressively eliminate copies of the Control to ramp up the pace of the test.
  • The final test is the validation test, where you run your Test Ad vs. Control – but now, you’re doing it right!

Since I’ve started doing this “Andy’s way,” my ad tests are running even faster than before. Way cool.

But what’s the real goal of all this testing? Higher click-through rates? Lower cost per click? These statistics are merely means to a greater end, my friend, and they can often lead you astray.

Indulge me for a minute, please. I didn’t get to be the “Keyword Guru” for nothing, you know…

There’s one other huge mistake that most folks make when they’re split testing ads - they don’t compare the business performance of their ads, just the click-through rate.

The problem should be obvious – you aren’t trying to buy traffic, you’re trying to buy customers. Actually, that’s only sort of true. What you’re really trying to buy is profit. Specifically, what you really want to achieve is the maximum possible profit per keyword.

At Stompernet, we call this statistic “Profit Per SERP”

Understanding how much profit you make every time one of your targeted keywords gets searched is a powerful idea. One of the most important metrics that any enlightened website owner will learn to measure and improve. It’s the KEY to search marketing success.

You have a lot of control over every variable in the search marketing equation, except for one thing…

You can NOT change the number of times people carry out a given search.

During the month of May, some number of people are going to search for “mortgage rates,” or “baby gifts,” or “free seo book.” We are powerless to change this number.

The number that we’re trying to change is our average profit for each of those searches.

By doing Adwords right, you can change that number, and in doing so, change the rules of the game.

We’ll talk more later.

For now, I’d like to invite you to watch the 50 minute instructional video that I prepared for you, with the help of master video editor Andy Jenkins. Watch online or download the Quicktime – but do watch… you don’t want to miss this.

85 thoughts on “Split Testing Adwords: You're Doing It Wrong

  1. Hi Dan,

    Just awful!

    I’m very excited at your techniques.

    You say,”How many copies you make is determined by how many of the ad impressions you want to go to the test ad.”

    I wonder how many copies do you usually create?
    Do you have a standard or suggestion?

    You also say, “eliminate copies of control one at a time,”

    Do you mean I must not eliminate them at the same time?
    When should I eliminate the next one?

    I have watched your video.
    I was strongly impressed at embedded match

    I’m looking forward to PPC Fast Start coming.

  2. How is this technique faster when the test ad gets 20% of impressions as opposed to 50%?

    Also, this way your “proven” ad only gets 20% impressions as well. In other words: the ad that gets “top” placement gets 20% impressions as opposed to 50% for FURTHER loss in CTR.

  3. I never thought about making copies of the control ad. That makes perfect sense. So much sense, I should have seen it. Time to change the way I have been doing things. I have basically been playing beat the best ad. But if the ad is always good, why give 50% of it’s views to something yet untested.

  4. @Japanese SEO, I normally run 3 copies of the control for my tests.

    If the test ad doesn’t fail quickly (they usually do) I delete one copy of the control, run some more testing, then another, then another, until I am confident enough to let the test run vs. the control.

    I recommend using Splittester.com as a simple tool to let you compare your results. You can combine the clickthrough and impression totals for the copies of the control and compare that to the single test ad, AND you can run each of the comparisons separately which might end your test faster.

    @Todd, in the example shown, your proven AD CREATIVE gets 80% of the impressions. The copy with performance history is shown 20% of the time.

    We have to eliminate performance history as a variable to do a valid test, that’s why we make copies.

    What you are comparing is the performance of two different ad creatives. Individual tests may take more or less time to conclude but overall we’ve been able to complete more tests than we did before. It would depend on the statistical model you use, and the level of probability/confidence you want to have in your decisions.

    We used to run 2 copies of the control and use A/B/C for comparison, but I got yelled at by a statistics geek for using A/B/C testing, and by other folks for throwing too many impressions into unproven ad creatives.

  5. @Stephan, almost nobody thinks about making copies of the control ad. Every time I’ve presented this to an audience, I immediately hear the sound of several dozen head slaps.

  6. @Japanese SEO, you can eliminate one or all of the copies when you have confidence that your test ad is performing well. Nobody can decide for you how quickly you want to ramp up.

    Throwing more impressions into a promising test will hasten the conclusion of the test, but it also puts more profit at risk – you need to strike a balance.

    It’s a lot easier to make that decision with good analytics behind you. The performance of your ad is not just about click-through rate.

  7. I’m inclined to agree with a couple of the other commenters: I understand how this mitigates your risk (not showing a potentially bad ad 50% of the time), but statistically speaking, only showing the new ad 20% of the time is absolutely going to increase your testing time. If it isn’t, then you changed your method and are making an unfair comparison. Of course, if you have a strong campaign and your primary goal is not to mess it up, then the 80%/20% approach is probably smart.

    More generally, though, if you have an ad with solid performance (CTR or conversion), your testing is probably better spent elsewhere, either on landing pages or general site conversion. Why mess with something that’s working, especially if you have such low confidence in the alternate versions you’re testing?

  8. Doesn’t sending 20% of the traffic just mean that it will take longer for the test to reach significance? You need to send X amount of traffic to an ad to get statistically significant numbers, and if you do that by sending 20% of your traffic for 10 days instead of 50% of your total traffic for 4 days, doesn’t that achieve the same results only slower?

    I’m asking because I’m a big fan of doing things wrong quickly. If you need to get a certain number of conversions to be able to judge an ads effectiveness it seems to me it’s best to get those numbers as fast as possible and move on.

  9. Pete, I’ll have to take your word for that, or at least not put up an argument, because I am not a statistics geek, and depending on which one you talk to they all have different ideas.

    I’ve heard from one math whiz that the method Splittester.com uses isn’t valid for this, but I can’t argue with him either.

    What I do know is that ads that fail often lose in a 1:1 comparison with one of the 3 control ads, and based on that, I am often able to terminate a test early. I’m sure different math geeks will tell me that this either is or is not valid.

    When you get into the other business factors besides click-through rate, the analysis gets real complicated with error bars and stuff, which is where I personally have to let other people deal with that.

    I’m not sure what you mean by “changed your method.” Because I’ve never been able to get a straight story from the math geek community about how to evaluate the results of a test, I did not prescribe a specific method in this post.

    You keep testing ads because every improvement in represents a substantial increase in the value of the business over time. Failure is just a test result, and if nothing you try ever fails, you aren’t doing enough testing.

    I certainly do have a lot of ad groups which are waiting for someone to come up with a better idea. Others where we only give 10% of the impressions to testing. Still others where we have 2 or even 3 ads running with a statistical dead heat after more than a year.

    But every time I think we’ll never beat the control again, someone comes up with a better idea.

    If you make major changes to your web site, branding, merchandising, etc. this can also affect the outcome of your advertising campaigns, to the point where a different ad creative may perform better than the existing control.

    The complexity of this stuff is bottomless. I’m just trying to help folks out here, by illustrating some stuff that nearly everyone is missing.

    Thanks for your thoughtful comments, and if you have some math/science to add to this discussion, it will enlighten me at least.

  10. @Kiowa, the length of a successful test (test ad beats the copies and gets to the championship round) is only slightly longer with this method. Play around with some hypotheticals @ SplitTester comparing “what if I ran this same test at 50/50″ and you’ll see that.

    What cuts our testing time is the early failures.

    We often see the test ad lose a 1:1 comparison vs. 1 of the copies of the control, where the combined stats for all copies may say it’s still too early to reach a conclusion.

    Based on high confidence from a 1:1 comparison, I’ll kill a test ad that isn’t giving me any other specific reasons to hope for a better outcome overall.

    That saves time.

  11. Sorry, by “changed your method”, I just meant that you not only went from 50/50 to 80/20, but also started using this new statistical technique, so the comparison may be apples vs. oranges. I think I understand now; you essentially treat it as a 5-way test, but 4 of the options are really the same. That’s interesting, but I’d have to dig into it. I think there’s an assumption that the 5 versions are independent, and by making 4 of them copies, you may be violating that assumption. On the other hand, what matters in hard-core mathematical theory and what matters in practice are often totally different things, so the method could be slightly cheating and still basically work.

    I do completely agree about the what-to-test problem. Often, even those of us with experience and some marketing savvy have a really hard time figuring out what variations to try out, and the best and worst part about testing is that our best guesses often end up being wrong. I think if you look at this technique as a way to mitigate that risk of going in a bad direction, it potentially has a lot of value.

    Totally unrelated, but I’ve also noticed recently that part of the problem with using the “control” version is that people are sometimes coming back to an ad they’ve seen before or are familiar with, giving the control a bit of an advantage at the beginning of some tests. This is even trickier with landing-page and site testing. In some cases, I’ve ended up throwing out the first week of data altogether or seen a completely different result in the first half of testing versus second half.

  12. To be precise, Pete, the “5th Beatle” (the control) isn’t actually part of the band, or the test set.

    What you are testing (in my example) is:
    3 Identical Ads
    1 Unique Ad

    The control itself isn’t part of the test set because we have an uncontrollable variable (the performance history of that uniquely identified ad instance) that will favor the control.

    The reason I started doing this was simply to mitigate the risk of test ads that fail, and I only used it when the test ad was a radical departure from the control and not an evolutionary tweak.

    Shortly after I was awoken one night by the idea that it also eliminated this troublesome variable of performance history. That’s when I started looking for help from math geeks, and working on this post. Two years ago. :D

    Yes, in practice, most testing methods (even the A/B method I dismissed early in my post) are better than no testing at all. But I’d still like to improve my methods if I can.

    As a side note, making additional copies of the control so that they too gain historical performance benefits over time allows you to run tests with even lower risk, but this would have complicated the explanation in the post without adding much value IMO.

    Good point about the “repeat view” vs. “initial view” issue. With Adwords, we can’t control that, so the best ad is the one that performs best over time. With landing pages, this can be a big deal.

    Andy Edmonds ran a little “opt in box placement” test on Brad Fallon’s “Free Line Report” site, where placement on the right side of the page improved opt-ins substantially. When he sliced it a little deeper, it turned out that was only true for first-time visitors.

    You can also get some very interesting “day of week” effects with Adwords ads, landing page elements, calls to action, and I guess everything else.

    I have one situation where we’re running a different ad on Monday-Thursday than what we run Friday-Sunday, which requires two separate campaign containers in Adwords. But it’s worth an extra 12% or so, from a pretty high volume keyword.

    (For anyone who isn’t aware, Dr. Pete is a cognitive psychologist and usability expert and really smart and stuff. If I were locked in a room with him and Andy Edmonds, who is also all 3 of those things, I would attempt to change the subject to baseball or something, for fear of a brain meltdown.)

    Thank you again, Pete – it’s people like you who make it possible for marketers to advance the art.

  13. To clarify, sorry…

    “Good point about the “repeat view” vs. “initial view” issue. With Adwords, we can’t control that, so the best ad is the one that performs best over time.”

    One of the reasons for the final test of “control vs. test ad” is to try to reduce the impact of this.

    We should probably think about a process to re-test old control ads from time to time. Might actually pay off better than only trying new ads.

  14. Yeah, can’t I be smart AND like baseball? :) You’re too kind, Dan. As my time-distance from graduate school increases, I’ve forgotten more about stats than I care to admit. Applying statistics in the real world is tricky business, but it can be fun for us geeks.

    I did misunderstand about the historical performance. I was thinking strictly in the usual sense of a control, but you’re also talking about the AdWords historical performance and the built-in bias the system applies to ads. That’s a really interesting factor that I hadn’t fully considered (since a lot of my split-testing is more on the conversion/usability side).

  15. You can be smart and like baseball. I like baseball too… but if we talked basebal, then I wouldn’t hear stuff like “low conversion rates have a log-normal not normal distribution” and get tired head all over again.

    I survived the whole calculus-to-stats trip like 30 years ago and it gave me bad tired head.

    Anyway, sounds like Andy’s on his way over here to fix me now…

  16. As the StomperNet stats geek, I hope to offer a calculator service I can vouch for someday.

    The cool thing about Dan’s splitting method is that you don’t even really need to run stats.

    If you do, you have to compare a single one of the controls to the test ad… you shouldn’t add up the views and clicks for all the controls and compare that to the test ad as you’ll be violating assumptions of equal variance (Sin #6 in the 7 deadly sins of testing, http://alwaysbetesting.com/abtest/index.cfm/2008/3/16/Getting-Serious-About-Testing-Learn-from-the-Pros).

    The good news is that Dan’s method provides an empirical validation of your variance by what’s known as an A-A test.

    Compare the performance of the copies of your ad to one another. Take the difference between the lowest performing control copy and the best performing control copy. If your test ad doesn’t beat every one of the controls by that amount, you either don’t have enough data, or it’s not a winner.

    So, imagine you have 3 copies (C1, C2, and C3) with 2.1%, 2.5%, and 2.7% click through and your test ad gets 2.9 clickthrough. The difference from the best control to the worst control is .6%. The difference between your test ad and your controls is .8%, .4%, and .2%. It’s not a winner – yet. You would need test performance of 2.7% + .6% = 3.4% to call a winner. With more data, the test ads should get closer together.

    Once all of the controls even out in performance, you know you’ve reached a point where each of the samples is big enough to generate consistent data and you can assuredly compare the control to the test ad and estimate the gain or loss.

    PS – a serious statistical analysis could generate cases where my heuristic above would be wrong. If you’ve got the tools to do real stats, great. If not, this heuristic is going to be reliable most of the time.

  17. Dan,

    Great post. We’ve run into the Google “bias” issue in our own A/B testing and this seems like a really simple solution to the problem. ( head slap sound here) It’s a real shame that Google doesn’t offer better testing tools. The ideal solution would be a test/control lab within Adwords that would remove historical performance from the equation and allow for balanced test groups. Yeah, I know – keep dreaming.

  18. Everyone got that?

    It’s actually easier than what I was doing, so that will help. Thanks, Andy.

    BTW, for anyone who *still* doesn’t get what I’m doing with StomperNet… it’s because of the brain trust.

  19. @Bob,

    I remember how that head slap feels.

    I’m still dreaming of Google making it easy to tell the difference between ads with identical headlines in Google Analytics, without tossing extra variables into the URL to put them into user-defined visitor segments.

    But we can always dream.

  20. Pingback: grumpy links • Tim Nash UK SEO Blog

  21. Brilliantly simple is all I can say.

    I second the opinion that this seems so obvious AFTER reading this post. I am surprised that so many of us really did not take into consideration the variable of “historical performance”.

    Can’t wait to spring this on some others and seem like the “geeky statistics guy with a great idea”. Keep up the great work.

  22. i’m not sure i follow your point about past peformance of your champion (control) copy – or the “Google Bias” as it has been called. We currently A/B test the “utterly wrong” way and also look at CTR/CPC/Avg. Position to make sure our ads are treated equally by Google.

    If that is the case (by the way, it has ALWAYS been the case) then we proceed to evaluate the 2 ads based on Lead Conversion/ROI.

    Your contention that the control ad should be duplicated to mitigate risk is fair, but you fail to mention how you differentiate the destination URL of each copy. Unless something has changed that I do not know about, Google will not allow an exact copy of an ad to run.

  23. Does anyone know if I need to add negative match queries into a google ad that is just setup for one exact match?

    My ad group is searching for:
    [baby gifts]

    Do I need to take out words I don’t want to be found for:
    -“home made”

    If I’m creating an [exact match] ad do I need to include the -negative matches that I don’t want to be found for?



    PS Dan, AWESOME video.

  24. Jason, nothing has changed, so maybe you have a different version of Adwords from everyone else. It sure sounds like it.

    In every account I have ever used, it’s as easy as creating a duplicate ad. They have to allow this for A/B/C testing to work, and that’s a very common testing method used by many advertisers.

    You just create a new ad, exactly like the other one, hit the save button, and it’s a copy.

    If you’re telling me that *every* ad test you run shows your ads with identical CTR, CPC, and position, that seems highly improbable… impossible even. You should buy lottery tickets with that kind of luck.

    The only way to eliminate performance history as a variable in ad ranking is to eliminate the existing ad instance from your test.

    You seem very happy with your testing assumptions, so by all means keep doing exactly what you’ve been doing.

  25. @Dan. Did not mean to imply that all of those metrics were exactly the same in every test, only that we look at those collectively to assure the champion ad is not getting a preferential position on the page. any difference we see, at all, is 0.1 (ie, 3.2 vs. 3.1). CPC is almost always identical and CTR variance is usually minimal. Differences we see are aligned with business performance (site conversion, revenue, etc.).

    Again, if i try to upload a duplicate ad in AdWords Editor, it is ignored. I just tried it again to make sure I wasn’t going crazy. I probably am going crazy, but it doesn’t appear to be b/c of this :)

  26. Jason,

    I don’t use Adwords Editor because I don’t do Windows unless it’s absolutely necessary. Try logging into your account through the web interface @ adwords.google.com, where you can make all the copies you like.

    I’m looking at an ad performance report right now, with a brand new test running, control’s average position is 2.5, copy’s average position is 3.3. CPC on copy is >10% higher.

    Not an especially high volume ad group (few hundred impressions a day), but within a few days, maybe a week, those will probably even out to something more like 2.5 vs 2.6.

  27. Dan,

    that is interesting that the UI will allow duplicates, but AWE will not. I guess that explains why we are seeing different results! For the volume that we manage, creating ads manually in the UI is really not an option – but something we may keep in mind for priority ad groups/campaigns.


  28. Kirk,

    Thank you!

    You don’t need to use negative matches for an exact match. “Does not apply.” :D

    Be careful with using “free” as a negative word, if you offer free shipping. Most ecommerce sites aren’t going to see much search for “free (whatever the product is)” but they will see some action on “(whatever the product us) free shipping.”

  29. Jason, if it helps any the Adwords API doesn’t seem to have a problem – it’s just that Google’s AWE doesn’t have the greatest feature set. So there are probably 3rd party tools that could do this.

    The most important tests are in the highest volume ad groups IMO – with Brad’s campaigns it’s beyond 80/20, more like 90% of the sales coming from a very small set of keywords – even with a very deep long tail effort.

  30. Dan that’s great, thank you.
    My site is a service based site so having a -Free negative match would be fine as shipping is irrelevant due to the delivery being email.

    I must admit it is starting to make sense a little more now that I’m reviewing your video again. The embedded matching section at the end of the video is currently melting my brain. I’ll get through it though.

    Thanks again.


  31. This is an awesome post, with a great level of insight into opportunities solopreneurs and other online marketers miss. Great. Sharing with my audience, well done :-)

    Maria Reyes-McDavis

  32. Hi Dan
    Just watched going natural 3.0. It’s great but I’ve got a couple of questions. After doing the embedded keyword technique across a number of accounts
    Ad Group exact 1. [keyphrase keyphrase]
    Ad Group phrase 2. “keyphrase keyphrase” -[keyphrase keyphrase]
    Ad Group broad 3. keyphrase keyprase -“keyphrase keyphrase”
    Google reports my ads not showing due to cancelling out by a negative term on ad groups 2 and 3. Can you advise please.

    By the way one of the slides in the video is slightly incorrect as in the personalized baby gifts broad match you show the embedded negative match to be -[personalized baby gifts] where I think it should be -“personalized baby gifts”

    Who says were not geeky (and thorough)here in the UK

  33. Dan would you say it’s best to use a Broad match (while taking out the negative matches), “Phrase Match” (again removing negative matched) or an [Exact] match ad system?

    After taking notes on your video and implementing two separate ad groups (1) “Phrase” and (2) [Exact]. However the exact match has had one impression in the last 24+ hours and the “phrase” match has only had 27 impressions in the last 24+ hours.

    I may be doing something wrong but I think I have followed your instructions to the letter.

  34. Ian,

    With a negative on “baby gifts” in the broad match group, that would also cause the “personalized baby gifts” traffic to flow into the phrase match group anyway – the error (you’re correct that there is one) is that the -[personalized baby gifts] belongs in the “baby gifts” phrase match ad group only.

    Google sometimes throws errors but you should see that your ads are showing. I’ve never seen one on any of my accounts but I’ve heard of it a couple times.

  35. I’m running 1 campaign but have 2 ad groups.
    1 is an exact match group with 4 ad variations, the other is a phrase match group again with 4 ad variations.

    I have no other campaigns running or ad groups.
    I’m happy to provide the keyword phrase if you feel it will help.

  36. BTW, folks – if your search terms are running a very low volume of queries, it’s probably safe to just combine the phrase & exact matches into one ad group.

  37. Pingback: Online Marketing Savvy Saturday Links with Web Success Diva

  38. Hi Dan,

    Thank you for answerig my questions.

    I have another questions abuout an embeded match explained on your video.

    Below is an sample on it.

    *Ad Group 1: Exact Match
    [baby gifts]

    *Ad Group 2: Phrase Match
    “baby gifts”
    -[baby gifts]

    *Ad Group 3: Exact Match
    baby gifts
    -“baby gifts”

    We don’t need -[baby gifts] in Ad Group 3, do we?

    How about in case of single word?

    Say, you bid “widget”.

    *Ad Group 1: Exact Match

    *Ad Group 2: Phrase Match

    *Ad Group 3: Broad Match

    I guess Ad Group 3 is meaningless.
    But what shoud I do about Ad Group 2.
    Is my example avobe correct?

  39. Ad Group exact 1. [baby gifts]
    Ad Group phrase 2. “baby gifts” -[baby gifts]
    Ad Group broad 3. baby gifts -“baby gifts”

    Ad Group exact 1. [personalized baby gifts]
    Ad Group phrase 2. “personalized baby gifts” -[personalized baby gifts]
    Ad Group broad 3. personalized baby gifts -“baby gifts”

  40. Sorry just messed up above

    So just to get this clear once and for all. Using embedded match groups should be
    Ad Group exact 1. [baby gifts]
    Ad Group phrase 2. “baby gifts” -[baby gifts]
    Ad Group broad 3. baby gifts -“baby gifts”

    Ad Group exact 1. [personalized baby gifts]
    Ad Group phrase 2. “personalized baby gifts” -[personalized baby gifts]
    Ad Group broad 3. personalized baby gifts -“baby gifts”
    As you said before that:
    “baby gifts” in the broad match group, that would also cause the “personalized baby gifts” traffic to flow into the phrase match group anyway

    Thanks in advance for your help

    Have I finally got this right?

    By the way google still reporting my ads not showing? But as you said sometimes their reporting is a bit wonky

  41. Hi Dan,

    I have 4 separate ads showing for my “Phrase Match” group.
    The CTR for one of my ads is really outperforming the others with a 17% CTR (and it’s not even in the top position for the keyword phrase).

    Anyway what I wanted to know is, how long should I test this for before I class it as my controlling ad and begin retesting other ad variations? Remember this is the first adwords campaign I have run so don’t actually have a standard controlling ad already set up.



  42. Ian, look at your actual stats – are the ads showing or not? Do you have these keywords in other ad groups or campaigns?

    The setup in your last comment looks right.

  43. Kirk, check your best ad vs. your worst ad among the 4 – because the ads are all different SplitTester should be fine for that.

    If your best ad is clearly better than the worst ad, with >90% confidence, then pause the worst ad. Then compare to the next ad, etc.

    Once you have the best headline & offer combination of the 4, that becomes your control.

    The big payoff on CTR is going to come from headline testing at first, but you should hit diminishing returns on that pretty fast.

    When I can’t seem to beat my best headline any more, that’s when I start testing offers. Sometimes we come up with an offer that doesn’t make sense with the headline, so we have to test other headlines with it.

  44. great post dan, truly.

    i wont try to figure out everything with you, because that would be lazy and unfair, but there are some basic questions to be asked.

    and here they are:

    -first, about the ad testing. how do know what to test against what, regarding testing a different benefit\offer against the control ad, or simply changing other details? should i perfect one offer, and just then try a complete different ad? that is a problem i am not sure there is a way of getting around it.

    – when you end up with only the original control ad against the test ad, do you delete them and copy both to be fresh and equal in statistics?

    -regarding the keywords exact, phrase and broad method you presented, i wonder what it says about the long tail. if you get you exact high volume keywords with lower bids that is great, but the way you suggested with phrase and broad, seems to lose the edge the long tail can give you. unless i am some what confused and you did not mean the the entire campaign to be made with one keyword ad group all the way.
    in the lovely video you show the beginning of the method only to cut to some other interesting advises. i am really curious about this part.

    -why actually start only with the exact group and not with the other two (this could be asked in relation to my last question)?

    thank you for time and words,


  45. only a small clarification for the first question-
    i meant how do you know if what you need to change is the benefit/offer/type of ad, or the small details within a certain ad?
    because you can test small changes forever while a better type of ad awaits, ad you can change for a different type of ad without giving chance to that one specific articulation for that benefit.

    maybe there is a good way to deal with it in bigger campaigns?


  46. Ilan, let’s see…

    1) How you decide what to test is a lengthy topic. What I generally do is get an effective (non-suicidal) headline working (e.g. Unique Baby Gifts) and test different offers.

    If I get a ‘great idea’ for a new headline (e.g. Baby Gifts That Rock!) I’ll put it into the queue, and test it when I get a chance – usually with a couple different offers besides whatever I have in the control.

    2) No, I don’t delete them and run the test over. Once we’re down to the final test, the test ad has to stand on its own. Historical performance shouldn’t be an issue by then, but mostly I am afraid of losing something by deleting a historically strong ad. :D

    3) The ATM method is about testing and grabbing profits from the high-volume “money” keywords.

    That doesn’t mean you don’t also target long tail, or use any other means at your disposal. It’s not intended to be exclusive.

    In the video, we suggest shutting down anything that isn’t profitable, but as I also say, you will want multiple campaigns and multiple ad groups in a successful account.

    All of those “other advices” are part of the plan – every piece is important because they work together.

    4) The point of starting with exact matching in ATM is that you can ONLY be certain of what the search query is, when you use exact match. This affects everything from your ad creative to the landing page.

    What we learn from testing exact match ads can almost always be applied rapidly to phrase & broad match ad groups, and with appropriate negative matches, you can grow a lot by using all of the match types.

  47. ok, that was quite helpful.

    so as i understand the process, the broad serves as a fishing road for additional ad groups (or even campaigns) as well as for negatives.

    if that is indeed the case, what about the functionality of broad later in campaign? i assume it stays in the high volume keywords for further opportunities.

    “You would need test performance of 2.7% + .6% = 3.4% to call a winner”

    does that have anything with statistical margins in order to make sure, or a whim? because it seems to me that time passes and defuse the error margin. i just don’t know how seriously take it into account.

    thank a bunch for the winner alert. it could save my life.

  48. This makes no sense to me. Performance history is based off ad text and keywords combinations. If you make a copy of an ad the ad gets the history from the control ad because they have the same text and keyword combination. This is the same reason you can move an ad group from one campaign to another and even though it is deleting the old one and creating a new one the history remains intact. Google also assigns a history to new ads that is in line with your old ads so that this problem does not occur. if they did not do this you would never be able to write a winning ad because of the head start the control had has going for it.

  49. Rob, when Google tells us explicitly that this isn’t a variable, I will stop trying to rule it out. Either way, I’m still not going to throw a profitable campaign up in the air with a 50/50 split test.

  50. Pingback: 18 Must Read Articles from my Twitter Account

  51. I totally disagree. The whole point of using a Chi-Squared calculation in Split Testing is determine how statistically significant a CTR is when there are different amounts of impressions. The Chi-Squared calc exists because the control group will have many more impressions than the test group. If you use this test, you can determine which is the better ad when the impressions are mismatched.


  52. Maybe you could explain what you disagree with?

    Do you believe that comparing data from a test I run this week to data from a “control” that happened in the past is valid?

    There are plenty of ways to determine a winner. I don’t remember anyone saying that Chi-Squared isn’t a valid method.

    This post is about making sure that you make valid comparisons, and a simple way to avoid throwing 50% or more of ad impressions into a test variation.

    Other than giving yourself a link, was there a point to your post? How is your tool different from splittester.com or any of the others out there?

  53. This process of reviewing ad testing is very time consuming especially if you have 2-3 keywords per adgroup which I do…has anyone used the winneralert mentioned in this thread…is it one of the better solutions? I have looked around and haven’t found anything except doing it manually. Thanks,Art

  54. Hi Dan, I know that you also advise splitting ad groups into broad, phrase and exact match, so does that mean that you run the Control, C1, C2, C3 & test for each of those match types? That would equate to a lot of testing and for campaigns with a low daily budget it could take days or even weeks to get decent results because the campaign budget would be maxed out with only a few clicks for each ad.

    What would you do in this situation, just work with exact match for example? Thanks.

  55. Dan,

    Great post, but I’m having some problems with split testing. I recently did a split test with 1 original add, 2 copies of the original, and 1 new ad I wanted to improve.

    Worked great! My new copy doubled the CTR, from 2.25% to 4.5%. So I paused the original and its 2 copies and left the new ad run on its own.

    That was a bad choice. Positions dropped a bit and CTR to 0 (after over 600 impressions.)

    When I switched the other ads on again, the clicks started again too,.. also on the new ad.

    Is the history of the ads some kind of average of all active ads? What’s your experience?


  56. Awesome technique!

    I go through my PPC accounts every two weeks to monitor performance and setup new test ads. I’ve been at this for a few years now, so I’ve come up with some pretty good control ads.

    This weekend I was considering not even testing anymore ads in the account. Like you said, most of the time the test ad will not beat the control and I have found I am loosing clicks/conversions to these tests.

    I was curious to see if there was a solution and have been searching for the last few hours on different testing techniques. I finally made it to your post and it makes more sense that anything I’ve seen. It’s a great idea.

    See ya. I’ve got work to do!

  57. Hey, am I the only one wondering about PPC Fast Start, or am I the only one who doesn’t know how to get hold of it :-)

    Dan, maybe the PPC Fast Start will answer this but I’ve been thinking long and hard about your advice for starting new AdWords campaigns, using exact match, and I just can’t seem to completely get my head round it.

    You say that you use exact match until you find profitable keywords but if you start so focused how do you know what people are searching for? I can only imagine that your keyword list is really, really long to account for every combination of keyword on the off-chance that some will prove profitable. I’m sure you use a few tools (AdWords keyword tool, etc) first to find some ideas, so is that how you go about it?

    I’m aware that if you start with phrase match you should really know which negative keywords to include, and that broad match can cost you in wasted clicks, but am I missing some fundamental point about exact match that I’ve not just covered? There are so many opinions out there on how to start new campaigns…

    Many thanks.

  58. Umm, I beg to differ regarding the point you make that doing a test with just 2 ads will eat up profits. Whether I am running it that way, or running 10 ad groups, I will still need X number of impressions/clicks to have valid results. And if that happens in 1 day or 1 month, I would rather have it done in a day.

  59. @Flaming Monkey – You’ve obviously missed the point. This strategy is simple yet works brilliantly.

    I had a few ads that were getting a great CTR and were converting very well. I stopped testing them because I didn’t want to waste 50% of my impressions on an ad that had a small chance of beating my control.

    This strategy lets me control what percentage of my impressions I use to test an ad that will probably not beat my control. Sure, I could split the impressions 50% control and 50% test ad and get quicker results, but why waste 50% of my impressions and potential profits on an ad with no history.

  60. @brentCrouch

    It wasn’t until the last post of this gigantic page that I understood the value of this test. Your two paragraphs summed up the whole dang thing.

  61. Hi Dan,

    Quick question. Have you ever performed any tests to verify if there was any connection between Google Adwords and Google Organic? I’ve always heard there wasn’t and it’s obvious you don’t have to pay for Adwords to get good organic listings on Google, but something I seen this month made me question if there is some connection.

    I have a site that has ranked #5 organically for a dvd related term for several years. In addition to organic, I’ve always used Adwords to drive traffic to the site as well.

    Last week, I increased my Adwords bid for this term to test a new landing page. Within a few days, my #5 listing shot to #1 and remained there for about a week. It eventually settled back down to #5 where it had previously been.

    Coincidence or is there some relation between the two?


    Brent Crouch

  62. Dan,
    Very interesting post…great “geeky” read. :)

    I do have one quick question…and please correct me if I have missed something obvious…

    Wouldn’t it just be easier to turn on Position Preference (1-1) or (5-5) for both A/B adds..then simply select the date range for the start of your test…which would effectively “throw out” the older data…

    You would then simply take the new values (the Control + New Ad) with “locked in position preference” and do a simple stats test to see if you have achieved 95% accuracy…

    Would that not work?


  63. This is obviously an interesting and different way to look at testing. Most people just do ab testing without looking into the true science of testing.

  64. Hi Dan, I have to say that your information is priceless and thanks for that. I just wanted to ask you if you dont mind to see if you can take a look at the link for the 50 minute video on this site about split testing adwords. The link seems to be broken

    regards and thanks heaps


  65. Hey Dan,

    I just referenced this post when explaining my strategy to a client who wanted to test a new ad that they were confident would “kill it”. (I was probably one of the head slaps you heard when you presented this concept at a Stompernet event, and have used that strategy ever since. ;-).

    Another great use of the additional copies is that they are built in validators, so you don’t even need outside tools like splittester. For example, if you’re running 3 copies of your control ad, you’ll know you don’t have statistical significance until those 3 ads’ performance stats are in alignment. Using that logic, you’re less likely to determine a winner or loser prematurely.

    Example of “Not enough testing”:
    Copy ad 1: CTR: 2.3%, Conversions: 25
    Copy ad 2: CTR: 1.5%, Conversions: 18
    Copy ad 3: CTR: 3.1%, Conversions: 41
    Test ad: CTR: 1.6%, Conversions: 17

    In the above example, even though you might be tempted to “pull the plug” on the test ad, you’ll know that you haven’t done enough testing solely because 3 other ads that should be performing identically – aren’t.

    Example of a statistically valid data set:
    Copy ad 1: CTR: 2.5%, Conversions: 86
    Copy ad 2: CTR: 2.4%, Conversions: 80
    Copy ad 3: CTR: 2.7%, Conversions: 87
    Test ad: CTR: 3.0%, Conversions: 95

    In the above example, if you were only running an A/B test, you might need to run the test a lot longer, but considering it’s an A/A/A/B test, and the fact that all 3 A’s are in alignment, you know you’ve been running the test for long enough to get reliable results.

    As you pointed out, you have to ignore the control ad because it may be getting preferential treatment due to performance history.

  66. I implemented a few of the things you talked about.. and wow! Great results!

    Would you be interested in checking out my campaign, and optimizing for me? What is your consulting fee.

    Best Regards

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>