May 6, 2008

Split Testing Adwords: You're Doing It Wrong

I'm sure the title of this post is going to make a few folks upset. Don't worry, the content is going to really make you mad. Not at me, but maybe at yourself, and especially at whoever taught you how to split test.

That's because, for most Adwords advertisers, the title of this post is very accurate. Most advertisers are doing split testing all wrong.

If you're doing  A/B, you're robbing yourself of profits at every step.

That's because, in a typical A/B split test, what you're doing is keeping your Control (the best performing ad) running, and creating a new test ad to run against it. Doing it that way is perfectly normal, but it's also completely, utterly, and totally wrong.

Most of the time, these test ads fail to beat the control… and predictably so. Your Control is the Control because it's doing well. After a few rounds of testing, it becomes less and less likely that your test ad will beat the control.

The "usual method" of ad testing delivers invalid results – here's why:

  • The Control has a strong performance history, but your test ad has no performance history.
  • As a result, your test ad may be ranked lower on the page, leading to lower CTR and (quite possibly)  conversion.

In other words, your test ad might have been better – but it didn't get a fair chance! This means that you may have already rejected many great ads, because you didn't test them correctly.

But that's not the worst part! "Control vs. test ad" also loses money:

When you run two ads in equal rotation, your Control is only getting half of the available ad impressions.

When a test ad fails fails to deliver good click-through and conversion rates, you've just given up as much as 50% of the profits that you would have had, if you had just left your ad group alone.

Wouldn't it be better if you could construct a valid test that didn't bleed money? Here's how it's done…

Step 1: Set Up A Valid Test

Instead of running your test ad against the Control, you create multiple copies of your Control.

How many copies you make is determined by how many of the ad impressions you want to go to the test ad.

In the example above, we've created 3 copies of the Control, left the Control running, and set up one test ad. Therefore, the test ad only gets 20% of the ad impressions.

This method leaves far less of your profit at risk.

It also allows you to run a valid test, by comparing the performance of your test ad against the copies of the Control only.

Since none of the ads in the "testing pool" has performance history, you can have far more confidence in the test results.

This method has reversed the results of many "close misses" I had with the old method of testing, which I was using myself until 2006.

Step Two: End Test (Fail) or Ramp Up To Full Test

In one of my many "hallway conversations" at the last Stompernet Live event, Andy Edmonds (Chief Scientist @ Stompernet and a real expert in testing) pointed out the real advantage of this method… which is that failed tests usually "fail early."

Scientific testing types have a much more formal language for this, of course, but I got tired head when he explained that part of it.

I'll just summarize the main points for you:

  • You can usually kill your test ad pretty early because most test ads will be obvious failures.
  • Running 3 copies of the Control actually accelerates your testing most of the time.
  • If the test ad appears to perform well, you can progressively eliminate copies of the Control to ramp up the pace of the test.
  • The final test is the validation test, where you run your Test Ad vs. Control – but now, you're doing it right!

Since I've started doing this "Andy's way," my ad tests are running even faster than before. Way cool.

But what's the real goal of all this testing? Higher click-through rates? Lower cost per click? These statistics are merely means to a greater end, my friend, and they can often lead you astray.

Indulge me for a minute, please. I didn't get to be the "Keyword Guru" for nothing, you know…

There's one other huge mistake that most folks make when they're split testing ads - they don't compare the business performance of their ads, just the click-through rate.

The problem should be obvious – you aren't trying to buy traffic, you're trying to buy customers. Actually, that's only sort of true. What you're really trying to buy is profit. Specifically, what you really want to achieve is the maximum possible profit per keyword.

At Stompernet, we call this statistic "Profit Per SERP"

Understanding how much profit you make every time one of your targeted keywords gets searched is a powerful idea. One of the most important metrics that any enlightened website owner will learn to measure and improve. It's the KEY to search marketing success.

You have a lot of control over every variable in the search marketing equation, except for one thing…

You can NOT change the number of times people carry out a given search.

During the month of May, some number of people are going to search for "mortgage rates," or "baby gifts," or "free seo book." We are powerless to change this number.

The number that we're trying to change is our average profit for each of those searches.

By doing Adwords right, you can change that number, and in doing so, change the rules of the game.

We'll talk more later.

For now, I'd like to invite you to watch the 50 minute instructional video that I prepared for you, with the help of master video editor Andy Jenkins. Watch online or download the Quicktime – but do watch… you don't want to miss this.

Filed under Blog by

Permalink Print Comment

Comments on Split Testing Adwords: You're Doing It Wrong »

May 7, 2008

Japanese SEO @ 5:25 am

Hi Dan,

Just awful!

I'm very excited at your techniques.

You say,"How many copies you make is determined by how many of the ad impressions you want to go to the test ad."

I wonder how many copies do you usually create? Do you have a standard or suggestion?

You also say, "eliminate copies of control one at a time,"

Do you mean I must not eliminate them at the same time? Why? When should I eliminate the next one?

P.S. I have watched your video. I was strongly impressed at embedded match

I'm looking forward to PPC Fast Start coming.

Chris @ 9:13 am

Thanks! I've been working with a PPC campaign for the past month and trying to run split test and this really helps.

Todd @ 9:23 am

How is this technique faster when the test ad gets 20% of impressions as opposed to 50%?

Also, this way your "proven" ad only gets 20% impressions as well. In other words: the ad that gets "top" placement gets 20% impressions as opposed to 50% for FURTHER loss in CTR.

Stephan Miller @ 9:33 am

I never thought about making copies of the control ad. That makes perfect sense. So much sense, I should have seen it. Time to change the way I have been doing things. I have basically been playing beat the best ad. But if the ad is always good, why give 50% of it's views to something yet untested.

Dan Thies @ 9:52 am

@Japanese SEO, I normally run 3 copies of the control for my tests.

If the test ad doesn't fail quickly (they usually do) I delete one copy of the control, run some more testing, then another, then another, until I am confident enough to let the test run vs. the control.

I recommend using Splittester.com as a simple tool to let you compare your results. You can combine the clickthrough and impression totals for the copies of the control and compare that to the single test ad, AND you can run each of the comparisons separately which might end your test faster.

@Todd, in the example shown, your proven AD CREATIVE gets 80% of the impressions. The copy with performance history is shown 20% of the time.

We have to eliminate performance history as a variable to do a valid test, that's why we make copies.

What you are comparing is the performance of two different ad creatives. Individual tests may take more or less time to conclude but overall we've been able to complete more tests than we did before. It would depend on the statistical model you use, and the level of probability/confidence you want to have in your decisions.

We used to run 2 copies of the control and use A/B/C for comparison, but I got yelled at by a statistics geek for using A/B/C testing, and by other folks for throwing too many impressions into unproven ad creatives.

Dan Thies @ 9:57 am

@Stephan, almost nobody thinks about making copies of the control ad. Every time I've presented this to an audience, I immediately hear the sound of several dozen head slaps.

Dan Thies @ 10:09 am

@Japanese SEO, you can eliminate one or all of the copies when you have confidence that your test ad is performing well. Nobody can decide for you how quickly you want to ramp up.

Throwing more impressions into a promising test will hasten the conclusion of the test, but it also puts more profit at risk – you need to strike a balance.

It's a lot easier to make that decision with good analytics behind you. The performance of your ad is not just about click-through rate.

Dr. Pete @ 11:15 am

I'm inclined to agree with a couple of the other commenters: I understand how this mitigates your risk (not showing a potentially bad ad 50% of the time), but statistically speaking, only showing the new ad 20% of the time is absolutely going to increase your testing time. If it isn't, then you changed your method and are making an unfair comparison. Of course, if you have a strong campaign and your primary goal is not to mess it up, then the 80%/20% approach is probably smart.

More generally, though, if you have an ad with solid performance (CTR or conversion), your testing is probably better spent elsewhere, either on landing pages or general site conversion. Why mess with something that's working, especially if you have such low confidence in the alternate versions you're testing?

Kiowa Jackson @ 11:31 am

Doesn't sending 20% of the traffic just mean that it will take longer for the test to reach significance? You need to send X amount of traffic to an ad to get statistically significant numbers, and if you do that by sending 20% of your traffic for 10 days instead of 50% of your total traffic for 4 days, doesn't that achieve the same results only slower?

I'm asking because I'm a big fan of doing things wrong quickly. If you need to get a certain number of conversions to be able to judge an ads effectiveness it seems to me it's best to get those numbers as fast as possible and move on.

Dan Thies @ 11:38 am

Pete, I'll have to take your word for that, or at least not put up an argument, because I am not a statistics geek, and depending on which one you talk to they all have different ideas.

I've heard from one math whiz that the method Splittester.com uses isn't valid for this, but I can't argue with him either.

What I do know is that ads that fail often lose in a 1:1 comparison with one of the 3 control ads, and based on that, I am often able to terminate a test early. I'm sure different math geeks will tell me that this either is or is not valid.

When you get into the other business factors besides click-through rate, the analysis gets real complicated with error bars and stuff, which is where I personally have to let other people deal with that.

I'm not sure what you mean by "changed your method." Because I've never been able to get a straight story from the math geek community about how to evaluate the results of a test, I did not prescribe a specific method in this post.

You keep testing ads because every improvement in represents a substantial increase in the value of the business over time. Failure is just a test result, and if nothing you try ever fails, you aren't doing enough testing.

I certainly do have a lot of ad groups which are waiting for someone to come up with a better idea. Others where we only give 10% of the impressions to testing. Still others where we have 2 or even 3 ads running with a statistical dead heat after more than a year.

But every time I think we'll never beat the control again, someone comes up with a better idea.

If you make major changes to your web site, branding, merchandising, etc. this can also affect the outcome of your advertising campaigns, to the point where a different ad creative may perform better than the existing control.

The complexity of this stuff is bottomless. I'm just trying to help folks out here, by illustrating some stuff that nearly everyone is missing.

Thanks for your thoughtful comments, and if you have some math/science to add to this discussion, it will enlighten me at least.

Dan Thies @ 11:49 am

@Kiowa, the length of a successful test (test ad beats the copies and gets to the championship round) is only slightly longer with this method. Play around with some hypotheticals @ SplitTester comparing "what if I ran this same test at 50/50" and you'll see that.

What cuts our testing time is the early failures.

We often see the test ad lose a 1:1 comparison vs. 1 of the copies of the control, where the combined stats for all copies may say it's still too early to reach a conclusion.

Based on high confidence from a 1:1 comparison, I'll kill a test ad that isn't giving me any other specific reasons to hope for a better outcome overall.

That saves time.

Dr. Pete @ 11:50 am

Sorry, by "changed your method", I just meant that you not only went from 50/50 to 80/20, but also started using this new statistical technique, so the comparison may be apples vs. oranges. I think I understand now; you essentially treat it as a 5-way test, but 4 of the options are really the same. That's interesting, but I'd have to dig into it. I think there's an assumption that the 5 versions are independent, and by making 4 of them copies, you may be violating that assumption. On the other hand, what matters in hard-core mathematical theory and what matters in practice are often totally different things, so the method could be slightly cheating and still basically work.

I do completely agree about the what-to-test problem. Often, even those of us with experience and some marketing savvy have a really hard time figuring out what variations to try out, and the best and worst part about testing is that our best guesses often end up being wrong. I think if you look at this technique as a way to mitigate that risk of going in a bad direction, it potentially has a lot of value.

Totally unrelated, but I've also noticed recently that part of the problem with using the "control" version is that people are sometimes coming back to an ad they've seen before or are familiar with, giving the control a bit of an advantage at the beginning of some tests. This is even trickier with landing-page and site testing. In some cases, I've ended up throwing out the first week of data altogether or seen a completely different result in the first half of testing versus second half.

Dan Thies @ 12:18 pm

To be precise, Pete, the "5th Beatle" (the control) isn't actually part of the band, or the test set.

What you are testing (in my example) is: 3 Identical Ads vs 1 Unique Ad

The control itself isn't part of the test set because we have an uncontrollable variable (the performance history of that uniquely identified ad instance) that will favor the control.

The reason I started doing this was simply to mitigate the risk of test ads that fail, and I only used it when the test ad was a radical departure from the control and not an evolutionary tweak.

Shortly after I was awoken one night by the idea that it also eliminated this troublesome variable of performance history. That's when I started looking for help from math geeks, and working on this post. Two years ago. :D

Yes, in practice, most testing methods (even the A/B method I dismissed early in my post) are better than no testing at all. But I'd still like to improve my methods if I can.

As a side note, making additional copies of the control so that they too gain historical performance benefits over time allows you to run tests with even lower risk, but this would have complicated the explanation in the post without adding much value IMO.

Good point about the "repeat view" vs. "initial view" issue. With Adwords, we can't control that, so the best ad is the one that performs best over time. With landing pages, this can be a big deal.

Andy Edmonds ran a little "opt in box placement" test on Brad Fallon's "Free Line Report" site, where placement on the right side of the page improved opt-ins substantially. When he sliced it a little deeper, it turned out that was only true for first-time visitors.

You can also get some very interesting "day of week" effects with Adwords ads, landing page elements, calls to action, and I guess everything else.

I have one situation where we're running a different ad on Monday-Thursday than what we run Friday-Sunday, which requires two separate campaign containers in Adwords. But it's worth an extra 12% or so, from a pretty high volume keyword.

(For anyone who isn't aware, Dr. Pete is a cognitive psychologist and usability expert and really smart and stuff. If I were locked in a room with him and Andy Edmonds, who is also all 3 of those things, I would attempt to change the subject to baseball or something, for fear of a brain meltdown.)

Thank you again, Pete – it's people like you who make it possible for marketers to advance the art.

Dan Thies @ 12:20 pm

To clarify, sorry…

"Good point about the "repeat view" vs. "initial view" issue. With Adwords, we can't control that, so the best ad is the one that performs best over time."

One of the reasons for the final test of "control vs. test ad" is to try to reduce the impact of this.

We should probably think about a process to re-test old control ads from time to time. Might actually pay off better than only trying new ads.

Dr. Pete @ 12:40 pm

Yeah, can't I be smart AND like baseball? :) You're too kind, Dan. As my time-distance from graduate school increases, I've forgotten more about stats than I care to admit. Applying statistics in the real world is tricky business, but it can be fun for us geeks.

I did misunderstand about the historical performance. I was thinking strictly in the usual sense of a control, but you're also talking about the AdWords historical performance and the built-in bias the system applies to ads. That's a really interesting factor that I hadn't fully considered (since a lot of my split-testing is more on the conversion/usability side).

Dan Thies @ 12:54 pm

You can be smart and like baseball. I like baseball too… but if we talked basebal, then I wouldn't hear stuff like "low conversion rates have a log-normal not normal distribution" and get tired head all over again.

I survived the whole calculus-to-stats trip like 30 years ago and it gave me bad tired head.

Anyway, sounds like Andy's on his way over here to fix me now…

AndyEd @ 1:04 pm

As the StomperNet stats geek, I hope to offer a calculator service I can vouch for someday.

The cool thing about Dan's splitting method is that you don't even really need to run stats.

If you do, you have to compare a single one of the controls to the test ad… you shouldn't add up the views and clicks for all the controls and compare that to the test ad as you'll be violating assumptions of equal variance (Sin #6 in the 7 deadly sins of testing, http://alwaysbetesting.com/abtest/index.cfm/2008/3/16/Getting-Serious-About-Testing-Learn-from-the-Pros).

The good news is that Dan's method provides an empirical validation of your variance by what's known as an A-A test.

Compare the performance of the copies of your ad to one another. Take the difference between the lowest performing control copy and the best performing control copy. If your test ad doesn't beat every one of the controls by that amount, you either don't have enough data, or it's not a winner.

So, imagine you have 3 copies (C1, C2, and C3) with 2.1%, 2.5%, and 2.7% click through and your test ad gets 2.9 clickthrough. The difference from the best control to the worst control is .6%. The difference between your test ad and your controls is .8%, .4%, and .2%. It's not a winner – yet. You would need test performance of 2.7% + .6% = 3.4% to call a winner. With more data, the test ads should get closer together.

Once all of the controls even out in performance, you know you've reached a point where each of the samples is big enough to generate consistent data and you can assuredly compare the control to the test ad and estimate the gain or loss.

PS – a serious statistical analysis could generate cases where my heuristic above would be wrong. If you've got the tools to do real stats, great. If not, this heuristic is going to be reliable most of the time.

Bob B. @ 1:12 pm

Dan,

Great post. We've run into the Google "bias" issue in our own A/B testing and this seems like a really simple solution to the problem. ( head slap sound here) It's a real shame that Google doesn't offer better testing tools. The ideal solution would be a test/control lab within Adwords that would remove historical performance from the equation and allow for balanced test groups. Yeah, I know – keep dreaming.

Dan Thies @ 1:22 pm

Everyone got that?

It's actually easier than what I was doing, so that will help. Thanks, Andy.

BTW, for anyone who still doesn't get what I'm doing with StomperNet… it's because of the brain trust.

Dan Thies @ 1:25 pm

@Bob,

I remember how that head slap feels.

I'm still dreaming of Google making it easy to tell the difference between ads with identical headlines in Google Analytics, without tossing extra variables into the URL to put them into user-defined visitor segments.

But we can always dream.

May 8, 2008
(Pingback)

grumpy links • Tim Nash UK SEO Blog @ 6:38 am

[...] do have PPC clients or campaigns running. Also worth checking out some of the commentary on Dans' Split Testing: You're doing it wrong (the guy talking on the vid) and Andys' Banned from Google which was a way cooler title then I [...]

Terry Van Horne @ 6:58 am

Dan, as usual you have me thinking "why didn't I think of that!". The video offered top right is awesome too!

David Mullings @ 7:50 am

Brilliantly simple is all I can say.

I second the opinion that this seems so obvious AFTER reading this post. I am surprised that so many of us really did not take into consideration the variable of "historical performance".

Can't wait to spring this on some others and seem like the "geeky statistics guy with a great idea". Keep up the great work.

Jason C. @ 8:21 am

i'm not sure i follow your point about past peformance of your champion (control) copy – or the "Google Bias" as it has been called. We currently A/B test the "utterly wrong" way and also look at CTR/CPC/Avg. Position to make sure our ads are treated equally by Google.

If that is the case (by the way, it has ALWAYS been the case) then we proceed to evaluate the 2 ads based on Lead Conversion/ROI.

Your contention that the control ad should be duplicated to mitigate risk is fair, but you fail to mention how you differentiate the destination URL of each copy. Unless something has changed that I do not know about, Google will not allow an exact copy of an ad to run.

Does anyone know if I need to add negative match queries into a google ad that is just setup for one exact match?

E.g. My ad group is searching for: [baby gifts]

Do I need to take out words I don't want to be found for: -homemade -"home made" -free -cheap -etc

If I'm creating an [exact match] ad do I need to include the -negative matches that I don't want to be found for?

Thanks

Kirk

PS Dan, AWESOME video.

Dan Thies @ 8:56 am

Jason, nothing has changed, so maybe you have a different version of Adwords from everyone else. It sure sounds like it.

In every account I have ever used, it's as easy as creating a duplicate ad. They have to allow this for A/B/C testing to work, and that's a very common testing method used by many advertisers.

You just create a new ad, exactly like the other one, hit the save button, and it's a copy.

If you're telling me that every ad test you run shows your ads with identical CTR, CPC, and position, that seems highly improbable… impossible even. You should buy lottery tickets with that kind of luck.

The only way to eliminate performance history as a variable in ad ranking is to eliminate the existing ad instance from your test.

You seem very happy with your testing assumptions, so by all means keep doing exactly what you've been doing.

Dan Thies @ 8:58 am

@ Terry, nice to see you, and thanks!

@David, thanks!

@ ALL, watch that video too, if you haven't. There's more.

Jason C. @ 9:07 am

@Dan. Did not mean to imply that all of those metrics were exactly the same in every test, only that we look at those collectively to assure the champion ad is not getting a preferential position on the page. any difference we see, at all, is 0.1 (ie, 3.2 vs. 3.1). CPC is almost always identical and CTR variance is usually minimal. Differences we see are aligned with business performance (site conversion, revenue, etc.).

Again, if i try to upload a duplicate ad in AdWords Editor, it is ignored. I just tried it again to make sure I wasn't going crazy. I probably am going crazy, but it doesn't appear to be b/c of this :)

Dan Thies @ 9:20 am

Jason,

I don't use Adwords Editor because I don't do Windows unless it's absolutely necessary. Try logging into your account through the web interface @ adwords.google.com, where you can make all the copies you like.

I'm looking at an ad performance report right now, with a brand new test running, control's average position is 2.5, copy's average position is 3.3. CPC on copy is >10% higher.

Not an especially high volume ad group (few hundred impressions a day), but within a few days, maybe a week, those will probably even out to something more like 2.5 vs 2.6.

Jason C. @ 9:28 am

Dan,

that is interesting that the UI will allow duplicates, but AWE will not. I guess that explains why we are seeing different results! For the volume that we manage, creating ads manually in the UI is really not an option – but something we may keep in mind for priority ad groups/campaigns.

Thanks, Jason

Dan Thies @ 9:33 am

Kirk,

Thank you!

You don't need to use negative matches for an exact match. "Does not apply." :D

Be careful with using "free" as a negative word, if you offer free shipping. Most ecommerce sites aren't going to see much search for "free (whatever the product is)" but they will see some action on "(whatever the product us) free shipping."

Dan Thies @ 9:39 am

Jason, if it helps any the Adwords API doesn't seem to have a problem – it's just that Google's AWE doesn't have the greatest feature set. So there are probably 3rd party tools that could do this.

The most important tests are in the highest volume ad groups IMO – with Brad's campaigns it's beyond 80/20, more like 90% of the sales coming from a very small set of keywords – even with a very deep long tail effort.

Dan that's great, thank you. My site is a service based site so having a -Free negative match would be fine as shipping is irrelevant due to the delivery being email.

I must admit it is starting to make sense a little more now that I'm reviewing your video again. The embedded matching section at the end of the video is currently melting my brain. I'll get through it though.

Thanks again.

Kirk

Web Success Diva @ 3:06 pm

This is an awesome post, with a great level of insight into opportunities solopreneurs and other online marketers miss. Great. Sharing with my audience, well done :-)

Maria Reyes-McDavis

Dan Thies @ 3:16 pm

Wow, thanks, Maria!

If you want to link direct to the video (that's a tracking link) it's @ http://www.stompernet.net/goingnatural3/

Dan

May 9, 2008

Ian Bowland @ 8:44 am

Hi Dan Just watched going natural 3.0. It's great but I've got a couple of questions. After doing the embedded keyword technique across a number of accounts Ad Group exact 1. [keyphrase keyphrase] Ad Group phrase 2. "keyphrase keyphrase" -[keyphrase keyphrase] Ad Group broad 3. keyphrase keyprase -"keyphrase keyphrase" Google reports my ads not showing due to cancelling out by a negative term on ad groups 2 and 3. Can you advise please.

By the way one of the slides in the video is slightly incorrect as in the personalized baby gifts broad match you show the embedded negative match to be -[personalized baby gifts] where I think it should be -"personalized baby gifts"

Who says were not geeky (and thorough)here in the UK

Dan would you say it's best to use a Broad match (while taking out the negative matches), "Phrase Match" (again removing negative matched) or an [Exact] match ad system?

After taking notes on your video and implementing two separate ad groups (1) "Phrase" and (2) [Exact]. However the exact match has had one impression in the last 24+ hours and the "phrase" match has only had 27 impressions in the last 24+ hours.

I may be doing something wrong but I think I have followed your instructions to the letter.

Dan Thies @ 12:04 pm

Ian,

With a negative on "baby gifts" in the broad match group, that would also cause the "personalized baby gifts" traffic to flow into the phrase match group anyway – the error (you're correct that there is one) is that the -[personalized baby gifts] belongs in the "baby gifts" phrase match ad group only.

Google sometimes throws errors but you should see that your ads are showing. I've never seen one on any of my accounts but I've heard of it a couple times.

Dan Thies @ 12:05 pm

Kirk, are you running these terms in other ad groups or campaigns?

Did you start with just an exact match ad group?

I'm running 1 campaign but have 2 ad groups. 1 is an exact match group with 4 ad variations, the other is a phrase match group again with 4 ad variations.

I have no other campaigns running or ad groups.
I'm happy to provide the keyword phrase if you feel it will help.

Dan Thies @ 1:04 pm

BTW, folks – if your search terms are running a very low volume of queries, it's probably safe to just combine the phrase & exact matches into one ad group.

[...] as part of your online marketing campaign, you must be split-testing. But, like Dan Thies explains, you're likely doing AdWords split-testing wrong. Great [...]

May 11, 2008

Japanese SEO @ 9:47 pm

Hi Dan,

Thank you for answerig my questions.

I have another questions abuout an embeded match explained on your video.

Below is an sample on it.

*Ad Group 1: Exact Match [baby gifts]

*Ad Group 2: Phrase Match "baby gifts" -[baby gifts]

*Ad Group 3: Exact Match baby gifts -"baby gifts"

We don't need -[baby gifts] in Ad Group 3, do we?

How about in case of single word?

Say, you bid "widget".

*Ad Group 1: Exact Match [widget]

*Ad Group 2: Phrase Match "widget" -[widget]

*Ad Group 3: Broad Match widget -"widget"

I guess Ad Group 3 is meaningless. But what shoud I do about Ad Group 2. Is my example avobe correct?

May 12, 2008

Ian Bowland @ 7:44 am

Ad Group exact 1. [baby gifts] Ad Group phrase 2. "baby gifts" -[baby gifts] Ad Group broad 3. baby gifts -"baby gifts"

Ad Group exact 1. [personalized baby gifts] Ad Group phrase 2. "personalized baby gifts" -[personalized baby gifts] Ad Group broad 3. personalized baby gifts -"baby gifts"

Ian Bowland @ 7:52 am

Sorry just messed up above

So just to get this clear once and for all. Using embedded match groups should be Ad Group exact 1. [baby gifts] Ad Group phrase 2. "baby gifts" -[baby gifts] Ad Group broad 3. baby gifts -"baby gifts"

Ad Group exact 1. [personalized baby gifts] Ad Group phrase 2. "personalized baby gifts" -[personalized baby gifts] Ad Group broad 3. personalized baby gifts -"baby gifts" As you said before that: "baby gifts" in the broad match group, that would also cause the "personalized baby gifts" traffic to flow into the phrase match group anyway

Thanks in advance for your help

Have I finally got this right?

By the way google still reporting my ads not showing? But as you said sometimes their reporting is a bit wonky

Hi Dan,

I have 4 separate ads showing for my "Phrase Match" group. The CTR for one of my ads is really outperforming the others with a 17% CTR (and it's not even in the top position for the keyword phrase).

Anyway what I wanted to know is, how long should I test this for before I class it as my controlling ad and begin retesting other ad variations? Remember this is the first adwords campaign I have run so don't actually have a standard controlling ad already set up.

Thanks

Kirk

Dan Thies @ 10:13 am

Ian, look at your actual stats – are the ads showing or not? Do you have these keywords in other ad groups or campaigns?

The setup in your last comment looks right.

Dan Thies @ 10:19 am

Kirk, check your best ad vs. your worst ad among the 4 – because the ads are all different SplitTester should be fine for that.

If your best ad is clearly better than the worst ad, with >90% confidence, then pause the worst ad. Then compare to the next ad, etc.

Once you have the best headline & offer combination of the 4, that becomes your control.

The big payoff on CTR is going to come from headline testing at first, but you should hit diminishing returns on that pretty fast.

When I can't seem to beat my best headline any more, that's when I start testing offers. Sometimes we come up with an offer that doesn't make sense with the headline, so we have to test other headlines with it.

May 13, 2008

Kirk @ Mobile Unlocked @ 5:45 am

Dan do you think that this system would work with Yahoo search marketing?

May 14, 2008

Dan Thies @ 1:32 pm

Kirk, there are differences in YSM that make it less useful as a testing platform, but you can still split test, yes.

May 21, 2008

ilan @ 8:22 am

great post dan, truly.

i wont try to figure out everything with you, because that would be lazy and unfair, but there are some basic questions to be asked.

and here they are:

-first, about the ad testing. how do know what to test against what, regarding testing a different benefit\offer against the control ad, or simply changing other details? should i perfect one offer, and just then try a complete different ad? that is a problem i am not sure there is a way of getting around it.

  • when you end up with only the original control ad against the test ad, do you delete them and copy both to be fresh and equal in statistics?

-regarding the keywords exact, phrase and broad method you presented, i wonder what it says about the long tail. if you get you exact high volume keywords with lower bids that is great, but the way you suggested with phrase and broad, seems to lose the edge the long tail can give you. unless i am some what confused and you did not mean the the entire campaign to be made with one keyword ad group all the way. in the lovely video you show the beginning of the method only to cut to some other interesting advises. i am really curious about this part.

-why actually start only with the exact group and not with the other two (this could be asked in relation to my last question)?

thank you for time and words,

ilan

ilan @ 8:41 am

only a small clarification for the first question- i meant how do you know if what you need to change is the benefit/offer/type of ad, or the small details within a certain ad? because you can test small changes forever while a better type of ad awaits, ad you can change for a different type of ad without giving chance to that one specific articulation for that benefit.

maybe there is a good way to deal with it in bigger campaigns?

ilan

Dan Thies @ 10:09 am

Ilan, let's see…

1) How you decide what to test is a lengthy topic. What I generally do is get an effective (non-suicidal) headline working (e.g. Unique Baby Gifts) and test different offers.

If I get a 'great idea' for a new headline (e.g. Baby Gifts That Rock!) I'll put it into the queue, and test it when I get a chance – usually with a couple different offers besides whatever I have in the control.

2) No, I don't delete them and run the test over. Once we're down to the final test, the test ad has to stand on its own. Historical performance shouldn't be an issue by then, but mostly I am afraid of losing something by deleting a historically strong ad. :D

3) The ATM method is about testing and grabbing profits from the high-volume "money" keywords.

That doesn't mean you don't also target long tail, or use any other means at your disposal. It's not intended to be exclusive.

In the video, we suggest shutting down anything that isn't profitable, but as I also say, you will want multiple campaigns and multiple ad groups in a successful account.

All of those "other advices" are part of the plan – every piece is important because they work together.

4) The point of starting with exact matching in ATM is that you can ONLY be certain of what the search query is, when you use exact match. This affects everything from your ad creative to the landing page.

What we learn from testing exact match ads can almost always be applied rapidly to phrase & broad match ad groups, and with appropriate negative matches, you can grow a lot by using all of the match types.

Dan Thies @ 10:10 am

For those who have a lot of testing going on, Winner Alert can help: http://www.winneralert.com/

May 22, 2008

ilan @ 8:53 am

ok, that was quite helpful.

so as i understand the process, the broad serves as a fishing road for additional ad groups (or even campaigns) as well as for negatives.

if that is indeed the case, what about the functionality of broad later in campaign? i assume it stays in the high volume keywords for further opportunities.

"You would need test performance of 2.7% + .6% = 3.4% to call a winner"

does that have anything with statistical margins in order to make sure, or a whim? because it seems to me that time passes and defuse the error margin. i just don't know how seriously take it into account.

thank a bunch for the winner alert. it could save my life.

May 26, 2008

Rob @ 11:58 am

This makes no sense to me. Performance history is based off ad text and keywords combinations. If you make a copy of an ad the ad gets the history from the control ad because they have the same text and keyword combination. This is the same reason you can move an ad group from one campaign to another and even though it is deleting the old one and creating a new one the history remains intact. Google also assigns a history to new ads that is in line with your old ads so that this problem does not occur. if they did not do this you would never be able to write a winning ad because of the head start the control had has going for it.

Dan Thies @ 2:35 pm

Rob, when Google tells us explicitly that this isn't a variable, I will stop trying to rule it out. Either way, I'm still not going to throw a profitable campaign up in the air with a 50/50 split test.

June 30, 2008
(Pingback)

18 Must Read Articles from my Twitter Account @ 7:47 am

[...] Interesting thoughts on PPC testing [...]

October 5, 2008

Tenerife Car Rental @ 5:42 am

Hi Dan, Thanks for the useful TIPs,still on 'testing' what you are posting here! Keep on your good articles. cheers,

October 31, 2008

adwords strategy @ 4:27 pm

I totally disagree. The whole point of using a Chi-Squared calculation in Split Testing is determine how statistically significant a CTR is when there are different amounts of impressions. The Chi-Squared calc exists because the control group will have many more impressions than the test group. If you use this test, you can determine which is the better ad when the impressions are mismatched.

http://www.adwords-marketing-tool.com/toolbox/chisquared.aspx

November 1, 2008

Dan Thies @ 4:38 pm

Maybe you could explain what you disagree with?

Do you believe that comparing data from a test I run this week to data from a "control" that happened in the past is valid?

There are plenty of ways to determine a winner. I don't remember anyone saying that Chi-Squared isn't a valid method.

This post is about making sure that you make valid comparisons, and a simple way to avoid throwing 50% or more of ad impressions into a test variation.

Other than giving yourself a link, was there a point to your post? How is your tool different from splittester.com or any of the others out there?

November 9, 2008

art @ 1:30 pm

This process of reviewing ad testing is very time consuming especially if you have 2-3 keywords per adgroup which I do…has anyone used the winneralert mentioned in this thread…is it one of the better solutions? I have looked around and haven't found anything except doing it manually. Thanks,Art

November 12, 2008

Dan Thies @ 8:26 pm

I've been using WinnerAlert – it's a very simple, low cost, "does one thing well" solution. You get an email alert when you have a winning ad in a test.

November 19, 2008

Gordon @ 4:01 am

Hi Dan, I know that you also advise splitting ad groups into broad, phrase and exact match, so does that mean that you run the Control, C1, C2, C3 & test for each of those match types? That would equate to a lot of testing and for campaigns with a low daily budget it could take days or even weeks to get decent results because the campaign budget would be maxed out with only a few clicks for each ad.

What would you do in this situation, just work with exact match for example? Thanks.

February 9, 2009

Peter (IMC) @ 1:38 pm

Dan,

Great post, but I'm having some problems with split testing. I recently did a split test with 1 original add, 2 copies of the original, and 1 new ad I wanted to improve.

Worked great! My new copy doubled the CTR, from 2.25% to 4.5%. So I paused the original and its 2 copies and left the new ad run on its own.

That was a bad choice. Positions dropped a bit and CTR to 0 (after over 600 impressions.)

When I switched the other ads on again, the clicks started again too,.. also on the new ad.

Is the history of the ads some kind of average of all active ads? What's your experience?

Thanks, Peter

February 15, 2009

Brent Crouch @ 10:09 pm

Awesome technique!

I go through my PPC accounts every two weeks to monitor performance and setup new test ads. I've been at this for a few years now, so I've come up with some pretty good control ads.

This weekend I was considering not even testing anymore ads in the account. Like you said, most of the time the test ad will not beat the control and I have found I am loosing clicks/conversions to these tests.

I was curious to see if there was a solution and have been searching for the last few hours on different testing techniques. I finally made it to your post and it makes more sense that anything I've seen. It's a great idea.

See ya. I've got work to do!

February 16, 2009

Gordon @ 4:46 pm

Hey, am I the only one wondering about PPC Fast Start, or am I the only one who doesn't know how to get hold of it :-)

Dan, maybe the PPC Fast Start will answer this but I've been thinking long and hard about your advice for starting new AdWords campaigns, using exact match, and I just can't seem to completely get my head round it.

You say that you use exact match until you find profitable keywords but if you start so focused how do you know what people are searching for? I can only imagine that your keyword list is really, really long to account for every combination of keyword on the off-chance that some will prove profitable. I'm sure you use a few tools (AdWords keyword tool, etc) first to find some ideas, so is that how you go about it?

I'm aware that if you start with phrase match you should really know which negative keywords to include, and that broad match can cost you in wasted clicks, but am I missing some fundamental point about exact match that I've not just covered? There are so many opinions out there on how to start new campaigns…

Many thanks.

March 5, 2009

Flaming Monkey Nostrils @ 2:53 pm

Umm, I beg to differ regarding the point you make that doing a test with just 2 ads will eat up profits. Whether I am running it that way, or running 10 ad groups, I will still need X number of impressions/clicks to have valid results. And if that happens in 1 day or 1 month, I would rather have it done in a day.

March 23, 2009

Dan Thies @ 8:52 pm

Flaming Monkey,

You keep doing that then.

Brent Crouch @ 10:29 pm

@Flaming Monkey – You've obviously missed the point. This strategy is simple yet works brilliantly.

I had a few ads that were getting a great CTR and were converting very well. I stopped testing them because I didn't want to waste 50% of my impressions on an ad that had a small chance of beating my control.

This strategy lets me control what percentage of my impressions I use to test an ad that will probably not beat my control. Sure, I could split the impressions 50% control and 50% test ad and get quicker results, but why waste 50% of my impressions and potential profits on an ad with no history.

April 8, 2009

Ron @ 5:31 pm

@brentCrouch

It wasn't until the last post of this gigantic page that I understood the value of this test. Your two paragraphs summed up the whole dang thing.

April 9, 2009

Peter (IMC) @ 12:33 pm

Dan,

You forgot to reply to my question (about 7 posts up). Can you give it a try?

Thanks, Peter

Dan Thies @ 2:55 pm

@Peter, Sorry about missing that. I've never seen that happen before, in any campaigns. Sounds like a question for Adwords support.

May 17, 2009

Brent Crouch @ 8:38 am

Hi Dan,

Quick question. Have you ever performed any tests to verify if there was any connection between Google Adwords and Google Organic? I've always heard there wasn't and it's obvious you don't have to pay for Adwords to get good organic listings on Google, but something I seen this month made me question if there is some connection.

I have a site that has ranked #5 organically for a dvd related term for several years. In addition to organic, I've always used Adwords to drive traffic to the site as well.

Last week, I increased my Adwords bid for this term to test a new landing page. Within a few days, my #5 listing shot to #1 and remained there for about a week. It eventually settled back down to #5 where it had previously been.

Coincidence or is there some relation between the two?

Thanks,

Brent Crouch

November 8, 2009

Jack Duncan @ 2:18 pm

Dan, Very interesting post…great "geeky" read. :)

I do have one quick question…and please correct me if I have missed something obvious…

Wouldn't it just be easier to turn on Position Preference (1-1) or (5-5) for both A/B adds..then simply select the date range for the start of your test…which would effectively "throw out" the older data…

You would then simply take the new values (the Control + New Ad) with "locked in position preference" and do a simple stats test to see if you have achieved 95% accuracy…

Would that not work?

Thanks!

January 7, 2010

Rex Dixon @ 5:29 pm

With all the different services available out there, wouldn't using a service such as http://www.performable.com/ be a better alternative? Might simplify the process? Or are you against using services to do your testing?

March 14, 2010

J. Mark Bangerter @ 4:42 pm

Has the video been removed? The link is no longer working.

Rennell @ 6:13 pm

This is obviously an interesting and different way to look at testing. Most people just do ab testing without looking into the true science of testing.

April 5, 2010

Dan Thies @ 6:12 pm

They've moved it – I'm working on a new version anyway, which will be hosted here.

June 17, 2010

Renford Nelson @ 3:40 pm

I have now setup my split test following your advice. One thing I would like to know; is there any pitfall to this method?

June 23, 2010

Paul Wright @ 4:48 pm

Hi Dan, I have to say that your information is priceless and thanks for that. I just wanted to ask you if you dont mind to see if you can take a look at the link for the 50 minute video on this site about split testing adwords. The link seems to be broken

regards and thanks heaps

Paul

Dan Thies @ 5:27 pm

Hi Paul – Stompernet has a version of that here: http://www.stomperblog.com/marketing/stompernet-going-natural-30/

I'll be releasing an update some time this summer.

Leave a Comment

Subscribe without commenting