A/B Testing Failed: CTR No Valid Criterion for Ad Quality?

Ad optimisation is one of the most important measures you can take to improve your AdWords Quality Score and, thus, the competitiveness of your PPC campaigns. Generally, you can use the click-through rate (CTR) as your main criterion to make the quality of your different ads comparable. However, there are a number of shortcomings that actually result in ads with (apparently) higher CTR, but worse performance.

Today, I would like to take a closer look at one of these special cases. My theory is this: if you select “Optimise for clicks” for your ad rotation settings, the CTR is no longer a valid criterion for ad quality. 

Starting Point: Is Google’s Algorithm Really So Misguided?

When looking at campaigns which are set to “Optimise for clicks” and include two or more ads per ad group, you may be surprised to notice that in many cases, Google selects ad A – even though ad B has the better CTR. So, what’s happening here? Is Google’s algorithm really this unreliable when it comes to choosing the better ad?

I wanted answers, so I alternately placed two campaigns via ACE (AdWords Campaign Experiments), each including two ads per ad group. Both campaigns were absolutely identical, except for the fact that I selected the ad rotation setting of “Rotate evenly” for one, and “Optimise for clicks” for the other.

Google processed the campaign set to “Optimise for clicks” as outlined above, choosing ad A, even though Ad B had the better CTR. In the case of the control campaign, it was the other way around: Ad B had the better CTR when set to “Rotate evenly”! This was getting more and more mysterious.

By Selecting the “Optimise for Clicks” Setting, Your Ads Will Be Displayed on a Segment Basis

Taking placement segmentation into account – Top (ads above the organic search results) vs. Other – proved conclusive. If you choose this type of segmentation in your client centre, the report will be divided into Google Search vs. Partner Network results. The results of the analysis are below:


From the table above, we learned the following (the better value is highlighted):

  • In total, Google displays ad A more frequently (78.9% vs. 21.1%), although ad B has the better CTR (2.29% vs. 1.38%).
  • Ad A has a significantly higher CTR in the Google Search top positions (24.49% vs. 10.94%) and therefore it is placed here more frequently (3.4% vs. 2.9%). (The difference between the two CTR values is particularly striking here, as ad A also has a better position on average.) Most clicks are happening in this area (2351), and the cost is also highest in this area (€349.07) – hence, this section is highly relevant for performance.
  • In the “Google Search: Other” area, the CTRs of ads A and B are almost identical (1.05% vs. 1.08%). Due to their different average positions, ad B performs better. Accordingly, Google displays ad B more frequently (9.0% vs. 3.2%).
  • Ad B seems to perform better in the partner network, especially in the top section (CTR: 8.33%). It remains unclear why Google would yet decide against displaying ad A in the search network (top vs. other). (Is Google perhaps taking into account a further segmentation by the respective advertising partner?)

We can see quite clearly that Google’s “Optimise for clicks” algorithm decides on a segment basis which ad delivers better performance. Both segments Top vs. Other and Google Search vs. Partner Network seem to be taken into account.

For the Sceptics: Segmentation on a Daily Basis Shows Identical Results

Since these results came as a complete surprise, I initially remained sceptical and considered that other factors may have biased the outcome. One of the most obvious possibilities was that bid modifications (or bid modifications by the competition) could distort the results over time.

However, by segmenting on a daily basis (on days where bids remained constant), we could largely reproduce the results shown above:


Segmented Ad Rotation: Desktop vs. Mobile

Considering what we’ve seen so far, we assumed that Google’s “Optimise for clicks” algorithm would also make segment-based decisions about ad rotation for different devices. A number of examples demonstrate this. For instance, one of the ad groups in the brand area provided the following distribution of placement and CTR:

Table3 desktop vs mobile

Surprisingly, for most ad groups in the analysed account, we could not identify a pattern of different placements for different devices. Instead, the results followed a remarkably normal distribution in most cases. Is it possible that Google only segments by device, if the device-related difference in quality crosses a certain threshold?

Again, the example above shows that segmentation leads to ad A being placed more often across all segments, although ad B still has the better average CTR.

What Are Possible Alternatives for Ad Optimisation Using the “Optimise for Clicks” Setting?

It’s nearly impossible to optimise ads professionally without segment analysis. By simply assuming “Ad A is better than ad B” you will probably fall short in most cases. Instead, it’s sensible to conduct a differentiated segment analysis:

  • Top vs. Other
  • Google Search vs. Partner Network
  • Desktop vs. Mobile

Google makes it quite easy to analyse ad quality based on segmentation. However, when it comes to implementing these new insights, little can actually be done: at present, it’s only possible to apply manual ad rotation settings (deliver ad A in segment I, and ad B in segment II) for mobile, and only to a limited degree (see Mobile Preferred Ads).

Using the “Optimise for clicks” setting, Google automatically delivers ads on a segment basis. In this context, you should remain critical for two reasons: (a) as an advertiser you will have limited control over your ad rotation, and (b) Google’s algorithm involves further influencing factors that cannot be observed by advertisers. Yet, the main shortcoming, in my opinion, is that you cannot use a combined ad comparison – e.g. optimising your ads by a combined factor of CTR and conversion rate – for your automated ad rotation.

What we can say for sure is that once you’ve activated the “Optimise for clicks” setting, conventional ad comparisons across segments on the basis of average CTRs will provide misleading results. For those of you who still want to use this ad rotation setting, here are some options to optimise your ads:

  1. Temporarily change the ad rotation settings to “Rotate evenly” during the trial period.
  2. Evaluate your ad quality on segment level, before ranking the results by relevance for your performance.
  3. Instead of CTR, there is a new key performance indicator (KPI) that’s theoretically useful for ad quality: How often does Google place your ad? By taking certain other factors into account (e.g. selecting a time frame for the evaluation that allows both ads to be eligible throughout the entire period), you can use this KPI as a rough estimate for “good” or “bad” ad quality. Of course, you will miss out on a more accurate estimate (ad A is only marginally better than ad B).
[shareaholic app="share_buttons" id="19406647" link="http://blog.crealytics.com/en/2015/07/29/ab-testing-failed-ctr-valid-criterion-ad-quality"]


Andreas Meyer

Andreas Meyer has been an PPC specialist since 2009. In the years since, he has been in charge of accounts of all sizes among diverse industries. As of 2014, he works as a PPC analyst in the Business Intelligence division. His areas of expertise are PPC account optimization and Google Shopping.

  • Lucas Ertola

    Hi Andreas!
    Great post! Lately Ive been working a lot with AB test in ads and I have a questions I would like to hear your opinion.
    1-Do you isolate kwds when you do AB test? I mean one kwd per adgroup. It happened to me that when I run an AB test, I would take the wrong decision if I chose just the ad with higher CTR. Why? because when I segment ads by Keyword, I see that Kwds impressions didnt rotate evenly. Have you ever had this problem? how you solver it? I run ad test in new one-kwd-adgroups.
    2-Another strange thing that sometimes happens, is that the ads on the AB test had a different Avg position (even with a one kwd adgroup), making them impossible to compare. Same than before. if it happened to you, how you solved it?