Applying distributional measures in the PPC

The tasks of a PPC manager are manifold. It starts with setting up AdWords account, analyzing the performance, monitoring and optimizing by using search query reports to convert potential growth into real growth by adding keywords to the account. Therefore, the conversion of potential growth to real growth by tuning the existing keywords is essential to uplift the performance. Now the question arises, whether concepts of other disciplines might provide helpful tools, to speed up the analysis or monitoring of accounts.

One tool we will present in this post comes from welfare economics and is used to measure the disparity in the distribution of income or wealth in an economy. The concept goes back to a sociologist called Corrado Gini. The so called Gini coefficient measures how much percent of a population unite how much percent of, e.g. income or wealth.

To give you an example, imagine there are 5 employees in the PPC department of your company. Now you are asking them, how many accounts each of them has to manage. Therefore you get the following table:

Employee Number of accounts managed
Sarah 3
Michael 2
Jack 8
Jane 10
Mary 4

Now let us order the number of managed accounts in an increasing order:

Employee Number of accounts managed
Michael 2
Sarah 3
Mary 4
Jack 8
Jane 10

In this case, Michael seems to be the rookie and Jane is a Hall of Famer. Now lets calculate the cumulated sum in a new column:

Number of employees Employee Number of accounts managed Cumulated number of accounts managed
1 Michael 2 2
2 Sarah 3 5
3 Mary 4 9
4 Jack 8 17
5 Jane 10 27

What we can read from the table is, that 3 employees manage 9 accounts together and all 5 employees manage 27 accounts. Expressed as a percentage: 60 percent of the employees manage 33 percent of the accounts, or to say it the other way around: just 40 % of the employees manage 67 % of the accounts.

This is the first step in understanding the Gini-concept and the related Lorenz Curve, which is basically just a graphical representation of the table above. The only difference is, that the graphical representation is in terms of percentages.

Distribution of Workload in SEA dept.

On the x-axis is the percentage of employees and on the y-axis the percentage of managed accounts. What happens if the curve is a straight line from zero to one? Well, the number of accounts managed by an employee is distributed even (uniform distribution), which means that any employee is managing the same number of accounts. This is represented by a straight 45 degree line. The other extreme case is, when all accounts are managed by only one person, the curve results in a reverse L-shaped form.

These two examples demonstrate what the purpose of this concept is: the ability to make inferences of distributional issues among subjects. It is straightforward that this concept can be translated to the analysis of PPC accounts. One example is to analyze the contribution of keywords to overall sales. One can check, how many keywords contribute by how many percent to the overall account performance, like revenue, costs or conversions. Yet, another task in this area would be to define benchmark values to identify what a well shaped curve is and to identify the dead keywords, adgroups or campaigns.

In order to do this, the Gini coefficient comes into play. Simply spoken, the Gini coefficient calculates the relation of the area below the 45 degree line but above the lorenz curve to the overall triangle below the lorenz curve. A Gini coefficient of 1 means, that the number of accounts is completely uneven distributed (e.g. 1 subject unions 100 % of the variable of interest). On the other hand a value of 0 means, that we have a uniform distribution (e.g. 1 % of the subjects is associated with 1 % of the data). In our example, the Gini coefficient of the red line is 0.31 and that of the blue line equates 0.8. In general, the application of distributional measures can give an overview on overall account performance on different granularity levels. One possible use case is the analysis of campaigns in an account to identify if something is going wrong. Below you see the Lorenz curve of a sports equipment and fashion retailer in europe. The variable of interest is costs. The range of data is representative, e.g. one year.

Lorenz Curve of Costs on Adgroup and Campaign Level

The interpretation of the data is straightforward: for example, about 75 percent of the adgroups produce about 8 percent of the costs, where the same percentage of campaigns can be accounted for about 25 percent. Are these good values or not? In general to answer this question, one should compare more than one KPI. Costs are usually only meaningful in comparison to conversions, revenue, profit or ROI.

Another approach is to check the Lorenz curves on a regular basis, e.g. bi weekly or monthly to check if the distribution of costs is shifting (as desired). To be able to do that, the first task should be to get a feeling for the shape of the Lorenz curve and the respective Gini coefficient. The Gini coefficient  for the ‘campaign Level’ curve is 0.69 and that of the ‘Adgroup Level’ is 0.88. Note that these values are not percentages in like 88 percent of the adgroups cause costs! That is misleading and wrong. It is more about how far we are away from a uniform distribution. If a uniform distribution is desired or not, depends on the optimizing behaviour of the respective customer or PPC manager. It is obviously crucial to define benchmark values for the Gini coefficient and to deliver the name of the adgroups or campaigns to the PPC manager, so that he can extract very fast, where he has to check his account. Rules of thumb, like 20 % of the keywords shall be accounted for 80 % of the revenue are a good starting point (http://www.phoenixrealm.com/80-20-rule-sem/) but by applying more complex statistical and mathematical methods, an improvement is possible.

The above analysis were made by R with the ineq package.

ABOUT THE AUTHOR


Björn Büchler

I started to work for crealytics in the role of a data analyst in 2013, after I worked in the field of macroeconomic research. In my everyday work I use R in conjunction with RStudio on Linux. My passion are econometrics (9and yes, in my opinion it includes statistics!), "data diggin' ", IT and Baseball.

    Find more about me on:
  • googleplus