Published February 7th, 2015 by Assaf Trafikant
Google Analytics Cohort Analysis
In 2015, Google Analytics launched a new feature: Cohort Analysis. The method itself has been around for quite some time, of course. Until that time, it could be performed (primitively) on Google Analytics, or more professionally using other designated tools (and plenty of excel tables). There’s a good chance you’ve done it before; you just didn’t know that’s what it’s called. So before we get down to the details, let’s start from the basics.
What Exactly Is Cohort Analysis?
“Cohort” (Roman legion); a group of people who share a common feature or aspect of behavior)
The Oxford English Dictionary
By definition, a cohort is a group of people with a common ground or characteristics, often a group that’s been unified over an extended period. When thinking of “common characteristics,” you might think of a group of users with the same pricing plan (if you are a SaaS company). You could go deeper and divide this group into two: monthly subscribers and annual subscribers. But that’s pretty simple and doesn’t count as cohort analysis.
What makes it a cohort analysis is the added aspect of time – segmenting groups not just according to their affiliation with a specific action, but also according to the time of occurrence. For example, you could define a group as “anyone who visited my site and bought leather boots on Cyber Monday 2015”.
The number of groups you can build is endless, and those visitors can belong to several groups simultaneously. Your ability to create and define groups and analyze their behavior over time is there to replace individual user analysis and help you reach statistically significant conclusions that characterize an entire group.
You could go further and compare the behaviors of entire, seemingly identical groups that differ from one another in their date of creation. For example: Are those who upgrade their plan to Premium in February 2014 different from those who upgraded in August 2014?
Churn Rate & Cohorts
Simply put, “Churn” is the number of customers that drop your product or service and stop being a customer at some point. The most common usage for Cohort charts is to compare the churn rate between groups of users, who defers the time of acquisition. For example, 3% of customers acquired during March 2019 left the platform in the next month. The same rate applies for April 2019 users and so on.
Cohort Analysis (The Old Way)
It was pretty awkward, to be honest. I used custom variables (before Google had Custom Dimensions) to mark users who behaved a certain way. For example, on a forex website, whenever someone performed a first-time deposit (a.k.a FTD), we used a custom variable to tag them. Also, we added the registry date (usually just the month and year). That way, we could group all of the users who performed FTDs in January 2015 and use that to create a segment and compare it to that of other users.
In sites using advanced BI platforms, or even a simple database that collects data on system use, I would work with complicated excel sheets and query after query to pull out the groups and compare them.
Google’s Cohort Analysis
Now, the Google Analytics Cohort module is far from perfect when compared to other cohort analysis systems out there, and seems like in stagnation for too many years, but it’s a huge step forward for Google. So, let’s look for this report: Audience >> Cohort Analysis.
Now, let’s take a look at the toolbar:
Cohort Type: At this point, it’s still limited and only shows the option of ‘acquisition date,’ which means that our sample is anyone who’s visited the site. Don’t be fooled. It doesn’t mean users who have completed a goal or made a purchase. The word acquisition refers to site visitors.
Cohort Size: The period under which you want to categorize the group. If you choose Month, you’ll get a group comprised of all of the users that visited your site during a certain month and maybe left your sites the month after. The resolution here should be aligned to your site business nature. If it’s an online grocery store, you wish your customers to come back every week, so you choose “week” cohort size. If you are the owner of the “Candy Crush” mobile game, you expect users to come back every day, so you choose “day.”
Metric: OK, so we have groups. Now the question is which index do I want to examine group performance by – retention, conversion, viewed pages, revenue, etc.
Date Range: This field changes according to the cohort size that you chose. If you picked months, you could see the behavior of your selected metric over the last two or three months for that same sample group.
Above the graph, there’s another drop-down menu with a list of months (if you chose a monthly report) or days (for daily reports).
A Quick Example
I want to compare the number of visits over time for those who first visited the site in November with those who visited it for the first time in the following December. In Cohort Size, choose Month; in Metric, choose “Sessions”; in Date Range, mark all relevant months (November and December, in this case). Here’s what I got:
The graph has two lines: the one that starts out higher represents visitors who reached the site in December, and that’s why it only has two sections on the X-axis. The darker line that starts slightly beneath it represents the users who reached the site in November and is therefore comprised of three parts, one for each month (November, December, and January). But why are they on top of each other if they start on different months?
The answer lies within the X-axis. You can see that it says Month 0, which means that the system drew the behavior graphs for both groups and placed them one over the other for the sake of comparison, and to analyze whether their behavior changes after one month (Month 1), etc.
What’s That Table Below Mean?
The table below the graph tells an interesting story, but for that, we’ll have to change our resolution to a daily report, 30 days back, and this time choose the “Retention” metric:
Let’s look at the first line: On January 11, out of all of the visitors on our site that day (which is why it says 100% under Day 0), only 4.36% came back the following day (Day 1). The next day, January 12, we had plenty of new visitors, but only 2.48% of them returned the following day.
What’s the difference between these days? Is it ‘noise’? Did all those January 11 visitors receive some kind of email campaign that evening and came back the next day? Are people busier on Mondays, and that’s why they don’t revisit websites? That’s the brass tacks of cohort analysis: You analyze different groups with different starting points (dates), but place them all on the same axis and same starting point.
But That’s Not All
Cohort reports also let you operate segments, for example, the Retention of mobile users Vs.desktop users. Applying this segment engine to over this report enables you to create highly complex groups based not only on their time of visit but their behavior as well.
There are an infinite number of questions we could ask, and that’s part of the problem with cohort analysis. One way of using the report is that demonstrated here, focusing on detecting inexplicable gaps between groups that on the surface should be identical. Sometimes, you do your homework and find out that there are external or internal factors that influence these numbers. Call it A/B testing on steroids.