The churn or attrition rate measures the percentage of clients who abandon a company’s services in a given period of time.

It is a behavior that must be known if you want your company’s business strategy to be effective. There are several ways to quantify this behavior, most of them are a posteriori data which only provide information from the past. This approach to the problem is fine if we want to take stock of customers. However, the most effective strategy is to learn from this past in order to prevent future customer abandonment. In other words, knowing what we have to change in our sales strategies or products to prevent customer abandonment.

In this article we show how Human Trends can help your business predict the customer profiles that are most likely to abandon your services. With this, we will be able to target those weaknesses that make our customers want to abandon or terminate our services.

Why should we know the customer abandonment rate?

No matter the size of your company, knowing your clients is fundamental. From the point of view of abandonment to avoid losing them but above all to know which of them will be loyal to you. Having policies aimed at the loyalty of your customers will make them stay with you in the long term.

How can we detect potential customers who decide to leave the company?

There are several techniques to detect them. Here are some of the techniques we use at Human Trends, which are mainly based on machine learning algorithms

Within machine learning there are different techniques, but we seek to know what is the reason behind abandonment in order to avoid it. That is why explanatory models are our best asset. They will not have the degree of success of a neural network but they will allow us to orient the problem in order to locate it and focus on solving it.

Case Study

The following is a case study in which we have a client portfolio of a telephone company. These data are anonymized since we only want to focus on the customer profile without falling into biases.

The data set with which the learning model was trained is a historical record that includes 20 variables that characterize the company’s customer profile. It is a database consisting of 7,043 clients. This historical record includes profiles of clients who have left the business and others who have not. Knowing this, the model built is capable of understanding this data and generalizing to new customer profiles that have not yet left the company. The answer that the algorithm will give is “True” or “False”, understanding as True the fact that they will leave the company. However, this answer is subject to probabilities due to the nature of the algorithm. This way our answer will be True with a certain percentage and False with another percentage. Both percentages add up to a hundred.

First we explore the data to see what kind of variables we have:

Image 1. First variables of the data set.

In image 1 we see some variables of the data set. As we see they are customer profiles. We have variables such as age, sex, tenure, if you have a phone line or not, if you have internet, if you have online security service, information backup services, among others. In image 1 there are not all the variables. The entire data set consists of 20 variables, which are named below:

«gender»  «SeniorCitizen»    «Partner»          «Dependents»       «tenure»           «PhoneService»     «MultipleLines»    «InternetService»  «OnlineSecurity»   «OnlineBackup»     «DeviceProtection» «TechSupport»   «StreamingTV»      «StreamingMovies»  «Contract»         «PaperlessBilling» «PaymentMethod»    «MonthlyCharges»  «TotalCharges»     «Churn»

Many of them provide more information than others. We seek to predict the last of all, “Churn”. This is a binary variable that indicates whether the client has left the service in the past. In this way, we are dealing with a supervised learning problem.

After adapting the data to our requirements we see which variables are better for prediction. That is, which of all those we have are the most influential. For example, we could ask ourselves if gender influences the churn rate. The answer to this question is provided by plot 1, we see that the amount of abandonment by gender is practically the same so we this varaiable provides redundant information and we can remove it. Another question that should be asked is if the tenure is important, plot 2 indicates that it is. We see how the more time clients spend (in months) the less they tend to leave, so permanence (tenure) will be an important variable for prediction. This study is done with all the variables of the data set and we see how each one of them influences the final result.

Plot 1. Amount of churn by gender.

Plot 2. Churn rate due to client permanence.

The nature of the variables will make us choose one model or another. In our case, after a careful analysis of the data set we see that only four variables are numerical (image 2) and the rest are categorical, that is, they have different categories. Therefore, the best model to use in these circumstances is one based on conditions such as a decision tree or association rules.

Image 2. Bar plots for the numerical variables of the data set.

In addition, this type of model, being explanatory, allows us to be able to detect those points that make our clients more likely to leave.

We train a decision tree model to predict which customer profile is most likely to leave. After applying the model, we predict with an 80% of accuracy the clients that will or will not leave.

To show an example case, let’s look at a customer with the following profile:

Image 3. Example of a customer profile.

This is a person who is not old (over 60), who is not a partner in the company, has been with the company for 10 months, has no phone service, pays month-to-month, has internet security, has internet service, has no movie or TV service and pays less than $50 per month, among other things. The question we ask the algorithm is what is the probability that a customer with this profile will leave?

The answer of the algorithm is the following:


So the answer, setting the threshold to 0.5, will be that this client will not leave. Which is correct.

The simplified decision parameter can be seen in the following image:

Image 4. Main decision variables of the model.

As we can see, the variables that determine whether a person will stay are the duration of the contract and whether or not they have Internet and the amount of money they have paid. This makes us realize what we should modify if we want to retain more customers.


In short, knowing your customer churn rate is very necessary in order to draw up a sales strategy for your business. Knowing the past helps to know the future. Knowing which customer profiles are most likely to leave will allow us to make personalized offers to those customers who fit this profile so that we can retain them and make our business thrive.

If you want to know more about this type of service do not hesitate to contact us.

Agustín Bignu

Data Scientist – Physicist

Leave a Reply

Your email address will not be published. Required fields are marked *