Recommendation is something we face every time we go online. Pages like Amazon, Google or Netflix use it and recommend purchases, possible interests or movies you might like based on your past. It is something characteristic and that always calls the attention for the fact of being invasive and “getting inside” us to a point that they recommend products and interests that even you did not know you wanted or had.
At Human Trends we do not share this thought and wonder if it is possible to recommend something without explicitly using your personal data. That is, without making “personalized” recommendations. The answer is yes, and here is how we do it.
Before we start, it is interesting to ask what a recommender system is. This is a simple question to answer from the consumer’s point of view. In fact, the average answer would be: “what Amazon does when you visit their site”. But technically, what is a recommender system?
“A recommendation system is an intelligent system that provides users with a series of personalized suggestions (recommendations) on a certain type of items. Recommendation systems study the characteristics of each user and by processing the data, find a subset of items that may be of interest to the user.”
The previous paragraph is a standardized technical definition of what is meant by a recommendation system in machine learning. As you can see, words like ” personalized” and “users” are used. These words are precisely the ones that annoy people and infringe on privacy. Here we are not questioning the definition but the approach, let’s see a modification:
“A recommendation system is an intelligent system that provides a series of suggestions (recommendations) on a certain type of elements (items). Recommendation systems study the characteristics of each user profile and by processing the data, find a subset of items that may be of interest to that profile”.
Now let’s look at this definition: the word “personalized” was removed and “users” was changed to “user profile”. This means that we move from individual profiles to group profiles, reducing the importance of the person and giving more relevance to the group, respecting the person’s privacy as we do not need it because we will focus on the products and data that characterize the profile. At this point the following question arises: does going from individual profiles to group profiles make our system less accurate and therefore make us lose money?
Surprisingly, the answer is no. Not only you will not lose money but the result is similar to the recommendation you would have made by personalizing the recommendation. You see, customizing only gives you a false sense of control by making it a “personalized” service. Nothing could be further from the truth.
We will then make a recommendation system to an anonymous user, simply based on the products they are carrying. For this we used a database from the Instacart Market Basket platform, which is an online platform for buying food, especially organic food.
Let’s start with an exploratory analysis of the data we have. We have different databases: products that the store has, departments, aisles and purchase orders. This last one will be the one we are interested in in order to build our system.
Within the orders we have different data: order number, user number, day of the week, time of the day and days without making a purchase. We are going to investigate these data a little bit with some graphics.
Figure 1. Number of purchases per hour of the day.
As we see there is an intense activity in the purchases during a great time of the day since the purchases are made by Internet, then the schedule is not a problem. What you can see is that the peak of shopping is at 10 am. This data is relevant to the user profile. Let’s see now the days of the week
Figure 2. Number of purchases during the week.
The week starts on Sunday according to many calendars so it could be assumed that days 0 and 1 are the beginning of the week (Sunday and Monday). So it is expected that they are by far the most demanded days to have provisions for the whole week. The other hypothesis is that it is the weekend. This is important since it is a factor that conditions a recommendation, the recommendations will be more valued when the buyer spends more time on the page and is not in a hurry because it is a working day. We see that the day of purchase is a factor that can be used to build the user profile.
Another question that should be asked is how often they buy. The following chart can answer this question:
Figure 3. Days without purchase.
As we can see there are two very characteristic peaks: at 8 days and at 31 days. This means that a large group usually buys per week and another large group per month so it will be another factor to take into account when building the user profiles.
In this way, we study each of the variables and we build a user profile that will serve to put the recommendation in its place without focusing on the user and but on the user profile that their data represent.
It is also interesting to know which are the most purchased products and the most interesting ones, the association of these is what interests us to build the recommendation system. To do this, we study the most sold and the most reordered ones
Figure 4. Best selling products.
Figure 5. Products that are reordered the most.
This data is important to guide recommendations based on products that are not as widely sold. On the side of those products that are reordered they will serve to characterize the user profile. In this way, products that are reordered by a user profile are of interest to us in order to recommend rules containing those products.
Another interesting point to see is which departments sell more products:
Figure 6. Products of the departments that are sold most frequently.
With the studied data set we generate the rules with a machine learning algorithm called association rules. This algorithm was introduced in (link) and is focused on the generation of rules of the style “X => Y”. Where X is a product or a set of products and Y is the recommendation of the product.
The algorithm generates a total of 56k rules, i.e. possible recommendations. Let’s see some of them:
As we can see, these are recommendations that are not frequent since they are products that are not sell frequently but are good for us in order to make recommendations about those products of which we want to increase sales.
With the rules obtained, it remains to relate them properly to the user profile and is done as follows. By purchase we have the list of products purchased and the characteristics of the user that we will buy with our user profiles armed. With the data and recommendations associated with each user profile, we create a database and control the products that the costumer’s are buying to find the one that best fits the products they are choosing at that moment.
As we could see following the methodology user profile + association rules we manage to create personalized recommendations for user profiles and not for individual users achieving the task we were looking for at the beginning and without interfering in the user’s privacy.
If you want to know how you can implement this in your company or how you can apply it in a separate service you can contact us, we will be happy to help you.
Data Scientist – Physicist