Intro to Recommendation Systems

5 min readSep 28, 2020

More often than not, consumers do not fully know what they want. Yes, a customer heading to a grocery store may go in with the intent of restocking their milk and egg supply, but that same customer will likely leave the grocery store with a lot more. Places like Trader Joes or Whole Foods will get customers to buy additional items by strategically placing them in certain places, such as wine next to the cheese section. This is a simple example of a recommendation system, created on the basis that humans tend to enjoy wine with cheese. Below, we will explore recommendation systems in the digital world.

Why are recommendation systems so valuable?

As mentioned above, many customers and app users do not know full well what they want. It can often overwhelm some people to try and traverse the complexities of some webpages, especially one’s with many nooks and crannies like Amazon or Netflix. If only an algorithm could predict with some confidence what a user might enjoy — then, the company selling the good and the user receiving the new-found product would both be all the happier!

Recommendation systems improve the overall user experience of a website, and also result in content customers and more sales. A win-win-win. Below, I will cover some of the most important concepts in trying to understand the inner workings of these impressive techniques.

How do recommender systems work?

Understanding Relationships

User — Product: This is based upon an understanding of a given user and the types of products they typically buy. A user that interacts with primarily laptops, hard drives, and monitors will likely be recommended similar tech gadgets.
Product — Product: These are based on items that are similar or complimentary in nature. This could be books of the same genre or guitars of the same manufacturer.
User — User: The final relationship seeks to understand customers that have similar tastes to one another. An example would be Facebook recommending something based on mutual friends or age.

What type of data do we need to analyze these relationships?

User Behavior Data: This can be thought of as “hard” data acquired from users directly interacting with a product. For instance, a rating given by a user or the purchase history of a user.
User Demographic Data: Many websites these days have some form of a user profile, wherein lies information such as age, location, education, income, etc.
Product Attribute Data: This is data that directly details the product itself, such as genre in the case of music or storage in the case of USBs.

Providing data for recommender systems

Explicit Ratings: Provided directly by a user. Examples include ratings, likes, following, reviews, etc. However, users don’t always leave reviews or ratings, so depending on the context, this realm of data can sometimes be hard to retrieve.
Implicit Ratings: These can be acquired automatically from users interacting with webpages. For instance, most e-commerce companies likely keep track of the pages users are viewing, what items they are clicking, and what items they are purchasing.
Item — Item Filtering: Otherwise known as product similarity, this is useful when we do not know much about the user yet. Instead, we may make recommendations based solely on how similar an item is to another item.
User — User Filtering: Otherwise known as user similarity, this looks at how similar User A is to User B, and using that knowledge, makes a recommendation. Its similar to knowing a lot about your sibling and making a recommendation based on that innate understanding. One thing to note here is that companies that are just starting up have what is known as the Cold Start Problem. Basically, to be most effective, a large user base is needed, and so it may take some time for this aspect of recommendations to actually become powerful.
Similarity Measures: An understanding of mathematics comes into play here. Depending on the context of the problem and the data, a variety of similarity measures can be applicable. Oftentimes we are interested in the “distance” between the data, where items that are closer together are more similar. For numerical distances we look at the Minkowski Distance (where r=1 refers to Manhattan Distance and r=2 refers to Euclidean Distance). For distance between categorical items, we need to use the Hamming Distance. And other times, metrics such as Cosine Similarity or Pearson’s Coefficient will be more appropriate. Again, it depends entirely on the context. At the bottom of this blog post I will link some resources detailing these various concepts.

Types of recommender systems

Content Based

Uses knowledge of each product to recommend new ones based on attributes of the item. These work well when descriptive data is available ahead of time, much like supervised learning classification systems.

Collaborative Filtering

These systems do not look at the products themselves, but instead at the data from the users. For example, imagine two people, Eve and Dan, and four books, Book 1 through Book 4. Eve has read all the books and rates them 5/5, 3/5, and 1/5, and 5/5. Dan has read the first three books, and rated them 4.6/5, 3/5, and 1.5/5. We can see a clear similarity in the preferences of Dan and Eve. Because Eve rated the last book a 5/5, a collaborative filtering algorithm would recommend this last book to Dan.

Hybrid

As the name suggests, this uses both of the above recommendation algorithms. Generally, a hybrid model will rely heavily on content based recommendation while a large user-base is built up (this is because collaborative filtering suffers from the Cold Start Problem as mentioned above in the article). Once enough data from users is gathered, collaborative filtering will start having an effect.

Intro to Recommendation Systems

Why are recommendation systems so valuable?

How do recommender systems work?

Understanding Relationships

What type of data do we need to analyze these relationships?

Providing data for recommender systems

Types of recommender systems

Content Based

Collaborative Filtering

Hybrid

Further reading…

Minkowski Distance

This is the generalized metric distance. When it becomes city block distance and when , it becomes Euclidean distance…

Glossary

The Euclidean distance between two points in either the plane or 3-dimensional space measures the length of a segment…

What is Manhattan Distance?

Answer (1 of 3): The Manhattan distance between two vectors (or points) a and b is defined as \sum_i |a_i - b_i| over…

Cosine Similarity - Understanding the math and how it works? (with python)

Cosine similarity is a metric used to measure how similar the documents are irrespective of their size. Mathematically…

Correlation Coefficient: Simple Definition, Formula, Easy Calculation Steps

Correlation coefficients are used in statistics to measure how strong a relationship is between two variables. There…

Written by Brianmccabe