How do I create recommender systems using LensKit?

0

Recommender systems are one of the main tools to attract customers in different types of markets. A good recommendation increases customer engagement and therefore has a positive impact on the business. When it comes to developing recommender systems, we find it very complex. LensKit is a library or toolkit that can help us create a good recommender system very easily. In this article, we will discuss the LensKit toolkit for building recommender systems. The main points covered in this article are listed below.

Contents

  1. What is LensKit?
  2. Build a recommender system
    1. Loading the dataset
    2. Importing components
    3. Instantiation algorithms
    4. Functionalization recommendations
    5. Installation recommendation
    6. Recommendation assessment

Let’s start by understanding what LensKit is.

What is LensKit?

LensKit is a library that includes a variety of tools for creating and practicing recommender systems. It is the successor to the Java LensKit toolkit for python. Using this python library we can use to train, run and evaluate recommendation algorithms. One of the most important things about building this library is to provide a flexible means for research in the area of ​​recommender systems.

LensKit has a variety of components and interfaces that can be used in the design and implementation of a new algorithm. It has tools to rate the items that can be considered as a basic tool for any recommender system using which we can rate the items or choose the best recommender.

It also has facilities to predict the odds. The prediction of the ratings can be considered as scores which depend on the rating scales that one wants to use. This is a representation of rating predictions for users. By using the Item Recommender interface of this tool, we can provide our best recommendations. The image below can be seen as the workflow diagram of the various components of this toolkit.

In the workflow diagram, we can see that the Rating Predictor and Item Advisor generate their respective Outcome Scores using the Item Marker.

We can install this library in our environment using pip and the lines of codes below.

%pip install LensKit

Or we can install it directly using git command like,

pip install git+https://github.com/LensKit/lkpy

After installing it, we are ready to use it. Let’s see how we can do this.

Build a recommender system

In this article, we will use the LensKit toolkit for nDCG assessment. nDCG stands for Normalized Discounted Cumulative Gain which is a measure of ranking quality. Thanks to this, we can measure the effectiveness of the recommendation algorithm. This toolkit is compatible with Pandas data framework and still provides some of the datasets for practicing recommender systems using some of its modules. One condition we need to meet is that we need data with expected variable names. For example, the expected evaluation data might contain the following columns:

This data can also contain different columns.

In one of our articles, we saw how the surprise library works. To check the compatibility of LensKit in this article, we will load the data using Surprise Toolkit and further work will be done using LensKit Toolkit.

Loading the dataset

Let’s load a dataset

import surprise
import pandas as pd
data = surprise.Dataset.load_builtin('ml-100k')
ddir = surprise.get_dataset_dir()
r_cols = ['user', 'item', 'rating', 'timestamp']
ratings = pd.read_csv(f'{ddir}/ml-100k/ml-100k/u.data', sep='t', names=r_cols,
                      encoding='latin-1')

To go out:

Here we can see the format of our data which is similar to the expected rating dataset format where we can see user, item, rating and timestamp columns. Let’s move on to the next steps.

Importing components

from LensKit import batch, topn, util, topn
from LensKit import crossfold as xf
from LensKit.algorithms import Recommender, als, item_knn as knn
%matplotlib inline

Instantiation algorithms

algo_ii = knn.ItemItem(20)
algo_als = als.BiasedMF(50)

Functionalization recommendations

After defining the algorithms, we are ready to generate recommendations and measure them. Using this toolkit, we can also evaluate the recommendation at build time to save memory. Here we will first generate the recommender and then evaluate it.

By using below function we can generate recommendations in batch parameters, which means this function will allow us to generate recommendations using algorithm and part of training and test data.

def eval(aname, algo, train, test):
    fittable = util.clone(algo)
    fittable = Recommender.adapt(fittable)
    fittable.fit(train)
    users = test.user.unique()
    recs = batch.recommend(fittable, users, 100)
    recs['Algorithm'] = aname
    return recs

Installation recommendation

After defining this function, we can perform the generation of recommendations by looping through the data and the algorithm.

all_recs = []
test_data = []
for train, test in xf.partition_users(ratings[['user', 'item', 'rating']], 5, xf.SampleFrac(0.2)):
    test_data.append(test)
    all_recs.append(eval('ItemItem', algo_ii, train, test))
    all_recs.append(eval('ALS', algo_als, train, test))

To go out:

This output is similar to traditional recommender system generation processes that include warnings about runtime problems due to large matrices.

Recommendation assessment

We are now ready to see the results. Before displaying the results, we can concatenate the results into a single dataframe.

all_recs = pd.concat(all_recs, ignore_index=True)
all_recs.head()

To go out:

In the output, we can see the scores of our items along with their rankings and the algorithm used to generate the result.

For better analysis, we can also concatenate all test data into a single dataframe.

test_data = pd.concat(test_data, ignore_index=True)
test_data.head()

To go out:

Now, this toolbox provides a generated recommendations analysis module named RecListAnalysis. By using this module, we can properly align our tests and recommendations. Let’s see how we can use it to evaluate the nDCG.

rla = topn.RecListAnalysis()
rla.add_metric(topn.ndcg)
results = rla.compute(all_recs, test_data)
results.head()

To go out:

Here in the output we can see that we have values ​​for nDCG in data frame format and that can be evaluated using different methods. Let’s see which algorithm has the most nDCG values.

results.groupby('Algorithm').ndcg.mean()

To go out:

Let’s view our assessment

results.groupby('Algorithm').ndcg.mean().plot.bar()

To go out:

Here we have our results. We can see that the least squares alternative has larger nDCG values.

Last words

In this article, we have discussed some of the important details regarding the LensKit toolkit designed for building and exploring recommender systems. Along with this, we implemented a process in which we used two algorithms to compare nDCG values ​​on the MovieLen evaluation dataset.

The references

Share.

About Author

Comments are closed.