Sugestio: recommendations as a service for Drupal

Making your website more personal has long been a wish for many developers. If you can predict what your visitors will like, you could give this content a more prominent position and potentially increase participation (clicks, sales, uploads, comments, ...) on your website.

The power behind the generation of personalized suggestions is the Recommender System. Something magical that comes in many flavors and has many uses... The magic part is that it's very difficult to accurately anticipate the wishes of your visitor... However, it is not impossible.

While we all know the success of the Amazon product recommendations, few of us have real experience with integrating a recommendation engine in our website. There are several obvious reasons for this:

  • In order to generate trustworthy recommendations, you need enough content and users who consumed the content. Small scale sites better stick with recommending the most popular items and the manual creation of lists.
  • Recommendation engines use specialized algorithms to calculate recommendations. These algorithms can put a very heavy load on your servers. The complexity involved, and the server cost often overrule the benefits of a recommendation engine
  • The way recommendations are being calculated is often a mystery... Do they really work and how do they really work? Nobody likes implementing something that contains too much mystery... The algorithms must be used wisely and need some expertise.

Recommendation Algorithms
There are several families of recommendation algorithms. They all have their specific application domain, cons and pros.
One of the best know algorithms are the “Collaborative Filters”. These collaborative filtering algorithms try to predict your taste by looking at people who consumed/rated similar content.

Other "Content Based" algorithms rather look at the content itself and try to match content with other content or users by looking at similar text patterns and/or keywords. Solr does a great job in relating your node based on the content!

And many more systems based on statistics such as Bayesian classifiers, and systems based on neural networks....

However there are a number of tools around that abstract the complexity and allow you to create recommendations on your website.

Recommendations in Drupal
Several modules are being created that try to give some sense of “personalization”, “discovery” or “suggestion” on your site.

An attempt to make an overview has been made in the past (see this overview of Content Recommendation Modules)

The mentioned modules use several techniques that help during the “discovery", "suggestion" and in some cases "prediction" process. The article itself loses a bit the nuance of "real” recommendation systems, and modules that “hardcode” relations between content.

With "real" recommendation engines, I refer to self learning algorithms that try to predict the users wishes based on past behavior.

A short flight through the mentioned modules gives us several categories:

A: Similarities by comparing terms and content

  • The Similar by Terms module tries to match content by looking at other content that have similar taxonomy terms.
  • Relevant Content has a similar approach and relates content that have similar terms.
  • Similarity Objects looks like the most advanced in this family, and allows advanced calculations of similarities based on taxonomy, terms and search index. However the maintainer is aware of scalability problems:
    "This module is in the middle of a rewrite to address the architecture and scalability problems of the first attempt."
  • Apache Solr also allows you to create a "More like this" block. Solr mainly counts term frequencies in your documents to compare documents (see the JavaDoc for the details. Update: Solr provides a very scalable solution and is probably the best solution in this category.

For a more detailed overview of these modules, look here.

B: Similarities based on manual defined relationships
An example is Node Hierarchy.

Also the CCK nodereference field is mentioned to suggest content. And indeed, sometimes this is the best way of creating suggestions.

C: Similarities based on user behavior
In this category mainly two modules are available:

  • Content Recommendation Engine has been around for several years and has been in different stability states. It is based on the well known Slope One algorithm. The Slope One algorithm gives an approximation of a collaborative filtering algorithm, and reduces calculation times.
  • Recommender API provides a range of features based on several algorithms (read http://mrzhou.cms.si.umich.edu/recommender). The current implemented algorithms are also Collaborative algorithms. However a new roadmap is created to offload the calculations to external frameworks like Apache Mahout. The efforts is planned to take 2+ years!.

These two modules are the most interesting modules when it comes down to analyzing user behavior and predicting future behavior... However the modules suffer strongly from scalability issues and do not provide a sufficient solutions for this.

Sugestio.com: Recommendations as a service

As already mentioned, one of the main problems of the generation of recommendations is the computational power needed to calculate similarities between users and items in order to generate predictions. (it's to say, getting enough data is the first problem to overcome...)

One of the solutions is to offload your calculations to a recommendation service. (a parallel can be made with anti-SPAM measures delivered by for example Mollom.

During the past years I have collaborated on Sugestio. Sugestio is a recommendation service that has been developed at Ghent University. While it's original conception was driven by research, as a solution to quickly calculate recommendations and test new algorithms, the service is now in "beta" and ready for the public (just fill in the contact form for an invitation).

The main goal of Sugestio is to simplify the integration of a recommendation service in your website, by providing a very simple API. The API basically let's you send user behavior and receive suggestions for these users. Sugestio comes with several developers libraries (PHP, Java and Python) on Github. Also a reference implementation for Drupal is provided. This reference implementation tracks ratings give by the users (using VotingAPI) and use these ratings to calculate similarities between content.

For further reading on using Sugestio in your projects, take a look at the Libraries & Tutorials