显示标签为“recommendation system”的博文。显示所有博文
显示标签为“recommendation system”的博文。显示所有博文

2009-09-23

Youtube : Users will give rating to videos they like very much or hate very much

Youtube 在官方blog上公布了一个图,显示1到5分的用户评分分布,这个图显示,绝大多数的用户评分都是5分,第二多的评分是1分,而其他2,3,4分非常少。这说明,用户只有在极端喜欢一个视频或者极端不喜欢一个视频的时候才会使用评分系统。

这个例子让我想起了推荐系统,youtube所说的这个现象似乎在imdb和netflix中都没有发现。我觉得这是因为,在imdb中,用户并不能在网站上看到视频,他们都是在看完某个电影后才登陆imdb,所以一个用户如果登陆imdb,评分可能就是他的一个任务。

但是在youtube上却不是这样,用户上youtube的第一任务不是评分,而是看视频,所以只要视频能够满足他们的要求,他们就会不停的看下去,不会想到要评分,只有当某个视频刺激到他,而他觉得有必要表达一下自己的意愿的时候,才会评分。


Youtube publish an article in its blog and said that most of users in youtube tend to give 5 starts and 1 star to videos they watch. They think, this is because most of users only rate videos they like very much or they hate very much.

This result is different from ratings in IMDB and Netflix. I think, the first task users want to do in imdb is to rate a movie, because they can not watch movies in IMDB. However, in youtube, the most important task is watching videos. Therefore, if all videos are OK, users will not stop watching and rate videos. They will only rate a video if the video is so special and they think they must express their views.

2009-03-04

最近有关recommender system和collaborative filtering的会议

ACM Recommender Systems 2009
Important Dates:
Long and short papers due: May 8
Author notification: June 19 (6 weeks)
2nd round short papers: June 26
Notification 2nd round: July 17
Camera-ready copy: August 17

Web Intelligence 2009

IMPORTANT DATES
Workshop proposal submission: January 15, 2009
Electronic submission of full papers: April 10, 2009
Tutorial proposal submission: April 10, 2009
Workshop paper submission: April 30, 2009
Notification of paper acceptance: June 3, 2009
Camera-ready copies of accepted papers: June 30, 2009
Workshops: September 15, 2009
Conference: September 15-18, 2009

2009-02-24

我在NetflixPrize的进展



NetflixPrize是一个collaborative filtering的比赛,目的在于设计出更好的推荐系统。我上周用它的数据集测试了我的算法,因为参数时间还短,目前结果不是很理想。下面将我的方法和一些已知的结果公布一下。

目前我使用的是SVD的方法,用这个方法,是因为这个方法比较快,需要的内存不大(3G)左右。至于kNN的方法,我的计算相似度矩阵的算法还比较耗时(有人说这个步骤可以很快),所以我先尝试了SVD的方法。

我目前的模型用的是最简单的svd模型:
r(u,i) = mean + b(u) + b(i) + <p(u), q(i)>
用梯度下降法优化。
6fa8f2de

在训练时,可以probe数据集里面的数据是包含在train里的,我们计算推广误差的时候,需要在train-probe的数据集上训练。但是在计算quiz的时候,还是要在整个train上做训练,否则精度相差还是很大的。

2009/02/24 d = 250,学习速率0.0055,正则化参数0.002,RMSE = 0.904

2009/02/25
今天用一种新的方法计算item-based算法中的相关系数,只需要3个小时(包含读取文件的时间)。

详细进展和介绍在 http://xlnetflixprize.blogspot.com/

2009-02-16

推荐系统和协同过滤面临的主要问题









数据稀疏
协同过滤的精度主要取决于用户数据的多少。如果一个系统有很多用户的历史数据,他就能更好的对用户的喜欢做出预测。所以,目前推荐系统做的最好的都是那些有着很大量用户数据的公司,比如Google, Yahoo, Netflix, Amazon等等。但是,即使拥有很多数据,数据还是不够多,因为推荐系统的历史还不够长,还没有积累足够的数据。在目前处理稀疏数据的算法中,软性SVD是一种最好的方法。

新用户问题
这个问题和数据稀疏问题有一些相似性,他是指如何对新用户做出推荐。当一个新用户进入一个网络时,我们对他的兴趣爱好还一无所知,这时如何做出推荐是一个很重要的问题。一般在这个时候,我们只是向用户推荐那写普遍反映比较好的物品,也就是说,推荐完全是基于物品的。

新用户问题还有一个变种就是长尾(long tail)问题,在Amazon中,不是所有的用户都对很多书给出了评分,很多用户只给少数的书给出了评分,这些用户就处在一个长尾中,如何处理那些不太表露自己兴趣的用户,也是推荐系统的一个主要问题。

隐性喜好发现
在现在的推荐系统中,用户的喜欢是通过用户对某些物品进行评分获得的。这种获得用户兴趣的方法是一种很直接的方法。但在实际的互联网中,用户有很多隐性的方法表露他们的喜欢。比如用户的文字评论,我们可以通过自然语言处理从用户的评论中获得用户的兴趣;或者是用户的浏览行为,比如用户长时间的浏览一个物品,或者用户经常浏览一个物品,或者用户
购买了一个物品,这些行为都可以作为模式识别系统中的特征。

所以,发现用户的隐性喜好,相对于模式识别的特征提取,这方面的研究也很热门。

用户兴趣的变化
我们知道,用户的兴趣不是永远不变的,随着年龄和阅历的变化,用户的行为会发生变化。也就是说,协同过滤其实还应该加入一个时间因子。目前对于变化的用户兴趣的研究还处于起步阶段,主要是因为现有的系统历史都不是很久,大多数用户的兴趣还是比较稳定的,但是随着互联网的发展,用户兴趣的变化对推荐系统的影响将会越来越明显,所以这方面的研究也将越来越重要。

偏激的用户和全新的物品
我们知道,这个世界上有一些用户是很偏激的。他们和大多数人的观点是相反的。对于这种用户,现有的推荐系统做出的预测往往是很差的。如何处理偏激的用户,是推荐系统中的一个重要问题。

和偏激用户相对应的,是全新的物品。比如有一部新电影,他是颠覆性的,和以前的电影都不太相似。用户对于这个电影的爱好和用户以前的兴趣是没有太大关系的,因为用户从来没见过这种电影,这个问题也是导致现有的推荐系统精度不高的主要原因。

马太效应以及推荐系统对互联网的影响
我们知道,被推荐系统所推荐的物品将会越来越热门,这就导致了大量很好的物品可能会被推荐系统所淹没。在互联网中,物品实在是太多了,而推荐系统只能推荐有限的物品。解决这个问题的主要方法是增加推荐系统的多样性,比如一个推荐系统发现一个用户非常喜欢吃德芙巧克力,那么他给这个用户推荐10个产品,不需要都是德芙巧克力,也可以推荐别的一些巧克力,或者一些和巧克力相似的甜品。在推荐时,不仅要推荐用户喜欢的东西,而且要通过推荐让用户喜欢一些东西,有的时候,用户自己也不知道他喜欢什么,通过推荐系统,他可能会发现一些新东西他比较喜欢。

推荐系统中的作弊
只要涉及到经济利益,就有人作弊。搜索引擎作弊是一个被研究了很久的问题,因为在搜索引擎中,自己的网站排名越高,就能获得越多的经济利益。在推荐系统中也是如此,比如在淘宝中,如果一个卖家的物品经常被推荐,他就可能获得很多经济利益。这样,很多电子商务的推荐系统都遭受到了作弊的干扰,一些人通过一些技术手段,对自己卖的物品给出非常高的评分,这就是一种作弊行为。

推荐系统中的作弊在电子商务网站中越来越严重,特别是在美国这种互联网比较发达的国家,已经受到一些研究者的重视。作弊行为相当于人为的向系统中注入了噪声。目前解决作弊的算法主要是基于信任度和信用的。现在很多电子商务网站都引入了信用系统,比如淘宝等等。如何设计信用系统和推荐系统更好的融合,是一个重要的研究问题。

2009-02-09

Recommendation Systems: An Interview with Satnam Alag

In a recent post, we looked at recommendation systems, briefly reviewing how Amazon and Google have implemented their own systems for recommending products and content to their users.

We had the opportunity to speak with Satnam Alag, author of the recently published Collective Intelligence in Action, about what makes for a good recommendation system, where the technology is heading, and why Netflix is finding it so hard to improve its own system.

Disclosure: I wrote the forward to 'Collective Intelligence in Action', however I have absolutely no financial interest in the book.

ReadWriteWeb: In our recent post about Netflix, we identified four main approaches to recommendations: Personalized recommendation: based on prior behavior of the user; Social recommendation: based on prior behavior of similar users; Item recommendation: based on the item itself; And a combination of all three. Do you agree with the four approaches we laid out in our article?

Satnam: Those four categories are pretty comprehensive. I present an alternate classification of recommendation systems in my book. I lay out two fundamental approaches. The first approach, item-based analysis, determines items that are related to a particular item. When a user likes a particular item, related ones are recommended. The second approach, user-based analysis, first determines users who are similar to that user.

Further, there are two main approaches to finding similar items and similar users. For the first, content-based analysis, content associated with the item, especially text, is used to compute similarity. In the second, the collaborative approach, actions such as ratings, bookmarking, and so forth are used to find similar items. For the second, user-based analysis, a number of approaches have been taken, including ones based on profile information, user actions, and lists of the user's friends or contacts. Of course, you can combine any these item/user and content/collaborative approaches to build a recommendation system.

The dimensions of the particular item and user space are helpful in deciding whether to use an item-based or user-based approach. Typically, an item-based approach is used to bootstrap one's application when the number of users is small. As the user base grows, the item-based approach is augmented by a user-based approach.

ReadWriteWeb: Other than Amazon and Netflix, which Internet companies have most impressed you in their implementation of recommendation systems?

Satnam: Other than Amazon and Nextflix, Google News' personalization is my personal favorite. Google News is a good example of building a scalable recommendation system for a large number of users (several million unique visitors per month) and a large number of items (several million new stories every two months), with constant item churn. This is different from Amazon's, whose rate of item churn is much lower. Google decided to use collaborative filtering for its recommendation system mainly because of its access to the data of its large user base and because this same approach could be applied to other applications, countries, and languages. A content-based recommendation system perhaps could have worked just as well, but may have required language- or location-specific tweaking. Google also wanted to leverage the same collaborative filtering technology to be able to recommend images, videos, and music, for which it's more difficult to analyze the underlying content.

Among start-ups, my personal favorite is the one we are developing at my current company, NextBio. It's not available yet but should be next month. The key point about this particular recommendation engine is its strong use of an ontology, similar in concept to tags, to develop a common vocabulary for items and users. The system then makes use of profile information and user interactions, both short- and long-term, to provide recommendations. The system leverages both item- and user-based approaches.

ReadWriteWeb: What commercial opportunities do you forsee with recommendation systems over the next few years?

Satnam: A good personalized recommendation system can mean the difference between a successful and a failed website. Given that most applications now invite users to interact and to leverage user-generated content, new content is being generated at a phenomenal rate. Showing the right content to the right user at the right time is key to creating a sticky application. I would be surprised if most successful websites did not leverage recommendation systems to provide personalized experiences to their users.

ReadWriteWeb: Your book includes a discussion of collaborative filtering. Can you tell us a bit about how this fits into the overall picture of recommendation systems?

Satnam: In recent years, an increasing amount of user interaction has provided applications with a large amount of information that can be converted into intelligence. This interaction may be in the form of ratings, blog entries, item tagging, user connections, or shared items of interest. This has led to the problem of information overload. What we need is a system that can recommend items based on the user's interests and interactions. This is where personalization and recommendation engines come in.

In my book, I take a holistic view of adding intelligence to one's application, a recommendation engine being one way to do it. The book focuses on both content-based and collaborative approaches to building recommendation systems. It focuses on capturing relevant information about the user, information from both within and outside one's application, and converting it into recommendations. One of the things you mentioned in your write-up on recommendation systems is that you would like to apply such a system to your website to recommend things to users. Someone reading my book should be able to create such a system using the techniques I demonstrate.

Next Page: Satnam's thoughts on the Netflix Prize and whether the 10% mark will ever be reached.

ReadWriteWeb: Netflix is offering $1 million to the team that can improve its recommendation algorithm by 10%. It's been over 2 years now, with the leading company at 9.63%. There is some skepticism, though, that 10% will be reached anytime soon, because now the contestants are making only incremental progress. Do you expect the 10% mark to be reached soon?

Satnam: Netflix's recommendation engine, Cinematch, uses an item-to-item algorithm (similar to Amazon's) with a number of heuristics. Given that Netflix' recommendation system has been very successful in the real world, it is pretty impressive that teams have been able to improve on it by as much as 9.63%. Of course, the Netflix competition doesn't take into account speed of implementation or the scalability of the approach. It simply focuses on the quality of recommendations in terms of closing the gap between user rating and predicted rating. So, it isn't clear whether Netflix will be able to leverage all of the innovation coming out of this competition. Also, the Netflix data doesn't contain much information to allow for a content-based approach; it's for this reason that teams are focusing on collaborative-based techniques.

The challenges to reaching the 10% mark are:

Skewed data: The data set for the competition consists of more than 100 million anonymous movie ratings, using a scale of one to five stars, made by 480,000 users for 17,770 movies. Note that the user-item data set for this problem is sparsely populated, with nearly 99% of user-item entries being zero. The distribution of movies per user is skewed. The median number of ratings per user is 93. About 10% of users rated 16 or fewer movies, while 25% of users rated 36 or fewer. Two users rated as many as 17,000 movies. Similarly, the ratings per movie are also skewed: almost half the user base rated one popular movie (Miss Congeniality); about 25% of movies had 190 or fewer ratings; and a handful of movies were rated fewer than 10 times.

The approach: The winning team, BellKor, spent more than 2,000 combined hours poring over data to find the winning solution. The winning solution was a linear combination of 107 sets of predictions. Many of the algorithms involved either the nearest-neighbor method (k-NN) or latent factor models, such as SVD/factorization and Restricted Boltzmann Machines (RBMs).

The winning solution uses k-NN to predict the rating for a user, using both the Pearson-r correlation and cosine methods to compute the similarities, with corrections to remove item-specific and user-specific biases. Latent semantic models are also widely used in the winning solution.

The BellKor team found it important to use a variety of models that compensated for each other's shortcomings. No one model alone could have gotten the BellKor team to the top of the competition. The combined set of models achieved an improvement of 8.43% over Cinematch, while the best model -- a hybrid of k-NN applied to output from RBMs -- improved the result by 6.43%. The biggest improvement by LSI methods was 5.1%, with the best pure k-NN model scoring below that. (K for the k-NN methods was in the range of 20 to 50.) The BellKor team also applied a number of heuristics to further improve the results.

The BellKor team demonstrates a number of guidelines for building a winning solution to this kind of competition:

  • Combining complementary models helps improve the overall solution. Note that a linear combination of three models, one each for k-NN, LSI, and RBM, would have yielded fairly good results, an improvement of 7.58%.
  • A principled approach is needed to optimize the solution.
  • The key to winning is building models that can accurately predict when there is sufficient data, without over-applying in the absence of adequate data.

The final solution will be along the same lines, combining multiple models with heuristics. Contestants will probably reach the magic 10% mark in the next year or two.

ReadWriteWeb: Some people think the 10% mark can't be reached with algorithms alone, but that the "human" element is required. For example, ClerkDogs is a service that hires actual former video-store clerks to "create a database that is much richer and deeper than the collaborative filtering engines." It's a similar approach to that of Pandora, which has 50 employees who listen to and tag songs. How far do you think algorithms can go in making recommendations?

Satnam: Recommendation systems are not perfect. A number of elements go into making successful ones, including approach, the speed of computing results, heuristics, the exploration and exploitation of coefficients, and so on. But it has been shown in the real world that the more personalized you can make recommendations, the higher the click-through rate, the stickier the application, and the lower the bounce rate.

Using humans to form a rich database for recommendations may work for small applications, but it would probably be too expensive to scale. I don't see them competing against each other, human versus machine. Even with human/expert recommendations, one first needs to find a human/expert with tastes similar to those of the user, especially if you want to go after the long tail.