2009-08-05

用户兴趣的周期效应,一个关于二战电影的实验

在推荐系统中,周期效应是很重要的。比如,每年到了某个日子,总有一些东西成为热门。我在Netflix数据集上做了一个实验,分析了一下二战电影在每年的哪些日子会成为热门。

我定义一个电影在某一天的热门程度为这个电影在这一天被rate的次数除以这一天所有的rating数目。我选择了像辛德勒名单,珍珠港,拯救大兵瑞恩这样的二战电影,所有的电影列表见

http://en.wikipedia.org/wiki/List_of_World_War_II_films

我计算了这些二战电影在每一天的热门程度,然后我发现,最热门的两天是:9月2号和6月5号。
这实在是太巧了,9月2号是日本投降的日子,也是二战胜利纪念日,6月5号是盟军诺曼底登陆的日子。真是太巧了,说明用户的兴趣确实是有周期性的,每年到了这两天,用户就比较喜欢看二战电影。

同时,这个结果还是具有统计上的显著性的,特别是9月2号,比其他日子显著的热门,下面是前几个日期二战电影的热门程度:

9-02 0.0096554
6-05 0.0070638
6-18 0.0068133
6-12 0.0067752
6-03 0.0067715
6-19 0.006768
8-20 0.0067643
6-04 0.0067132
6-11 0.0067011
6-09 0.0066593
7-30 0.0066514
6-02 0.0065948
6-17 0.0065529
8-14 0.0065498

2009-08-04

对TopK和预测RMSE的看法

很多人都困惑与TopK和RMSE的评测的区别,我感觉其实这两种评测方法解决的是不同的两个问题。

在设计实际的推荐系统时,我们不可能计算一个用户对所以电影的评分,然后排序,找出topK。在BellKor的论文中,他用TopK评测预测问题时, 是随机选出1000个电影,然后评分排序,得出TopK。

实际的系统中,我们需要用binary data首先找出一个候选集,这个过程其实是TopK的过程(这个过程其实不需要评分,只需要关系0-1矩 阵),然后我们计算用户对候选集中电影的评分,然后对候 选集用评分排序。所以说,topk和netflix其实不是一个问题,而是推荐系统中两个不同的问题,所以用不同的评测方法也是应该的。

在Netflix中,我不需要做TopK,因为候选集已经给定了,就是quiz。在实际系统中,我们需要先做TopK,然后用评分对TopK中的K个候选物品评分排序。

举个简单的例子。 topk是找到用户最可能看的电影,他的排名是根据用户看电影的可能性排名的,而rating是在用户可能看的电影中找出用户喜欢的电影,因为有的时候 用户也会对不喜欢的电影评分。 所以这两者结合的结果就是,找出用户最可能看,且看了之后会喜欢的电影。

推荐系统在线讨论组 Forum for resys in Google Group in China

在Google Group上建了一个推荐系统的讨论组,欢迎大家加入

I build a forum for recommender system in Google Groups, welcome to join

https://groups.google.com/group/resys

2009-08-03

Cloud Wisdom in Resys 推荐系统中的群体智慧

I think, the most important thing in design resys is not to find a single algorithm which produce the most accurate predictions and recommendations. There is no such algorithms. Users preference is very different and different types of users have different patterns. Therefore, there is no single model which can meet everyone's habit.

我认为,推荐系统中最重要的问题不是找到一个能够产生极高精度的模型,因为这样的模型是不存在的。不同的用户有不同的兴趣模式,在这个世界上,我们不可能用一个模型来规范所以人的行为。如果这种模型存在,那么政府就很好管理人民了。所以,推荐系统的主要任务是设计不同的推荐算法,然后将这些推荐算法通过一定的方式组合起来。对用户进行分类,然后对某一类用户找到比较好的算法组合,只有这种方式才能设计出高精度的推荐系统。

Netflix PrizeGitHub Contest解决的是不同的问题,前面解决的是预测问题,后面解决的是推荐问题。基本上来说,Netflix中的算法几乎是不能用到Github Contest中的(除了KNN),但是模型组合的思想是放诸四海而皆准的。在Netflix中,我们用回归来组合模型,而在Github中,我们可以通过Bagging加上一些随机优化算法来组合模型(SAGA都是著名的随机优化算法)。

2009-08-02

My Story in NetflixPrize

I am a student from China and I graduated from University of Science and Technology of China (USTC). Now, I study in Institute of Automation, Chinese Academy of Sciences (CASIA).

The reason why I choose recommender system as my research field is very accidental. I do research on search engine before and last year, I wrote a paper of how to find similar web pages in Internet and submit it to WWW09. However, it is rejected by WWW09 and the reviewer tell me this have been studied by many other researchers in resys. At that time, I know the existing of collaborative filtering and resys.

After my paper was rejected, I told with another member in our lab and we found a good conference about resys, that is "ACM Conference on Recommender Systems". After I read some papers about resys, I though resys is a good research area and then I downloaded all papers in ACM resys conference. This happened in Feb. 2009 and it was Spring Festival in China. I took all these papers to home and read them in my holiday. When I read papers, I found two data set is often used resys and CF, that is movielens and Netflix.

I tried movielens firstly because it is small. After doing some researches in movielens, I want to try Netflix data and then I attend Netflix Prize (2009-2-21).

I tried SVD firstly in NetflixPrize because I thought my computer can not store item-item matrix in memory. However, I found I was wrong and I used KNN in Netflix after SVD. I read all papers from BPC, GPT and other members in Leaderboard. The last algorithm I implemented was RBM because the paper of RBM is hard to understand. My friend Wang Yuantao from Tsing-Hua University help me realized RBM. He study math in Tsing-Hua, so understand formulas in RBM paper is easy for him. After I implement RBM, I break 0.87 in NetflixPrize. Meanwhile, I wrote a paper about my time-dependent model and this paper is accepted by WI09.

In June 26, when I woke up, I received a message from Wang and he tell me someone have break 10% line. I was very surprised because at the time, PT was in the firstly place (0.858x) and I thought they need at least half a year to break 10% line. I opened the computer and I found PT is merged with BellKor and Bigchaos. At the same time, I received two emails from Greg and Larry(Dace). They want to combine their results with me and I agreed. Then, I joined VI and Larry also joined VI in the same time. The following story everyone in VI knows and I will not write it.

2009-07-31

GitHub Contest


I am working on github contest with Daniel Haran. Unlike NetflixPrize, the github contest is a Top-K resys task. I think, it is another important task in recommender system.

Let's take movie recommender system for example. When we design a movie resys, we meet two problems:
1) Given a user, we should find which movies he/she will watch. That is finding a candidate movies set.
2) In the candidate set, we should find which movies this user will like after watching.

I think, the first task is top-k recommendation task (GitHub) and the second task is prediction task (NetflixPrize).

Solving two above tasks is the fundanmental of design good recommender system.

2009-07-28

Netflix Competitors Learn the Power of Teamwork

Published: July 27, 2009

A contest set up by Netflix, which offered a $1 million prize to anyone who could significantly improve its movie recommendation system, ended on Sunday with two teams in a virtual dead heat, and no winner to be declared until September.

But the contest, which began in October 2006, has already produced an impressive legacy. It has shaped careers, spawned at least one start-up company and inspired research papers. It has also changed conventional wisdom about the best way to build the automated systems that increasingly help people make online choices about movies, books, clothing, restaurants, news and other goods and services.

These so-called recommendation engines are computing models that predict what a person might enjoy based on statistical scoring of that person’s stated preferences, past consumption patterns and similar choices made by many others — all made possible by the ease of data collection and tracking on the Web.

“The Netflix prize contest will be looked at for years by people studying how to do predictive modeling,” said Chris Volinsky, a scientist at AT&T Research and a leader of one of the two highest-ranked teams in the competition.

The biggest lesson learned, according to members of the two top teams, was the power of collaboration. It was not a single insight, algorithm or concept that allowed both teams to surpass the goal Netflix, the movie rental company, set nearly three years ago: to improve the movie recommendations made by its internal software by at least 10 percent, as measured by predicted versus actual one-through-five-star ratings by customers.

Instead, they say, the formula for success was to bring together people with complementary skills and combine different methods of problem-solving. This became increasingly apparent as the contest evolved. Mr. Volinsky’s team, BellKor’s Pragmatic Chaos, was the longtime front-runner and the first to surpass the 10 percent hurdle. It is actually a seven-person collection of other teams, and its members are statisticians, machine learning experts and computer engineers from the United States, Austria, Canada and Israel.

When BellKor’s announced last month that it had passed the 10 percent threshold, it set off a 30-day race, under contest rules, for other teams to try to best it. That led to another round of team-merging by BellKor’s leading rivals, who assembled a global consortium of about 30 members, appropriately called the Ensemble.

Submissions came fast and furious in the last few weeks from BellKor’s and the Ensemble. Just minutes before the contest deadline on Sunday, the Ensemble’s latest entry edged ahead of BellKor’s on the public Web leader board — by one-hundredth of a percentage point.

“The contest was almost a race to agglomerate as many teams as possible,” said David Weiss, a Ph.D. candidate in computer science at the University of Pennsylvania and a member of the Ensemble. “The surprise was that the collaborative approach works so well, that trying all the algorithms, coding them up and putting them together far exceeded our expectations.”

The contestants evolved, it seems, along with the contest. When the Netflix competition began, Mr. Weiss was one of three seniors at Princeton University, including David Lin and Lester Mackey, who made up a team called Dinosaur Planet. Mr. Lin, a math major, went on to become a derivatives trader on Wall Street.

But Mr. Mackey is a Ph.D. candidate at the Statistical Artificial Intelligence Lab at the University of California, Berkeley. “My interests now have been influenced by working on the Netflix prize contest,” he said.

Software recommendation systems, Mr. Mackey said, will increasingly become common tools to help people find useful information and products amid the explosion of information and offerings competing for their attention on the Web. “A lot of these techniques will propagate across the Internet,” he predicted.

That is certainly the hope of Domonkos Tikk, a Hungarian computer scientist and a member of the Ensemble. Mr. Tikk, 39, and three younger colleagues started working on the contest shortly after it began, and in 2007 they teamed up with the Princeton group. “When we entered the Netflix competition, we had no experience in collaborative filtering,” Mr. Tikk said.

Yet based on what they learned, Mr. Tikk and his colleagues founded a start-up, Gravity, which is developing recommendation systems for commercial clients, including e-commerce Web sites and a European cellphone company.

Though the Ensemble team nudged ahead of BellKor’s on the public leader board, it is not necessarily the winner. BellKor’s, according to Mr. Volinsky, remains in first place, and Netflix contacted it on Sunday to say so.

And in an online forum, another member of the BellKor’s team, Yehuda Koren, a researcher for Yahoo in Israel, said his team had “a better test score than the Ensemble,” despite what the rival team submitted for the leader board.

So is BellKor’s the winner? Certainly not yet, according to a Netflix spokesman, Steve Swasey. “There is no winner,” he said.

A winner, Mr. Swasey said, will probably not be announced until sometime in September at an event hosted by Reed Hastings, Netflix’s chief executive. The movie rental company is not holding off for maximum public relations effect, Mr. Swasey said, but because the winner has not yet been determined.

The Web leader board, he explained, is based on what the teams submit. Next, Netflix’s in-house researchers and outside experts have to validate the teams’ submissions, poring over the submitted code, design documents and other materials. “This is really complex stuff,” Mr. Swasey said.

In Hungary, Mr. Tikk did not sound optimistic. “We didn’t get any notification from Netflix,” he said in a phone interview. “So I think the chances that we won are very slight. It was a nice try.”

2009-07-27

关于下一代推荐系统的一些看法

首先祝贺我们的队伍"The Ensemble"在leaderboard上取得第一名,最终谁赢得比赛还没有正式宣布,不过我们能在最后30天做出这样的成绩还是不错的,感谢队友们,不是他们的帮助,我永远也无法知道0.855x的结果是如何做出来的。

比赛结束了,我对现有的推荐系统做了一番思考。在中国,我一向最佩服的就是douban的推荐系统,因为他们的推荐系统设计是专业的,而不是只用了简单的方法。据说他们使用了业界的一些先进算法。我用豆瓣很久了,但是他的推荐系统还有问题,当然这个问题不是豆瓣的问题,而是推荐系统中很难解决的一些问题。

1.我们知道,单纯的collaborative filtering在实际系统中是不够的,我们需要利用内容信息,但是我们在使用内容时往往是简单的用来计算相似度。比如我们有书的作者,出版社,书名,标签信息。我们往往用这些信息来比较书的相似度,然后推荐相似的书给用户。但是我在研究中发现,用户对书的不同属性的依赖是不同的,有些用户比较信赖出版社,比如我买计算机书,只买几个著名出版社的,其他出版社的书我对他的质量不信任。也有些时候看作者,比如C++,一般只买大牛的书。但是,豆瓣的推荐系统并没有学习出我的这些喜好(应该说没有完全学习出来),他们只是学习出我喜欢C++的书,但没有学习出我对作者和出版社的要求。这一方面因为我没有提供太多的喜好数据,另一方面也是因为可能并没有进行对这些特征的学习。

2.用户评论和自然语言,我们在淘宝买东西的时候,经常喜欢看以前卖家的评论来决定我们的行为。所以下一代推荐系统设计中对论坛评论需要加以利用。当然这涉及自然语言理解中的情感分析,做到完全准确是困难的,但现有的技术足以利用评论来提高推荐的精度,不管提高多少,肯定是能提高的。

3.海量数据。我们考虑网页推荐,其实也就是个性化搜索。这个问题和电影推荐不同,在电影推荐系统中,电影数总是少于用户数的,但在个性化搜索中,用户数是远远小于网页数的。在这种情况下,我觉得聚类是最有效的,我们很难学习出用户对特定网页的喜好,计算量太大。但是我们还是可以先对网页聚类,然后学习出用户的不同类型网页的态度(这个类型可以是基于内容的,也可以是基于界面的,或者基于域名,总之聚类的方法很多)。而且对于用户也很多的系统,比如google,我们也可以对用户聚类,学习出特定类的用户对特定类网页的喜好。这在设计大型系统中,可以作为一个baseline。

所以,我认为,未来推荐系统需要解决的3个问题就是
1)如何结合内容特征
2)如何理解用户的自然语言
3)如何处理海量数据

2009-07-17

Ruby

最近在和一个团队设计一个推荐系统。他们向我推荐了Ruby。这个语言我以前从QQP那儿听到过,我感觉我已经学了太多的语言(C++,Java,Python,PHP)。所以当时就没有用Ruby。不过既然这个工程需要用,昨天就买了一本书看了看。

语法很快就学会了,同时用它实现了一下SVD++模型。速度看当然是不如C++,不过和python差不多,库很多,做很多其他事情比较容易。他们向我推荐了JRuby,说这个的速度已经可以和C++媲美了,但愿如此。

我觉得现在的程序员,应该精通一门语言,熟悉2-3种语言,了解5-6种语言的语法。最近新语言太多了。

P.S. Netflix Prize还有10天就结束了,得抓紧啊,希望还是有的,嘿嘿!最近我在研究用户聚类,感觉不错!

博客受难记

这个博客一会儿能访问,一会儿不能访问,中途我也试过国产的博客,但总觉得他们的界面不适合我,太像90后小孩子的节面了,我还是喜欢简洁的,能自己控制的。所以即使这个博客在中国不能访问,我也懒的换了,我相信总有一天是能够访问的。

而且一般看我的博客的人都是会翻墙的,所以也没有换的必要,嘿嘿。昨天我也学会了怎么翻,所以又可以更新了,很爽!

我觉得有些网站封也就算了,blogspot不应该都封掉,至少像我这种谈谈技术的,一点都不黄色也不反动,sigh...

2009-07-16

写个博客真不容易

现在能上的网是越来越少了,sigh!!