2009-03-04
最近有关recommender system和collaborative filtering的会议
Important Dates:
Long and short papers due: May 8
Author notification: June 19 (6 weeks)
2nd round short papers: June 26
Notification 2nd round: July 17
Camera-ready copy: August 17
Web Intelligence 2009
IMPORTANT DATES
Workshop proposal submission: January 15, 2009
Electronic submission of full papers: April 10, 2009
Tutorial proposal submission: April 10, 2009
Workshop paper submission: April 30, 2009
Notification of paper acceptance: June 3, 2009
Camera-ready copies of accepted papers: June 30, 2009
Workshops: September 15, 2009
Conference: September 15-18, 2009
2009-02-24
我在NetflixPrize的进展

NetflixPrize是一个collaborative filtering的比赛,目的在于设计出更好的推荐系统。我上周用它的数据集测试了我的算法,因为参数时间还短,目前结果不是很理想。下面将我的方法和一些已知的结果公布一下。
目前我使用的是SVD的方法,用这个方法,是因为这个方法比较快,需要的内存不大(3G)左右。至于kNN的方法,我的计算相似度矩阵的算法还比较耗时(有人说这个步骤可以很快),所以我先尝试了SVD的方法。
我目前的模型用的是最简单的svd模型:
r(u,i) = mean + b(u) + b(i) + <p(u), q(i)>
用梯度下降法优化。
6fa8f2de
在训练时,可以probe数据集里面的数据是包含在train里的,我们计算推广误差的时候,需要在train-probe的数据集上训练。但是在计算quiz的时候,还是要在整个train上做训练,否则精度相差还是很大的。
2009/02/24 d = 250,学习速率0.0055,正则化参数0.002,RMSE = 0.904
2009/02/25
今天用一种新的方法计算item-based算法中的相关系数,只需要3个小时(包含读取文件的时间)。
详细进展和介绍在 http://xlnetflixprize.blogspot.com/
2009-02-16
推荐系统和协同过滤面临的主要问题




数据稀疏
协同过滤的精度主要取决于用户数据的多少。如果一个系统有很多用户的历史数据,他就能更好的对用户的喜欢做出预测。所以,目前推荐系统做的最好的都是那些有着很大量用户数据的公司,比如Google, Yahoo, Netflix, Amazon等等。但是,即使拥有很多数据,数据还是不够多,因为推荐系统的历史还不够长,还没有积累足够的数据。在目前处理稀疏数据的算法中,软性SVD是一种最好的方法。
新用户问题
这个问题和数据稀疏问题有一些相似性,他是指如何对新用户做出推荐。当一个新用户进入一个网络时,我们对他的兴趣爱好还一无所知,这时如何做出推荐是一个很重要的问题。一般在这个时候,我们只是向用户推荐那写普遍反映比较好的物品,也就是说,推荐完全是基于物品的。
新用户问题还有一个变种就是长尾(long tail)问题,在Amazon中,不是所有的用户都对很多书给出了评分,很多用户只给少数的书给出了评分,这些用户就处在一个长尾中,如何处理那些不太表露自己兴趣的用户,也是推荐系统的一个主要问题。
隐性喜好发现
在现在的推荐系统中,用户的喜欢是通过用户对某些物品进行评分获得的。这种获得用户兴趣的方法是一种很直接的方法。但在实际的互联网中,用户有很多隐性的方法表露他们的喜欢。比如用户的文字评论,我们可以通过自然语言处理从用户的评论中获得用户的兴趣;或者是用户的浏览行为,比如用户长时间的浏览一个物品,或者用户经常浏览一个物品,或者用户
购买了一个物品,这些行为都可以作为模式识别系统中的特征。
所以,发现用户的隐性喜好,相对于模式识别的特征提取,这方面的研究也很热门。
用户兴趣的变化
我们知道,用户的兴趣不是永远不变的,随着年龄和阅历的变化,用户的行为会发生变化。也就是说,协同过滤其实还应该加入一个时间因子。目前对于变化的用户兴趣的研究还处于起步阶段,主要是因为现有的系统历史都不是很久,大多数用户的兴趣还是比较稳定的,但是随着互联网的发展,用户兴趣的变化对推荐系统的影响将会越来越明显,所以这方面的研究也将越来越重要。
偏激的用户和全新的物品
我们知道,这个世界上有一些用户是很偏激的。他们和大多数人的观点是相反的。对于这种用户,现有的推荐系统做出的预测往往是很差的。如何处理偏激的用户,是推荐系统中的一个重要问题。
和偏激用户相对应的,是全新的物品。比如有一部新电影,他是颠覆性的,和以前的电影都不太相似。用户对于这个电影的爱好和用户以前的兴趣是没有太大关系的,因为用户从来没见过这种电影,这个问题也是导致现有的推荐系统精度不高的主要原因。
马太效应以及推荐系统对互联网的影响
我们知道,被推荐系统所推荐的物品将会越来越热门,这就导致了大量很好的物品可能会被推荐系统所淹没。在互联网中,物品实在是太多了,而推荐系统只能推荐有限的物品。解决这个问题的主要方法是增加推荐系统的多样性,比如一个推荐系统发现一个用户非常喜欢吃德芙巧克力,那么他给这个用户推荐10个产品,不需要都是德芙巧克力,也可以推荐别的一些巧克力,或者一些和巧克力相似的甜品。在推荐时,不仅要推荐用户喜欢的东西,而且要通过推荐让用户喜欢一些东西,有的时候,用户自己也不知道他喜欢什么,通过推荐系统,他可能会发现一些新东西他比较喜欢。
推荐系统中的作弊
只要涉及到经济利益,就有人作弊。搜索引擎作弊是一个被研究了很久的问题,因为在搜索引擎中,自己的网站排名越高,就能获得越多的经济利益。在推荐系统中也是如此,比如在淘宝中,如果一个卖家的物品经常被推荐,他就可能获得很多经济利益。这样,很多电子商务的推荐系统都遭受到了作弊的干扰,一些人通过一些技术手段,对自己卖的物品给出非常高的评分,这就是一种作弊行为。
推荐系统中的作弊在电子商务网站中越来越严重,特别是在美国这种互联网比较发达的国家,已经受到一些研究者的重视。作弊行为相当于人为的向系统中注入了噪声。目前解决作弊的算法主要是基于信任度和信用的。现在很多电子商务网站都引入了信用系统,比如淘宝等等。如何设计信用系统和推荐系统更好的融合,是一个重要的研究问题。
2009-02-09
If You Liked This, You’re Sure to Love That
THE “NAPOLEON DYNAMITE” problem is driving Len Bertoni crazy. Bertoni is a 51-year-old “semiretired” computer scientist who lives an hour outside Pittsburgh. In the spring of 2007, his sister-in-law e-mailed him an intriguing bit of news: Netflix, the Web-based DVD-rental company, was holding a contest to try to improve Cinematch, its “recommendation engine.” The prize: $1 million.
(Netflix举行了一个推荐系统的比赛,如果谁能将Cinemath的预测结果提高10%,将会获得100万美金,目前最好的提高是9.63%,由Len Bertoni做出)
Cinematch is the bit of software embedded in the Netflix Web site that analyzes each customer’s movie-viewing habits and recommends other movies that the customer might enjoy. (Did you like the legal thriller “The Firm”? Well, maybe you’d like “Michael Clayton.” Or perhaps “A Few Good Men.”) The Netflix Prize goes to anyone who can make Cinematch’s predictions 10 percent more accurate. One million dollars might sound like an awfully big prize for such a small improvement. But in fact, Netflix’s founders tried for years to improve Cinematch, with only incremental results, and they knew that a 10 percent bump would be a challenge for even the most deft programmer. They also knew that, as Reed Hastings, the chief executive of Netflix, told me recently, “getting to 10 percent would certainly be worth well in excess of $1 million” to the company. The competition was announced in October 2006, and no one has won yet, though 30,000 hackers worldwide are hard at work on the problem. Each day, teams submit their updated solutions to the Netflix Prize Web page, and Netflix instantly calculates how much better than Cinematch they are. (There’s even a live “leader board” ranking the top contestants.)
In March 2007, Bertoni decided he wanted to give it a crack. So he downloaded a huge set of data that Netflix put online: an enormous list showing how 480,189 of the company’s customers rated 17,770 Netflix movies. When Netflix customers log into their accounts, they can rate any movie from one to five stars, to help “teach” the Netflix system what their preferences are; the average customer has rated around 200 movies, so Netflix has a lot of information about what its customers like and don’t like. (The data set doesn’t include any personal information — names, ages, location and gender have been stripped out.) So Bertoni began looking for patterns that would predict customer behavior — specifically, an algorithm that would guess correctly the number of stars a given user would apply to a given movie. A year and a half later, Bertoni is still going, often spending 20 hours a week working on it in his home office. His two children — 12 and 13 years old — sometimes sit and brainstorm with him. “They’re very good with mathematics and algebra,” he told me, chuckling. “And they think of interesting questions about your movie-watching behavior.” For example, one day the kids wondered about sequels: would a Netflix user who liked the first two “Matrix” movies be just as likely to enjoy the third one, even though it was widely considered to be pretty dreadful?
Each time he or his kids think of a new approach, Bertoni writes a computer program to test it. Each new algorithm takes on average three or four hours to churn through the data on the family’s “quad core” Gateway computer. Bertoni’s results have gradually improved. When I last spoke to him, he was at No. 8 on the leader board; his program was 8.8 percent better than Cinematch. The top team was at 9.44 percent. Bertoni said he thought he was within striking distance of victory.
But his progress had slowed to a crawl. The more Bertoni improved upon Netflix, the harder it became to move his number forward. This wasn’t just his problem, though; the other competitors say that their progress is stalling, too, as they edge toward 10 percent. Why?
Bertoni says it’s partly because of “Napoleon Dynamite,” an indie comedy from 2004 that achieved cult status and went on to become extremely popular on Netflix. It is, Bertoni and others have discovered, maddeningly hard to determine how much people will like it. When Bertoni runs his algorithms on regular hits like “Lethal Weapon” or “Miss Congeniality” and tries to predict how any given Netflix user will rate them, he’s usually within eight-tenths of a star. But with films like “Napoleon Dynamite,” he’s off by an average of 1.2 stars.The reason, Bertoni says, is that “Napoleon Dynamite” is very weird and very polarizing. It contains a lot of arch, ironic humor, including a famously kooky dance performed by the titular teenage character to help his hapless friend win a student-council election. It’s the type of quirky entertainment that tends to be either loved or despised. The movie has been rated more than two million times in the Netflix database, and the ratings are disproportionately one or five stars.
(Napoleon Dynamite 是一部电影,人们对这部电影的看法非常两极分化,在Netflix中,这部电影被200万用户打分,但是分数在1-5个等级中杂乱的分布,没有什么规律)
Worse, close friends who normally share similar film aesthetics often heatedly disagree about whether “Napoleon Dynamite” is a masterpiece or an annoying bit of hipster self-indulgence. When Bertoni saw the movie himself with a group of friends, they argued for hours over it. “Half of them loved it, and half of them hated it,” he told me. “And they couldn’t really say why. It’s just a difficult movie.”
Mathematically speaking, “Napoleon Dynamite” is a very significant problem for the Netflix Prize. Amazingly, Bertoni has deduced that this single movie is causing 15 percent of his remaining error rate; or to put it another way, if Bertoni could anticipate whether you’d like “Napoleon Dynamite” as accurately as he can for other movies, this feat alone would bring him 15 percent of the way to winning the $1 million prize. And while “Napoleon Dynamite” is the worst culprit, it isn’t the only troublemaker. A small subset of other titles have caused almost as much bedevilment among the Netflix Prize competitors. When Bertoni showed me a list of his 25 most-difficult-to-predict movies, I noticed they were all similar in some way to “Napoleon Dynamite” — culturally or politically polarizing and hard to classify, including “I Heart Huckabees,” “Lost in Translation,” “Fahrenheit 9/11,” “The Life Aquatic With Steve Zissou,” “Kill Bill: Volume 1” and “Sideways.”
So this is the question that gently haunts the Netflix competition, as well as the recommendation engines used by other online stores like Amazon and iTunes. Just how predictable is human taste, anyway? And if we can’t understand our own preferences, can computers really be any better at it?
IT USED TO BE THAT if you wanted to buy a book, rent a movie or shop for some music, you had to rely on flesh-and-blood judgment — yours, or that of someone you trusted. You’d go to your local store and look for new stuff, or you might just wander the aisles in what librarians call a stack search, to see if anything jumped out at you. You might check out newspaper reviews or consult your friends; if you were lucky, your local video store employed one of those young cinéastes who could size you up in a glance and suggest something suitable.
The advent of online retailing completely upended this cultural and economic ecosystem. First of all, shopping over the Web is not a social experience; there are no clever clerks to ask for advice. What’s more, because they have no real space constraints, online stores like Amazon or iTunes can stock millions of titles, making a stack search essentially impossible. This creates the classic problem of choice: how do you decide among an effectively infinite number of options?
But Web sites have this significant advantage over brick-and-mortar stores: They can track everything their customers do. Every page you visit, every purchase you make, every item you rate — it is all recorded. In the early ’90s, scientists working in the field of “machine learning” realized that this enormous trove of data could be used to analyze patterns in people’s taste. In 1994, Pattie Maes, an M.I.T. professor, created one of the first recommendation engines by setting up a Web site where people listed songs and bands they liked. Her computer algorithm performed what’s known as collaborative filtering. It would take a song you rated highly, find other people who had also rated it highly and then suggest you try a song that those people also said they liked.
“We had this realization that if we gathered together a really large group of people, like thousands or millions, they could help one another find things, because you can find patterns in what they like,” Maes told me recently. “It’s not necessarily the one, single smart critic that is going to find something for you, like, ‘Go see this movie, go listen to this band!’ ”
In one sense, collaborative filtering is less personalized than a store clerk. The clerk, in theory anyway, knows a lot about you, like your age and profession and what sort of things you enjoy; she can even read your current mood. (Are you feeling lousy? Maybe it’s not the day for “Apocalypse Now.”) A collaborative-filtering program, in contrast, knows very little about you — only what you’ve bought at a Web site and whether you rated it highly or not. But the computer has numbers on its side. It may know only a little bit about you, but it also knows a little bit about a huge number of other people. This lets it detect patterns we often cannot see on our own. For example, Maes’s music-recommendation system discovered that people who like classical music also like the Beatles. It is an epiphany that perhaps make sense when you think about it for a second, but it isn’t immediately obvious.
Soon after Maes’s work made its debut, online stores quickly understood the value of having a recommendation system, and today most Web sites selling entertainment products have one. Most of them use some variant of collaborative filtering — like Amazon’s “Customers Who Bought This Item Also Bought” function. Some setups ask you to actively rate products, as Netflix does. But others also rely on passive information. They keep track of your everyday behavior, looking for clues to your preferences. (For example, many music-recommendation engines — like the Genius feature on Apple’s iTunes, Microsoft’s Mixview music recommender or the Audioscrobbler program at Last.fm — can register every time you listen to a song on your computer or MP3 player.) And a few rare services actually pay people to evaluate products; the Pandora music-streaming service has 50 employees who listen to songs and tag them with descriptors — “upbeat,” “minor key,” “prominent vocal harmonies.”
Netflix came late to the party. The company opened for business in 1997, but for the first three years it offered no recommendations. This wasn’t such a big problem when Netflix stocked only 1,000 titles or so, because customers could sift through those pretty quickly. But Netflix grew, and today, it stocks more than 100,000 movies. “I think that once you get beyond 1,000 choices, a recommendation system becomes critical,” Hastings, the Netflix C.E.O., told me. “People have limited cognitive time they want to spend on picking a movie.”
Cinematch was introduced in 2000, but the first version worked poorly — “a mix of insightful and boneheaded recommendations,” according to Hastings. His programmers slowly began improving the algorithms. They could tell how much better they were getting by trying to replicate how a customer rated movies in the past. They took the customer’s ratings from, say, 2001, and used them to predict their ratings for 2002. Because Netflix actually had those later ratings, it could discern what a “perfect” prediction would look like. Soon, Cinematch reached the point where it could tease out some fairly nuanced — and surprising — connections. For example, it found that people who enjoy “The Patriot” also tend to like “Pearl Harbor,” which you’d expect, since they’re both history-war-action movies; but it also discovered that they like the heartstring-tugging drama “Pay It Forward” and the sci-fi movie “I, Robot.”
Cinematch has, in fact, become a video-store roboclerk: its suggestions now drive a surprising 60 percent of Netflix’s rentals. It also often steers a customer’s attention away from big-grossing hits toward smaller, independent movies. Traditional video stores depend on hits; just-out-of-the-theaters blockbusters account for 80 percent of what they rent. At Netflix, by contrast, 70 percent of what it sends out is from the backlist — older movies or small, independent ones. A good recommendation system, in other words, does not merely help people find new stuff. As Netflix has discovered, it also spurs them to consume more stuff.
For Netflix, this is doubly important. Customers pay a flat monthly rate, generally $16.99 (although cheaper plans are available), to check out as many movies as they want. The problem with this business model is that new members often have a couple of dozen movies in mind that they want to see, but after that they’re not sure what to check out next, and their requests slow. And a customer paying $17 a month for only one movie every month or two is at risk of canceling his subscription; the plan makes financial sense, from a user’s point of view, only if you rent a lot of movies. (My wife and I once quit Netflix for precisely this reason.) Every time Hastings increases the quality of Cinematch even slightly, it keeps his customers active.
But by 2006, Cinematch’s improving performance had plateaued. Netflix’s programmers couldn’t go any further on their own. They suspected that there was a big breakthrough out there; the science of recommendation systems was booming, and computer scientists were publishing hundreds of papers each year on the subject. At a staff meeting in the summer of 2006, Hastings suggested a radical idea: Why not have a public contest? Netflix’s recommendation system was powered by the wisdom of crowds; now it would tap the wisdom of crowds to get better too.
AS HASTINGS HOPED, the contest has galvanized nerds around the world. The Top 10 list for the Netflix Prize currently includes a group of programmers in Austria (who are at No. 2), a trained psychologist and Web consultant in Britain who uses his teenage daughter to perform his calculus (No. 9), a lone Ph.D. candidate in Boston who calls himself My Brain and His Chain (a reference to a Ben Folds song; he’s at No. 6) and Pragmatic Theory — two French-Canadian guys in Montreal (No. 3). Nearly every team is working on the prize in its spare time. In October, when I dropped by the house of Martin Chabbert, a 32-year-old member of the Pragmatic Theory duo, it was only 8:30 at night, but we had to whisper: his four children, including a 2-month-old baby, had just gone to bed upstairs. In his small dining room, a laptop sat open next to children’s books like “Les Robots: Au Service de L’homme” and a “Star Wars” picture book in French.
“This is where I do everything,” Chabbert said. “After the kids are asleep and I’ve packed the lunches for school, I come down at 9 in the evening and work until 11 or 12. It was very exciting in the beginning!” He laughed. “It still is, but with the baby now, going to bed at midnight is not a good idea.”
Pragmatic Theory formed last spring, when Chabbert’s longtime friend Martin Piotte — a 43-year-old electrical and computer engineer — heard about the Netflix Prize. Like many of the amateurs trying to win the $1 million, they had no relevant expertise. (“Absolutely no background in statistics that was useful,” Piotte told me ruefully. “Two guys, absolutely no clue.”) But they soon discovered that the Netflix competition is a fairly collegial affair. The company hosts a discussion board devoted to the prize, and competitors frequently help one another out — discussing algorithms they’ve tried and publicly brainstorming new ways to improve their work, sometimes even posting reams of computer code for anyone to use. When someone makes a breakthrough, pretty soon every other team is aware of it and starts using it, too. Piotte and Chabbert soon learned the major mathematical tricks that had propelled the leading teams into the Top 10.
The first major breakthrough came less than a month into the competition. A team named Simon Funk vaulted from nowhere into the No. 4 position, improving upon Cinematch by 3.88 percent in one fell swoop. Its secret was a mathematical technique called singular value decomposition. It isn’t new; mathematicians have used it for years to make sense of prodigious chunks of information. But Netflix never thought to try it on movies.
Singular value decomposition works by uncovering “factors” that Netflix customers like or don’t like. Say, for example, that “Sleepless in Seattle” has been rated by 200,000 Netflix users. In one sense, this is just a huge list of numbers — user No. 452 gave it two stars; No. 985 gave it five stars; and so on. But you could also think of those ratings as individual reactions to various aspects of the movie. “Sleepless in Seattle” is a “chick flick,” a comedy, a star vehicle for Tom Hanks; each customer is reacting to how much — or how little — he or she likes “chick flicks,” comedies and Tom Hanks. Singular value decomposition takes the mass of Netflix data — 17,770 movies, ratings by 480,189 users — and automatically sorts the films. The programmers do not actively tell the computer what to look for; they just run the algorithm until it groups together movies that share qualities with predictive value.
Sometimes when you look at the clusters of movies, you can deduce the connections. Chabbert showed me one list: at the top were “Sleepless in Seattle,” “Steel Magnolias” and “Pretty Woman,” while at the bottom were “Star Trek” movies. Clearly, the computer recognized some factor that suggests that someone who likes the romantic aspect of “Pretty Woman” will probably like “Sleepless in Seattle” and dislike “Star Trek.” Chabbert showed me another cluster: this time DVD collections of the TV show “Friends” all clustered at the top of the list, while action movies like “Reindeer Games” and thrillers like “Hannibal” clustered at the bottom. Most likely, the computer had selected for “comic” content here. Other lists appear to group movies based on whether they lean strongly to the ideological right or left.
As programmers extract more and more values, it becomes possible to draw exceedingly sophisticated correlations among movies and hence to offer incredibly nuanced recommendations. “We’re teasing out very subtle human behaviors,” said Chris Volinsky, a scientist with AT&T in New Jersey who is one of the most successful Netflix contestants; his three-person team held the No. 1 position for more than a year. His team relies, in part, on singular value decomposition. “You can find things like ‘People who like action movies, but only if there’s a lot of explosions, and not if there’s a lot of blood. And maybe they don’t like profanity,’ ” Volinsky told me when we spoke recently. “Or it’s like ‘I like action movies, but not if they have Keanu Reeves and not if there’s a bus involved.’ ”
MOST OF THE LEADING TEAMS competing for the Netflix Prize now use singular value decomposition. Indeed, given how quickly word of new breakthroughs spreads among the competitors, virtually every team in the Top 10 makes use of similar mathematical ploys. The only thing that separates their scores is how skillfully they tweak their algorithms. The Netflix Prize has come to resemble a drag race in which everyone drives the same car, with only tiny modifications to the fuel injection. Yet those tweaks are crucial. Since the top teams are so close — there is less than a tenth of a percent between each contender — even tiny improvements can boost a team to the top of the charts.
These days, the competitors spend much of their time thinking deeply about the math and psychology behind recommendations. For example, the teams are grappling with the problem that over time, people can change how sternly or leniently they rate movies. Psychological studies show that if you ask someone to rate a movie and then, a month later, ask him to do so again, the rating varies by an average of 0.4 stars. “The question is why,” Len Bertoni said to me. “Did you just remember it differently? Did you see something in between? Did something change in your life that made you rethink it?” Some teams deal with this by programming their computers to gradually discount older ratings.
Another common problem is identifying overly punitive raters. If you’re a really harsh critic and I’m a much more easygoing one, your two-star rating may be equal to my four-star rating. To compensate, an algorithm might try to detect when a Netflix customer tends to hand out only one- or two-star ratings — a sign of a strict, pursed-lip customer — and artificially boost his or her ratings by a half-star or so. Then there’s the problem of movie raters who simply aren’t consistent. They might be evenhanded most of the time, but if they log into Netflix when they’re in a particularly bad mood, they might impulsively decide to rate a couple of dozen movies harshly.
TV shows, which are hot commodities on Netflix, present yet another perplexing issue. Customers respond to TV series much differently than they do to movies. People who loved the first two seasons of “The Wire” might start getting bored during the third but keep on watching for a while, then stop abruptly. So when should Cinematch stop recommending “The Wire”? When do you tell someone to give up on a TV show?
Interestingly, the Netflix Prize competitors do not know anything about the demographics of the customers whose taste they’re trying to predict. The teams sometimes argue on the discussion board about whether their predictions would be better if they knew that customer No. 465 is, for example, a 23-year-old woman in Arizona. Yet most of the leading teams say that personal information is not very useful, because it’s too crude. As one team pointed out to me, the fact that I’m a 40-year-old West Village resident is not very predictive. There’s little reason to think the other 40-year-old men on my block enjoy the same movies as I do. In contrast, the Netflix data are much more rich in meaning. When I tell Netflix that I think Woody Allen’s black comedy “Match Point” deserves three stars but the Joss Whedon sci-fi film “Serenity” is a five-star masterpiece, this reveals quite a lot about my taste. Indeed, Reed Hastings told me that even though Netflix has a good deal of demographic information about its users, the company does not currently use it much to generate movie recommendations; merely knowing who people are, paradoxically, isn’t very predictive of their movie tastes.
As the teams have grown better at predicting human preferences, the more incomprehensible their computer programs have become, even to their creators. Each team has lined up a gantlet of scores of algorithms, each one analyzing a slightly different correlation between movies and users. The upshot is that while the teams are producing ever-more-accurate recommendations, they cannot precisely explain how they’re doing this. Chris Volinsky admits that his team’s program has become a black box, its internal logic unknowable.
There’s a sort of unsettling, alien quality to their computers’ results. When the teams examine the ways that singular value decomposition is slotting movies into categories, sometimes it makes sense to them — as when the computer highlights what appears to be some essence of nerdiness in a bunch of sci-fi movies. But many categorizations are now so obscure that they cannot see the reasoning behind them. Possibly the algorithms are finding connections so deep and subconscious that customers themselves wouldn’t even recognize them. At one point, Chabbert showed me a list of movies that his algorithm had discovered share some ineffable similarity; it includes a historical movie, “Joan of Arc,” a wrestling video, “W.W.E.: SummerSlam 2004,” the comedy “It Had to Be You” and a version of Charles Dickens’s “Bleak House.” For the life of me, I can’t figure out what possible connection they have, but Chabbert assures me that this singular value decomposition scored 4 percent higher than Cinematch — so it must be doing something right. As Volinsky surmised, “They’re able to tease out all of these things that we would never, ever think of ourselves.” The machine may be understanding something about us that we do not understand ourselves.
Yet it’s clear that something is still missing. Volinsky’s momentum has slowed down significantly, as everyone else’s has. There’s some X factor in human judgment that the current bunch of algorithms isn’t capturing when it comes to movies like “Napoleon Dynamite.” And the problem looms large. Bertoni is currently at 8.8 percent; he says that a small group of mainly independent movies represents more than half of the remaining errors in the way of winning the prize. Most teams suspect that continuing to tweak existing algorithms won’t be enough to get to 10 percent. They need another breakthrough — some way to digitally replicate the love/hate dynamic that governs hard-to-pigeonhole indie films.
“This last half-percent really is the Mount Everest,” Volinsky said. “It’s going to take one of these ‘aha’ moments.”
SOME COMPUTER SCIENTISTS think the “Napoleon Dynamite” problem exposes a serious weakness of computers. They cannot anticipate the eccentric ways that real people actually decide to take a chance on a movie.
The Cinematch system, like any recommendation engine, assumes that your taste is static and unchanging. The computer looks at all the movies you’ve rated in the past, finds the trend and uses that to guide you. But the reality is that our cultural tastes evolve, and they change in part because we interact with others. You hear your friends gushing about “Mad Men,” so eventually — even though you have never had any particular interest in early-’60s America — you give it a try. Or you go into the video store and run into a particularly charismatic clerk who persuades you that you really, really have to give “The Life Aquatic With Steve Zissou” a chance.
As Gavin Potter, a Netflix Prize competitor who lives in Britain and is currently in ninth place, pointed out to me, a computerized recommendation system seeks to find the common threads in millions of people’s recommendations, so it inherently avoids extremes. Video-store clerks, on the other hand, are influenced by their own idiosyncrasies. Even if they’re considering your taste to make a suitable recommendation, they can’t help relying on their own sense of what’s good and bad. They’ll make more mistakes than the Netflix computers — but they’re also more likely to have flashes of inspiration, like pointing you to “Napoleon Dynamite” at just the right moment.
“If you use a computerized system based on ratings, you will tend to get very relevant but safe answers,” Potter says. “If you go with the movie-store clerk, you will get more unpredictable but potentially more exciting recommendations.”
Another critic of computer recommendations is, oddly enough, Pattie Maes, the M.I.T. professor. She notes that there’s something slightly antisocial — “narrow-minded” — about hyperpersonalized recommendation systems. Sure, it’s good to have a computer find more of what you already like. But culture isn’t experienced in solitude. We also consume shows and movies and music as a way of participating in society. That social need can override the question of whether or not we’ll like the movie.
“You don’t want to see a movie just because you think it’s going to be good,” Maes says. “It’s also because everyone at school or work is going to be talking about it, and you want to be able to talk about it, too.” Maes told me that a while ago she rented a “Sex and the City” DVD from Netflix. She suspected she probably wouldn’t really like the show. “But everybody else was constantly talking about it, and I had to know what they were talking about,” she says. “So even though I would have been embarrassed if Netflix suggested ‘Sex and the City’ to me, I’m glad I saw it, because now I get it. I know all the in-jokes.”
Maes suspects that in the future, computer-based reasoning will become less important for online retailers than social-networking tools that tap into the social zeitgeist, that let customers see, in Facebook fashion, for example, what their close friends are watching and buying. (Potter has an even more intriguing idea. He says he thinks that a recommendation system could predict cultural microtrends by monitoring news events. His research has found, for example, that people rent more movies about Wall Street when the stock market drops.) In the world of music, there are already several innovative recommendation services that try to analyze buzz — by monitoring blogs for repeated mentions of up-and-coming bands, or by sifting through millions of people’s playlists to see if a new band is suddenly getting a lot of attention.
Of course, for a company like Netflix, there’s a downside to pushing exciting-but-risky movie recommendations on viewers. If Netflix tries to stretch your taste by recommending more daring movies, it also risks annoying customers. A bad movie recommendation can waste an evening.
Is there any way to find a golden mean? When I put the question to Reed Hastings, the Netflix C.E.O., he told me he suspects that there won’t be any simple answer. The company needs better algorithms; it needs breakthrough techniques like singular value decomposition, with the brilliant but inscrutable insights it enables. But Hastings also says he thinks Maes is right, too, and that social-networking tools will become more useful. (Netflix already has one, in fact — an application that lets users see what their family and peers are renting. But Hastings admits it hasn’t been as valuable as computerized intelligence; only a very small percentage of rentals are driven by what friends have chosen.) Hastings is even considering hiring cinephiles to watch all 100,000 movies in the Netflix library and write up, by hand, pages of adjectives describing each movie, a cloud of tags that would offer a subjective view of what makes films similar or dissimilar. It might imbue Cinematch with more unpredictable, humanlike intelligence.
“Human beings are very quirky and individualistic, and wonderfully idiosyncratic,” Hastings says. “And while I love that about human beings, it makes it hard to figure out what they like.”
Recommendation Systems: An Interview with Satnam Alag
We had the opportunity to speak with Satnam Alag, author of the recently published Collective Intelligence in Action, about what makes for a good recommendation system, where the technology is heading, and why Netflix is finding it so hard to improve its own system.
Disclosure: I wrote the forward to 'Collective Intelligence in Action', however I have absolutely no financial interest in the book.
ReadWriteWeb: In our recent post about Netflix, we identified four main approaches to recommendations: Personalized recommendation: based on prior behavior of the user; Social recommendation: based on prior behavior of similar users; Item recommendation: based on the item itself; And a combination of all three. Do you agree with the four approaches we laid out in our article?
Satnam: Those four categories are pretty comprehensive. I present an alternate classification of recommendation systems in my book. I lay out two fundamental approaches. The first approach, item-based analysis, determines items that are related to a particular item. When a user likes a particular item, related ones are recommended. The second approach, user-based analysis, first determines users who are similar to that user.
Further, there are two main approaches to finding similar items and similar users. For the first, content-based analysis, content associated with the item, especially text, is used to compute similarity. In the second, the collaborative approach, actions such as ratings, bookmarking, and so forth are used to find similar items. For the second, user-based analysis, a number of approaches have been taken, including ones based on profile information, user actions, and lists of the user's friends or contacts. Of course, you can combine any these item/user and content/collaborative approaches to build a recommendation system.
The dimensions of the particular item and user space are helpful in deciding whether to use an item-based or user-based approach. Typically, an item-based approach is used to bootstrap one's application when the number of users is small. As the user base grows, the item-based approach is augmented by a user-based approach.
ReadWriteWeb: Other than Amazon and Netflix, which Internet companies have most impressed you in their implementation of recommendation systems?
Satnam: Other than Amazon and Nextflix, Google News' personalization is my personal favorite. Google News is a good example of building a scalable recommendation system for a large number of users (several million unique visitors per month) and a large number of items (several million new stories every two months), with constant item churn. This is different from Amazon's, whose rate of item churn is much lower. Google decided to use collaborative filtering for its recommendation system mainly because of its access to the data of its large user base and because this same approach could be applied to other applications, countries, and languages. A content-based recommendation system perhaps could have worked just as well, but may have required language- or location-specific tweaking. Google also wanted to leverage the same collaborative filtering technology to be able to recommend images, videos, and music, for which it's more difficult to analyze the underlying content.
Among start-ups, my personal favorite is the one we are developing at my current company, NextBio. It's not available yet but should be next month. The key point about this particular recommendation engine is its strong use of an ontology, similar in concept to tags, to develop a common vocabulary for items and users. The system then makes use of profile information and user interactions, both short- and long-term, to provide recommendations. The system leverages both item- and user-based approaches.
ReadWriteWeb: What commercial opportunities do you forsee with recommendation systems over the next few years?
Satnam: A good personalized recommendation system can mean the difference between a successful and a failed website. Given that most applications now invite users to interact and to leverage user-generated content, new content is being generated at a phenomenal rate. Showing the right content to the right user at the right time is key to creating a sticky application. I would be surprised if most successful websites did not leverage recommendation systems to provide personalized experiences to their users.
ReadWriteWeb: Your book includes a discussion of collaborative filtering. Can you tell us a bit about how this fits into the overall picture of recommendation systems?
Satnam: In recent years, an increasing amount of user interaction has provided applications with a large amount of information that can be converted into intelligence. This interaction may be in the form of ratings, blog entries, item tagging, user connections, or shared items of interest. This has led to the problem of information overload. What we need is a system that can recommend items based on the user's interests and interactions. This is where personalization and recommendation engines come in.
In my book, I take a holistic view of adding intelligence to one's application, a recommendation engine being one way to do it. The book focuses on both content-based and collaborative approaches to building recommendation systems. It focuses on capturing relevant information about the user, information from both within and outside one's application, and converting it into recommendations. One of the things you mentioned in your write-up on recommendation systems is that you would like to apply such a system to your website to recommend things to users. Someone reading my book should be able to create such a system using the techniques I demonstrate.
Next Page: Satnam's thoughts on the Netflix Prize and whether the 10% mark will ever be reached.
ReadWriteWeb: Netflix is offering $1 million to the team that can improve its recommendation algorithm by 10%. It's been over 2 years now, with the leading company at 9.63%. There is some skepticism, though, that 10% will be reached anytime soon, because now the contestants are making only incremental progress. Do you expect the 10% mark to be reached soon?
Satnam: Netflix's recommendation engine, Cinematch, uses an item-to-item algorithm (similar to Amazon's) with a number of heuristics. Given that Netflix' recommendation system has been very successful in the real world, it is pretty impressive that teams have been able to improve on it by as much as 9.63%. Of course, the Netflix competition doesn't take into account speed of implementation or the scalability of the approach. It simply focuses on the quality of recommendations in terms of closing the gap between user rating and predicted rating. So, it isn't clear whether Netflix will be able to leverage all of the innovation coming out of this competition. Also, the Netflix data doesn't contain much information to allow for a content-based approach; it's for this reason that teams are focusing on collaborative-based techniques.
The challenges to reaching the 10% mark are:
Skewed data: The data set for the competition consists of more than 100 million anonymous movie ratings, using a scale of one to five stars, made by 480,000 users for 17,770 movies. Note that the user-item data set for this problem is sparsely populated, with nearly 99% of user-item entries being zero. The distribution of movies per user is skewed. The median number of ratings per user is 93. About 10% of users rated 16 or fewer movies, while 25% of users rated 36 or fewer. Two users rated as many as 17,000 movies. Similarly, the ratings per movie are also skewed: almost half the user base rated one popular movie (Miss Congeniality); about 25% of movies had 190 or fewer ratings; and a handful of movies were rated fewer than 10 times.
The approach: The winning team, BellKor, spent more than 2,000 combined hours poring over data to find the winning solution. The winning solution was a linear combination of 107 sets of predictions. Many of the algorithms involved either the nearest-neighbor method (k-NN) or latent factor models, such as SVD/factorization and Restricted Boltzmann Machines (RBMs).
The winning solution uses k-NN to predict the rating for a user, using both the Pearson-r correlation and cosine methods to compute the similarities, with corrections to remove item-specific and user-specific biases. Latent semantic models are also widely used in the winning solution.
The BellKor team found it important to use a variety of models that compensated for each other's shortcomings. No one model alone could have gotten the BellKor team to the top of the competition. The combined set of models achieved an improvement of 8.43% over Cinematch, while the best model -- a hybrid of k-NN applied to output from RBMs -- improved the result by 6.43%. The biggest improvement by LSI methods was 5.1%, with the best pure k-NN model scoring below that. (K for the k-NN methods was in the range of 20 to 50.) The BellKor team also applied a number of heuristics to further improve the results.
The BellKor team demonstrates a number of guidelines for building a winning solution to this kind of competition:
- Combining complementary models helps improve the overall solution. Note that a linear combination of three models, one each for k-NN, LSI, and RBM, would have yielded fairly good results, an improvement of 7.58%.
- A principled approach is needed to optimize the solution.
- The key to winning is building models that can accurately predict when there is sufficient data, without over-applying in the absence of adequate data.
The final solution will be along the same lines, combining multiple models with heuristics. Contestants will probably reach the magic 10% mark in the next year or two.
ReadWriteWeb: Some people think the 10% mark can't be reached with algorithms alone, but that the "human" element is required. For example, ClerkDogs is a service that hires actual former video-store clerks to "create a database that is much richer and deeper than the collaborative filtering engines." It's a similar approach to that of Pandora, which has 50 employees who listen to and tag songs. How far do you think algorithms can go in making recommendations?
Satnam: Recommendation systems are not perfect. A number of elements go into making successful ones, including approach, the speed of computing results, heuristics, the exploration and exploitation of coefficients, and so on. But it has been shown in the real world that the more personalized you can make recommendations, the higher the click-through rate, the stickier the application, and the lower the bounce rate.
Using humans to form a rich database for recommendations may work for small applications, but it would probably be too expensive to scale. I don't see them competing against each other, human versus machine. Even with human/expert recommendations, one first needs to find a human/expert with tastes similar to those of the user, especially if you want to go after the long tail.
2009-02-07
推荐系统中的两种不同类型的用户
在设计推荐系统时,要充分考虑这两种用户的特点
2009-01-30
推荐系统的5个问题
http://www.readwriteweb.com/archives/5_problems_of_recommender_systems.php
1. Lack of Data 数据缺失
Perhaps the biggest issue facing recommender systems is that they need a lot of data to effectively make recommendations. It's no coincidence that the companies most identified with having excellent recommendations are those with a lot of consumer user data: Google, Amazon, Netflix, Last.fm.
推荐系统的最大问题是它们需要很大的数据量来做出好的推荐。毫无疑问,那些在推荐系统上做出很好结果的公司都是拥有大量用户数据的公司:Google, Amazon, Netflix, Last.fm.
2. Changing Data 数据变化
This issue was pointed out in ReadWriteWeb's comments by Paul Edmunds, CEO of 'intelligent recommendations' company Clicktorch. Paul commented that systems are usually "biased towards the old and have difficulty showing new".
An example of this was blogged by David Reinke of StyleHop, a resource and community for fashion enthusiasts. David noted that "past behavior [of users] is not a good tool because the trends are always changing" (emphasis ours). Clearly an algorithmic approach will find it difficult if not impossible to keep up with fashion trends. Most fashion-challenged people - I fall into that category - rely on trusted fashion-conscious friends and family to recommend new clothes to them.
David Reinke went on to say that "item recommendations don't work because there are simply too many product attributes in fashion and each attribute (think fit, price, color, style, fabric, brand, etc) has a different level of importance at different times for the same consumer." He did point out though that social recommenders may be able to 'solve' this problem.
3. Changing User Preferences 用户兴趣的变化
Again suggested by Paul Edmunds, the issue here is that while today I have a particular intention when browsing e.g. Amazon - tomorrow I might have a different intention. A classic example is that one day I will be browsing Amazon for new books for myself, but the next day I'll be on Amazon searching for a birthday present for my sister (actually I got her a gift card, but that's beside the point).
On the topic of user preferences, recommender systems may also incorrectly label users - a la this classic Wall St Journal story from 2002, If TiVo Thinks You Are Gay, Here's How to Set It Straight.
4. Unpredictable Items 不可预测的项目
In our post on the Netflix Prize, about the $1 Million prize offered by Netflix for a third party to deliver a collaborative filtering algorithm that will improve Netflix's own recommendations algorithm by 10%, we noted that there was an issue with eccentric movies. The type of movie that people either love or hate, such as Napoleon Dynamite. These type of items are difficult to make recommendations on, because the user reaction to them tends to be diverse and unpredictable.
Music is full of these items. Would you have guessed that this author is a fan of both Metallica and The Carpenters? I doubt Last.fm would make that recommendation.
5. This Stuff is Complex! 系统很复杂
We're stating the obvious here, but the below slide from Strands' presentation at Recked illustrates that it takes a lot of variables to do even the simplest recommendations (and we imagine the below variables only scratch the surface):
2009-01-19
Web研究方面的资源
我目前在做collaborative filtering & recommend system, 就是推荐系统,比如amazon里面推荐书,或者其他的推荐网站。
我做了一个网页,收集了这方面的一些研究资源:http://xlvector.googlepages.com/webcrawling
今年上半年要好好努力,争取做出点成绩。
越来越感到牛顿话的正确:成功的人都是站在前人的肩膀上的。
昨天给一个18岁的孩子写了一个成人赠言,我觉得也适合用来勉励自己:
有信心!有决心!有恒心!
2009-01-15
今年的web方面的会议
Workshop proposal submission: January 15, 2009
Electronic submission of full papers: April 10, 2009
Tutorial proposal submission: April 10, 2009
Workshop paper submission: April 30, 2009
Notification of paper acceptance: June 3, 2009
Camera-ready copies of accepted papers: June 30, 2009
Workshops: September 15, 2009
Conference: September 15-18, 2009
ECIR2010
10 Sep 2009: Workshop/tutorial submission deadline
01 Oct 2009: Paper submission deadline
22 Oct 2009: Poster and demo submission deadline
31 Oct 2009: Notification of acceptance (workshops/tutorials)
23 Nov 2009: Notification of acceptance (papers/posters/demos)
20 Dec 2009: Camera-ready copy
15 Jan 2010: Registration open
01 Feb 2010: Author registration
WISE2009
Workshop proposals February 28, 2009
Full paper submission May 1, 2009
Tutorial/Panel proposals May 31, 2009
Author notification June 15, 2009
Camera-ready submission June 28, 2009
Author registration June 28, 2009
Conference and Workshops October 5-7, 2009
ACM Recommender Systems 2009
Tutorial Proposals: April 10, 2009
Paper Submission: May 8, 2009
Workshop Proposals: May 15, 2009
Doctoral Symposium Applications: June 8, 2009
Paper Acceptance Notifications: June 19, 2009
Conference: October 22-25, 2009
Doctoral Symposium: October 22
Tutorials: October 22
Technical Program: October 23-24
Workshops: October 25
2009-01-14
常见爬虫分析
很多人说,爬虫有什么了不起的,不就是个广度优先搜索嘛!现在也有各种各样的开源爬虫,拿过来直接用挺方便,不过实际的网络爬虫确实不那么简单。
一般来说,好的爬虫要满足两个条件:(1)首先爬比较著名的网站,比如sina,qq,啥的(2)在更新的时候,首先更新比较著名的网站,比如sina,qq啥的。这两个条件很容易理解,因为这些网站受到很多用户的注意,所以先爬先更新是应该的。
那么我要问,互联网上这么多网站,你怎么知道哪些网站是著名的?有人要说,不是有著名的pagerank算法吗,他可以基于链接关系给出网页的排名。那么我又要问这帮人,现在网页还没有爬下来,你怎么运行你的pagerank算法。这个问题就不那么好解决了。
于是,一个牛人说,我们可以这样,我们先用广度优先搜索爬一段时间,然后得出一个网页集合,然后用pagerank算法对这个集合中的那些未爬网页(就是这个集合中的网页会指向很多网页,这些网页很多没爬)排名,然后选择排名最高的网页先爬。这个就是启发式爬虫的思想,简单的说,他是用一个优先级队列,而不是广度优先爬虫的FIFO队列来存储待爬的网页,然后用爬下来的网页直接的链接结构启发式的估计待爬网页的排名。
这个爬虫的想法不错,不过还是有效率问题,我们知道,pagerank是一个迭代算法,如果网页多了,算起来还是很费事的。
一个牛人看到这个问题,他重新研究了广度优先爬虫,他忽然发现,广度优先爬虫能够自然的倾向于爬著名的网站,这是因为著名的网站有大量的反向链接,所以很容易从别的网页发现著名的网站,从而著名的网站能够比较早的爬下来。广度优先爬虫的这个属性实在是太好了,不过尽管他有这个倾向,但是他在爬著名网站这个问题上的效率还是不如上面的那种爬虫,也就是说,他首先爬下来的网页中,还是有大量的不重要的网页。
所以说,如果我们有一个方法,能够像广度优先爬虫那么自然的对著名网页有一种倾向,而且这个倾向非常大。那么这中爬虫无疑是很高效的,不用每次都算PageRank就能自动的先爬著名的网站,这种爬虫当然很好很强大。
我的这篇文章,主要就是谈如何设计这种爬虫。
2009-01-13
popular网页的两个基于链接的特征
1. popular的网页和大多数网页的距离都很近
2. popular的网页和其他的网页连接很紧密
PageRank算法在上面两个特征上都有所反应,但反应的不够。
2009-01-12
PageRank算法的几个主要问题
不过这个算法还是有很多问题。
我们举一个例子,比如下图。黄色的网页被很多蓝色的网页指向,但是这些蓝色的网页只被少数的橙色的网页指向。也就是说,如果没有这些橙色的网页,黄色的和蓝色的网页就和整个互联网不联通了。在这个例子中,黄色的网页会获得比较高的PageRank,但实际上,黄色的网页不应该有这么高的排名,因为他和整个互联网的联系是松散的。他的排名其实是被蓝色的网页提高上去的。
这是一个典型的link farm,也就是链接工厂的结构。黄色的网页叫target page,就是我们要提高rank的网页,而蓝色的网页是boost page,也就是用来提高rank的网页。
PageRank的最大问题,就是对链接工厂无能为力。但这是为什么呢?我们可以用pagerank的另一个解释来说明这个问题。很多研究表明,一个页面的pagerank是这样获得的:
我们知道,对一种网页,有很多从其他网页到他的路径。比如对网页x,有一条从y到x的路径,那么y就通过这条路径把自己的pagerank注入到x,路径越长,注入的rank就越多。那么,一个网页如果有很多比较短的到他的路径,这个网页就会有比较高的rank。但是,这里面有一个问题,那就是这些路径可能都是交叉的。比如上面的图中,所有到黄色顶点的路径都会在橙色顶点相交。而同时,link farm是一个稠密图,他里面的短路径是很多的。这就解释了pagerank为什么解决不了链接工厂的问题。
那么对于越多的SEO网站,我们有什么手段来发现他们呢?下面这个方法比较著名:
我们忽略那些特别短的路径,因为spam会从link farm中获得很多短路径,如果我们忽略掉特别短的路径对pagerank的影响,可以解决这个问题。
对于web graph中的一个网页v,我们把到他的距离为h的顶点数记为S(v,h)。
那么对于任何一个顶点v,S(v,h)关于h的分布是用来对v排名的一个重要手段,现在有很多这方面的研究。