Papers on Smoothing and Using Click-through Data

Random Walks on the Click Graph, SIGIR 07 by MSR Cambridge.

This paper generally describes that random walk is effective in smoothing the noisy and sparse click log data. One thing makes me confusing is the “self-transition probability” (in Section 3.1), which,  I think, could be renamed as “exiting probability” as in PageRank for intuition.

Optimizing Web Search Using Web Click-through Data, CIKM ’04 by MSR Asia.

This paper does not follow random walk naively; instead, it considers co-citation relationship between queries (as well between urls), and proposed an iterative algorithm working with such co-citation relationship.  However, there isn’t a consistent math model for this algorithm.  I am wondering maybe we can derive the method by random walking on the co-citation graph (instead of the query-url bipartite graph).

Click-Through Rate Estimation for Rare Events in Online Advertising, by Yahoo! Labs.

In this paper, two methods were visited: (1) learning an optimal beta prior (or beta smoothing factor) by considering clicking as a Bernoulli experiment, and (2) the well-known exponential smoothing method.

Smoothing Click-through Data for Web Search Ranking, SIGIR’09 by MSR Redmond

This paper is about smoothing click-through data for feature construction in learning-to-rank. Two methods were proposed: (1) random walk, and (2) a discounting method inspired by the Good-Turing method used in statistical language model.

Using the Wisdom of the Crowds for Keyword Generation, WWW’08 by MSR Mountain View

This paper describes a modified random walk algorithm to categorize queries (via urls) to pre-defined “concepts”.  The algorithm is bootstrapped by seeding each concept some urls.  And if concepts (or categories) are mutual exclusive, the algorithm can be accelerated.