Quarterly (March, June, September, December)
160 pp. per issue
6 3/4 x 10
2014 Impact factor:

Computational Linguistics

Hwee Tou Ng, Editor
September 2013, Vol. 39, No. 3, Pages 709-754
(doi: 10.1162/COLI_a_00148)
© 2013 Association for Computational Linguistics
Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction
Article PDF (313.61 KB)

Web search result clustering aims to facilitate information search on the Web. Rather than the results of a query being presented as a flat list, they are grouped on the basis of their similarity and subsequently shown to the user as a list of clusters. Each cluster is intended to represent a different meaning of the input query, thus taking into account the lexical ambiguity (i.e., polysemy) issue. Existing Web clustering methods typically rely on some shallow notion of textual similarity between search result snippets, however. As a result, text snippets with no word in common tend to be clustered separately even if they share the same meaning, whereas snippets with words in common may be grouped together even if they refer to different meanings of the input query.

In this article we present a novel approach to Web search result clustering based on the automatic discovery of word senses from raw text, a task referred to as Word Sense Induction. Key to our approach is to first acquire the various senses (i.e., meanings) of an ambiguous query and then cluster the search results based on their semantic similarity to the word senses induced. Our experiments, conducted on data sets of ambiguous queries, show that our approach outperforms both Web clustering and search engines.