Hypergeometric Language Model and Zipf-Like Scoring Function for Web Document Similarity Retrieval

13 years 11 months ago

Download wi.dii.uchile.cl

The retrieval of similar documents in the Web from a given document is diﬀerent in many aspects from information retrieval based on queries generated by regular search engine users. Thus, document similarity retrieval is not covered by traditional search engines as their learning mechanisms are optimized with respect to small queries of speciﬁc words selected by end users. In this work, a new method is proposed for Web similarity document retrieval based on generative language models and meta search engines. Probabilistic language models are used as a random query generator for the given document, whose generated queries aims to represent its relevant information. Queries are submitted to a customizable set of Web search engines. Once all results obtained are gathered, its evaluation is determined by a proposed scoring function based on the Zipf law. Results obtained showed that the proposed methodology for query generation and scoring procedure solves the problem with acceptable l...

Felipe Bravo-Marquez, Gaston L'Huillier, Sebasti&a

Real-time Traffic

Document | Information Technology | Language Models | Search Engines | SPIRE 2010 |

claim paper

Post Info
More Details (n/a)

Added	30 Jan 2011
Updated	30 Jan 2011
Type	Journal
Year	2010
Where	SPIRE
Authors	Felipe Bravo-Marquez, Gaston L'Huillier, Sebastián A. Ríos, Juan D. Velásquez

Comments (0)

Sciweavers

Hypergeometric Language Model and Zipf-Like Scoring Function for Web Document Similarity Retrieval

Document | Information Technology | Language Models | Search Engines | SPIRE 2010 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers