This paper develops a general, formal framework for modeling term dependencies via Markov random fields. The model allows for arbitrary text features to be incorporated as eviden...
Inverted index structures are the mainstay of modern text retrieval systems. They can be constructed quickly using off-line mergebased methods, and provide efficient support for ...
Web pages contain a combination of unique content and template material, which is present across multiple pages and used primarily for formatting, navigation, and branding. We stu...
Facing the retrieval problem according to the overwhelming set of documents online the adaptation of text categorization to web units has recently been pushed. The aim is to utiliz...
Hidden Markov models (HMMs) are powerful statistical models that have found successful applications in Information Extraction (IE). In current approaches to applying HMMs to IE, a...