User generated content is characterized by short, noisy documents, with many spelling errors and unexpected language usage. To bridge the vocabulary gap between the user's in...
Wouter Weerkamp, Krisztian Balog, Maarten de Rijke
SGML standardized in ISO 8879 [International Organization for Standardization (1986)] has been proliferated because it can provide various styles and transform documents on dieren...
The availability of a document’s logical structure in XML retrieval allows retrieval systems to return document portions (elements) instead of whole documents. This helps searche...
Existing HTML mark-up is used only to indicate the structure and lay-out of documents, but not the document semantics. As a result web documents are difficult to be semantically p...
Search engines that support structured documents typically support structure created by the author (e.g., title, section), and may also support structure added by an annotation pr...