More and more documents on the World Wide Web are based on templates. On a technical level this causes those documents to have a quite similar source code and DOM tree structure. G...
Abstract. We show that several previously proposed passage-based document ranking principles, along with some new ones, can be derived from the same probabilistic model. We use lan...
We present a new approach to automatic summarization based on neural nets, called NetSum. We extract a set of features from each sentence that helps identify its importance in the...
Krysta Marie Svore, Lucy Vanderwende, Christopher ...
Abstract. Relevance feedback, which uses the terms in relevant documents to enrich the user’s initial query, is an effective method for improving retrieval performance. An assoc...
By exploiting the object-oriented dynamic composability of modern document applications and formats, malcode hidden in otherwise inconspicuous documents can reach third-party appli...
Wei-Jen Li, Salvatore J. Stolfo, Angelos Stavrou, ...
Software tools are used to compare multiple versions of a textual document to help a reader understand the evolution of that document over time. These tools generally support the ...
We present SHIRI-Annot an automatic ontology-driven and unsupervised approach for the semantic annotation of documents which contain well structured parts and not well structured o...
Abstract. In semantic web applications where query initiators and information providers do not necessarily share the same ontology, semantic interoperability generally relies on on...
Anthony Ventresque, Sylvie Cazalens, Philippe Lama...
Knowledge work in many fields requires examining several aspects of a collection of documents to attain meaningful understanding that is not explicitly available. Despite recent ad...
Different dialects of XML have emerged as ubiquitous document exchange formats. For effective collaboration based on such documents, the capability to propagate edit operations pe...