Figures in digital documents contain important information. Current digital libraries do not summarize and index information available within figures for document retrieval. We pr...
Xiaonan Lu, James Ze Wang, Prasenjit Mitra, C. Lee...
Abstract. We study the problem of authenticating the content and creation time of documents generated by an organization and retained in archival storage. Recent regulations (e.g.,...
Document classification presents difficult challenges due to the sparsity and the high dimensionality of text data, and to the complex semantics of the natural language. The tradi...
Text clustering methods can be used to structure large sets of text or hypertext documents. The well-known methods of text clustering, however, do not really address the special p...
Opinion finding is a challenging retrieval task, where it has been shown that it is especially difficult to improve over a strongly performing topic-relevance baseline. In this pa...
Rodrygo L. T. Santos, Ben He, Craig Macdonald, Iad...