In the BlueView project, digital library services are developed and partially implemented based on the architecture of virtual document servers. Using standard tools like fulltext...
Andreas Heuer, Holger Meyer, Beate Porst, Patrick ...
As the rapid growth of PDF document in digital libraries, recognizing the document structure and detecting specific document components are useful for document storage, classifica...
Topic modeling has been a key problem for document analysis. One of the canonical approaches for topic modeling is Probabilistic Latent Semantic Indexing, which maximizes the join...
Deng Cai, Qiaozhu Mei, Jiawei Han, Chengxiang Zhai
Given a string s, the Parikh vector of s, denoted p(s), counts the multiplicity of each character in s. Searching for a match of Parikh vector q (a “jumbled string”) in the tex...
Peter Burcsi, Ferdinando Cicalese, Gabriele Fici, ...