With the advent of XML as the de facto language for data publishing and exchange, scalable distribution of XML data to large, dynamic populations of consumers remains an important...
Background: As the size of the known human interactome grows, biologists increasingly rely on computational tools to identify patterns that represent protein complexes and pathway...
In this paper, we develop multilingual supervised latent Dirichlet allocation (MLSLDA), a probabilistic generative model that allows insights gleaned from one language's data...
Large web search engines have to answer thousands of queries per second with interactive response times. Due to the sizes of the data sets involved, often in the range of multiple...
Exact substring matching queries on large data collections can be answered using q-gram indices, that store for each occurring q-byte pattern an (ordered) posting list with the po...