: Business Process Re-engineering (BPR) is an area that requires a lot of technical documents and an important feature of a well-written document is a coherent narrative. Even thou...
Numerous approaches, including textual, structural and featural, to detecting duplicate documents have been investigated. Considering document images are usually stored and transm...
We present a mixture model based approach for learning individualized behavior models for the Web users. We investigate the use of maximum entropy and Markov mixture models for ge...
Abstract We introduce OCELOT, a prototype system for automatically generating the “gist” of a web page by summarizing it. Although most text summarization research to date has ...
Deriving a thematically meaningful partition of an unlabeled document corpus is a challenging task. In this context, the use of document representations based on latent thematic ge...