It is observed that a better approach to Web information understanding is to base on its document framework, which is mainly consisted of (i) the title and the URL name of the pag...
The Dirichlet compound multinomial (DCM) distribution, also called the multivariate Polya distribution, is a model for text documents that takes into account burstiness: the fact ...
This paper considers the problem of identifying on the Web compound documents (cDocs) ? groups of web pages that in aggregate constitute semantically coherent information entities...
To maintain interoperability in the Web environment it is necessary to comply with Web standards. Current specifications of HTML and XHTML languages define conformance conditions ...
applications parts. Programmers can easily implement new parts by using abstract classes of the framework. HotDoc is implemented in VisualWorks Smalltalk.