We demonstrate the usefulness of the uniform resource locator (URL) alone in performing web page classification. This approach is magnitudes faster than typical web page classific...
In this paper we present a novel model based approach to detect severely broken parallel lines in noisy textual documents. It is important to detect and remove these lines so the ...
Because a hypermedia document is more complex than conventional text, it requires preparation with respect to two key aspects. First, the author begins to develop a "vision&q...
Takeshi Shimizu, Stephen W. Smoliar, John S. Borec...
This paper compares several indexing methods for person names extracted from text, developed for an information retrieval system with requirements for fast approximate matching of...
We present a probabilistic generative model of visual attributes, together with an efficient learning algorithm. Attributes are visual qualities of objects, such as ‘red’, ...