Modeling human behavior requires vast quantities of accurately labeled training data, but for ubiquitous people-aware applications such data is rarely attainable. Even researchers...
Daniel Peebles, Hong Lu, Nicholas D. Lane, Tanzeem...
Existing HTML mark-up is used only to indicate the structure and lay-out of documents, but not the document semantics. As a result web documents are difficult to be semantically p...
In this paper, we describe a system to perform Document Image Retrieval in Digital Libraries. The system allows users to retrieve digitized pages on the basis of layout similaritie...
Web-based information systems provide to their users the ability to interleave querying and browsing during their information discovery efforts. The MIX system provides an API cal...
Web pages contain clutter (such as ads, unnecessary images and extraneous links) around the body of an article, which distracts a user from actual content. Extraction of "use...