An approach to postal address detection from webpages is proposed. The webpages are first segmented into text blocks based on their visual similarity. The text content in each bl...
Fitting enough information from webpages to make browsing on small screens compelling is a challenging task. One approach is to present the user with a thumbnail image of the full...
We demonstrate the usefulness of the uniform resource locator (URL) alone in performing web page classification. This approach is magnitudes faster than typical web page classific...
We investigate the novel problem of event recognition from news webpages. "Events" are basic text units containing news elements. We observe that a news article is always...
We develop a novel framework for the page-level template detection problem. Our framework is built on two main ideas. The first is the automatic generation of training data for a ...