This paper deals with studies the problem of identification and extraction of flat and nested data records from a given web page. With the explosive growth of information sources ...
Internet content today is about 80% text-based. No matter static or dynamic, the information is encoded and presented as multilingual, unstructured natural language text pages. As ...
Pavlin Dobrev, Albena Strupchanska, Galia Angelova
A typical problem for webdesigners is to realize pages that can be potentially accessed from a number of display devices with different screen sizes and resolutions. Liquid layouts...
We developed and tested a heuristic technique for extracting the main article from news site Web pages. We construct the DOM tree of the page and score every node based on the amo...
Except for a handful of "mobile" Web sites, the Web is designed for browsing with personal computers with large screens capable of fitting the content of most Web pages....