Text classification categories Web documents in large collections into predefined classes based on their contents. Unfortunately, the classification process can be time-consumi...
Recent work on incremental crawling has enabled the indexed document collection of a search engine to be more synchronized with the changing World Wide Web. However, this synchron...
Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey...
In the AllRight project, we are developing an algorithm for unsupervised table detection and segmentation that uses the visual rendition of a Web page rather than the HTML code. O...
There have been recent improvements in document technologies like the standardization of object interfaces to access and manipulate the properties of web documents. There has also...
To improve the scalability of the Web it is common practice to apply caching and replication techniques. Numerous strategies for placing and maintaining multiple copies of Web doc...
Guillaume Pierre, Maarten van Steen, Andrew S. Tan...