Sciweavers

708 search results - page 40 / 142
» Identifying Content Blocks from Web Documents
Sort
View
SIGMOD
2003
ACM
142views Database» more  SIGMOD 2003»
14 years 8 months ago
Winnowing: Local Algorithms for Document Fingerprinting
Digital content is for copying: quotation, revision, plagiarism, and file sharing all create copies. Document fingerprinting is concerned with accurately identifying copying, incl...
Saul Schleimer, Daniel Shawcross Wilkerson, Alexan...
WWW
2007
ACM
14 years 9 months ago
Efficient Update of Indexes for Dynamically Changing Web Documents
Recent work on incremental crawling has enabled the indexed document collection of a search engine to be more synchronized with the changing World Wide Web. However, this synchron...
Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey...
WWW
2011
ACM
13 years 3 months ago
Web scale NLP: a case study on url word breaking
This paper uses the URL word breaking task as an example to elaborate what we identify as crucialin designingstatistical natural language processing (NLP) algorithmsfor Web scale ...
Kuansan Wang, Christopher Thrasher, Bo-June Paul H...
ICWE
2003
Springer
14 years 1 months ago
The Cooperative Web: A Step towards Web Intelligence
The Web is mainly processed by humans. The role of the machines is just to transmit and display the contents of the documents, barely being able to do something else. Nowadays ther...
Daniel Gayo-Avello, Darío Álvarez Gu...
ADAPTIVE
2007
Springer
14 years 2 months ago
Web Document Modeling
A very common issue of adaptive Web-Based systems is the modeling of documents. Such documents represent domain-specific information for a number of purposes. Application areas su...
Alessandro Micarelli, Filippo Sciarrone, Mauro Mar...