Sciweavers

5766 search results - page 976 / 1154
» Reverse Engineering of Data
Sort
View
WWW
2007
ACM
16 years 5 months ago
U-REST: an unsupervised record extraction system
In this paper, we describe a system that can extract record structures from web pages with no direct human supervision. Records are commonly occurring HTML-embedded data tuples th...
Yuan Kui Shen, David R. Karger
WWW
2006
ACM
16 years 5 months ago
XML screamer: an integrated approach to high performance XML parsing, validation and deserialization
This paper describes an experimental system in which customized high performance XML parsers are prepared using parser generation and compilation techniques. Parsing is integrated...
Margaret Gaitatzes Kostoulas, Morris Matsa, Noah M...
WWW
2006
ACM
16 years 5 months ago
Detecting spam web pages through content analysis
In this paper, we continue our investigations of "web spam": the injection of artificially-created pages into the web in order to influence the results from search engin...
Alexandros Ntoulas, Marc Najork, Mark Manasse, Den...
WWW
2006
ACM
16 years 5 months ago
Beyond PageRank: machine learning for static ranking
Since the publication of Brin and Page's paper on PageRank, many in the Web community have depended on PageRank for the static (query-independent) ordering of Web pages. We s...
Matthew Richardson, Amit Prakash, Eric Brill
WWW
2005
ACM
16 years 5 months ago
A publish and subscribe collaboration architecture for web-based information
Markup languages, representations, schemas, and tools have significantly increased the ability for organizations to share their information. Languages, such as the Extensible Mark...
M. Brian Blake, David H. Fado, Gregory A. Mack