Sciweavers

2677 search results - page 74 / 536
» Extracting Structured Data from Web Pages
Sort
View
CEAS
2011
Springer
12 years 8 months ago
Spam detection using web page content: a new battleground
Traditional content-based e-mail spam filtering takes into account content of e-mail messages and apply machine learning techniques to infer patterns that discriminate spams from...
Marco Túlio Ribeiro, Pedro Henrique Calais ...
TREC
2003
13 years 10 months ago
Combining Structural Information and the Use of Priors in Mixed Named-Page and Homepage Finding
This paper presents Carnegie Mellon University’s experiments on the mixed named-page and homepage finding task of the TREC 12 Web Track. Our results were strong; we achieved the...
Paul Ogilvie, Jamie Callan
STACS
2009
Springer
14 years 3 months ago
A Comparison of Techniques for Sampling Web Pages
As the World Wide Web is growing rapidly, it is getting increasingly challenging to gather representative information about it. Instead of crawling the web exhaustively one has to...
Eda Baykan, Monika Rauch Henzinger, Stefan F. Kell...
ER
2009
Springer
167views Database» more  ER 2009»
14 years 3 months ago
FOCIH: Form-Based Ontology Creation and Information Harvesting
Creating an ontology and populating it with data are both labor-intensive tasks requiring a high degree of expertise. Thus, scaling ontology creation and population to the size of ...
Cui Tao, David W. Embley, Stephen W. Liddle
ICML
2007
IEEE
14 years 9 months ago
Dynamic hierarchical Markov random fields and their application to web data extraction
Hierarchical models have been extensively studied in various domains. However, existing models assume fixed model structures or incorporate structural uncertainty generatively. In...
Jun Zhu, Zaiqing Nie, Bo Zhang, Ji-Rong Wen