Sciweavers

1052 search results - page 22 / 211
» Improved CHAID algorithm for document structure modelling
Sort
View
ISAAC
2005
Springer
120views Algorithms» more  ISAAC 2005»
14 years 2 months ago
Improved Algorithms for Largest Cardinality 2-Interval Pattern Problem
Abstract The 2-Interval Pattern problem is to find the largest constrained pattern in a set of 2-intervals. The constrained pattern is a subset of the given 2-intervals such that ...
Hao Yuan, Linji Yang, Erdong Chen
SIGMOD
2009
ACM
140views Database» more  SIGMOD 2009»
14 years 4 months ago
Robust web extraction: an approach based on a probabilistic tree-edit model
On script-generated web sites, many documents share common HTML tree structure, allowing wrappers to effectively extract information of interest. Of course, the scripts and thus ...
Nilesh N. Dalvi, Philip Bohannon, Fei Sha
CIKM
2009
Springer
14 years 3 months ago
Effective and efficient structured retrieval
Search engines that support structured documents typically support structure created by the author (e.g., title, section), and may also support structure added by an annotation pr...
Le Zhao, Jamie Callan
CIKM
2009
Springer
14 years 3 months ago
Improving web page classification by label-propagation over click graphs
In this paper, we present a semi-supervised learning method for web page classification, leveraging click logs to augment training data by propagating class labels to unlabeled si...
Soo-Min Kim, Patrick Pantel, Lei Duan, Scott Gaffn...
TREC
2004
13 years 10 months ago
Language Models for Searching in Web Corpora
: We describe our participation in the TREC 2004 Web and Terabyte tracks. For the web track, we employ mixture language models based on document full-text, incoming anchortext, and...
Jaap Kamps, Gilad Mishne, Maarten de Rijke