Sciweavers

910 search results - page 67 / 182
» Testbed for information extraction from deep web
Sort
View
CIKM
2005
Springer
14 years 1 months ago
Retrieving answers from frequently asked questions pages on the web
We address the task of answering natural language questions by using the large number of Frequently Asked Questions (FAQ) pages available on the web. The task involves three steps...
Valentin Jijkoun, Maarten de Rijke
ITCC
2002
IEEE
14 years 1 months ago
Web-Based Information Access: Multilingual Automatic Authoring
The needs for managing similar documents in different languages increases with the growing amounts of electronic information available in documents of the same type (e.g. news str...
Roberto Basili, Maria Teresa Pazienza, Fabio Massi...
SIGMOD
2009
ACM
140views Database» more  SIGMOD 2009»
14 years 2 months ago
Robust web extraction: an approach based on a probabilistic tree-edit model
On script-generated web sites, many documents share common HTML tree structure, allowing wrappers to effectively extract information of interest. Of course, the scripts and thus ...
Nilesh N. Dalvi, Philip Bohannon, Fei Sha
CIKM
2008
Springer
13 years 10 months ago
Characterizing and predicting community members from evolutionary and heterogeneous networks
Mining different types of communities from web data have attracted a lot of research efforts in recent years. However, none of the existing community mining techniques has taken i...
Qiankun Zhao, Sourav S. Bhowmick, Xin Zheng, Kai Y...
WWW
2010
ACM
14 years 3 months ago
CETR: content extraction via tag ratios
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
Tim Weninger, William H. Hsu, Jiawei Han