Sciweavers

945 search results - page 8 / 189
» Information Extraction from HTML: Application of a General M...
Sort
View
AAAI
2007
13 years 10 months ago
Template-Independent News Extraction Based on Visual Consistency
Wrapper is a traditional method to extract useful information from Web pages. Most previous works rely on the similarity between HTML tag trees and induced template-dependent wrap...
Shuyi Zheng, Ruihua Song, Ji-Rong Wen
CICLING
2009
Springer
14 years 8 months ago
Business Specific Online Information Extraction from German Websites
This paper presents a system that uses the domain name of a German business website to locate its information pages (e.g. company profile, contact page, imprint) and then identifi...
Yeong Su Lee, Michaela Geierhos
EVOW
2008
Springer
13 years 9 months ago
DEEPER: A Full Parsing Based Approach to Protein Relation Extraction
Abstract. Lexical variance in biomedical texts poses a challenge to automatic protein relation mining. We therefore propose a new approach that relies only on more general language...
Timur Fayruzov, Martine De Cock, Chris Cornelis, V...
SIGMOD
2009
ACM
140views Database» more  SIGMOD 2009»
14 years 2 months ago
Robust web extraction: an approach based on a probabilistic tree-edit model
On script-generated web sites, many documents share common HTML tree structure, allowing wrappers to effectively extract information of interest. Of course, the scripts and thus ...
Nilesh N. Dalvi, Philip Bohannon, Fei Sha
ERCIMDL
2010
Springer
180views Education» more  ERCIMDL 2010»
13 years 4 months ago
SciPlore Xtract: Extracting Titles from Scientific PDF Documents by Analyzing Style Information (Font Size)
Extracting titles from a PDFs full text is an important task in information retrieval to identify PDFs. Existing approaches apply complicated and expensive (in terms of calculating...
Jöran Beel, Bela Gipp, Ammar Shaker, Nick Fri...