Sciweavers

263 search results - page 30 / 53
» Re-engineering structures from Web documents
Sort
View
JCDL
2006
ACM
167views Education» more  JCDL 2006»
14 years 1 months ago
Combining DOM tree and geometric layout analysis for online medical journal article segmentation
We describe an HTML web page segmentation algorithm, which is applied to segment online medical journal articles (regular HTML and PDF-Converted-HTML files). The web page content ...
Jie Zou, Daniel X. Le, George R. Thoma
JCDL
2010
ACM
188views Education» more  JCDL 2010»
14 years 19 days ago
Exposing the hidden web for chemical digital libraries
In recent years, the vast amount of digitally available content has lead to the creation of many topic-centered digital libraries. Also in the domain of chemistry more and more di...
Sascha Tönnies, Benjamin Köhncke, Oliver...
WEBI
2005
Springer
14 years 1 months ago
A Semi-Supervised Document Clustering Algorithm Based on EM
Document clustering is a very hard task in Automatic Text Processing since it requires to extract regular patterns from a document collection without a priori knowledge on the cat...
Leonardo Rigutini, Marco Maggini
ADC
2006
Springer
130views Database» more  ADC 2006»
14 years 1 months ago
A two-phase rule generation and optimization approach for wrapper generation
Web information extraction is a fundamental issue for web information management and integrations. A common approach is to use wrappers to extract data from web pages or documents...
Yanan Hao, Yanchun Zhang
DAGSTUHL
2006
13 years 9 months ago
Are we Ready to Embrace the Semantic Web?
action from low level features to high level semantics. Owing to the proliferation of multimedia content in the internet, there is widespread interest in the semantic web community...
Shankar Vembu, Stephan Baumann