Sciweavers

309 search results - page 28 / 62
» Discovering informative content blocks from Web documents
Sort
View
WWW
2001
ACM
14 years 8 months ago
Towards second and third generation web-based multimedia
First generation Web-content encodes information in handwritten (HTML) Web pages. Second generation Web content generates HTML pages on demand, e.g. by filling in templates with c...
Jacco van Ossenbruggen, Joost Geurts, Frank Cornel...
CLEF
2005
Springer
14 years 1 months ago
EuroGOV: Engineering a Multilingual Web Corpus
EuroGOV is a multilingual web corpus that was created to serve as the document collection for WebCLEF, the CLEF 2005 web retrieval task. EuroGOV is a collection of web pages crawl...
Börkur Sigurbjörnsson, Jaap Kamps, Maart...
JCDL
2006
ACM
167views Education» more  JCDL 2006»
14 years 1 months ago
Combining DOM tree and geometric layout analysis for online medical journal article segmentation
We describe an HTML web page segmentation algorithm, which is applied to segment online medical journal articles (regular HTML and PDF-Converted-HTML files). The web page content ...
Jie Zou, Daniel X. Le, George R. Thoma
WWW
2008
ACM
14 years 8 months ago
Learning to classify short and sparse text & web with hidden topics from large-scale data collections
This paper presents a general framework for building classifiers that deal with short and sparse text & Web segments by making the most of hidden topics discovered from larges...
Xuan Hieu Phan, Minh Le Nguyen, Susumu Horiguchi
WWW
2005
ACM
14 years 8 months ago
WEESA: Web engineering for semantic Web applications
The success of the Semantic Web crucially depends on the existence of Web pages that provide machine-understandable meta-data. This meta-data is typically added in the semantic an...
Gerald Reif, Harald Gall, Mehdi Jazayeri