Search Sciweavers | Sciweavers

563 search results - page 40 / 113

» Crawling the web for structured documents

159

Voted

TREC
2000

101views Information Technology» more TREC 2000»

Information Space Based on HTML Structure

15 years 5 months ago

Download trec.nist.gov

The main goal for the Information Space system for TREC9 was early precision. To facilitate this, an emphasis was placed on seeking matches from only the TITLE, H1, H2 and H3 tags...

Gregory B. Newby

claim paper

Read More »

140

click to vote

ITCC
2005
IEEE

105views Information Technology» more ITCC 2005»

Elimination of Redundant Information for Web Data Mining

15 years 9 months ago

Download eprints.utas.edu.au

These days, billions of Web pages are created with HTML or other markup languages. They only have a few uniform structures and contain various authoring styles compared to traditi...

Shakirah Mohd Taib, Soon-ja Yeom, Byeong Ho Kang

claim paper

Read More »

237

click to vote

SIGMOD
2009
ACM

219views Database» more SIGMOD 2009»

Hermes: a travel through semantics on the data web

16 years 4 months ago

Download www.aifb.uni-karlsruhe.de

The Web as a global information space is developing from a Web of documents to a Web of data. This development opens new ways for addressing complex information needs. Search is n...

Haofen Wang, Thomas Penin, Kaifeng Xu, Junquan Che...

claim paper

Read More »

149

click to vote

OHS
2001
Springer

141views Internet Technology» more OHS 2001»

Revisiting and Versioning in Virtual Special Reports

15 years 8 months ago

Download wwwis.win.tue.nl

Adaptation/personalization is one of the main issues for web applications and require large repositories. Creating adaptive web applications from these repositories requires to hav...

Sébastien Iksal, Serge Garlatti

claim paper

Read More »

116

click to vote

DOCENG
2007
ACM

121views Document Analysis» more DOCENG 2007»

Structure and content analysis for html medical articles: a hidden markov model approach

15 years 8 months ago

Download archive.nlm.nih.gov

We describe ongoing research on segmenting and labeling HTML medical journal articles. In contrast to existing approaches in which HTML tags usually serve as strong indicators, we...

Jie Zou, Daniel X. Le, George R. Thoma

claim paper

Read More »

« Prev « First page 40 / 113 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers