Sciweavers

309 search results - page 21 / 62
» Discovering informative content blocks from Web documents
Sort
View
IDEAS
2000
IEEE
98views Database» more  IDEAS 2000»
13 years 12 months ago
Keeping Web Pages Up-to-Date with SQL: 1999
From the beginnings of the World Wide Web (WWW or Web) and the definition of the Common Gateway Interface (CGI), Web site administrators have used dynamically generated HTML page...
Henrik Loeser
ELPUB
1998
ACM
13 years 11 months ago
Addressing Publishing Issues with Hypermedia Distributed on the Web
The content and structure of an electronically published document can be authored and processed in ways that allow for flexibility in presentation on different environments for di...
Lloyd Rutledge, Lynda Hardman, Jacco van Ossenbrug...
CIKM
2011
Springer
12 years 7 months ago
Simultaneous joint and conditional modeling of documents tagged from two perspectives
This paper explores correspondence and mixture topic modeling of documents tagged from two different perspectives. There has been ongoing work in topic modeling of documents with...
Pradipto Das, Rohini K. Srihari, Yun Fu
WWW
2010
ACM
14 years 2 months ago
CETR: content extraction via tag ratios
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
Tim Weninger, William H. Hsu, Jiawei Han
VLDB
2003
ACM
125views Database» more  VLDB 2003»
14 years 7 months ago
THESUS: Organizing Web document collections based on link semantics
Abstract. The requirements for effective search and management of the WWW are stronger than ever. Currently Web documents are classified based on their content not taking into acco...
Maria Halkidi, Benjamin Nguyen, Iraklis Varlamis, ...