Sciweavers

2677 search results - page 34 / 536
» Extracting Structured Data from Web Pages
Sort
View
ECIR
2009
Springer
13 years 6 months ago
PathRank: Web Page Retrieval with Navigation Path
Abstract. This paper describes a path-based method to use the multi-step navigation information discovered from website structures for web page ranking. Use of hyperlinks to enhanc...
Jianqiang Li, Yu Zhao 0002
DATESO
2009
105views Database» more  DATESO 2009»
13 years 6 months ago
From Web Pages to Web Communities
In this paper we are looking for a relationship between the intent of Web pages, their architecture and the communities who take part in their usage and creation. From our point of...
Milos Kudelka, Václav Snásel, Zdenek...
DEXAW
2008
IEEE
123views Database» more  DEXAW 2008»
14 years 3 months ago
Text Extraction from the Web via Text-to-Tag Ratio
– We describe a method to extract content text from diverse Web pages by using the HTML document’s Text-to-Tag Ratio rather than specific HTML cues that may not be constant acr...
Tim Weninger, William H. Hsu
ICDM
2002
IEEE
143views Data Mining» more  ICDM 2002»
14 years 1 months ago
Automatic Web Page Classification in a Dynamic and Hierarchical Way
Automatic classification of web pages is an effective way to deal with the difficulty of retrieving information from the Internet. Although there are many automatic classification...
Xiaogang Peng, Ben Choi
LREC
2008
108views Education» more  LREC 2008»
13 years 10 months ago
A Lightweight and Efficient Tool for Cleaning Web Pages
Originally conceived as a "naive" baseline experiment using traditional n-gram language models as classifiers, the NCLEANER system has turned out to be a fast and lightw...
Stefan Evert