Sciweavers

391 search results - page 20 / 79
» Finding and Extracting Data Records from Web Pages
Sort
View
IPM
2006
146views more  IPM 2006»
13 years 8 months ago
Dictionary-based text categorization of chemical web pages
A new dictionary-based text categorization approach is proposed to classify the chemical web pages efficiently. Using a chemistry dictionary, the approach can extract chemistry-re...
Chunyan Liang, Li Guo, Zhaojie Xia, Feng-Guang Nie...
APWEB
2006
Springer
14 years 5 days ago
Image Description Mining and Hierarchical Clustering on Data Records Using HR-Tree
Since we can hardly get semantics from the low-level features of the image, it is much more difficult to analyze the image than textual information on the Web. Traditionally, textu...
Congle Zhang, Sheng Huang, Gui-Rong Xue, Yong Yu
WWW
2008
ACM
14 years 9 months ago
Efficient similarity joins for near duplicate detection
With the increasing amount of data and the need to integrate data from multiple data sources, a challenging issue is to find near duplicate records efficiently. In this paper, we ...
Chuan Xiao, Wei Wang 0011, Xuemin Lin, Jeffrey Xu ...
AIIA
2003
Springer
14 years 1 months ago
Preprocessing and Mining Web Log Data for Web Personalization
We describe the web usage mining activities of an on-going project, called ClickWorld3 , that aims at extracting models of the navigational behaviour of a web site users. The model...
Miriam Baglioni, U. Ferrara, Andrea Romei, Salvato...
WWW
2011
ACM
13 years 3 months ago
HyLiEn: a hybrid approach to general list extraction on the web
We consider the problem of automatically extracting general lists from the web. Existing approaches are mostly dependent upon either the underlying HTML markup or the visual struc...
Fabio Fumarola, Tim Weninger, Rick Barber, Donato ...