Sciweavers

62 search results - page 12 / 13
» Learning Page-Independent Heuristics for Extracting Data fro...
Sort
View
WSDM
2010
ACM
204views Data Mining» more  WSDM 2010»
14 years 2 months ago
Learning URL patterns for webpage de-duplication
Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...
Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...
SIGMOD
2006
ACM
107views Database» more  SIGMOD 2006»
14 years 7 months ago
Documentum ECI self-repairing wrappers: performance analysis
Documentum Enterprise Content Integration (ECI) services is a content integration middleware that provides one-query access to the Intranet and Internet content resources. The ECI...
Boris Chidlovskii, Bruno Roustant, Marc Brette
SSPR
2004
Springer
14 years 28 days ago
Optimizing Classification Ensembles via a Genetic Algorithm for a Web-Based Educational System
Classification fusion combines multiple classifications of data into a single classification solution of greater accuracy. Feature extraction aims to reduce the computational cost ...
Behrouz Minaei-Bidgoli, Gerd Kortemeyer, William F...
KDD
2010
ACM
277views Data Mining» more  KDD 2010»
13 years 11 months ago
Growing a tree in the forest: constructing folksonomies by integrating structured metadata
Many social Web sites allow users to annotate the content with descriptive metadata, such as tags, and more recently to organize content hierarchically. These types of structured ...
Anon Plangprasopchok, Kristina Lerman, Lise Getoor
WWW
2010
ACM
14 years 2 months ago
Large-scale bot detection for search engines
In this paper, we propose a semi-supervised learning approach for classifying program (bot) generated web search traffic from that of genuine human users. The work is motivated by...
Hongwen Kang, Kuansan Wang, David Soukal, Fritz Be...