Sciweavers

511 search results - page 77 / 103
» Discovering data dependencies in Web content mining
Sort
View
WSDM
2010
ACM
204views Data Mining» more  WSDM 2010»
14 years 5 months ago
Learning URL patterns for webpage de-duplication
Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...
Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...
DMKD
1997
ACM
198views Data Mining» more  DMKD 1997»
14 years 3 months ago
Clustering Based On Association Rule Hypergraphs
Clustering in data mining is a discovery process that groups a set of data such that the intracluster similarity is maximized and the intercluster similarity is minimized. These d...
Eui-Hong Han, George Karypis, Vipin Kumar, Bamshad...
SIGIR
2009
ACM
14 years 5 months ago
Web derived pronunciations for spoken term detection
Indexing and retrieval of speech content in various forms such as broadcast news, customer care data and on-line media has gained a lot of interest for a wide range of application...
Dogan Can, Erica Cooper, Arnab Ghoshal, Martin Jan...
SSDBM
2008
IEEE
149views Database» more  SSDBM 2008»
14 years 5 months ago
Query Planning for Searching Inter-dependent Deep-Web Databases
Increasingly, many data sources appear as online databases, hidden behind query forms, thus forming what is referred to as the deep web. It is desirable to have systems that can pr...
Fan Wang, Gagan Agrawal, Ruoming Jin
SAC
2005
ACM
14 years 4 months ago
Automatic extraction of informative blocks from webpages
Search engines crawl and index webpages depending upon their informative content. However, webpages — especially dynamically generated ones — contain items that cannot be clas...
Sandip Debnath, Prasenjit Mitra, C. Lee Giles