Search Sciweavers | Sciweavers

72 search results - page 10 / 15

» Term-Based Clustering and Summarization of Web Page Collecti...

172

Voted

PAKDD
2009
ACM

116views Data Mining» more PAKDD 2009»

Scalable Web Mining with Newistic

16 years 17 days ago

Download www.horatiumocian.com

Abstract. Newistic is a web mining platform that collects and analyses documents crawled from the Internet. Although it currently processes news articles, it can be easily adapted ...

Ovidiu Dan, Horatiu Mocian

claim paper

Read More »

166

click to vote

CIKM
2011
Springer

218views Information Technology» more CIKM 2011»

Probabilistic near-duplicate detection using simhash

14 years 5 months ago

Download irl.cs.tamu.edu

This paper oﬀers a novel look at using a dimensionalityreduction technique called simhash [8] to detect similar document pairs in large-scale collections. We show that this algo...

Sadhan Sood, Dmitri Loguinov

claim paper

Read More »

155

click to vote

KES
2008
Springer

164views Information Technology» more KES 2008»

Data Mining for Navigation Generating System with Unorganized Web Resources

15 years 5 months ago

Download www.its.ac.id

Users prefer to navigate subjects from organized topics in an abundance resources than to list pages retrieved from search engines. We propose a framework to cluster frequent items...

Diana Purwitasari, Yasuhisa Okazaki, Kenzi Watanab...

claim paper

Read More »

113

Voted

WEBI
2007
Springer

92views Internet Technology» more WEBI 2007»

K-SVMeans: A Hybrid Clustering Algorithm for Multi-Type Interrelated Datasets

15 years 12 months ago

Download www.cse.psu.edu

Identiﬁcation of distinct clusters of documents in text collections has traditionally been addressed by making the assumption that the data instances can only be represented by ...

Levent Bolelli, Seyda Ertekin, Ding Zhou, C. Lee G...

claim paper

Read More »

154

click to vote

WWW
2007
ACM

131views Internet Technology» more WWW 2007»

U-REST: an unsupervised record extraction system

16 years 6 months ago

Download people.csail.mit.edu

In this paper, we describe a system that can extract record structures from web pages with no direct human supervision. Records are commonly occurring HTML-embedded data tuples th...

Yuan Kui Shen, David R. Karger

claim paper

Read More »

« Prev « First page 10 / 15 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers