Search Sciweavers | Sciweavers

6 search results - page 1 / 2

» A pattern tree-based approach to learning URL normalization ...

click to vote

WWW
2010
ACM

234views Internet Technology» more WWW 2010»

A pattern tree-based approach to learning URL normalization rules

14 years 5 months ago

Download research.microsoft.com

Duplicate URLs have brought serious troubles to the whole pipeline of a search engine, from crawling, indexing, to result serving. URL normalization is to transform duplicate URLs...

Tao Lei, Rui Cai, Jiang-Ming Yang, Yan Ke, Xiaodon...

claim paper

Read More »

click to vote

WSDM
2010
ACM

204views Data Mining» more WSDM 2010»

Learning URL patterns for webpage de-duplication

14 years 5 months ago

Download www.wsdm-conference.org

Presence of duplicate documents in the World Wide Web adversely aﬀects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...

Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...

claim paper

Read More »

click to vote

KDD
2008
ACM

183views Data Mining» more KDD 2008»

De-duping URLs via rewrite rules

14 years 11 months ago

Download research.yahoo.com

A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...

Anirban Dasgupta, Ravi Kumar, Amit Sasturkar

claim paper

Read More »

click to vote

KDD
1997
ACM

154views Data Mining» more KDD 1997»

Autonomous Discovery of Reliable Exception Rules

14 years 2 months ago

Download www.aaai.org

This paper presents an autonomous algorithm for discovering exception rules from data sets. An exception rule, which is defined as a deviational pattern to a well-known fact, exhi...

Einoshin Suzuki

claim paper

Read More »

click to vote

ICTAI
2008
IEEE

226views Artificial Intelligence» more ICTAI 2008»

Information Extraction as an Ontology Population Task and Its Application to Genic Interactions

14 years 5 months ago

Download www-lipn.univ-paris13.fr

Ontologies are a well-motivated formal representation to model knowledge needed to extract and encode data from text. Yet, their tight integration with Information Extraction (IE)...

Alain-Pierre Manine, Érick Alphonse, Philip...

claim paper

Read More »

« Prev « First page 1 / 2 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers