Search Sciweavers | Sciweavers

29

WSDM
2010
ACM

204views Data Mining» more WSDM 2010»

Learning URL patterns for webpage de-duplication

14 years 3 months ago

Presence of duplicate documents in the World Wide Web adversely aﬀects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...

Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...

claim paper

Read More »

38

click to vote

PVLDB
2010

161views more PVLDB 2010»

Annotating and Searching Web Tables Using Entities, Types and Relationships

13 years 6 months ago

Download www.comp.nus.edu.sg

Tables are a universal idiom to present relational data. Billions of tables on Web pages express entity references, attributes and relationships. This representation of relational...

Girija Limaye, Sunita Sarawagi, Soumen Chakrabarti

claim paper

Read More »

33

click to vote

KDD
1997
ACM

169views Data Mining» more KDD 1997»

Learning to Extract Text-Based Information from the World Wide Web

14 years 18 days ago

Download www.aaai.org

Thereis a wealthof informationto be minedfromnarrative text on the WorldWideWeb.Unfortunately, standard natural language processing (NLP)extraction techniques expect full, grammat...

Stephen Soderland

claim paper

Read More »

31

click to vote

KDD
2004
ACM

145views Data Mining» more KDD 2004»

A graph-theoretic approach to extract storylines from search results

14 years 1 months ago

Download www.cs.uiuc.edu

We present a graph-theoretic approach to discover storylines from search results. Storylines are windows that offer glimpses into interesting themes latent among the top search re...

Ravi Kumar, Uma Mahadevan, D. Sivakumar

claim paper

Read More »

24

click to vote

WSDM
2009
ACM

125views Data Mining» more WSDM 2009»

Less is more: sampling the neighborhood graph makes SALSA better and faster

14 years 3 months ago

Download wsdm2009.org

In this paper, we attempt to improve the eﬀectiveness and the eﬃciency of query-dependent link-based ranking algorithms such as HITS, MAX and SALSA. All these ranking algorith...

Marc Najork, Sreenivas Gollapudi, Rina Panigrahy

claim paper

Read More »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers