Search Sciweavers | Sciweavers

609 search results - page 47 / 122

» Adaptive record extraction from web pages

156

click to vote

WWW
2001
ACM

113views Internet Technology» more WWW 2001»

Crawling the Hidden Web

16 years 6 months ago

Download www.dia.uniroma3.it

Current-day crawlers retrieve content only from the publicly indexable Web, i.e., the set of Web pages reachable purely by following hypertext links, ignoring search forms and pag...

Sriram Raghavan, Hector Garcia-Molina

claim paper

Read More »

200

click to vote

NAACL
2003

208views Computational Linguistics» more NAACL 2003»

Automatic Extraction of Semantic Networks from Text using Leximancer

15 years 7 months ago

Download acl.ldc.upenn.edu

Leximancer is a software system for performing conceptual analysis of text data in a largely language independent manner. The system is modelled on Content Analysis and provides u...

Andrew E. Smith

claim paper

Read More »

182

click to vote

NAACL
2010

182views Computational Linguistics» more NAACL 2010»

Extracting Parallel Sentences from Comparable Corpora using Document Level Alignment

15 years 3 months ago

Download research.microsoft.com

The quality of a statistical machine translation (SMT) system is heavily dependent upon the amount of parallel sentences used in training. In recent years, there have been several...

Jason R. Smith, Chris Quirk, Kristina Toutanova

claim paper

Read More »

171

click to vote

KDD
2004
ACM

145views Data Mining» more KDD 2004»

A graph-theoretic approach to extract storylines from search results

15 years 11 months ago

Download www.cs.uiuc.edu

We present a graph-theoretic approach to discover storylines from search results. Storylines are windows that offer glimpses into interesting themes latent among the top search re...

Ravi Kumar, Uma Mahadevan, D. Sivakumar

claim paper

Read More »

175

click to vote

PAKDD
2009
ACM

116views Data Mining» more PAKDD 2009»

Scalable Web Mining with Newistic

16 years 24 days ago

Download www.horatiumocian.com

Abstract. Newistic is a web mining platform that collects and analyses documents crawled from the Internet. Although it currently processes news articles, it can be easily adapted ...

Ovidiu Dan, Horatiu Mocian

claim paper

Read More »

« Prev « First page 47 / 122 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers