Sciweavers

609 search results - page 47 / 122
» Adaptive record extraction from web pages
Sort
View
WWW
2001
ACM
16 years 6 months ago
Crawling the Hidden Web
Current-day crawlers retrieve content only from the publicly indexable Web, i.e., the set of Web pages reachable purely by following hypertext links, ignoring search forms and pag...
Sriram Raghavan, Hector Garcia-Molina
NAACL
2003
15 years 7 months ago
Automatic Extraction of Semantic Networks from Text using Leximancer
Leximancer is a software system for performing conceptual analysis of text data in a largely language independent manner. The system is modelled on Content Analysis and provides u...
Andrew E. Smith
NAACL
2010
15 years 3 months ago
Extracting Parallel Sentences from Comparable Corpora using Document Level Alignment
The quality of a statistical machine translation (SMT) system is heavily dependent upon the amount of parallel sentences used in training. In recent years, there have been several...
Jason R. Smith, Chris Quirk, Kristina Toutanova
KDD
2004
ACM
145views Data Mining» more  KDD 2004»
15 years 11 months ago
A graph-theoretic approach to extract storylines from search results
We present a graph-theoretic approach to discover storylines from search results. Storylines are windows that offer glimpses into interesting themes latent among the top search re...
Ravi Kumar, Uma Mahadevan, D. Sivakumar
PAKDD
2009
ACM
116views Data Mining» more  PAKDD 2009»
16 years 24 days ago
Scalable Web Mining with Newistic
Abstract. Newistic is a web mining platform that collects and analyses documents crawled from the Internet. Although it currently processes news articles, it can be easily adapted ...
Ovidiu Dan, Horatiu Mocian