Search Sciweavers | Sciweavers

368 search results - page 25 / 74

» Template-Based Information Mining from HTML Documents

155

click to vote

SIGMOD
2009
ACM

140views Database» more SIGMOD 2009»

Robust web extraction: an approach based on a probabilistic tree-edit model

15 years 10 months ago

Download www-rcf.usc.edu

On script-generated web sites, many documents share common HTML tree structure, allowing wrappers to eﬀectively extract information of interest. Of course, the scripts and thus ...

Nilesh N. Dalvi, Philip Bohannon, Fei Sha

claim paper

Read More »

130

click to vote

LREC
2010

201views Education» more LREC 2010»

Cultural Heritage: Knowledge Extraction from Web Documents

15 years 4 months ago

Download www.lrec-conf.org

This article presents the use of NLP techniques (text mining, text analysis) to develop specific tools that allow to create linguistic resources related to the cultural heritage d...

Eva Sassolini, Alessandra Cinini

claim paper

Read More »

137

click to vote

KDD
2002
ACM

170views Data Mining» more KDD 2002»

Enhanced word clustering for hierarchical text classification

16 years 3 months ago

Download www.cs.utexas.edu

In this paper we propose a new information-theoretic divisive algorithm for word clustering applied to text classification. In previous work, such "distributional clustering&...

Inderjit S. Dhillon, Subramanyam Mallela, Rahul Ku...

claim paper

Read More »

123

Voted

APWEB
2008
Springer

123views Internet Technology» more APWEB 2008»

A Study on Multi-word Extraction from Chinese Documents

15 years 4 months ago

Download meta-synthesis.iss.ac.cn

As a sequence of two or more consecutive individual words inherent with contextual semantics of individual words, multi-word attracts much attention from statistical linguistics an...

Wen Zhang, Taketoshi Yoshida, Xijin Tang

claim paper

Read More »

154

Voted

WSDM
2009
ACM

136views Data Mining» more WSDM 2009»

Mining common topics from multiple asynchronous text streams

15 years 10 months ago

Download wsdm2009.org

Text streams are becoming more and more ubiquitous, in the forms of news feeds, weblog archives and so on, which result in a large volume of data. An eﬀective way to explore the...

Xiang Wang 0002, Kai Zhang, Xiaoming Jin, Dou Shen

claim paper

Read More »

« Prev « First page 25 / 74 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers