Sciweavers

368 search results - page 25 / 74
» Template-Based Information Mining from HTML Documents
Sort
View
SIGMOD
2009
ACM
140views Database» more  SIGMOD 2009»
14 years 2 months ago
Robust web extraction: an approach based on a probabilistic tree-edit model
On script-generated web sites, many documents share common HTML tree structure, allowing wrappers to effectively extract information of interest. Of course, the scripts and thus ...
Nilesh N. Dalvi, Philip Bohannon, Fei Sha
LREC
2010
201views Education» more  LREC 2010»
13 years 8 months ago
Cultural Heritage: Knowledge Extraction from Web Documents
This article presents the use of NLP techniques (text mining, text analysis) to develop specific tools that allow to create linguistic resources related to the cultural heritage d...
Eva Sassolini, Alessandra Cinini
KDD
2002
ACM
170views Data Mining» more  KDD 2002»
14 years 7 months ago
Enhanced word clustering for hierarchical text classification
In this paper we propose a new information-theoretic divisive algorithm for word clustering applied to text classification. In previous work, such "distributional clustering&...
Inderjit S. Dhillon, Subramanyam Mallela, Rahul Ku...
APWEB
2008
Springer
13 years 8 months ago
A Study on Multi-word Extraction from Chinese Documents
As a sequence of two or more consecutive individual words inherent with contextual semantics of individual words, multi-word attracts much attention from statistical linguistics an...
Wen Zhang, Taketoshi Yoshida, Xijin Tang
WSDM
2009
ACM
136views Data Mining» more  WSDM 2009»
14 years 2 months ago
Mining common topics from multiple asynchronous text streams
Text streams are becoming more and more ubiquitous, in the forms of news feeds, weblog archives and so on, which result in a large volume of data. An effective way to explore the...
Xiang Wang 0002, Kai Zhang, Xiaoming Jin, Dou Shen