Sciweavers

359 search results - page 8 / 72
» Document clustering using word clusters via the information ...
Sort
View
IJCAI
2007
13 years 9 months ago
Semantic Smoothing of Document Models for Agglomerative Clustering
In this paper, we argue that the agglomerative clustering with vector cosine similarity measure performs poorly due to two reasons. First, the nearest neighbors of a document belo...
Xiaohua Zhou, Xiaodan Zhang, Xiaohua Hu
ICDE
2007
IEEE
211views Database» more  ICDE 2007»
14 years 1 months ago
Document Representation and Dimension Reduction for Text Clustering
Increasingly large text datasets and the high dimensionality associated with natural language create a great challenge in text mining. In this research, a systematic study is cond...
M. Mahdi Shafiei, Singer Wang, Roger Zhang, Evange...
CLEF
2011
Springer
12 years 7 months ago
A Language-Independent Approach to Identify the Named Entities in Under-Resourced Languages and Clustering Multilingual Document
Abstract. This paper presents a language-independent Multilingual Document Clustering (MDC) approach on comparable corpora. Named entites (NEs) such as persons, locations, organiza...
N. Kiran Kumar, G. S. K. Santosh, Vasudeva Varma
ISI
2007
Springer
14 years 1 months ago
DOTS: Detection of Off-Topic Search via Result Clustering
— Often document dissemination is limited to a “need to know” basis so as to better maintain organizational trade secrets. Retrieving documents that are off-topic to a user...
Nazli Goharian, Alana Platt
WWW
2010
ACM
14 years 2 months ago
CETR: content extraction via tag ratios
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
Tim Weninger, William H. Hsu, Jiawei Han