Sciweavers

1178 search results - page 229 / 236
» Modeling and analysis of content identification
Sort
View
WWW
2002
ACM
14 years 8 months ago
Topic-sensitive PageRank
In the original PageRank algorithm for improving the ranking of search-query results, a single PageRank vector is computed, using the link structure of the Web, to capture the rel...
Taher H. Haveliwala
KDD
2006
ACM
179views Data Mining» more  KDD 2006»
14 years 8 months ago
Extracting key-substring-group features for text classification
In many text classification applications, it is appealing to take every document as a string of characters rather than a bag of words. Previous research studies in this area mostl...
Dell Zhang, Wee Sun Lee
KDD
2005
ACM
125views Data Mining» more  KDD 2005»
14 years 8 months ago
Email data cleaning
Addressed in this paper is the issue of `email data cleaning' for text mining. Many text mining applications need take emails as input. Email data is usually noisy and thus i...
Jie Tang, Hang Li, Yunbo Cao, ZhaoHui Tang
KDD
2004
ACM
163views Data Mining» more  KDD 2004»
14 years 8 months ago
Exploiting dictionaries in named entity extraction: combining semi-Markov extraction processes and data integration methods
We consider the problem of improving named entity recognition (NER) systems by using external dictionaries--more specifically, the problem of extending state-of-the-art NER system...
William W. Cohen, Sunita Sarawagi
SIGMOD
2009
ACM
213views Database» more  SIGMOD 2009»
14 years 8 months ago
Dictionary-based order-preserving string compression for main memory column stores
Column-oriented database systems [19, 23] perform better than traditional row-oriented database systems on analytical workloads such as those found in decision support and busines...
Carsten Binnig, Stefan Hildenbrand, Franz Fär...