Sciweavers

500 search results - page 36 / 100
» Document frequency and term specificity
Sort
View
WWW
2009
ACM
14 years 11 months ago
User-centric content freshness metrics for search engines
In order to return relevant search results, a search engine must keep its local repository synchronized to the Web, but it is usually impossible to attain perfect freshness. Hence...
Ali Dasdan, Xinh Huynh
KDD
2005
ACM
163views Data Mining» more  KDD 2005»
14 years 4 months ago
Web mining from competitors' websites
This paper presents a framework for user-oriented text mining. It is then illustrated with an example of discovering knowledge from competitors’ websites. The knowledge to be di...
Xin Chen, Yi-fang Brook Wu
ECML
2006
Springer
14 years 28 days ago
Distributional Features for Text Categorization
Abstract-- Text categorization is the task of assigning predefined categories to natural language text. With the widely used `bag of words' representation, previous researches...
Xiao-Bing Xue, Zhi-Hua Zhou
ICDM
2006
IEEE
132views Data Mining» more  ICDM 2006»
14 years 5 months ago
High Quality, Efficient Hierarchical Document Clustering Using Closed Interesting Itemsets
High dimensionality remains a significant challenge for document clustering. Recent approaches used frequent itemsets and closed frequent itemsets to reduce dimensionality, and to...
Hassan H. Malik, John R. Kender
SIGIR
2005
ACM
14 years 4 months ago
Title extraction from bodies of HTML documents and its application to web page retrieval
This paper is concerned with automatic extraction of titles from the bodies of HTML documents. Titles of HTML documents should be correctly defined in the title fields; however, i...
Yunhua Hu, Guomao Xin, Ruihua Song, Guoping Hu, Sh...