Sciweavers

536 search results - page 92 / 108
» Learning to cluster web search results
Sort
View
WWW
2008
ACM
14 years 8 months ago
Efficient similarity joins for near duplicate detection
With the increasing amount of data and the need to integrate data from multiple data sources, a challenging issue is to find near duplicate records efficiently. In this paper, we ...
Chuan Xiao, Wei Wang 0011, Xuemin Lin, Jeffrey Xu ...
AIPRF
2007
13 years 9 months ago
Evaluation of Different Approaches to Training a Genre Classifier
This paper presents experiments on classifying web pages by genre. Firstly, a corpus of 1539 manually labeled web pages was prepared. Secondly, 502 genre features were selected ba...
Vedrana Vidulin, Mitja Lustrek, Matjaz Gams
WWW
2008
ACM
14 years 8 months ago
Finding the right facts in the crowd: factoid question answering over social media
Community Question Answering has emerged as a popular and effective paradigm for a wide range of information needs. For example, to find out an obscure piece of trivia, it is now ...
Jiang Bian, Yandong Liu, Eugene Agichtein, Hongyua...
BMCBI
2007
174views more  BMCBI 2007»
13 years 7 months ago
ESTuber db: an online database for Tuber borchii EST sequences
Background: The ESTuber database (http://www.itb.cnr.it/estuber) includes 3,271 Tuber borchii expressed sequence tags (EST). The dataset consists of 2,389 sequences from an in-hou...
Barbara Lazzari, Andrea Caprera, Cristian Cosentin...
CORR
2004
Springer
144views Education» more  CORR 2004»
13 years 7 months ago
The Google Similarity Distance
Words and phrases acquire meaning from the way they are used in society, from their relative semantics to other words and phrases. For computers the equivalent of `society' is...
Rudi Cilibrasi, Paul M. B. Vitányi