Information distillation techniques are used to analyze and interpret large volumes of speech and text archives in multiple languages and produce structured information of interes...
One common predictive modeling challenge occurs in text mining problems is that the training data and the operational (testing) data are drawn from different underlying distributi...
We propose an unsupervised method for detecting spam documents from Web page data, based on equivalence relations on strings. We propose 3 measures for quantifying the alienness (...
Test collections are the primary drivers of progress in information retrieval. They provide a yardstick for assessing the effectiveness of ranking functions in an automatic, rapi...
Nima Asadi, Donald Metzler, Tamer Elsayed, Jimmy L...
mes, abstracts and year of publication of all 853 papers published.1 We then applied Porter stemming and stopword removal to this text, represented terms from the elds with twice t...
Alan F. Smeaton, Gary Keogh, Cathal Gurrin, Kieran...