—This paper compares parallel and distributed implementations of an iterative, Gibbs sampling, machine learning algorithm. Distributed implementations run under Hadoop on facilit...
Sebastien Bratieres, Jurgen Van Gael, Andreas Vlac...
Term-based representations of documents have found widespread use in information retrieval. However, one of the main shortcomings of such methods is that they largely disregard le...
Data-driven function tag assignment has been studied for English using Penn Treebank data. In this paper, we address the question of whether such method can be applied to other la...
Abstract—Breast cancer is the leading cause of cancer mortality in women between the ages of 15 and 54. During mammography screening, radiologists use a strict lexicon (BI-RADS) ...
Houssam Nassif, Ryan Woods, Elizabeth S. Burnside,...
Citation matching, or the automatic grouping of bibliographic references that refer to the same document, is a data management problem faced by automatic digital libraries for sci...
Isaac G. Councill, Huajing Li, Ziming Zhuang, Sand...