In this paper, we propose a hierarchical architecture for grouping peers into clusters in a large-scale BitTorrent-like underlying overlay network in such a way that clusters are e...
It has been observed that precision increases with collection size. One explanation could be that the redundancy of information increases, making it easier to find multiple docum...
Many real-world domains present the problem of imbalanced data sets, where examples of one classes significantly outnumber examples of other classes. This makes learning difficu...
Abstract. One of the more challenging problems faced by the data mining community is that of imbalanced datasets. In imbalanced datasets one class (sometimes severely) outnumbers t...
Statistical machine learning methods are employed to train a Named Entity Recognizer from annotated data. Methods like Maximum Entropy and Conditional Random Fields make use of fe...