Previous work on Natural Language Processing for Information Retrieval has shown the inadequateness of semantic and syntactic structures for both document retrieval and categoriza...
Multi-view algorithms reduce the amount of required training data by partitioning the domain features into separate subsets or views that are sufficient to learn the target concep...
Image classification is a well-studied and hard problem in computer vision. We extend a proven solution for classifying web spam to handle images. We exploit the link structure of...
As the web expands exponentially, the need to put some order to its content becomes apparent. Hypertext categorization, that is the automatic classification of web documents into ...
The innate verbosity of the Extensible Markup Language remains one of its main weaknesses, especially when large XML documents are concerned. This problem can be solved with the a...
Przemyslaw Skibinski, Szymon Grabowski, Jakub Swac...