There has been a lot of research targeting text classification. Many of them focus on a particular characteristic of text data - multi-labelity. This arises due to the fact that a ...
Mohammad Salim Ahmed, Latifur Khan, Nikunj C. Oza,...
Multiversion support for XML documents is needed in many critical applications, such as software configuration control, cooperative authoring, web information warehouses, and "...
In this paper we propose a new approach to improve electronic editions of human science corpus, providing an efficient estimation of manuscripts pages structure. In any handwriti...
This work applies boosted wrapper induction (BWI), a machine learning algorithm for information extraction from semi-structured documents, to the problem of named entity recogniti...
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...