Support Vector Machines (SVMs) are currently the state-of-the-art models for many classication problems but they suer from the complexity of their training algorithm which is at l...
In this paper, a machine learning approach to support the user during the correction of the layout analysis is proposed. Layout analysis is the process of extracting a hierarchica...
The set of references that typically appear toward the end of journal articles is sometimes, though not always, a field in bibliographic (citation) databases. But even if referenc...
Random Forests were introduced by Breiman for feature (variable) selection and improved predictions for decision tree models. The resulting model is often superior to AdaBoost and ...
Long Han, Mark J. Embrechts, Boleslaw K. Szymanski...
We propose an algorithm for extracting fields from HTML search results. The output of the algorithm is a database table– a data structure that better lends itself to high-level...