In this paper, we propose a machine learning approach to title extraction from general documents. By general documents, we mean documents that can belong to any one of a number of...
Yunhua Hu, Hang Li, Yunbo Cao, Dmitriy Meyerzon, Q...
XML repositories are now a widespread means for storing and exchanging information on the Web. As these repositories become increasingly used in dynamic applications such as e-com...
James Bailey, Alexandra Poulovassilis, Peter T. Wo...
Among the various proposals answering the shortcomings of Document Type Definitions (DTDs), XML Schema is the most widely used. Although DTDs and XML Schema Defintions (XSDs) di...
The innate verbosity of the Extensible Markup Language remains one of its main weaknesses, especially when large XML documents are concerned. This problem can be solved with the a...
Przemyslaw Skibinski, Szymon Grabowski, Jakub Swac...
Kernel Canonical Correlation Analysis (KCCA) is a method of correlating linear relationship between two variables in a kernel defined feature space. A machine learning algorithm b...