A semi-structured information space consists of multiple collections of textual documents containing fielded or tagged sections. The space can be highly heterogeneous, because eac...
Engineering diagnosis often involves analyzing complex records of system states printed to large, textual log files. Typically the logs are designed to accommodate the widest debug...
A wealth of knowledge is encoded in the form of tables on the World Wide Web. We propose a classification algorithm and a rich feature set for automatically recognizing layout tab...
Named entity recognition studies the problem of locating and classifying parts of free text into a set of predefined categories. Although extensive research has focused on the de...
Semistructured data is becoming increasingly important for web applications with the development of XML and related technologies. Designing a “good” semistructured database is ...
Xiaoying Wu, Tok Wang Ling, Sin Yeung Lee, Mong-Li...