In traditional text clustering methods, documents are represented as "bags of words" without considering the semantic information of each document. For instance, if two ...
Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, ...
This work applies boosted wrapper induction (BWI), a machine learning algorithm for information extraction from semi-structured documents, to the problem of named entity recogniti...
Abstract. With information proliferation on the Web, how to obtain highquality information from the Web has been one of hot research topics in many fields like Database, IR as well...
Automatic generation of taxonomies can be useful for a wide area of applications. In our application scenario a topical hierarchy should be constructed reasonably fast from a large...
We describe efficient techniques for construction of large term co-occurrence graphs, and investigate an application to the discovery of numerous fine-grained (specific) topics. A...