Currently, in the field of technology monitoring, it is very important to be able to get relevant information from heterogeneous sources, especially on the World Wide Web. The com...
– We describe a method to extract content text from diverse Web pages by using the HTML document’s Text-to-Tag Ratio rather than specific HTML cues that may not be constant acr...
A novel method for simultaneous keyphrase extraction and generic text summarization is proposed by modeling text documents as weighted undirected and weighted bipartite graphs. Sp...
Cross Document Coreference (CDC) is the task of constructing the coreference chain for mentions of a person across a set of documents. This work offers a holistic view of using do...
Jian Huang 0002, Pucktada Treeratpituk, Sarah M. T...
This research is about automatic identification and extraction of person names in Chinese text documents. Solutions to this problem have immediate and extensive applications in ma...