Incorporating background knowledge into data mining algorithms is an important but challenging problem. Current approaches in semi-supervised learning require explicit knowledge p...
Samah Jamal Fodeh, William F. Punch, Pang-Ning Tan
Much of the information on the Web is found in articles from online news outlets, magazines, encyclopedias, review collections, and other sources. However, extracting this content...
This paper describes the development of a structured document collection containing user-generated text and numerical metadata for exploring the exploitation of metadata in inform...
Walid Magdy, Jinming Min, Johannes Leveling, Garet...
Most up-to-date well-behaved topic-based summarization systems are built upon the extractive framework. They score the sentences based on the associated features by manually assig...
Abstract. When automatic plagiarism detection is carried out considering a reference corpus, a suspicious text is compared to a set of original documents in order to relate the pla...