Automated text categorization is an important technique for many web applications, such as document indexing, document filtering, and cataloging web resources. Many different appr...
Hierarchical topic taxonomies have proliferated on the World Wide Web [5, 18], and exploiting the output space decompositions they induce in automated classification systems is an...
We propose an unsupervised method for detecting spam documents from Web page data, based on equivalence relations on strings. We propose 3 measures for quantifying the alienness (...
Abstract. Kanazawa ([1]) has studied the learnability of several parameterized families of classes of categorial grammars. These classes were shown to be learnable from text, in th...
Information extraction can be defined as the task of automatically extracting instances of specified classes or relations from text. We consider the case of using machine learni...