Sciweavers

ICMLA
2008

Text Classification Using Tree Kernels and Linguistic Information

14 years 1 months ago
Text Classification Using Tree Kernels and Linguistic Information
Standard Machine Learning approaches to text classification use the bag-of-words representation of documents to deceive the classification target function. Typical linguistic structures such as morphology, syntax and semantic are completely ignored in the learning process. This paper examines the role of these structures on the classifier construction applying the study to the Portuguese language. Classifiers are built using the SVM algorithm on a newspaper's articles dataset. The results show that syntactic structure is not useful for text classification (as initially expected), but a novel structured representation that uses document's semantic information has the same discriminative power over classes as the traditional bag-of-words one.
Teresa Gonçalves, Paulo Quaresma
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2008
Where ICMLA
Authors Teresa Gonçalves, Paulo Quaresma
Comments (0)