Sciweavers

PRIS
2010

The Impact of Pre-processing on the Classification of MEDLINE Documents

13 years 10 months ago
The Impact of Pre-processing on the Classification of MEDLINE Documents
The amount of information available in the MEDLINE database makes it very hard for a researcher to retrieve a reasonable amount of relevant documents using a simple query language interface. Automatic Classification of documents may be a valuable technology to help reducing the amount of documents retrieved for each query. To accomplish this process it is of capital importance to use appropriate pre-processing techniques on the data. The main goal of this study is to analyse the impact of pre-processing techniques in text Classification of MEDLINE documents. We have assessed the effect of combining different pre-processing techniques together with several classification algorithms available in the WEKA tool. Our experiments show that the application of pruning, stemming and WordNet reduces significantly the number of attributes and improves the accuracy of the results. Key words:Text Classification, Machine Learning, Pre-Processing, Information Retrieval, MEDLINE
Carlos Adriano Gonçalves, Célia Talm
Added 14 Feb 2011
Updated 14 Feb 2011
Type Journal
Year 2010
Where PRIS
Authors Carlos Adriano Gonçalves, Célia Talma Gonçalves, Rui Camacho, Eugénio C. Oliveira
Comments (0)