The Impact of Pre-processing on the Classification of MEDLINE Documents

15 years 5 months ago

Download paginas.fe.up.pt

The amount of information available in the MEDLINE database makes it very hard for a researcher to retrieve a reasonable amount of relevant documents using a simple query language interface. Automatic Classification of documents may be a valuable technology to help reducing the amount of documents retrieved for each query. To accomplish this process it is of capital importance to use appropriate pre-processing techniques on the data. The main goal of this study is to analyse the impact of pre-processing techniques in text Classification of MEDLINE documents. We have assessed the effect of combining different pre-processing techniques together with several classification algorithms available in the WEKA tool. Our experiments show that the application of pruning, stemming and WordNet reduces significantly the number of attributes and improves the accuracy of the results. Key words:Text Classification, Machine Learning, Pre-Processing, Information Retrieval, MEDLINE

Carlos Adriano Gonçalves, Célia Talm

Real-time Traffic

Pattern Recognition | Pre-processing | Pre-processing Techniques | PRIS 2010 | Text Classification |

claim paper

Added	14 Feb 2011
Updated	14 Feb 2011
Type	Journal
Year	2010
Where	PRIS
Authors	Carlos Adriano Gonçalves, Célia Talma Gonçalves, Rui Camacho, Eugénio C. Oliveira

Sciweavers

The Impact of Pre-processing on the Classification of MEDLINE Documents

Pattern Recognition | Pre-processing | Pre-processing Techniques | PRIS 2010 | Text Classification |

Explore & Download

Productivity Tools

Sciweavers