Active Learning Genetic programming for record deduplication

14 years 15 days ago

Download homepages.dcc.ufmg.br

The great majority of genetic programming (GP) algorithms that deal with the classification problem follow a supervised approach, i.e., they consider that all fitness cases available to evaluate their models are labeled. However, in certain application domains, a lot of human effort is required to label training data, and methods following a semi-supervised approach might be more appropriate. This is because they significantly reduce the time required for data labeling while maintaining acceptable accuracy rates. This paper presents the Active Learning GP (AGP), a semi-supervised GP, and instantiates it for the data deduplication problem. AGP uses an active learning approach in which a committee of multi-attribute functions votes for classifying record pairs as duplicates or not. When the committee majority voting is not enough to predict the class of the data pairs, a user is called to solve the conflict. The method was applied to three datasets and compared to two other deduplication...

Junio de Freitas, Gisele L. Pappa, Altigran Soares

Real-time Traffic

Active Learning | Active Learning GP | Artificial Intelligence | CEC 2010 | Data Deduplication Problem |

claim paper

Post Info
More Details (n/a)

Added	10 Feb 2011
Updated	10 Feb 2011
Type	Journal
Year	2010
Where	CEC
Authors	Junio de Freitas, Gisele L. Pappa, Altigran Soares da Silva, Marcos André Gonçalves, Edleno Silva de Moura, Adriano Veloso, Alberto H. F. Laender, Moisés G. de Carvalho

Comments (0)

Sciweavers

Active Learning Genetic programming for record deduplication

Active Learning | Active Learning GP | Artificial Intelligence | CEC 2010 | Data Deduplication Problem |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers