Relational data pre-processing techniques for improved securities fraud detection

16 years 6 months ago

Download kdl.cs.umass.edu

Commercial datasets are often large, relational, and dynamic. They contain many records of people, places, things, events and their interactions over time. Such datasets are rarely structured appropriately for knowledge discovery, and they often contain variables whose meanings change across different subsets of the data. We describe how these challenges were addressed in a collaborative analysis project undertaken by the University of Massachusetts Amherst and the National Association of Securities Dealers (NASD). We describe several methods for data preprocessing that we applied to transform a large, dynamic, and relational dataset describing nearly the entirety of the U.S. securities industry, and we show how these methods made the dataset suitable for learning statistical relational models. To better utilize social structure, we first applied known consolidation and link formation techniques to associate individuals with branch office locations. In addition, we developed an innova...

Andrew Fast, Lisa Friedland, Marc Maier, Brian Tay

Real-time Traffic

Data Mining | KDD 2007 | Relational Probability Trees | Statistical Relational Learning | Statistical Relational Models |

claim paper

» Information awareness a prospective technical assessment

» Conficker and beyond a largescale empirical study

» A GPU accelerated storage system

Post Info
More Details (n/a)

Added	30 Nov 2009
Updated	30 Nov 2009
Type	Conference
Year	2007
Where	KDD
Authors	Andrew Fast, Lisa Friedland, Marc Maier, Brian Taylor, David Jensen, Henry G. Goldberg, John Komoroske

Comments (0)

Sciweavers

Relational data pre-processing techniques for improved securities fraud detection

Data Mining | KDD 2007 | Relational Probability Trees | Statistical Relational Learning | Statistical Relational Models |

Explore & Download

Productivity Tools

Sciweavers