Duplicate detection determines different representations of realworld objects in a database. Recent research has considered the use of relationships among object representations t...
The directed graph representation of the World Wide Web has been extensively used to analyze the Web structure, behavior and evolution. However, those graphs are huge and do not ...
— Measuring network flow sizes is important for tasks like accounting/billing, network forensics and security. Per-flow accounting is considered hard because it requires that m...
We describe a new scalable algorithm for semi-supervised training of conditional random fields (CRF) and its application to partof-speech (POS) tagging. The algorithm uses a simil...
New applications of data mining, such as in biology, bioinformatics, or sociology, are faced with large datasets structured as graphs. We present an efficient algorithm for minin...