People’s email communications can be modeled as graphs with vertices representing email accounts and edges representing email communications. Email communication data usually co...
Xiaomeng Wan, Evangelos E. Milios, Nauzer Kalyaniw...
The widespread use of templates on the Web is considered harmful for two main reasons. Not only do they compromise the relevance judgment of many web IR and web mining methods suc...
Karane Vieira, Altigran Soares da Silva, Nick Pint...
Segmenting a MRI images into homogeneous texture regions representing disparate tissue types is often a useful preprocessing step in the computer-assisted detection of breast canc...
This paper presents a tree-pattern-based method of automatically and accurately finding code clones in program files. Duplicate tree-patterns are first collected by anti-unificati...
This paper offers a novel look at using a dimensionalityreduction technique called simhash [8] to detect similar document pairs in large-scale collections. We show that this algo...