We present a highly accurate method for classifying web pages based on link percentage, which is the percentage of text characters that are parts of links normalized by the number...
This paper introduced the four tracks that WIM-Lab Fudan University had taken part in at TREC 2007. For spam track, a multi-centre model was proposed considering the characteristi...
Jun Xu, Jing Yao, Jiaqian Zheng, Qi Sun, Junyu Niu
In this paper we address the problem of extracting important (and unimportant) discourse patterns from call center conversations. Call centers provide dialog based calling-in supp...
Anup Chalamalla, Sumit Negi, L. Venkata Subramania...
We investigate the problem of automatically labelling
faces of characters in TV or movie material with their
names, using only weak supervision from automaticallyaligned
subtitl...
Unlike conventional data or text, Web pages typically contain a large amount of information that is not part of the main contents of the pages, e.g., banner ads, navigation bars, ...