It becomes more difficult to find valuable contents in the Web 2.0 environment since lots of inexperienced users provide many unorganized contents. In the previous researches, peop...
Web spam research has been hampered by a lack of statistically significant collections. In this paper, we perform the first large-scale characterization of web spam using conten...
Many text documents naturally have two kinds of labels. For example, we may label web pages from universities according to their categories, such as "student" or "fa...
Terrorists and extremists are increasingly utilizing Internet technology to enhance their ability to influence the outside world. Due to the lack of multi-lingual and multimedia ...
The detection and improvement of low-quality information is a key concern in Web applications that are based on user-generated content; a popular example is the online encyclopedi...