In this paper, we present a fast and scalable Bayesian model for improving weakly annotated data – which is typically generated by a (semi) automated information extraction (IE) ...
Many text documents naturally have two kinds of labels. For example, we may label web pages from universities according to their categories, such as "student" or "fa...
The nature of the internet as a non-peer-reviewed (and more generally largely unregulated) publication medium has allowed wide-spread promotion of inaccurate and unproven medical ...
As opposed to representing a document as a "bag of words" in most information retrieval applications, we propose a model of representing a web page as sets of named enti...
Nan Di, Conglei Yao, Mengcheng Duan, Jonathan J. H...
The World Wide Web (WWW) has experienced a dramatic increase in popularity since 1993. Many reports indicate that its growth will continue at an exponential rate. This growth has ...