Although Wikipedia has emerged as a powerful collaborative Encyclopedia on the Web, it is only partially multilingual as most of the content is in English and a small number of ot...
This paper presents a nativeness classifier for English. The detector was developed and tested with TED Talks collected from the web, where the major non-native cues are in terms...
The proliferation of video content on the web makes similarity detection an indispensable tool in web data management, searching, and navigation. In this paper, we propose a numbe...
Sponsored search is one of the enabling technologies for today's Web search engines. It corresponds to matching and showing ads related to the user query on the search engine...
Recently, the opportunity of extracting structured data from the Web has been identified by a number of research projects. One such example is that millions of relational-style H...
Daisy Zhe Wang, Xin Luna Dong, Anish Das Sarma, Mi...