We discuss the problem of Web data extraction and describe an XML-based methodology whose goal extends far beyond simple "screen scraping." An ideal data extraction proc...
In scholarly digital libraries, author disambiguation is an important task that attributes a scholarly work with specific authors. This is critical when individuals share the sam...
Web spamming, the practice of introducing artificial text and links into web pages to affect the results of searches, has been recognized as a major problem for search engines. Bu...
Today web search engines provide the easiest way to reach information on the web. In this scenario, more than 95% of Indian language content on the web is not searchable due to mu...
Web image search has been explored and developed in academic as well as commercial areas for over a decade. To measure the similarity between Web images and user queries, most of ...
Ying Liu, Tao Qin, Tie-Yan Liu, Lei Zhang, Wei-Yin...