The PDF format is commonly used for the exchange of documents on the Web and there is a growing need to understand and extract or repurpose data held in PDF documents. Many system...
The Web is increasingly becoming an important channel for conducting businesses, disseminating information, and communicating with people on a global scale. More and more companie...
How can a search engine automatically provide the best and most appropriate title for a result URL (link-title) so that users will be persuaded to click on the URL? We consider th...
We address the problem of academic conference homepage understanding for the Semantic Web. This problem consists of three labeling tasks - labeling conference function pages, func...
The two most important tasks in information extraction from the Web are webpage structure understanding and natural language sentences processing. However, little work has been don...
Chunyu Yang, Yong Cao, Zaiqing Nie, Jie Zhou, Ji-R...