We address the task of answering natural language questions by using the large number of Frequently Asked Questions (FAQ) pages available on the web. The task involves three steps...
The needs for managing similar documents in different languages increases with the growing amounts of electronic information available in documents of the same type (e.g. news str...
Roberto Basili, Maria Teresa Pazienza, Fabio Massi...
On script-generated web sites, many documents share common HTML tree structure, allowing wrappers to effectively extract information of interest. Of course, the scripts and thus ...
Mining different types of communities from web data have attracted a lot of research efforts in recent years. However, none of the existing community mining techniques has taken i...
Qiankun Zhao, Sourav S. Bhowmick, Xin Zheng, Kai Y...
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...