Parallel web pages are important source of training data for statistical machine translation. In this paper, we present a new approach to sentence alignment on parallel web pages....
For web applications, determining how requests from a web page are routed through server components can be time-consuming and error-prone due to the complex set of rules and mecha...
We consider the problem of template-independent news extraction. The state-of-the-art news extraction method is based on template-level wrapper induction, which has two serious li...
Junfeng Wang, Xiaofei He, Can Wang, Jian Pei, Jiaj...
Background: Protein structure analysis and comparison are major challenges in structural bioinformatics. Despite the existence of many tools and algorithms, very few of them have ...
The widespread use of templates on the Web is considered harmful for two main reasons. Not only do they compromise the relevance judgment of many web IR and web mining methods suc...
Karane Vieira, Altigran Soares da Silva, Nick Pint...