Tree-to-string translation rules are widely used in linguistically syntax-based statistical machine translation systems. In this paper, we propose to use deep syntactic informatio...
Many tasks require users to extract information from diverse sources, to edit or process this information locally, and to explore how the end results are affected by changes in th...
Wrapper is a traditional method to extract useful information from Web pages. Most previous works rely on the similarity between HTML tag trees and induced template-dependent wrap...
Detection of template and noise blocks in web pages is an important step in improving the performance of information retrieval and content extraction. Of the many approaches propos...
This paper addresses several key issues in the ArnetMiner system, which aims at extracting and mining academic social networks. Specifically, the system focuses on: 1) Extracting ...
Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zha...