Sciweavers

ACL
2006

A DOM Tree Alignment Model for Mining Parallel Data from the Web

14 years 1 months ago
A DOM Tree Alignment Model for Mining Parallel Data from the Web
This paper presents a new web mining scheme for parallel data acquisition. Based on the Document Object Model (DOM), a web page is represented as a DOM tree. Then a DOM tree alignment model is proposed to identify the translationally equivalent texts and hyperlinks between two parallel DOM trees. By tracing the identified parallel hyperlinks, parallel web documents are recursively mined. Compared with previous mining schemes, the benchmarks show that this new mining scheme improves the mining coverage, reduces mining bandwidth, and enhances the quality of mined parallel sentences.
Lei Shi, Cheng Niu, Ming Zhou, Jianfeng Gao
Added 30 Oct 2010
Updated 30 Oct 2010
Type Conference
Year 2006
Where ACL
Authors Lei Shi, Cheng Niu, Ming Zhou, Jianfeng Gao
Comments (0)