A Novel Method for Bilingual Web Page Acquisition from Search Engine Web Records

15 years 2 months ago

Download www.aclweb.org

A new approach has been developed for acquiring bilingual web pages from the result pages of search engines, which is composed of two challenging tasks. The first task is to detect web records embedded in the result pages automatically via a clustering method of a sample page. Identifying these useful records through the clustering method allows the generation of highly effective features for the next task which is high-quality bilingual web page acquisition. The task of high-quality bilingual web page acquisition is a classification problem. One advantage of our approach is that it is search engine and domain independent. The test is based on 2516 records extracted from six search engines automatically and annotated manually, which gets a high precision

Yanhui Feng, Yu Hong, Zhenxiang Yan, Jian-Min Yao,

Real-time Traffic

COLING 2010 | Computational Linguistics | Page | Search Engines | Web Page |

claim paper

Post Info
More Details (n/a)

Added	13 May 2011
Updated	13 May 2011
Type	Journal
Year	2010
Where	COLING
Authors	Yanhui Feng, Yu Hong, Zhenxiang Yan, Jian-Min Yao, Qiaoming Zhu

Comments (0)

Sciweavers

A Novel Method for Bilingual Web Page Acquisition from Search Engine Web Records

COLING 2010 | Computational Linguistics | Page | Search Engines | Web Page |

Explore & Download

Productivity Tools

Sciweavers