Most page segmentation algorithms have userspecifiable free parameters. However, algorithm designers typically do not provide a quantitative/rigorous method for choosing values fo...
The demand of browsing information from general Web pages using a mobile phone is increasing. However, since the majority of Web pages on the Internet are optimized for browsing f...
Gen Hattori, Keiichiro Hoashi, Kazunori Matsumoto,...
Abstract. This paper presents a quantitative comparison of six algorithms for page segmentation: X-Y cut, smearing, whitespace analysis, constrained text-line finding, Docstrum, an...
Some previous works show that a web page can be partitioned to multiple segments or blocks, and usually the importance of those blocks in a page is not equivalent. Also, it is pro...
Ruihua Song, Haifeng Liu, Ji-Rong Wen, Wei-Ying Ma
This paper describes an intelligent agent to facilitate bitext mining from the Web via automatic discovery of URL pairing patterns (or keys) for retrieving parallel web pages. The...