The basis of much of the intelligence on the Web is the hyperlink structure which represents an organising principle based on the human facility to be able to discriminate between...
The study of a national Web graph is challenging and can provide insight into social phenomena specific to a country. However, because there is no country border in the Web, decidi...
Mining bilingual data (including bilingual sentences and terms1 ) from the Web can benefit many NLP applications, such as machine translation and cross language information retrie...
Long Jiang, Shiquan Yang, Ming Zhou, Xiaohua Liu, ...
In this paper, we proposed a new approach, called FiVaTech for the problem of Web data extraction. FiVaTech is a page-level data extraction system which deduces the data schema an...
Mohammed Kayed, Chia-Hui Chang, Khaled F. Shaalan,...
While the Web has been increasingly recognized as a culturally valuable social artifact, many nations endeavor to create national Web archives for long term preservation. However, ...