The Word Wide Web has becoming one of the most important information repositories. However, information in web pages is free of standards in presentation, without being organized in well format. It is a challenging work to extract appropriate and useful information from Web pages. Currently, many web extraction systems called web wrappers, either semiautomatically or fully automatically, have been developed. In this paper, some existing techniques are investigated, then our current work on web information extraction is presented. In our design, we have classified the patterns of information into static and non-static structures, and use different technique to extract the relevant information. In our implementation, patterns are represented with XSL files, and all the extracted information is packaged into a machinereadable format of XML.
Man I. Lam, Zhiguo Gong, Maybin K. Muyeba