This paper presents BlogBuster, a tool for extracting a corpus from the blogosphere. The topic of cleaning arbitrary web pages with the goal of extracting a corpus from web data, ...
Hidden Web databases maintain a collection of specialised documents, which are dynamically generated in response to users' queries. The majority of these documents are genera...
Yih-Ling Hedley, Muhammad Younas, Anne E. James, M...
The Web is the richest source of information and knowledge. Unfortunately the current structure of Web pages makes it difficult for users to retrieve the information or knowledge ...
Enterprises provide professionally authored content about their products/services in different languages for use in web sites and customer care. For customer care, personalization...
Existing web usage mining techniques focus only on discovering knowledge based on the statistical measures obtained from the static characteristics of web usage data. They do not ...