We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
The Web has become an excellent source for gathering consumer opinions. There are now numerous Web sites containing such opinions, e.g., customer reviews of products, forums, disc...
Currently, in the field of technology monitoring, it is very important to be able to get relevant information from heterogeneous sources, especially on the World Wide Web. The com...
Sentiment classification in text documents is an active data mining research topic in opinion retrieval and analysis. Different from previous studies concentrating on the developm...
Dong (Haoyuan) Li, Anne Laurent, Pascal Poncelet, ...
Today the major web search engines answer queries by showing ten result snippets, which need to be inspected by users for identifying relevant results. In this paper we investigat...