We present an empirical evaluation and comparison of two content extraction methods in HTML: absolute XPath expressions and relative XPath expressions. We argue that the relative ...
Marek Kowalkiewicz, Maria E. Orlowska, Tomasz Kacz...
To simplify the task of obtaining information from the vast number of information sources that are available on the World Wide Web (WWW), we are building tools to build informatio...
We explore techniques for detecting news articles containing invalid information, using the help of text categorization technology. The information that exists on the World Wide W...
The World Wide Web no longer consists just of HTML pages. Our work sheds light on a number of trends on the Internet that go beyond simple Web pages. The hidden Web provides a wea...
Fabian M. Suchanek, Aparna S. Varde, Richi Nayak, ...
As the World Wide Web in China grows rapidly, mining knowledge in Chinese Web pages becomes more and more important. Mining Web information usually relies on the machine learning ...