Effective retrieval of court decisions is important. Automatically identifying legal concepts in the decision texts would be very helpful. In this paper we investigate how a stat...
We present an empirical evaluation and comparison of two content extraction methods in HTML: absolute XPath expressions and relative XPath expressions. We argue that the relative ...
Marek Kowalkiewicz, Maria E. Orlowska, Tomasz Kacz...
This work aims to provide a page segmentation algorithm which uses both visual and content information to extract the semantic structure of a web page. The visual information is u...
We describe whiteboard content capture system from Presentations Automatically Organized from Lectures (PAOL) that captures content within the setting of a classroom environment. ...
Paul E. Dickson, W. Richards Adrion, Allen R. Hans...
Existing HTML mark-up is used only to indicate the structure and lay-out of documents, but not the document semantics. As a result web documents are difficult to be semantically p...