This work aims to provide a page segmentation algorithm which uses both visual and content information to extract the semantic structure of a web page. The visual information is u...
Thereis a wealthof informationto be minedfromnarrative text on the WorldWideWeb.Unfortunately, standard natural language processing (NLP)extraction techniques expect full, grammat...
Existing HTML mark-up is used only to indicate the structure and lay-out of documents, but not the document semantics. As a result web documents are difficult to be semantically p...