We present a large-scale analysis of the content of weblogs dating back to the release of the Blogger program in 1999. Over one million blogs were analyzed from their conception t...
Readability is a crucial presentation attribute that web summarization algorithms consider while generating a querybaised web summary. Readability quality also forms an important ...
Sets of named entities are used heavily at commercial search engines such as Google, Yahoo and Bing. Acquiring sets of entities typically consists of combining semi-supervised exp...
A large number of web pages contain data structured in the form of “lists”. Many such lists can be further split into multi-column tables, which can then be used in more seman...
Background: Molecular Biology accumulated substantial amounts of data concerning functions of genes and proteins. Information relating to functional descriptions is generally extr...