We propose a novel extraction approach that exploits content redundancy on the web to extract structured data from template-based web sites. We start by populating a seed database...
Pankaj Gulhane, Rajeev Rastogi, Srinivasan H. Seng...
The Protein Data Bank (PDB) is the world-wide repository of macromolecular structure information. We present a series of databases that run parallel to the PDB. Each database hold...
Robbie P. Joosten, Tim A. H. te Beek, Elmar Kriege...
We present EXTENT, an image annotation system that combines the context and content information to annotate images with metadata that cannot be reliably inferred from either the c...
Chang-Ming Tsai, Arun Qamra, Edward Y. Chang, Yuan...
In this paper, we study the use of semantic information to improve performance of transparent query caching for dynamic content web sites. We observe that in dynamic content web a...
Semantic similarity between words or phrases is frequently used to find matching correlations between search queries and documents when straightforward matching of terms fails. Th...