This paper presents a method for automatically generating an association thesaurus from a text corpus, and demonstrates its application to information retrieval. The thesaurus gen...
It is crucial for cross-language information retrieval (CLIR) systems to deal with the translation of unknown queries1 due to that real queries might be short. The purpose of this...
Accurate web page classification often depends crucially on information gained from neighboring pages in the local web graph. Prior work has exploited the class labels of nearby p...
We propose a novel extraction approach that exploits content redundancy on the web to extract structured data from template-based web sites. We start by populating a seed database...
Pankaj Gulhane, Rajeev Rastogi, Srinivasan H. Seng...
Information extraction approaches are heavily used to gather product information on the Web, especially focusing on technical product specifications. If requesting different sour...