As massive document repositories and knowledge management systems continue to expand, in proprietary environments as well as on the Web, the need for duplicate detection becomes i...
Multitenant data infrastructures for large cloud platforms hosting hundreds of thousands of applications face the challenge of serving applications characterized by small data foo...
Aaron J. Elmore, Sudipto Das, Divyakant Agrawal, A...
Sixearch.org is a peer application for social, distributed, adaptive Web search, which integrates the Sixearch.org protocol, a topical crawler, a document indexing system, a retri...
Text is a pervasive information type, and many applications require querying over text sources in addition to structured data. This paper studies the problem of query processing i...
We present a new model for detection of noun phrases in unrestricted text, whose most outstanding feature is its flexibility: the system is able to recognize noun phrases similar ...