Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...
Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...
XML has grown into a widely used and highly developed technology, due in part to the subcomponents built around the technology (advanced parsers, frameworks, libraries, etc). The ...
Naphtali Rishe, Ouri Wolfson, Ben Wongsaroj, Damia...
We propose a new overlay network, called Generic Identifier Network (GIN), for collaborative nodes to share objects with transactions across affiliated organizations by merging th...
We describe a novel simple and highly scalable semi-supervised method called Word-Class Distribution Learning (WCDL), and apply it the task of information extraction (IE) by utili...
Yanjun Qi, Ronan Collobert, Pavel Kuksa, Koray Kav...
Abstract— Today data sources are pervasive and their number is growing tremendously. Current tools are not prepared to exploit this unprecedented amount of information and to cop...