This article is motivated by the importance of building web data mashups. Building on the remarkable success of Web 2.0 mashups, and specially Yahoo Pipes, we generalize the idea ...
Similarity-based grouping of data entries in one or more data sources is a task underlying many different data management tasks, such as, structuring search results, removal of red...
This paper presents an extensible architecture that can be used to support the integration of biological data sets. Biological research frequently requires this kind of synthesis....
Michael Maibaum, Galia Rimon, Christine A. Orengo,...
Semantically heterogeneous and distributed data sources are quite common in several application domains such as bioinformatics and security informatics. In such a setting, each dat...
Businesses today need to interrelate data stored in diverse systems with differing capabilities, ideally via a single high-level query interface. We present the design of a query ...
Laura M. Haas, Donald Kossmann, Edward L. Wimmers,...
A data warehouse stores materialized views over data from one or more sources in order to provide fast access to the integrated data, regardless of the availability of the data so...
We present MOCHA, a new self-extensible database middleware system designed to interconnect distributed data sources. MOCHA is designed to scale to large environments and is based...
Three join algorithms are evaluated in an environment with distributed main-memory based mediators and data sources. A streamed ship-out join ships bulks of tuples to a mediator ne...
—Information about individuals on publicly available web sites stands as a valuable, yet unorganized, data source. Turning such an enormous data source into a “database” is h...
Scientific data offers some of the most interesting challenges in data integration today. Scientific fields evolve rapidly and accumulate masses of observational and experiment...
Partha Pratim Talukdar, Zachary G. Ives, Fernando ...