Sciweavers

471 search results - page 11 / 95
» MapReduce: Simplified Data Processing on Large Clusters
Sort
View
SC
2009
ACM
14 years 2 months ago
Kepler + Hadoop: a general architecture facilitating data-intensive applications in scientific workflow systems
MapReduce provides a parallel and scalable programming model for data-intensive business and scientific applications. MapReduce and its de facto open source project, called Hadoop...
Jianwu Wang, Daniel Crawl, Ilkay Altintas
CLOUDCOM
2010
Springer
13 years 5 months ago
Efficient Metadata Generation to Enable Interactive Data Discovery over Large-Scale Scientific Data Collections
Discovering the correct dataset efficiently is critical for computations and effective simulations in scientific experiments. In contrast to searching web documents over the Intern...
Sangmi Lee Pallickara, Shrideep Pallickara, Milija...
DASFAA
2009
IEEE
115views Database» more  DASFAA 2009»
14 years 2 months ago
TRUSTER: TRajectory Data Processing on ClUSTERs
With the continued advancements in location-based services involved infrastructures, large amount of time-based location data are quickly accumulated. Distributed processing techni...
Bin Yang 0002, Qiang Ma, Weining Qian, Aoying Zhou
BMCBI
2010
151views more  BMCBI 2010»
13 years 7 months ago
BABAR: an R package to simplify the normalisation of common reference design microarray-based transcriptomic datasets
Background: The development of DNA microarrays has facilitated the generation of hundreds of thousands of transcriptomic datasets. The use of a common reference microarray design ...
Mark J. Alston, John Seers, Jay C. D. Hinton, Sach...
KDD
2002
ACM
138views Data Mining» more  KDD 2002»
14 years 8 months ago
Learning to match and cluster large high-dimensional data sets for data integration
Part of the process of data integration is determining which sets of identifiers refer to the same real-world entities. In integrating databases found on the Web or obtained by us...
William W. Cohen, Jacob Richman