Sciweavers

624 search results - page 20 / 125
» High Performance Matrix Multiplication on Many Cores
Sort
View
IJON
2008
116views more  IJON 2008»
13 years 7 months ago
Discovering speech phones using convolutive non-negative matrix factorisation with a sparseness constraint
Discovering a representation that allows auditory data to be parsimoniously represented is useful for many machine learning and signal processing tasks. Such a representation can ...
Paul D. O'Grady, Barak A. Pearlmutter
SOSP
2009
ACM
14 years 4 months ago
Distributed aggregation for data-parallel computing: interfaces and implementations
Data-intensive applications are increasingly designed to execute on large computing clusters. Grouped aggregation is a core primitive of many distributed programming models, and i...
Yuan Yu, Pradeep Kumar Gunda, Michael Isard
CHES
2007
Springer
327views Cryptology» more  CHES 2007»
14 years 1 months ago
On the Power of Bitslice Implementation on Intel Core2 Processor
Abstract. This paper discusses the state-of-the-art fast software implementation of block ciphers on Intel’s new microprocessor Core2, particularly concentrating on “bitslice i...
Mitsuru Matsui, Junko Nakajima
HRI
2007
ACM
13 years 11 months ago
Developing performance metrics for the supervisory control of multiple robots
Efforts are underway to make it possible for a single operator to effectively control multiple robots. In these high workload situations, many questions arise including how many r...
Jacob W. Crandall, M. L. Cummings
ARVLSI
1997
IEEE
151views VLSI» more  ARVLSI 1997»
13 years 11 months ago
The Hierarchical Multi-Bank DRAM: A High-Performance Architecture for Memory Integrated with Processors
A microprocessor integrated with DRAM on the same die has the potential to improve system performance by reducing the memory latency and improving the memory bandwidth. However, a...
Tadaaki Yamauchi, Lance Hammond, Kunle Olukotun