Sciweavers

371 search results - page 22 / 75
» Collective Error Detection for MPI Collective Operations
Sort
View
ICDCS
2012
IEEE
11 years 11 months ago
Tiresias: Online Anomaly Detection for Hierarchical Operational Network Data
Operational network data, management data such as customer care call logs and equipment system logs, is a very important source of information for network operators to detect prob...
Chi-Yao Hong, Matthew Caesar, Nick G. Duffield, Ji...
SIGCOMM
2010
ACM
13 years 9 months ago
Detecting the performance impact of upgrades in large operational networks
Networks continue to change to support new applications, improve reliability and performance and reduce the operational cost. The changes are made to the network in the form of up...
Ajay Anil Mahimkar, Han Hee Song, Zihui Ge, Aman S...
SOSP
2009
ACM
14 years 5 months ago
Debugging in the (very) large: ten years of implementation and experience
Windows Error Reporting (WER) is a distributed system that automates the processing of error reports coming from an installed base of a billion machines. WER has collected billion...
Kirk Glerum, Kinshuman Kinshumann, Steve Greenberg...
IJHPCA
2010
105views more  IJHPCA 2010»
13 years 7 months ago
A Pipelined Algorithm for Large, Irregular All-Gather Problems
We describe and evaluate a new, pipelined algorithm for large, irregular all-gather problems. In the irregular all-gather problem each process in a set of processes contributes in...
Jesper Larsson Träff, Andreas Ripke, Christia...
IOLTS
2008
IEEE
102views Hardware» more  IOLTS 2008»
14 years 3 months ago
Integrating Scan Design and Soft Error Correction in Low-Power Applications
— Error correcting coding is the dominant technique to achieve acceptable soft-error rates in memory arrays. In many modern circuits, the number of memory elements in the random ...
Michael E. Imhof, Hans-Joachim Wunderlich, Christi...