16th IEEE Workshop on Dependable Parallel, Distributed and Network-Centric Systems

Anchorage, Alaska, USA

Held in conjunction with the

25th IEEE International Parallel and Distributed Processing Symposium

May 16-20, 2011

The workshop is sponsored by the IEEE Computer Society Technical Committee on Parallel Processing and IRIANC (International Research Institute on Autonomic Network Computing), Boston, USA/Munich, Germany.


Tentative Program (PDF file)

FRIDAY, May 20, 2011

9:45am - 10:00am Opening Remarks

10:00am - 11:30am Network Algorithms

10:00am "Solving k-Set Agreement with Stable Skeleton Graphs", Martin Biely (EPFL and Technische Universität Wien), Peter Robinson (Nanyang Technological University) and Ulrich Schmid (Technische Universitiät Wien)

10:30am "Compact Route Computation: Improving Parallel BGP Route Processing for Scalable Routers", Xuezhii Jiang, Mingwei Xu and Qi Li (Tsinghua University)

11:00am "Towards Persistent Connections using Failure Detectors", Naohiro Hayashibara (Kyoto Sangyo University)

11:30am - 12:40pm Lunch Break

12:40pm - 1:40pm Keynote Speech "Fault tolerance for High Performance Computing Applications in Hostile Environments: Exascale and Cloud", Franck Cappello (INRIA and University of Illinois at Urbana-Champaign)

1:40pm - 3:10pm Cloud Computing

1:40pm "A Monitoring and Audit Logging Architecture for Data Location Compliance in Federated Cloud Infrastructures", Philippe Massonet, Syed Naqvi, Christophe Ponsard (CETIC), Joseph Latanicki (Thales Theresis), Benny Rochwerger (IBM Haifa) and Massimo Villari (University of Messina)

2:10pm "Dependable Autonomic Cloud Computing with Information Proxies", Deger Cenk Erdil (Istanbul Bilgi University)

2:40pm "A Fault-tolerant High Performance Cloud Strategy for Scientific Computing", Ekpe Okorafor (African University of Science & Technology)

3:10pm - 3:30pm Coffee Break

3:30pm - 5:00pm High Performance Computing

3:30pm "Evaluation of Simple Causal Message Logging for Large-Scale Fault Tolerant HPC Systems", Esteban Meneses (University of Illinois at Urbana-Champaign), Greg Bronevetsky (Lawrence Livermore National Laboratory) and Laxmikant V. Kalé (University of Illinois at Urbana-Champaign)

4:00pm "Building a Fault Tolerant MPI Application: A Ring Communication Example", Joshua Hursey and Richard Graham (Oak Ridge National Laboratory)

4:30pm "Algorithm-Based Recovery for Newton's Method without Checkpointing", Hui Liu, Teresa Davies, Chong Ding, Christer Karlsson and Zizhong Chen (Colorado School of Mines)

5:00pm - 5:20pm Coffee Break

5:20pm - 6:20pm Failure Analysis

5:20pm "Predicting Node Failure in High Performance Computing Systems from Failure and Usage Logs", Nithin Nakka (University of Illinois at Urbana-Champaign), Ankit Agrawal and Alok Choudhary (Northwestern University)

5:50pm "Achieving Target MTTF by Duplicating Reliability-Critical Components in High Performance Computing Systems", Nithin Nakka (University of Illinois at Urbana-Champaign), Alok Choudhary (Northwestern University), Gary Grider, John Bent, James Nunez and Satsangat Khalsa (Los Alamos National Laboratories)

6:20pm - 6:30pm Concluding Remarks


Increasingly large and complex parallel, distributed and network-centric computing systems provide unique challenges to the researchers in dependable computing, especially because of the high failure rates intrinsic to these systems. The goal of this workshop in continuation of the FTPDS (Fault- Tolerant Parallel and Distributed Systems) workshop series is to provide a forum for researchers and practitioners to discuss all aspects of dependability including reliability, availability, safety and security for parallel, distributed and network-centric systems. All aspects of design, theory and realization are of interest.

Topics of interest include, but are not limited to:

Steering Committee:

D. Avresky, International Research Institute on Autonomic Network Computing (IRIANC), Boston USA/Munich Germany (Chair)
E. Maehle, University of Luebeck, Germany (Co-Chair)

Program Co-Chairs:

T. Kikuno - OSAKA University, Japan
T. Tsuchiya - OSAKA University, Japan

Program Committee:

J. Alonso, Technical University of Catalonia and Barcelona Supercomputing Center, Spain
B. Ciciani, University of Roma, Italy
M. Colajanni, University of Modena, Italy
G. Deconinck, University of Leuven, Belgium
A. Doering, IBM Research Zurich, Switzerland
C. Elks, University of Virginia, USA
I. Gashi, City University London, UK
K. E. Grosspietsch, Fraunhofer IAIS, Germany
S. Ivanov, RT-solutions.de GmbH, Germany
B. Johnson, University of Virginia, USA
R. Khazan, MIT Lincoln Lab, USA
A. Puliafito, University of Messina, Italy
F. Quaglia, University of Roma, Italy
F. Salfner, Humboldt University, Germany
L. Silva , University of Coimbra, Portugal
P. Sobe, University of Applied Sciences Dresden, Germany
W. Steiner, TTTech Computertechnik AG, Viena, Austria
C. Trinitis, TU Munich, Germany

Contact the Program Chair via email:

Tohru Kikuno - OSAKA University, Japan
Tatsuhiro Tsuchiya - OSAKA University, Japan

To submit papers, send the file (at most 8 pages long including figures and references in the IEEE format (see www.ieee.org)) describing original unpublished research by

January 20, 2011

in either Postscript or PDF format electronically via the submission page:


All papers will be reviewed.

Notification of authors: February 6, 2011.
Camera-ready papers: February 21, 2011.

The proceedings for workshops will be published (on CD-ROM) along with the regular proceedings for The IEEE International Parallel & Distributed Processing Symposium.