Paper Details

Communicating Process Architectures (CPA)
 Title: The Role of Concurrency in the Modern HPC Center
 Conference: Communicating Process Architectures 2015
 Authors: Brian Vinter
Niels Bohr Institute, University of Copenhagen
The modern HPC center is a complex entity. A center easily hosts tens of thousands of processors in the smallest cases, and several million processors for the largest facilities. Before 2020 this number is expected to pass 100 million processors in a single computer alone. Thus massive parallel processing is indeed everyday business today, and the major challenges of running such facilities are well established, and solutions exist for most of these challenges.

In the following we will make a strong distinction between parallelism, i.e. Single Program Multiple Data or Single Instruction Multiple Data, and concurrency, i.e. multiple processes, both identical and different, that may run in parallel or interleaved, depending on timing and available hardware.

While concurrency mechanisms were commonly used to express applications for parallel computers two or three decades ago, this use of concurrency has completely died out. There are several explanations for this, but the most important is that the cost of a HPC installation today is so high that users must document that they use the facility efficiently. All applications must stay above a specified threshold, typically 70%, CPU utilization, and even with SPMD type programming, asynchrony amongst processors is a common challenge when trying to stay above the threshold.
This does not mean that concurrency is without its use in the modern HPC center. While the actual computing-­‐nodes do not use concurrency, the underlying fabric, that allows the compute-­‐nodes to operate, has a large potential for concurrency. In many cases these elements could benefit from a formal approach to concurrency control.

In this workshop we will present the challenges in the HPC center that indeed does work on concurrently; storage-­‐systems, schedulers, backup-­‐systems, archiving, and network-­‐bulk transfers to name a few. The interesting challenge is that while all these elements require concurrency control to operate correctly and efficiently, they are also highly interdependent, i.e. the concurrency aspects must cover the full set of infrastructure components for optimal efficiency. We will seek to describe scenarios for real HPC centers and sketch solutions that are build on structured concurrency approach. 

BibTeX Entry

Full paper