Communicating Process Architectures 2015 (CPA 2015)


  Editors: Kevin Chalmersa, Jan Bækgaard Pedersenb, Frederick R. M. Barnesc, Jan F. Broeninkd, Ruth Ivimey-Cooke, Adam T. Sampsonf, Peter H. Welchc
      (a) School of Computing, Edinburgh Napier University, UK
      (b) Department of Computer Science, University of Nevada Las Vegas, USA
      (c) School of Computing, University of Kent, UK
      (d) Robotics and Mechatronics, CTIT Institute, University of Twente, The Netherlands
      (e) eLifesciences Ltd., UK
      (f) School of Arts, Media and Computer Games, Abertay University, UK
 
  Publisher: IOS Press, Amsterdam, The Netherlands
  ISBN: 978-XXX

  Held at University of Kent in Canterbury, UK, August 23 - 26, 2015 (Local chair: Peter H. Welch)
 
  Conference Program
 
  Award Winners
  CoCoL: Concurrent Communications Library (Best paper)
  Process-Based Aho-Corasick Failure Function Construction (Best student paper)
  OpenTransputer: Reinventing a Parallel Machine from the Past (Michael D. Poole prize for widening access to CPA ideas)
  OLL Compiler Project (Best Fringe)
 
 Keynote Presentations
 
 Title: Communicating Processes and Processors (1975-2025)
 Authors: David May
Department of Computer Science, University of Bristol
 Abstract: The ideas that gave rise to CSP, occam and transputers originated in the UK around 1975; occam and the Inmos transputer were launched around 1985. Thousands of computer scientists and engineers learned about concurrency and parallel computing using the occam-transputer platform and the influence is evident in many current architectures and programming tools. I will reflect on the relevance of these ideas today — after 30 years — and talk about communicating processes and processors in the age of clouds, things and robots.

Brief Background: David May is Professor of Computer Science at Bristol University, UK. He graduated in CS from Cambridge University in 1972 and then spent several years working on architectures and language for distributed processing. In 1979 he joined Inmos and then spent 16 years in the semiconductor industry. David was the architect of the Inmos Transputer — the first microprocessor designed to support multiprocessing — and the designer of the OCCAM concurrent programming language. David joined Bristol University as Head of Computer Science in 1995 and continued an active involvement with Bristol's growing microelectronics cluster and its investors. His most recent venture is XMOS, which he co-founded in 2005. David has 40 granted patents with many pending patents centered around microprocessor technology. David was elected a Fellow of the Royal Society of London in 1990 for his contributions to computer architecture and parallel computing, and a Fellow of the UK Royal Academy of Engineering in 2010.
 Bibliography:   [BibTeX]
 Abstract: [PDF] 
 Presentation: [PDF] 
 
 Title: occam's Rule Applied: Separation of Concerns as a Key to Trustworthy Systems Engineering for Complex Systems
 Authors: Eric Verhulst
Altreonic NV
 Abstract: "Keep it simple but not too simple" means that a complex solution is really a problem that's not very well understood. In formal methods, this is reflected not only in the size of the state space, but also in the dependencies between these states. This is the main reason why Formal Modelling is not delivering as expected: the state space explosion would require an infinite amount of resources. If an automated tool cannot handle the state space, how can we expect engineers to do so? This is where CSP comes in: it divides the state space in small manageable chunks, making it easier to reason about the behaviour. There are however a few pre-conditions for this to work: one must take a step back, dividing the complex state space before conquering it, hence thinking about functionalities and how they are related before thinking about the punctual states in space and time.
Extrapolating the CSP abstract process algebra leads to a generic concept of describing systems as a set of Interacting Entities, whereby the Interactions are seen as first class citizens, at the same level as the Entities, decoupling the Entities' states by explicit information exchanges. We enter hereby the domain of modelling. One major issue with modelling approaches is that, while we need different and complementary models to develop a real system, these often have different semantics (if the semantics are properly defined at all). By being able to hide the internal semantics, one can focus on the interactions and use these as standardised interfaces.
It is clear that for this to work in the software domain, the natural programming model should be concurrent and execute on hardware that is compatible with it — a design feature of the transputer that has not been matched since. This opens the door to multi-domain modelling where, for example, parts of the system are continuous and other parts are discrete (as in executing a clocked logic). This gives us an interesting new domain of hybrid logic, a topic we want to explore further in a workshop at the conference.
This lecture will be guided by my own personal journey, starting with a spreadsheet to program a parallel machine, covering Peter Welch's courses in occam and the formal development of our distributed RTOS.

Brief Background: Eric led the development of the Virtuoso multi-board RTOS, used in the ESA's Rosetta space mission to comet 67P/Churyumov-Gerasimenko. Virtuoso was the first distributed RTOS on the transputer and its successor developments — such as OpenComRTOS, a formal redevelopment from scratch, and VirtuosoNext, featuring fine-grain space partitioning — all apply the valuable principles and lessons learned from CSP, the transputer and occam.
 Bibliography:   [BibTeX]
 Abstract: [PDF] 
 Presentation: [PDF] 
 
 
 Papers
 
 Title: OpenTransputer: Reinventing a Parallel Machine from the Past (Michael D. Poole prize for widening access to CPA ideas)
 Authors: Andres Amaya-Garciaa, David Kellerb, David Mayb
(a) Department of Computer Science, University of Britsol
(b) Department of Computer Science, University of Bristol
 Abstract: The OpenTransputer is a new implementation of the transputer first launched by Inmos in 1985. It supports the same instruction set, but uses a different micro-architecture that takes advantage of today's manufacturing technology; this results in a reduced cycle count for many of the instructions. Our new transputer includes support for channels that connect to input-output ports, enabling direct connection to external devices such as sensors, actuators and standard communication interfaces. We have also generalised the channel communications with the support of virtual channels and the design of a message routing component based on a Beneš switch to enable the construction of networks with scalable throughput and low-latency. We aim to make the transputer and switch components available as open-source designs.
 Bibliography:   [BibTeX]
 Paper: [PDF] 
 Presentation: [PDF] 
 
 Title: Discrete Event-based Neural Simulation using the SpiNNaker System
 Authors: Andrew Browna, Jeff Reeveb, Steve Furberc
(a) University of Southampton, UK, University of Southampton
(b) Department of Electronics & Computer Science, University of Southampton
(c) School of Computer Science, Univeristy of Manchester
 Abstract: SpiNNaker is a computing system composed of over a million ARM cores, embedded in a bespoke asynchronous communication fabric. The physical realization of the system consist of 57600 nodes (a node is a silicon die), each node containing 18 ARM cores and a routing engine. The communication infrastructure allows the cores to communicate via short, fixed-length (40- or 72-bit), hardware-brokered packets. The packets find their way through the network in a sequence of hops, and the specifics of each route are held (distributed) in the route engines, not unlike internet routing. On arrival at a target core, a hardware-triggered interrupt invokes code to handle the incoming packet. Within this computing model, the state of the system-under-simulation is distributed, held in memory local to the cores, and the topology is also distributed, held in the routing engine internal tables. The message passing is non-deterministic and non-transitive, there is no memory coherence between the core local memories, and there is no global synchronization. This paper shows how such a system can be used to simulate large systems of neurons using discrete event-based techniques. More notably, the solution time remains approximately constant with neural system size as long as sufficient hardware cores are available.
 Bibliography:   [BibTeX]
 Paper: [PDF] 
 Presentation: [Power Point] 
 
 Title: Adding CSPm Functions and Data Types to CSP++
 Authors: Daniel Garnera, Markus Roggenbachb, William B. Gardnerc
(a) BT, Adastral Park
(b) Department of Computer Science, Swansea University
(c) School of Computer Science, University of Guelph
 Abstract: This work extends the subset of CSPm that can be translated to C++ using the tool CSP++. Specifications can now contain user defined functions, make limited use of set and sequence data, and utilise built-in CSPm functions that operate on such data. An extended development paradigm is suggested, based on applying transformational methods from UML to C++. All techniques are demonstrated over three case-study examples.

 Bibliography:   [BibTeX]
 Paper: [PDF] 
 Presentation: [PDF] 
 
 Title: Towards Lightweight Formal Development of MPI Applications
 Authors: Nelson Souto Rosaa, Humaira Kamalb, Alan Wagnerb
(a) Centro de Informatica, Universidade Federal de Pernambuco
(b) Department of Computer Science, University of British Columbia
 Abstract: A significant number of parallel applications are implemented using MPI (Message Passing Interface) and several existing approaches focus on their verification. However, these approaches typically work with complete applications and fixing any undesired behaviour at this late stage of application development is difficult and time consuming. To address this problem, we present a lightweight formal approach that helps developers build safety into the MPI applications from the early stages of the program development. Our approach consists of a methodology that includes verification during the program development process. We provide tools that hide the more difficult formal aspects from developers making it possible to verify properties such as freedom from deadlock as well as automatically generating partial skeletons of the code. We evaluate our approach with respect to its ability and efficiency in detecting deadlocks.
 Bibliography:   [BibTeX]
 Paper: [PDF] 
 Presentation: [PDF] 
 
 Title: A Model-driven Methodology for Generating and Verifying CSP-based Java Code
 Authors: Julio Marinoa, Raul N. N. Alborodob
(a) Babel Group, Universidad Politecnica de Madrid
(b) IMDEA Software Institute
 Abstract: Model-driven methodologies can help developers create more reliable software. In previous work, we have advocated a model-driven approach for the analysis and design of concurrent, safety-critical systems. However, to take full advantage of those techniques, they must be supported by code generation schemes for concrete programming languages. Ideally, this translation should be traceable, automated and should support verification of the generated code. In this paper we consider the problem of generating concurrent Java code from high-level interaction models. Our proposal includes an extension of JML that allows to specify shared resources as Java interfaces, and several translation patterns for (partial) generation of CSP-based Java code. The template code thus obtained is JML-annotated Java using the JCSP library, with proof obligations that help with both traceability and verification. Finally, we present an experiment in code verification using the KeY tool.
 Bibliography:   [BibTeX]
 Paper: [PDF] 
 Presentation: [PDF] 
 
 Title: A Super-Simple Run-Time for CSP-Based Concurrent Systems
 Authors: Michael E. Goldsby
Sandia National Laboratories
 Abstract: MicroCSP is a run-time system written in C supporting process-oriented concurrency. It features CSP-style synchronous communication over point-to-point channels and alternation (including timeouts) and does preemptive priority process scheduling. The MicroCSP programmer writes the logic of a process as an ordinary function which, barring preemption, runs to completion every time the process is scheduled. To make such an approach feasible requires that the programmer cast each process’s logic in normal form: a single choice of guarded events, with event-free computation following each event. A context switch requires a mere function call, with the exception of those that occur as the result of an interrupt. The system is memory-efficient, fast and has a narrow hardware interface. The intended target of MicroCSP is bare microcontroller hardware, and its efficient use of the stack makes it particularly suitable for microcontrollers with restricted memory. The current version runs on a single processor over a Linux implementation of the hardware interface and serves as a prototype for the microcontroller implementations. A multicore implementation appears to be possible.
 Bibliography:   [BibTeX]
 Paper: [PDF] 
 Presentation: [Power Point] 
 
 Title: A Design for Interchangeable Simulation and Implementation
 Authors: Klaus Birkelund Jensen, Brian Vinter
Niels Bohr Institute, University of Copenhagen
 Abstract: Research on concurrent systems requires modeling, simulation and evaluation in many settings. We describe a design for Interchangeable Simulation and Implementation (ISI): an approach that enables the simultaneous modeling, simulation and evaluation of highly concurrent process based systems. We describe how ISI can been used to design and evaluate techniques used in storage systems. Our key insight is that, while the simulation of systems could be implemented as regular discrete event simulation, the benefits of designing it as the highly concurrent process based system that it is – with the full data and control flow in mind – enables seamless interchangeability between discrete and real time simulation.
 Bibliography:   [BibTeX]
 Paper: [PDF] 
 Presentation: [PDF] 
 
 Title: Lambda Calculus in Core Aldwych
 Authors: Matthew Huntbach
School of Electronic Engineering and Computer Science, Queen Mary University of London
 Abstract: Core Aldwych is a simple model for concurrent computation, involving the concept of agents who communicate through shared variables. Each variable will have exactly one agent which can write to it, and its value can never be changed once written, but a value can contain further variables which are written to later. A key aspect is that the reader of a value may become the writer of variables in it. In this paper we show how this model can be used to encode lambda calculus. Individual function applications can be explicitly encoded as lazy or not, as required. We then show how this encoding can be extended to cover functions which manipulate mutable variables, but with the underlying Core Aldwych implementation still using only immutable variables. The ordering of function applications then becomes an issue, with Core Aldwych able to model either the enforcement of an ordering or the decision to retain indeterminate ordering which would allow for parallel execution.
 Bibliography:   [BibTeX]
 Paper: [PDF] 
 Presentation: [PDF] 
 
 Title: CoCoL: Concurrent Communications Library (Best paper)
 Authors: Kenneth Skovhede, Brian Vinter
Niels Bohr Institute, University of Copenhagen
 Abstract: In this paper we examine a new CSP inspired library for the Common Intermediate Language (CIL), dubbed CoCoL: Concurrent Communications Library. The use of CIL makes the library accessible from a number of languages, including C#, F#, Visual Basic and IronPython. The processes are based on tasks and continuation callbacks, rather than threads, which enables networks with millions of running processes on a single machine. The channels are based on request queues with two-phase commit tickets, which enables external choice without coordination among channels. We evaluate the performance of the library on different operating systems, and compare the performance with JCSP and C++CSP.
 Bibliography:   [BibTeX]
 Paper: [PDF] 
 Presentation: [PDF] 
 
 Title: Process-Based Aho-Corasick Failure Function Construction (Best student paper)
 Authors: Tinus Straussa, Derrick G. Kourieb, Bruce W. Watsonb, Loek Cleophasc
(a) FASTAR Research Group, University of Pretoria
(b) FASTAR Research Group, Stellenbosch University
(c) Department of Computer Science, Umeå Universitet
 Abstract: This case study is embedded in a wider project aimed at investigating process-based software development to better utilise the multiple cores on contemporary hardware platforms. Three alternative process-based architectures for the classical Aho-Corasick failure function construction algorithm are proposed, described in CSP and implemented in Go. Empirical results show that these process-based implementations attain significant speedups over the conventional sequential implementation of the algorithm for significantly-sized data sets. Evidence is also presented to demonstrate that the process-based performances are comparable to the performance of a more conventional concurrent implementation in which the input data is simply partitioned over several concurrent processes.
 Bibliography:   [BibTeX]
 Paper: [PDF] 
 Presentation: [PDF] 
 
 Title: Guppy: Process-Oriented Programming on Embedded Devices
 Authors: Frederick R. M. Barnes
School of Computing, University of Kent
 Abstract: Guppy is a new and experimental process-oriented programming language, taking much inspiration (and some code-base) from the existing occam-π language. This paper reports on a variety of aspects related to this, specifically language, compiler and run-time system development, enabling Guppy programs to run on desktop and embedded systems. A native code-generation approach is taken, using C as the intermediate language, and with stack-space requirements determined at compile-time.
 Bibliography:   [BibTeX]
 Paper: [PDF] 
 Presentation: [PDF] 
 
 Title: Communicating Process Architectures in the Light of Parallel Design Patterns and Skeletons
 Authors: Kevin Chalmers
School of Computing, Edinburgh Napier University
 Abstract: This work presents thoughts on linking Communicating Process Architectures (CPA) and parallel design patterns. The work considers where CPA align in the parallel application design world. Expansion of CPA to support current trends in parallel application development are also presented.
 Bibliography:   [BibTeX]
 Paper: [PDF] 
 Presentation: [PDF] 
 
 Title: Bus Centric Synchronous Message Exchange for Hardware Designs
 Authors: Brian Vinter, Kenneth Skovhede
Niels Bohr Institute, University of Copenhagen
 Abstract: A new design and implementation of the Synchronous Message Exchange (SME) model is presented. The new version uses explicit busses, which may include multiple fields, and where components may use a bus for both reading and writing. The original version allowed only reading from or writing to a bus, which triggered a need for some busses to exist in two versions for different directions. In addition to the new and improved bus-model, the new version of SME produces traces that may be used for validating a later VHDL implementation of the designed component. Further, it can produce a graphical representation of a design to help with debugging.
 Bibliography:   [BibTeX]
 Paper: [PDF] 
 Presentation: [PDF] 
 
 Title: Model-Driven Design of Simulation Support for the TERRA Robot Software Tool Suite
 Authors: Zhou Lu, Maarten M. Bezemer, Jan F. Broenink
Robotics and Mechatronics, CTIT Institute, University of Twente
 Abstract: Model-Driven Development (MDD) based on the concepts of model, meta-model, and model transformation is an approach to develop predictable and reliable software for Cyber-Phsical Systems (CPS). The work presented here is on a methodology to design simulation software based on MDD techniques, supporting the TERRA tool suite to describe and simulate process communication flows. TERRA is implemented using MDD techniques and Communicating Sequential Process algebra (CSP). Simulation support for TERRA helps the designer to understand the semantics of the designed model, hence to increase the probability of first-time-right software implementations. A new simulation meta-model is proposed, abstracting the simulation process of a TERRA model. With this new meta-model and our previously designed CSP meta-model, a simulation model can be transformed from its TERRA source. The Eclipse Modelling Framework (EMF) is used to implement the meta-model. The Eclipse Epsilon Framework includes the Epsilon Transformation Language (ETL) and the Epsilon Generation Language (EGL) are used for model-to-model and model-to-text transformation. The simulation support is shown using an example, in which the generated trace text is shown as well. Further work is to implement an animation facility to show the trace text in the TERRA graphical model using colours.
 Bibliography:   [BibTeX]
 Paper: [PDF] 
 Presentation: [PDF] 
 
 Title: Code Specialisation of Auto-Generated GPU Kernels
 Authors: Troels Blum, Brian Vinter
Niels Bohr Institute, University of Copenhagen
 Abstract: This paper explores and evaluates the effect of automatic code specialization on auto generated GPU kernels. When combining the high productivity coding environment of computational science with the Just-In-Time compilation nature of many GPU runtime systems, there is a clear cut opportunity for code optimization and specialization. We have developed a hybrid kernel generation method which is shown to be useful and competitive across very different use cases, and requires minimal knowledge of the overall structure of the program. Stencil codes which are commonly found at the core of computer simulations are ideal candidates for this type of code specialization. For exactly this type of application we are able to achive speedups of up to 2.5 times with the implemented strategy.
 Bibliography:   [BibTeX]
 Paper: [PDF] 
 Presentation: [PDF] 
 
 
 Fringe Presentations
 
 Title: Not that Blocking!
 Authors: Øyvind Teig
Autronica Fire and Security AS
 Abstract: Communicating to fellow programmers that the concept of "blocking" in process-oriented design is perfectly acceptable, while using a word with basically negative connotations, is difficult. The bare need to do it often means that misunderstanding this concept is harmful. The first contender on a "blocking channel" has also correctly been said (by several people) to "wait" on the channel. A better correspondence between the negative meaning and the semantics is when "blocking" describes serious side effects on the temporal properties of other concurrent components. This is the correctly feared blocking. This fringe presentation will explore this further and invite discussion.
 Bibliography:   [BibTeX]
 Abstract: [PDF] 
 Presentation: [PDF] 
 
 Title: Not that Concurrent!
 Authors: Øyvind Teig
Autronica Fire and Security AS
 Abstract: Concurrency is, in the literature, often used as a noun with a range of strengths: there is more or less concurrency; it is more or less limited; it may even be seen described as complete. In trying to discuss multi-threaded programming with programmers who state that they program single-threaded, it is important to communicate that they may program less concurrently, but probably not as non-concurrently as they believe. What are the factors that increase concurrency and which factors are orthogonal to the degree of concurrency? Does a Go language goroutine increase it and is a C++ object orthogonal? Will the CSP paradigm generally enable increased concurrency? Is the CSP paradigm of communication-with-synchronisation itself orthogonal to the degree of concurrency? It is also important to understand the term parallel slackness: does it introduce or enable more concurrency? And what about atomicity? This fringe presentation aims to raise more questions that it is able to answer. However, some lines of reasoning are suggested. Finally: is it at all meaningful to raise the awareness of concurrent as an adjective?
 Bibliography:   [BibTeX]
 Abstract: [PDF] 
 Presentation: [PDF] 
 
 Title: T42 Transputer Design in FPGA (Year One: Design Status Report)
 Authors: Uwe Mielke
Infineon Technologies
 Abstract: This fringe session will present the current status of our still ongoing IMS-T425 compatible Transputer design in FPGA. Data path and control path are in a stable working state. Fetch unit and a basic system control unit are almost functional. Small instruction sequences can be executed from 8 Kbyte memory already. Some design details around the scheduler micro-code will be discussed.
 Bibliography:   [BibTeX]
 Abstract: [PDF] 
 Presentation: [PDF] 
 
 Title: OLL Compiler Project (Best Fringe)
 Authors: Barry M. Cook

 Abstract: OLL is the latest iteration of a series of compiler projects I've written over the past 20+ years. During that time I've targeted C, VHDL (hardware), ARM and Java's JVM. Each has concentrated on a different aspect of the challenge.
I've also designed, by hand, a lot of high-performance test equipment hardware/ firmware and control/interface software; also a little bit of a flight system for a satellite. All have required care to (try to) ensure reliable operation, eliminating run-time failures. CSP has been a critical framework that has worked very well.
I am more convinced than ever that I, and maybe others, could use a design language that equally well describes algorithms regardless of whether they are to be run as hardware or software — not Hardware/Software Co-Design but Hardware/Software Same-Design.
Support for eliminating run-time errors is essential but it is theoretically impossible to adequately analyse arbitrary programs - and relatively easy to create pathological cases that would break the compiler. The interesting question, however, is whether it is possible to handle a large-enough set of real (not contrived) programs to be useful. This project aims to explore the bounds of practicality.
I will give a brief overview of the aims of this ongoing project and report its current status.
 Bibliography:   [BibTeX]
 Abstract: [PDF] 
 Presentation: [PDF] 
 
 Title: C++11 CSP
 Authors: Kevin Chalmers
School of Computing, Edinburgh Napier University
 Abstract: This fringe presentation focuses on development undertaken on a library to support CSP ideas within the new C++11 standard. The library has been developed to undertake development of a MPI layer for CPA Networking, with C and C++ being simpler languages to build up such a framework. This fringe presentation will look at how we can write applications
 Bibliography:   [BibTeX]
 Abstract: [PDF] 
 Presentation: [PDF] 
 
 Title: Protected Mode RTOS: What does it Mean?
 Authors: Bernhard H. C. Sputh
Altreonic NV
 Abstract: Now that we can formally verify software models, why do we still need protection? Protection from programming errors and protection from hardware errors. The key reason is that formal models are abstractions and programmers are humans with an illogical brain using illogical and error-prone dynamic programming languages. In addition, software runs on a shared resource, called a processor and that processor exists in the real physical world, whereby external influences like cosmic rays can change its state. Hence, protection has to be seen in the context of increasing the trustworthiness (as defined by the ARRL criterion) of the system. The key is to do it in such a way that we don’t jeopardise the properties we expect from a system in absence of the errors mentioned above. This was the rationale for developing VirtuosoNext, offering fine-grain and space partitioning with some help of the hardware.
 Bibliography:   [BibTeX]
 Abstract: [PDF] 
 Presentation: [PDF] 
 
 Title: Managing Hard Real Times (28 Years Later)
 Authors: Peter H. Welch
School of Computing, University of Kent
 Abstract: In some quarters, received wisdom about hard real-time systems (where lateness in response means system failure) is that a currently running process must be pre-empted by a higher priority process that becomes runnable (e.g. by the assertion of a processor event pin or timeout) — otherwise worst-case response times cannot be guaranteed. Further, if a higher priority process needs to synchronise with one of lower priority, the latter must automatically inherit the priority of the former. If this does not happen, the opposite happens and the former effectively inherits the lower priority of the latter as it waits for it to be scheduled (priority inversion) — again, worst-case response times fail.
The CCSP multicore scheduler for occam-pi (part of the KRoC package) is, possibly, the fastest and most scalable (with respect to processor cores) such beast on the planet. However, its scheduling is cooperative (not pre-emptive) and it does not implement priority inheritance (and cannot do so, given the nature of CSP synchronisation, where processes do not know the identities of the other processes involved). Therefore, despite its performance, received wisdom would seem to rule it out for hard real-time applications.
This talk reviews a paper, [1], from the OUG-7 proceedings (1987) that discusses these ideas with respect to Transputers. One minor improvement to the reported techniques will be described. Otherwise, no change is needed for modern multicore architectures. Conclusions are that (a) pre-emptive scheduling is not required, (b) priority inheritance is a design error (dealt with by correct design, not the run-time system) and (c) the occam-pi/CCSP scheduler can be made to work even more efficiently for hard real-time systems than it presently does for soft real-time (e.g. complex system modelling).

[1]   Peter H. Welch. Managing Hard Real-Time Demands on Transputers.
        In: Traian Muntean, ed., Proceedings of OUG-7 Conference and International Workshop
        on Parallel Programming of Transputer Based Machines.

        LGI-IMAG, Grenoble, France: IOS Press, Netherlands. 1987.
 Bibliography:   [BibTeX]
 Abstract: [PDF] 
 Presentation: [Power Point]  [PDF] 
 
 
 Workshops
 
 Title: Dealing with (Real)-Time in Real-World Hybrid Systems
 Authors: Pieter Van schaik, Eric Verhulst
Altreonic NV
 Abstract:
Background
One of the issues that has been bothering embedded systems engineers is how to deal with time. Some approaches have attempted to make time part of the modelling language, other approaches turn it in a partial order of events, while most programmers ignore it completely equating QoS with real-time (most of the time but not guaranteed). While in the discrete domain, time is considered to be progressing as a number of clock cycles, in the continuous domain time has an infinitesimal granularity. If we now want to proof the correctness of a hybrid system, people traditionally use time dependent partial ordinary differential equations in the continuous domain and model checkers or provers for the discrete domain.

Action How can we combine the two? Following the Keynote theme, we remember to separate the concerns. Hence we need time-independent models that, when executed on a given platform or in a given system context, result in specific time properties (like meeting a deadline or stability). In the discrete domain, this can be achieved by using priority as a scheduling parameter. In the continuous domain, engineers routinely transform the model into the frequency domain to prove desired properties using Nyquist or Laplace. The workshop will look for answers on how such hybrid models can be formally verified (as opposed to simulation and testing only).
 Bibliography:   [BibTeX]
 Abstract: [PDF] 
 Presentation: [PDF] 
 
 Title: The Role of Concurrency in the Modern HPC Center
 Authors: Brian Vinter
Niels Bohr Institute, University of Copenhagen
 Abstract:
Background
The modern HPC center is a complex entity. A center easily hosts tens of thousands of processors in the smallest cases, and several million processors for the largest facilities. Before 2020 this number is expected to pass 100 million processors in a single computer alone. Thus massive parallel processing is indeed everyday business today, and the major challenges of running such facilities are well established, and solutions exist for most of these challenges.

In the following we will make a strong distinction between parallelism, i.e. Single Program Multiple Data or Single Instruction Multiple Data, and concurrency, i.e. multiple processes, both identical and different, that may run in parallel or interleaved, depending on timing and available hardware.

While concurrency mechanisms were commonly used to express applications for parallel computers two or three decades ago, this use of concurrency has completely died out. There are several explanations for this, but the most important is that the cost of a HPC installation today is so high that users must document that they use the facility efficiently. All applications must stay above a specified threshold, typically 70%, CPU utilization, and even with SPMD type programming, asynchrony amongst processors is a common challenge when trying to stay above the threshold.
This does not mean that concurrency is without its use in the modern HPC center. While the actual computing-­‐nodes do not use concurrency, the underlying fabric, that allows the compute-­‐nodes to operate, has a large potential for concurrency. In many cases these elements could benefit from a formal approach to concurrency control.

Action
In this workshop we will present the challenges in the HPC center that indeed does work on concurrently; storage-­‐systems, schedulers, backup-­‐systems, archiving, and network-­‐bulk transfers to name a few. The interesting challenge is that while all these elements require concurrency control to operate correctly and efficiently, they are also highly interdependent, i.e. the concurrency aspects must cover the full set of infrastructure components for optimal efficiency. We will seek to describe scenarios for real HPC centers and sketch solutions that are build on structured concurrency approach.
 Bibliography:   [BibTeX]
 Abstract: [PDF] 
 Presentation: [PDF] 
 
 Title: Message-Passing Concurrency Shootout
 Authors: Kevin Chalmers
School of Computing, Edinburgh Napier University
 Abstract:
Background
In the last few years there has been a number of new programming languages which incorporate message passing concurrency. Examples such as Google's Go and Mozilla's Rust have shown an increased industry and academic interest in the ideas of message passing concurrency as a first order concern. These languages have joined existing ones such as occam-pi, Erlang, and Ada with strong communication based concurrency. It is therefore argued that the concurrent systems programmer has a number of options for exploiting message based concurrency.

The Communicating Process Architectures (CPA) community and others have for a number of years developed libraries to support message passing concurrency within existing programming languages and runtimes. This support is normally built upon the thread support libraries of the host language. JCSP and PyCSP are commonly discussed, but support for CPA ideas has also been implemented in Haskell, C++, Lua, and other languages.

The languages and libraries supporting message passing concurrency are normally inspired by one of more process algebras. Hoare's Communicating Sequential Processes (CSP) and Milner's pi-calculus are the two main inspirations for message passing work. It is questionable however how well these process algebras are supported in the languages and libraries they inspire.

Action
The aim of this workshop is the development of a message passing language and library based shootout in a similar manner to that seen at The Computer Language Benchmarks Game, although looking at more than just performance. The metrics produced will be augmented by a discussion on support for the principals of CSP, CCS, and the pi-calculus. The work undertaken, and further development, will lead to the production of a journal publication focusing on CPA languages and libraries, providing a state-of-the-art review. The paper produced will summarise the results and provide further dissemination of CPA ideas to a wider audience. The test applications built will be made available on a repository and the results published via arxiv for accessibility outside the paper. The workshop is looking for people interested in producing the test applications in the various languages and libraries.

There are two questions that the work undertaken will attempt to answer:
  - How well supported are the primitives and ideas of CSP, CCS, and the pi-calculus in the range of languages and libraries supporting message passing concurrency?
  - What are the metrics of the languages and libraries supporting message passing concurrency?

For the first question a set of criteria is required to define what we mean by process algebra support. This requires definition of what a process algebra provides. The key ideas are message passing passing and event selection, with the ability to spawn off processes (execute in parallel). As a starting point, the current list of properties of interest are:
  - Message passing support (this is the minimum criteria)
  - Type of message passing support - synchronous and/or asynchronous
  - First Order Channels (not all languages provide a channel construct)
  - Higher Order Channels (channels that can send channels)
  - First order processes (also a minimum criteria)
  - Higher order processes (channels can send processes)
  - Parallel execution statement
  - Process ownership (e.g. a parallel cannot complete until all its child processes have)
  - Selection on incoming messages
  - Other selection types? (e.g. skip, timeout)
  - Selection on outgoing messages
  - Multiway synchronisation

For the second question, a look at the metrics that help define the performance and usefulness of the languages is required. CPA typically uses the CommsTime and StressedAlt benchmarks. These two benchmarks allow analysis of the two key properties in a communication based language or library - communication time and selection time. The memory usage of a process is also required, or an equivalent measure. The aim would be to understand the number of processes the language or library can support. Knowing whether the language or library exploited parallel hardware (e.g. multicore) is also important. Typical software metrics such as Lines of Code (LoC), time, speedup, efficient, memory usage, and CPU usage for the following applications is a possible starting point:
  - CommsTime - channel communication time
  - StressedAlt - selection time and maximum process support
  - Dining Philosophers - LoC
  - Vector Addition - simple data parallel problem
  - Matrix Multiplication - complex data parallel problem
  - Mandelbrot - data parallel with large data output
  - Monte Carlo Pi - data parallel with low data output and reduction
  - N-body simulation - shared data

The current set of languages I have come across that reportedly support message passing concurrency are:

   Ada, Ateji PX, Clojure, D, Elixir, Erlang, Go, Guppy, Hume, Kilim, Hume, Limbo, Nim, occam-pi, Oz, ProcessJ, Perl, Rust, Unicon

As for the set of libraries:

   JCSP, CTJ, C++CSP, PyCSP, C Haskell P, C Scala P, CCSP, LuaCSP

A discussion on whether to include all actor based languages and libraries is also required, as if these are included the number of libraries and languages will increase.
 Bibliography:   [BibTeX]
 Abstract: [PDF] 
 Presentation: [PDF]