In this paper we report on a successful extension to the CSP refinement checker FDR3 that permits it to run on clusters of machines. We demonstrate that it is able to scale linearly up to clusters of 64 16-core machines (i.e. 1024 cores), achieving an overall speed-up of over 1000 relative to the sequential case. Further, this speed-up was observed both on dedicated supercomputer facilities, but more notably, also on a commodity cloud computing provider. This enhancement has enabled us to verify a system of 10^12 states, which we believe to be the largest refinement check ever attempted by several orders of magnitude.