Published: Dec 19, 2023 by Liam Pattinson
Project info
The FAIR Data Pipeline (FDP) is a collection of software utilities for tracking the provenance of FAIR (Findable, Accessible, Interoperable, Reusable) data. It was originally developed to aid in the creation of reproducible epidemiological modelling workflows, a need spurred by the outbreak of COVID-19. FDP allows users to automate a data processing pipeline that traces all inputs, outputs and associated metadata in a FAIR manner. APIs are available in Python, R, Julia, C++, and Java.
Following a talk at RSECon 2022, we became interested in the potential application of FDP to computational plasma physics pipelines, and offered to expand its capabilities to include support for Fortran – a language commonly used in plasma physics simulations and other high-performance computing applications. Rather than reimplementing the whole API in our target language, as was the strategy employed for the existing supported languages, we instead opted to reuse the C++ API and instead create Fortran bindings.
The first step was to upgrade the CMake build system used by the C++ API so that it would export targets to external projects. This proved to be more challenging than expected, as multiple dependencies of FDP lacked adequate CMake support or exported targets in a way that defied best practices.
Following this, a simple C API was implemented that deferred all computation to the C++ objects via opaque pointer types. The C API was developed to act as a bridge for the Fortran API, and also to serve as a jumping-off point for anybody wanting to expand FDP’s language support even further – perhaps to something like Rust or Haskell.
Finally, Fortran bindings to the C API were created using bind(c)
interface
routines and iso_c_binding
kinds. A first draft of the Fortran API was
developed at a satellite hackathon for NERC’s Digital Gathering 2023 in
Cambridge, and further refinements were made afterwards. One of the biggest
challenges when writing this was to ensure compatibility between strings in
Fortran and C, as Fortran strings have a known size while C strings consist
simply of a pointer to the first character and a terminating null character
'\0'
at some unknown offset in memory. While converting between the two
forms is trivial, it was found to be difficult to ensure correct lifetimes and
avoid unnecessary copies. Some unexpected behaviours of bind(c)
routines were
also encountered, such as the requirement to attach the value
attribute to
some dummy arguments in order to avoid passing junk data into the C functions.