A Service Oriented Architecture

 

The Adaptable IO System (ADIOS) provides a simple, flexible way for scientists to describe the data in their code that may need to be written, read, or processed outside of the running simulation. By providing an external to the code XML file describing the various elements, their types, and how you wish to process them this run, the routines in the host code (either Fortran or C) can transparently change how they process the data. The goal of this system is to give a level of adaptability such that the scientist can change how the IO in their code works simply by changing a single entry in the XML file and restarting the code. Along with this, a user can also just change which transports method is used for a data type such as a restart, analysis, or diagnostic write. ADIOS has also pioneered the concept of data staging to drive the adoption of in transit techniques for data processing, significantly reducing the I/O bottleneck, especially as we transition to Exascale.

 

Staging allows application data to be processed using in-transit plugins (shown as P) before going to storage. This technique enables extreme scale data management through a service oriented approach.

The in code IO routines were modeled after standard Fortran POSIX IO routines for simplicity and clarity. The additional complexity including organization into hierarchies, data type specifications, process grouping, and how to process the data is stored in an XML file that is read once on code startup. Based on the settings in this XML file, the data will be processed differently. For example, you could select MPI individual IO, MPI collective IO, POSIX IO, an asynchronous IO technique, visualization engine, or even NULL for no output and cause the code to process the data differently without having to either change the source code or even recompile. For the transport method implementer, the system provides a series of standard function calls to encode/decode data in the standardized .bp file format as well as “interactive” processing of the data by providing direct downcalls into the implementation for each data item written and also callbacks when processing a data stream once a data item has been identified along with its dimensions and a second callback once the data has been read giving the implementation the option to allocate memory and process the data as close to the data source as is reasonable.

 

For more information, please contact Scott Klasky of Oak Ridge National Laboratory at [email protected].

ExaCT is funded by the DoE office of Advanced Scientific Computing Research (ASCR). Dr. Karen Pao is the program manager and Dr. William Harrod is the director of the ASCR Research Division. U.S. Department of Energy: Office of Science Stanford University The University of Utah Georgia Institute of Technology Lawrence Berkeley National Laboratory Lawrence Livermore National Laboratory Oak Ridge National Laboratory The University of Texas at Austin Rutgers: The State University of New Jersey National Renewable Energy Laboratory (NREL) Los Alamos National Laboratory Sandia National Laboratories