Architectural Simulation to Accelerate CoDesign
The architectural simulation of computing systems plays an essential role in providing performance and power predictive capability because it enables the rapid quantitative exploration of HPC system design tradeoffs. Three tools which work synergistically together to provide a complete codesign simulation tool set are the ROSE compiler, the ACE emulation platform, and the SST simulator.
The codesign process uses a hierarchy of simplified surrogate code representations to provide hardware designers with actionable detailed information while still ensuring that the context for any insight remains faithful to the full application’s requirements.
The SST simulator consists of two tools: SST/macro and SST/micro. SST/macro is a coarse-grained simulator which lets designers study large-scale systems in a way that captures the complex interactions among hardware components. SST/macro can use a skeleton application that domain experts provide or ROSE-based analysis tools generate which scales and behaves exactly like the original application code, enabling investigating the communication characteristics and bottlenecks of applications that may arise only at the scales which are predicted for next-generation machines. Designers can also replay traces of a previously run MPI application through the simulator, allowing its execution time to be estimated on new hardware or validating the simulator against existing hardware. SST/micro is a general simulation framework which can be used to compose complete simulations of HPC compute nodes by combining cycle-accurate models of processors, memory, disks, and network routers to explore detailed node-level architectural design tradeoffs in both hardware and software.
The ACE emulation platform extends tools like the Tensilica Xtensa Processor Generator (XPG) tool chain to work as our rapid prototyping platform for node-level HPC emulation. The XPG’s customizable instruction set, communication interfaces, and memory hierarchy make it ideal for exploring novel chip multi- processor designs, and its ability to extend the instruction set to add application-specific functionality produces a streamlined processor with scratchpad memories, advanced communication features, and custom operational codes that facilitate advanced communication and synchronization. Also, XPG’s ability to automatically generate C/C++ compilers, debuggers, and functional models enables fast software porting and rapid testing with a new architecture.For more information, please contact John Shalf of Lawrence Berkeley National Laboratory at [email protected].
