Pub/Scalable Decode Caching in Multi-Core Instruction Set Simulators (RAPIDO)
-
Martin Kristien, Nigel Topham, Björn Franke, Igor Böhm, Harry Wagstaff and Tom Spink
-
17th Workshop on Rapid Simulation and Performance Evaluation for Design Optimization: Methods and Tools (RAPIDO'26), January 2026.
Abstract
Instruction set simulators (ISSs) play an important role in embedded software development. Integrated in virtual platforms, they en- able coding, testing, and performance evaluation without the need for physical platforms. However, simulations incur a performance penalty over native execution, resulting in slow simulation speeds for complex applications. We realize that in interpreter-based ISS – developers’ first choice when detailed processor pipeline and cache simulation are required – the simulator’s own instruction fetch and decode stages substantially contribute to overall runtime. We propose a novel simulator instruction fetch and decode cache architecture: (a) We use instruction encodings for cache indexing instead of the program counter, (b) we introduce separate instruction fetch and decode caches instead of a single, unified cache, and (c) we introduce a tiered cache architecture, comprising private and global caches for multicore guest architectures. We have implemented our novel caching schemes in the commercial Synopsys ARC© nSIM ISS that provides an instruction accurate processor model for the Synopsys ARC processor families. We evaluated our new simulator cache architecture using complex real-world workloads and guest configurations with up to 128 simulated guest cores, where we demonstrate average speed-ups of 1.31× over a state-of-the-art baseline scheme, while requiring only 27% of the original cache memory.