Abstract
Ravi Iyer, Jack Perdue, Nancy M. Amato, Lawrence Rauchwerger, Laxmi Bhuyan, "An Experimental Evaluation of the HP V-Class and SGI Origin 2000 Multiprocessors using Microbenchmarks and Scientific Applications," International Journal of Parallel Programming, 33(4):307-350, 2005.
Journal(ps, pdf, abstract)
As processor technology continues to advance at a rapid pace, the principal
performance bottleneck of shared memory systems has become the memory access
latency. In order to understand the effects of cache and memory hierarchy on
system latencies, performance analysts perform benchmark analysis on existing
multiprocessors. In this study, we present a detailed
comparison of two architectures, the HP V-Class and the SGI Origin
2000. Our goal is to compare and contrast design techniques used in these
multiprocessors. We present the impact of processor design, cache/memory
hierarchies and coherence protocol optimizations on the memory system
performance of these multiprocessors. We also study the effect of parallelism
overheads such as process creation and synchronization on the user-level
performance of these multiprocessors. Our experimental methodology uses
microbenchmarks as well as scientific applications to characterize the
user-level performance. Our microbenchmark results show the impact of L1/L2
cache size and TLB size on uniprocessor load/store latencies, the effect of
coherence protocol design/optimizations and data sharing patterns on
multiprocessor memory access latencies and finally the overhead of
parallelism. Our application-based evaluation shows the impact of problem size,
dominant sharing patterns and number of processors used on speedup and
raw execution time. Finally, we use hardware counter measurements to study
the correlation of system-level performance metrics and the application's
execution time performance.