Case Study: Performance Evaluation of GPUs
Performance Evaluation of GPUs Using the RapidMind Multi-core Development Platform
Download the full document as a PDF
Michael McCool, RapidMind Inc. Kevin Wadleigh, Brent Henderson, and Hsin-Ying Lin, Hewlett-Packard Company
Abstract
The high-performance processors in video accelerators, GPUs, can be used as numerical co-processors in a variety of applications. The RapidMind Multi-core Development Platform is a software development system that allows the developer to use standard C++ programming to easily create high-performance and massively parallel applications that run on the GPU. While using the RapidMind Multi-core Development Platform, we can compare the performance of BLAS dense linear algebra operations, the FFT, and European option pricing on the GPU against highly tuned CPU implementations on the fastest available CPUs.
Summary
Using the RapidMind Multi-core Development Platform, GPU results for three benchmarks (the FFT, the BLAS routine SGEMM and Black-Scholes) were compared to the best available implementations for the fastest available CPUs. This chart summarizes the results on the GPU as a speedup over that of the corresponding CPU implementation: For the BLAS SGEMM benchmark, the GPU implementation was up to 2.4 times faster than the best available CPU implementation. The FFT benchmark on the GPU was up to 2.7 times faster than the best available CPU benchmark. Finally, the Black-Scholes benchmark on the GPU was 32.2 times faster than a highly tuned and vectorized CPU implementation. In all cases, the GPU implementation was created using the RapidMind platform.
Note that the performance of the RapidMind implementations of these benchmarks was equivalent to the best available GPU implementations done through OpenGL. More direct access to the hardware would provide an opportunity for even higher performance. For example, ATI has announced support for a lower level interface specific to their hardware. This could provide an opportunity for additional performance improvements in the future via the RapidMind interface.
Background
Programming general purpose applications for the GPU can be difficult for many reasons. Since graphics APIs were not designed for general purpose programming, they can act as a significant barrier to the use of the GPU for such applications. Generally, a large number of unnecessary graphics concepts need to be learned as well as their mapping onto the hardware architecture. In addition, the cache, memory and execution architectures of the GPU are radically different from that of CPUs and so different optimizations are necessary to achieve optimum performance. Parallel programming itself is also not intuitive for most programmers. It requires techniques and algorithms that are not required for traditional serial programming. Parallel programming introduces numerous new development and debugging challenges.
RapidMind Inc. has created a software development platform that allows developers to use standard C++ programming to easily create high-performance applications that run on the GPU. The RapidMind platform interface uses a simple, portable, data-parallel model of computation. This model of computation is easy to learn and use but maps efficiently onto GPUs. The RapidMind platform is embedded in the application and transparently manages massively parallel computations. In order to evaluate the performance and applicability of the RapidMind programming model on the GPU, a number of benchmark applications were implemented.
Benchmark Results
In order to evaluate the performance and applicability of the RapidMind programming model on the GPU, a number of benchmark applications were implemented. Full benchmark and hardware details can be found in the complete paper.
