To characterize and compare the OpenCL performance of existing and future devices uCLbench provides the following types of micro-benchmarks:
- Arithmetic Throughput: Parallel and sequential throughput for all basic mathematical operations, and many built-in functions defined by the OpenCL standard. When available, native implementations (with reduced accuracy) are also measured.
- Memory Subsystem: Host to device, device to device and device to host copying bandwidth. Streaming bandwidth for on-device address spaces. Latency for memory accesses to global, local and constant address spaces. Also determines existence and size of caches.
- Branching Penalty: Impact of divergent dynamic branching on device performance, particularly pronounced on GPUs.
- Runtime Overheads: Kernel compilation time and queuing delays incurred when invoking kernels of various code volume.
- Webpage: http://www.dps.uibk.ac.at/insieme/uclbench.html
Project page: