To characterize and compare the OpenCL performance of existing and future devices uCLbench provides the following types of micro-benchmarks:

  • Arithmetic Throughput: Parallel and sequential throughput for all basic mathematical operations, and many built-in functions defined by the OpenCL standard. When available, native implementations (with reduced accuracy) are also measured.
  • Memory Subsystem: Host to device, device to device and device to host copying bandwidth. Streaming bandwidth for on-device address spaces. Latency for memory accesses to global, local and constant address spaces. Also determines existence and size of caches.
  • Branching Penalty: Impact of divergent dynamic branching on device performance, particularly pronounced on GPUs.
  • Runtime Overheads: Kernel compilation time and queuing delays incurred when invoking kernels of various code volume.



(Edit this webpage on Github)