TI has a fully conformant OpenCL 1.1 implementation.

The below table is taken from http://downloads.ti.com/mctools/esd/docs/opencl/intro.html and shows which DSPs have OpenCL-support.

SoC | System | Khronos Conformance | Installation Instructions |
---|---|---|---|

AM572 | AM572 EVM | OpenCL v1.1 Conformant | Processor SDK for AM57x |

DRA75x | DRA75x EVM | OpenCL v1.1 Conformant | Processor SDK for DRA7x (Enabling OpenCL on DRA75x) |

AM571 | AM572 EVM | OpenCL v1.1 Conformant | Processor SDK for AM57x |

66AK2H | 66AK2H EVM | OpenCL v1.1 Conformant | Processor SDK for K2H |

66AK2L | 66AK2L EVM | Not submitted for conformance | Processor SDK for K2L |

66AK2E | 66AK2E EVM | Not submitted for conformance | Processor SDK for K2E |

66AK2G | 66AK2G EVM | Not submitted for conformance | Processor SDK for K2G |

# Theoretical Performance of the C66x

- Fixed point 16×16 MACs per cycle: 32
- Fixed point 32×32 MACs per cycle: 8
- Floating point single precision MACs per cycle: 8
- Arithmetic floating point operations per cycle: 16 2-way SIMD on .L and .S units (e.g. 8 SP operations for A and B) and 4 SP multiply on one .M unit (e.g 8 SP operations for A and B)
- Arithmetic floating point operations per cycle: 164 2-way SIMD on .L and .S units (e.g. 8 SP operations for A and B) and 4 SP multiply on one .M unit (e.g 8 SP operations for A and B)
- Load/store width 2 x 64-bit 2 x 64-bit Vector size (SIMD capability): 128-bit (4 x 32-bit, 4 x 16-bit, 4x-8bits)

## GFLOPs

2 FLOPs – 2-way SIMD on .L1 (A side) such as DADDSP or DSUBSP

2 FLOPs – 2-way SIMD on .L2 (B side) such as DADDSP or DSUBSP

2 FLOPs – 2-way SIMD on .S1 (A side) such as DADDSP or DSUBSP

2 FLOPs – 2-way SIMD on .S2 (B side) such as DADDSP or DSUBSP

4 FLOPs – 4-way SIMD on .M1 (A side) such as QMPYSP (or CMPYSP, maybe not 4-way SIMD)

# 4 FLOPs – 4-way SIMD on .M2 (B side) such as QMPYSP (or CMPYSP, maybe not 4-way SIMD)

16 FLOPs total per cycle per C66x CorePac (source)

# Boards

A good starter board is the BeagleBoard X-15, and has OpenCL drivers. It has 2x C66X DSPs and 2x 1.5-GHz ARM Cortex-A15.