This document is a set of guidelines for developers who know OpenCL C and plan to port their kernels to OpenCL C++, and therefore they need to know the main differences between those two kernel languages. The focus is not on highlighting all the differences, but rather on exposing and explaining those that are the most important, and those that may cause hard-to-detect bugs when porting from OpenCL C to OpenCL C++.
This text was initially published at https://github.com/OpenCL/OpenCLCXXPortingGuidelines/blob/master/OpenCLCToOpenCLCppPortingGuidelines.md
Comments, suggestions for improvements, and contributions are most welcome.
Differences
OpenCL C++ Programming Language
OpenCL C Vector Literals
Vector literals, expression used for creating vectors from a list of scalars, vectors or a mixture thereof, known from OpenCL C are not part of the OpenCL C++ kernel language.
In OpenCL C++ vector types can be initialized like any other class – using constructors. For example, the following are available for float4
:
float4(float, float, float, float)
float4(float2, float, float)
float4(float, float2, float)
float4(float, float, float2)
float4(float2, float2)
float4(float3, float)
float4(float, float3)
float4(float)
Note
In OpenCL C++ vector literals are NOT evaluated as user might expect, unfortunately, they never cause compilation errors.
Vector literals in OpenCL C++ are not evaluated as user might expect. In OpenCL C++ expression (int4)(1, 2, 3, 4)
is evaluated to (int4)4
. This happens because of how comma operator works: every value enclosed in parentheses except for the last is discarded, and then scalar-to-vector conversion is used for 4
.
In certain situations vector literals in OpenCL C++ code can cause warnings during compilation, but they do not cause compilation errors.
Solution
Do not use vector literals. Replace them with vector constructors.
Examples, bad
int4 i = (int4)(1, 2, 3, 4);
// This expression will be evaluated to (int4)4,
// and i will be (4, 4, 4, 4).
// In OpenCL C++ compiler (clang) provided by Khronos
// it causes 'expression result unused' warnings.
int4 i = (int4)(cl::max(0, 1), cl::max(0, 2), cl::max(0, 3), cl::max(0, 4))
// This expression will be evaluated to (int4)4,
// and i will be (4, 4, 4, 4).
// In OpenCL C++ compiler (clang) provided by Khronos
// it DOES NOT cause any warnings.
Examples, correct
uint4 u = uint4(1); // u will be (1, 1, 1, 1)
int4 i = int4{-1, -2, 3, 4} // i will be (-1, -2, 3, 4)
// in each case f will be (1.0f, 2.0f, 3.0f, 4.0f)
float4 f = float4(1.0f, 2.0f, 3.0f, 4.0f);
float4 f = float4(float2(1.0f, 2.0f), float2(3.0f, 4.0f));
float4 f = float4(1.0f, float2(2.0f, 3.0f), 4.0f);
boolN
Type
OpenCL C++ introduces new built-in vector type: boolN
(where N
is 2, 3, 4, 8, or 16). This addition change resolves problem with using the relational (<
, >
, <=
, >=
, ==
, !=
) and the logical operators (!
, &&
, ||
) with built-in vector types.
In OpenCL C for built-in vector types the relational and the logical operators return a vector signed integer type of the same size as the source operands. In OpenCL C++ it was simpliefied and those operators return boolN
for vector types and bool
for scalars.
The OpenCL C 2.0 Specification on the results of the relational operators:
The result is a scalar signed integer of type
int
if the source operands are scalar and a vector signed integer type of the same size as the source operands if the source operands are vector types. Vector source operands of typecharn
anducharn
return acharn
result; vector source operands of typeshortn
andushortn
return ashortn
result; vector source operands of typeintn
,uintn
andfloatn
return anintn
result; vector source operands of typelongn
,ulongn
anddoublen
return alongn
result.
For scalar types, the relational operators shall return
0
if the specified relation isfalse
and1
if the specified relation istrue
. For vector types, the relational operators shall return0
if the specified relation isfalse
and–1
(i.e. all bits set) if the specified relation istrue
. The relational operators always return0
if either argument is not a number (NaN
).
Including boolN
vector types in OpenCL C++ also caused changes in signatures and/or behavior of built-in relational functions like: all()
, any()
and select()
. See Relational Functions section for more details.
Examples
bool2 b = bool2(1 == 0); // { false, false }
// In OpenCL C: int b = 2 > 1, and b is 1
bool b = 2 > 1 // true
// In OpenCL C: int b = 2 > 1, and b is 0
bool b = 2 == 1 // false
// OpenCL C-related note:
// -1 for signed integer type means that all bits are set
// In OpenCL C: int2 b = (uint2)(0, 1) > (uint2)(0, 0),
// and b is { 0, -1 }
bool2 b = uint2(0, 1) > uint2(0, 0); // { false, true }
// In OpenCL C: long2 b = (ulong2)(0, 0) > (ulong2)(0, 0),
// and b is { 0, 0 }
bool2 b = ulong2(0, 0) > ulong2(0, 0); // { false, false }
// In OpenCL C: long2 b = (long2)(1, 1) > (long2)(0, 0),
// and b is { -1, -1 }
bool2 b = long2(1, 1) > long2(0, 0); // { true, true }
#include <opencl_relational>
// In OpenCL C: int2 b = isnan((float2)(0.0f)),
// and b is { 0, 0 }
bool2 b = isnan(float2(0.0f)) // { false, false }
// In OpenCL C: long2 b = isfinite((double2)(0.0))
// and b is { -1, -1 }
bool2 b = isfinite(double2(0.0)) // { true, true }
OpenCL C++ Specification References
End Of Explicit Named Address Spaces
OpenCL C++ 1.0 Specification in Address Spaces section says:
The OpenCL C++ kernel language doesn’t introduce any explicit named address spaces, but they are implemented as part of the standard library described in Address Spaces Library section. There are 4 types of memory supported by all OpenCL devices: global, local, private and constant. The developers should be aware of them and know their limitations.
That means that instead of using keywords global
, constant
, local
, and private
, in order to explicitly specify address space for variable or pointer you have to use address space pointers and address space storage classes.
Note
Go to Address Spaces Library section of The Porting Guidelines to read more about address space pointers and address space storage classes.
It is still possible for OpenCL C++ compiler to deduce an address space based on the scope where an object is declared:
- If a variable is declared in program scope, with
static
orextern
specifier and the standard library storage class (see Explicit address space storage classes section) is not used, the variable is allocated in the global memory of a device. - If a variable is declared in function scope, without static specifier and the standard library storage class (see Explicit address space storage classes section) is not used, the variable is allocated in the private memory of a device.
OpenCL C++ Specification References
Examples, bad (OpenCL C-style)
// Compilation error, "global" address space is not defined
// in OpenCL C++ kernel language
kernel void example_kernel(global int * input)
{
// Compilation error, "local" address space is not defined
// in OpenCL C++ kernel language
local int array[256];
// ...
}
// Compilation error, "constant" address space is not defined
// in OpenCL C++ kernel language
kernel void example_kernel(constant int * input)
{
// Compilation error, "private" address space is not defined
// in OpenCL C++ kernel language
private int x;
// ...
}
Examples, correct (OpenCL C++)
#include <opencl_memory>
#include <opencl_work_item>
kernel void example_kernel(cl::global_ptr<int[]> input)
{
cl::local<int[256]> array;
uint gid = cl::get_global_id(0);
array[gid] = input[gid];
// ...
}
kernel void example_kernel(cl::constant_ptr<int[]> input)
{
int x = 0;
// ...
}
int y; // Allocated in global memory
static int z; // Allocated in global memory
kernel void example_kernel(cl::constant_ptr<int[]> input)
{
int x = 0; // Allocated in private memory
static cl::global<int> w; // Allocated in global memory
// ...
}
Note
More examples on address spaces can be found in subsections 3.4.5. Restrictions and 3.4.6. Examples of section Address Spaces Library in OpenCL C++ specification.
Kernel Function Restrictions
Since OpenCL C++ kernel language is based on C++14 several restrictions were defined for kernel function to make it resemble kernel function known from OpenCL C:
- A kernel functions are by implicitly declared as extern “C”.
- A kernel function cannot be overloaded.
- A kernel function cannot be template function.
- A kernel function cannot be called by another kernel function.
- A kernel function cannot have parameters specified with default values.
- A kernel function must have the return type void.
- A kernel function cannot be called main.
Note
Compared to OpenCL C in OpenCL C++ you cannot call a kernel function from another kernel function.
OpenCL C++ Specification References
Examples, bad
// A kernel function cannot be template function.
template<class T>
kernel void example_kernel(cl::global_ptr<T[]> input, uint size)
{ /* ... */ }
// A kernel function cannot have parameters specified with default values.
kernel void foo(cl::global_ptr<uint[]> input, uint size = 10)
{ /* ... */ }
kernel void bar(cl::global_ptr<uint[]> input, uint size)
{
// A kernel function cannot be called by another kernel function.
foo(input, size);
}
// A kernel function cannot be overloaded.
kernel void bar(cl::global_ptr<float[]> input, uint size)
{ /* ... */ }
Examples, correct
template<class T>
void function_template(cl::global_ptr<T[]> input, uint size)
{ /* ... */ }
// Specialization for T = float
template<>
void function_template(cl::global_ptr<float[]> input, uint size)
{ /* ... */ }
kernel void kernel_uint(cl::global_ptr<uint[]> input, uint size)
{
function_template<uint>(input, size);
}
kernel void kernel_float(cl::global_ptr<float[]> input, uint size)
{
function_template<float>(input, size);
}
Kernel Parameter Restrictions
The OpenCL host compiler and the OpenCL C++ kernel language device compiler can have different requirements for i.e. type sizes, data packing and alignment, etc., therefore the kernel parameters must meet the following requirements:
- Types passed by pointer or reference must be standard layout types.
- Types passed by value must be POD types.
- Types cannot be declared with the built-in bool scalar type, vector type or a class that contain bool scalar or vector type fields.
- Types cannot be structures and classes with bit field members.
- Marker types must be passed by value (Marker Types section).
global
,constant
,local
storage classes can be passed only by reference or pointer. More details in Explicit address space storage classes section.- Pointers and references must point to one of the following address spaces: global, local or constant.
OpenCL C++ Specification References
General Restrictions
The following C++14 features are not supported by OpenCL C++:
- the
dynamic_cast
operator (ISO C++ Section 5.2.7), - type identification (ISO C++ Section 5.2.8),
- recursive function calls (ISO C++ Section 5.2.2, item 9) unless they are a compile-time constant expression,
- non-placement
new
anddelete
operators (ISO C++ Sections 5.3.4 and 5.3.5), goto
statement (ISO C++ Section 6.6),register
andthread_local
storage qualifiers (ISO C++ Section 7.1.1),virtual
function qualifier (ISO C++ Section 7.1.2),- function pointers (ISO C++ Sections 8.3.5 and 8.5.3) unless they are a compile-time constant expression,
- virtual functions and abstract classes (ISO C++ Sections 10.3 and 10.4),
- exception handling (ISO C++ Section 15),
- the C++ standard library (ISO C++ Sections 17 . . . 30),
asm
declaration (ISO C++ Section 7.4),- no implicit lambda to function pointer conversion (ISO C++ Section 5.1.2, item 6),
- variadic functions (ISO C99 Section 7.15, Variable arguments <stdarg.h>),
- and, like C++, OpenCL C++ does not support variable length arrays (ISO C99, Section 6.7.5).
To avoid potential confusion with the above, please note the following features are supported in OpenCL C++:
- All variadic templates (ISO C++ Section 14.5.3) including variadic function templates are supported.
OpenCL C++ Specification References
OpenCL C++ Standard Library
OpenCL C++ does not support the C++14 standard library, but instead implements its own standard library. It is a replacement for built-in functions provided in OpenCL C.
Note
OpenCL C++ classes and functions are NOT auto-included.
Namespace cl::
All class and functions provided in OpenCL C++ Standard Library are located in namespace cl::
.
OpenCL C++ Specification References
Solution
Adding a using-directive using namespace cl;
right after including all required headers can reduce work needed to port OpenCL C programs to OpenCL C++.
Examples
#include <opencl_memory>
#include <opencl_integer> // cl::abs(gentype x)
kernel void foo(cl::global_ptr<int[]> input /* note cl:: prefix */, uint size)
{
uint global_id = cl::get_global_id(0); // note cl:: prefix
if(global_id < size)
{
using namespace cl; // no need for cl:: prefix in this scope
input[global_id] = abs(input[global_id]);
}
}
#include <opencl_memory>
#include <opencl_integer> // cl::abs(gentype x)
using namespace cl; // No need for cl:: prefix after this using-directive
kernel void foo(global_ptr<int[]> input, uint size)
{
uint global_id = get_global_id(0);
if(global_id < size)
{
input[global_id] = abs(input[global_id]);
}
}
Conversions Library
OpenCL C convert_type<_sat><_roundingMode>()
and convert_typeN<_sat><_roundingMode>()
built-in functions were replaced in OpenCL C++ with convert_cast<>
function template. The behavior of the conversion may be modified by one or two optional modifiers that specify saturation for out-of-range inputs and rounding behavior.
Rounding Modes
namespace cl
{
enum class rounding_mode
{
rte, // Round to nearest even
rtz, // Round toward zero
rtp, // Round toward positive infinity
rtn // Round toward negative infinity
};
}
Note
If a rounding mode is not specified, conversions to integer type use the
rtz
(round toward zero) rounding mode and conversions to floating-point type uses therte
rounding mode.
OpenCL C++ Specification References
Examples
#include <opencl_convert>
using namespace cl; // No need for cl:: prefix after this using-directive
kernel void covert_foo_bar()
{
int4 i { -1, 0, 1, 2 };
float4 f { -1.5f, -0.5f, 0.5f, 1.5f};
// Convert ints to floats using the default rounding mode (rte).
// In OpenCL C: convert_float4_rtp(i)
float4 f1 = convert_cast<float4>(i);
// In OpenCL C: convert_float4_rtp(i)
float4 f2 = convert_cast<float4, rounding_mode::rtp>(i);
// In OpenCL C: convert_int4_sat(f)
int4 i1 = convert_cast<int4, saturate::on>(f);
// In OpenCL C: convert_int4_sat_rte(f)
int4 i1 = convert_cast<int4, rounding_mode::rte, saturate::on>(f);
}
Reinterpreting Data Library
OpenCL C as_type()
and as_typeN()
operators used for reinterpreting bits in a data type as another data type in OpenCL were replaced in OpenCL C++ with TargetType as_type(InputType const&)
function template.
Note
All data types described in Device built-in scalar data types and Device built-in vector data types tables (except
bool
andvoid
) may be also reinterpreted as another data type of the same size using theas_type()
function template for scalar and vector data types.
OpenCL C++ Specification References
Examples
#include <opencl_reinterpret>
using namespace cl; // No need for cl:: prefix after this using-directive
kernel void reinterpret_bar_foo()
{
float f = 1.0f;
uint u = as_type<uint>(f); // Legal. Contains: 0x3f800000
float4 f = float4(1.0f, 2.0f, 3.0f, 4.0f);
// Legal. Contains:
// int4(0x3f800000, 0x40000000, 0x40400000, 0x40800000)
int4 i = as_type<int4>(f);
int i;
// Legal. Result is implementation-defined.
short2 j = as_type<short2>(i);
float4 f;
// Error: result and operand have different sizes
double4 g = as_type<double4>(f);
float4 f;
// Legal.
// g.xyz will have same values as f.xyz.
// g.w is undefined
float3 g = as_type<float3>(f);
}
Address Spaces Library
As mentioned in End of explicit named address spaces, in OpenCL C++ explicit named address spaces known from OpenCL C were replaced by explicit address space storage and pointer classes.
Explicit address space storage classes:
cl::global<T> x
– allocated in global memory.- The global storage class can only be used to declare variables at program, function and class scope.
- The variables at function and class scope must be declared with
static
specifier.
cl::local<T> x
– allocated in local memory.- The local storage class can only be used to declare variables at program, kernel and class scope.
- The variables at class scope must be declared with
static
specifier.
cl::priv<T> x
– allocated in private memory.- The priv storage class cannot be used to declare variables in the program scope, with static specifier or extern specifier.
cl::constant<T> x
– allocated in global memory, read-only.- The constant storage class can only be used to declare variables at program, kernel and class scope.
- The variables at class scope must be declared with static specifier.
Explicit address space storage pointers classes:
cl::global_ptr<T>
cl::local_ptr<T>
cl::private_ptr<T>
cl::constant_ptr<T>
The explicit address space pointer classes are just like pointers: they can be converted to and from pointers with compatible address spaces, qualifiers and types. Assignment or casting between explicit pointer types of incompatible address spaces is illegal.
All named address spaces are incompatible with all other address spaces, but local, global and private pointers can be converted to standard C++ pointers.
Restrictions
The OpenCL C++ specification specification in subsections 3.4.5. Restrictions of section Address Spaces Library contains detailed list of restrictions with examples regarding explicit address space storage and pointer classes. It is very important to read and understand those restrictions.
OpenCL C++ Specification References
Examples
#include <opencl_array>
#include <opencl_memory>
#include <opencl_work_item>
int x; // Allocated in global address space
cl::global<int> y; // Allocated in global address space
cl::constant<int> z {0}; // Allocated in global address space, read-only,
// must be initialized
// Program scope array of 5 ints allocated in local address space
cl::local<cl::array<int, 5>> w = { 10 };
// Explicit address space class object passed by value
kernel void example_kernel(cl::global_ptr<int[]> input)
{
cl::local<int[256]> array;
static cl::global<int> a;
static cl::constant<int> b {0};
}
// Explicit address space storage object passed by reference
kernel void example_kernel(cl::global<cl::array<int, 5>>& input)
{ /* ... */ }
// Explicit address space storage object passed by pointer
kernel void example_kernel(cl::global<int> * input)
{ /* ... */ }
Note
More examples on address spaces can be found in subsections 3.4.5. Restrictions and 3.4.6. Examples of section Address Spaces Library in OpenCL C++ specification.
Marker Types
Like OpenCL C, OpenCL C++ includes special types – images, pipes. All those types are considered marker types. Being a marker type comes with the following set of restrictions:
- Marker types have the default constructor deleted.
- Marker types have all default copy and move assignment operators deleted.
- Marker types have address-of operator deleted.
- Marker types cannot be used in divergent control flow. It can result in undefined behavior.
- Size of marker types is undefined.
All marker types can be passed to functions only by a reference.
OpenCL C++ Specification References
Examples
#include <opencl_image>
#include <opencl_work_item>
using namespace cl;
float4 bar_val(image2d<float4> img) {
return img.read({get_global_id(0), get_global_id(1)});
}
float4 bar_ref(image2d<float4>& img) {
return img.read({get_global_id(0), get_global_id(1)});
}
kernel void foo(image2d<float4> img)
{
// Error: marker type cannot be passed by value
float4 val = bar_val(img);
// Correct, passing marker type by reference
float4 val = bar_ref(img);
}
#include <opencl_image>
#include <opencl_work_item>
using namespace cl;
float4 bar(image2d<float4> img) {
return img.read({get_global_id(0), get_global_id(1)});
}
kernel void foo(image2d<float4> img1, image2d<float4> img2)
{
// Error: marker type cannot be declared in the kernel
image2d<float4> img3;
// Error: marker type cannot be assigned
img1 = img2;
// Error: taking address of marker type
image2d<float4> *imgPtr = &img1;
// Undefined behavior: size of marker type is not defined
size_t s = sizeof(img1);
// Undefined behavior: divergent control flow
float4 val = bar(get_global_id(0) ? img1: img2);
}
Images and Samplers Library
Images are another part of the OpenCL that changed a lot compared to OpenCL C. Instead of image types and built-in image read/write functions in OpenCL C++ there are image class templates with corresponding methods. Image and sampler class templates are marker types.
Image types
OpenCL C | OpenCL C++ |
---|---|
image1d_t | cl::image1d |
image1d_buffer_t | cl::image1d_buffer |
image1d_array_t | cl::image1d_array |
image2d_t | cl::image2d |
image2d_array_t | cl::image2d_array |
image2d_depth_t | cl::image2d_depth |
image2d_array_depth_t | cl::image2d_array_depth |
image3d_t | cl::image3d |
sampler_t | cl::sampler |
To instantiate image template class user has to specify image element type (which is type returned when reading from an image, and required when writing pixel to an image), and access mode (cl::image_access::read
is the default access mode).
Image dimension
Based on the dimension of an image different methods are available. All image types have int width()
method, images of dimension 2 or 3 have int height()
, 3D images have int depth()
, and arrayed images have one additional method – int array_size()
. See subsection Image dimension of OpenCL C++ Specification for more details.
Image element type
Depending on the type of an image different types are allowed to be specified as image element type template parameter. Image type with invalid pixel type is ill formed. See subsection Image element types of OpenCL C++ Specification for more details.
Image processing kernels written in OpenCL C++ can be made more readable using .rgba
vector component access (compared to .xyzw
in OpenCL C). Like xyzw
selector, rgba
selector works only for vector types with 4 or less elements. See also Vector Component Access part of subsection Built-in Vector Data Types and section Vector Utilities Library of OpenCL C++ Specification.
// OpenCL C++
kernel void openclcxx(image2d<uint4, // image element type
image_access::read // access mode
> img)
{
uint4 color;
// rgba selector
color.r = 255;
color.gb = uint2(0);
color.a = 255;
//...
}
// OpenCL C
kernel void openclc(read_only image2d_t img) // read_only keyword sets access mode
// image element type not defined
{
uint4 color;
// xyzw selector
color.x = 255;
color.yz = (uint2)(0);
color.w = 255;
//...
}
Image access mode
Based on the image access mode different read and write methods are present in the instantiated image class. See subsection Image access of OpenCL C++ Specification for more details.
namespace cl
{
enum class image_access
{
sample,
read,
write,
read_write
};
}
Sampler
Like in OpenCL C, in OpenCL C++ there only two ways of acquiring a sampler inside of a kernel. One is to pass it as a kernel parameter from host using clSetKernelArg
function, the other is to create cl::sampler
using make_sampler
function in the kernel code. The sampler objects at non-program scope must be declared with static specifier.
template <addressing_mode A, normalized_coordinates C, filtering_mode F>
constexpr sampler make_sampler();
Sampler parameters and their behavior are described in subsection Sampler Modes of OpenCL C++ Specification.
OpenCL C++ Specification References
Examples
// OpenCL C++
#include <opencl_image>
#include <opencl_work_item>
using namespace cl;
using my_image1d_type = image1d<float4, // image element type
image_access::write>; // access mode
using my_image2d_type = image2d<float4>; // access mode is image_access::read
kernel void openclcxx(my_image1d_type img1d, my_image2d_type img2d)
{
const int coords1d(get_global_id(0));
const int2 coords2d(get_global_id(0), get_global_id(1));
float4 val1d(0.0f);
// 1) write() is enabled because the access mode of my_image1d_type
// is image_access::write
// 2) write() takes int value as pixel coordinates because my_image1d_type
// is a 1d image type
// 3) write() takes float4 value as pixel value because float4 is the image
// element type of my_image1d_type
img1d.write(coords1d, val1d);
// 1) read() is enabled because the access mode of my_image2d_type
// is image_access::read
// 2) read() takes int2 as an input argument because my_image2d_type
// is a 2d image type
// 3) read() returns float4 because float4 is the image element type
// of my_image2d_type
float4 val2d = img2d.read(coords2d);
}
// OpenCL C
kernel void openclc(write_only image1d_t img1d, // write_only keyword sets access mode
read_only image2d_t img2d) // read_only keyword sets access mode
{
const int coords1d = get_global_id(0);
const int2 coords2d = (int2)(get_global_id(0), get_global_id(1));
float4 val1d = (float4)(0.0f);
write_imagef(img1d, coords1d, val1d);
// float4 read_imagef(image2d_t, int2) function is used to
// read from img 2d image.
float4 val2d = read_imagef(img2d, coords2d);
}
Pipes Library
In OpenCL C++ pipe
keyword was replaced with cl::pipe
class template. Reserve operations return cl::pipe::reservation
object, instead of returning reservation id of type reserve_id_t
.
All pipe
s-related function were moved to cl::pipe
or reservation
as their methods.
Pipe storage
OpenCL C++ introduces new pipe-related type – cl::pipe_storage
class template. It enables programmers to create cl::pipe
objects in an OpenCL program without need to create cl_pipe
on host using API. cl::pipe_storage
class template has two template parameters: T
– element type, and N
– the maximum number of packets which can be held by an object.
Note
One kernel can have only one pipe accessor (cl::pipe
object) associated with one cl::pipe_storage
object.
Requirements and Restictions
cl::pipe::reservation
, cl::pipe_storage
and cl::pipe
are marker types. However, they also have additional sets of requirements and restictions beyond those specified in Market Types section. The most important are:
- The element type
T
ofpipe
andpipe_storage
class templates must be a POD type i.e. satisfyis_pod<T>::value == true
. - A kernel cannot read from and write to the same pipe object.
- Variables of type
pipe_storage
can only be declared at program scope or with thestatic
specifier. - Variables of type
pipe
created frompipe_storage
can only be declared inside a kernel function at kernel scope. - The
reservation
,pipe_storage
, andpipe
types cannot be used as a class or union field, a pointer type, an array or the return type of a function. - The
reservation
,pipe_storage
, andpipe
types cannot be used with theglobal
,local
,priv
andconstant
address space storage classes.
The full lists of requirements and restictions can be found in subsections Requirements and Restrictions of Pipe Library section in OpenCL C++ Specification.
OpenCL C++ Specification References
Examples
Reading from and writing to a pipe:
// OpenCL C++
#include <opencl_pipe>
kernel void foobar(cl::pipe<int /* type */, cl::pipe_access::write /* access mode */> wp,
cl::pipe<int /* access mode defaults to read */> rp)
{
int val;
// ...
// write() method is enabled only for pipes with
// pipe_access::write access mode
if(wp.write(val)) { // val passed by const reference
// ...
}
// read() method is enabled only for pipes with
// pipe_access::read access mode
if(rp.read(val)) { // val passed by reference
// ...
}
}
// OpenCL C
kernel void foobar(write_only /* access mode */ pipe /* keyword */ int /* type */ wp,
read_only /* access mode */ pipe /* keyword */ int /* type */ rp)
{
int val;
// ...
// In OpenCL write_pipe(...) and read_pipe(...) operations
// returns 0 when write/read is successful, and a negative
// value otherwise
if(write_pipe(p, &val) == 0) {
// ...
}
if(read_pipe(p, &val) == 0) {
// ...
}
}
// OpenCL C++
#include <opencl_pipe>
kernel void foobar(cl::pipe<int> p)
{
int val;
// cl::pipe<int, cl::pipe_access::read>::reservation<memory_scope_work_item>
auto r = p.reserve(3);
// ...
// read() method is available because pipe p is in
// pipe_access::read access mode
if(r.read(2, val)) {
// ...
}
r.commit();
}
Making and using a reservation:
// OpenCL C
kernel void foobar(read_only pipe int p)
{
int val;
reserve_id_t rid = reserve_read_pipe(p, 3);
// ...
if(read_pipe(p, rid, 2, &val)) {
// ...
}
commit_read_pipe(p, rid);
}
// OpenCL C++
#include <opencl_pipe>
kernel void foobar(cl::pipe<int> p)
{
int val;
// cl::pipe<int, cl::pipe_access::read>::reservation<memory_scope_work_item>
auto r = p.reserve(3);
// ...
// read() method is available because pipe p is in
// pipe_access::read access mode
if(r.read(2, val)) {
// ...
}
r.commit();
}
Using pipe_storage
:
// OpenCL C++
#include <opencl_pipe>
cl::pipe_storage <int, 1337> my_pipe;
kernel void reader()
{
auto p = my_pipe.get<cl::pipe_access::read>();
// ...
p.read(...);
// ...
}
kernel void writer()
{
auto p = my_pipe.get<cl::pipe_access::write>();
// ...
p.write(...);
// ...
}
kernel void error_kernel()
{
auto p1 = my_pipe.get<cl::pipe_access::write>();
// Error, one kernel can have only one pipe accessor
// (cl::pipe object) associated with one cl::pipe_storage object.
auto p2 = my_pipe.get<cl::pipe_access::read>();
// ...
}
Device Enqueue Library
When it comes to enqueuing a kernel without host interaction, the biggest difference between OpenCL C and OpenCL C++ is that in OpenCL C++ enqueued kernel can be a lambda expression or a function, whereas in OpenCL C it is defined using block syntax.
All functions except function which returns default device queue and kernel query functions were moved to appropriate classes as their methods. See Header <opencl_device_queue> Synopsis subsections of OpenCL C++ specification.
Device Queue
In OpenCL C++ cl::device_queue
class represents device queue (queue_t
in OpenCL C). cl::device_queue
is a marker type (see Marker Types).
OpenCL C | OpenCL C++ |
---|---|
queue_t | cl::device_queue |
namespace cl
{
struct device_queue: marker_type
{
// ...
template <class Fun, class... Args>
enqueue_status enqueue_kernel(enqueue_policy flag,
const ndrange &ndrange,
Fun fun,
Args... args) noexcept;
// In OpenCL C:
// int enqueue_kernel(queue_t queue,
// kernel_enqueue_flags_t flags,
// const ndrange_t ndrange,
// void (^block)(local void *, ...),
// uint size0, ...);
// ...
};
}
Note
args
are the arguments that will be passed tofun
when kernel will be enqueued with the exception forlocal_ptr
parameters. For local pointers user must supply the size of local memory that will be allocated usinglocal_ptr::size_type{num_elements}
. In OpenCL C user has to passuint
value for a corresponding local pointer, which specifies the size of a local memory accessible using that local pointer.
Event
In OpenCL C++ cl::event
class represents device-side event (clk_event_t
in OpenCL C).
OpenCL C | OpenCL C++ |
---|---|
clk_event_t | cl::event |
cl::event
has the same possible states as clk_event_t
, however in OpenCL C++ error is not represented by any negative value, but rather by cl::event_status::error
enum.
OpenCL C | OpenCL C++ | Description |
---|---|---|
CL_SUBMITTED | cl::event_status::submitted | Initial status of a user event |
CL_COMPLETE | cl::event_status::complete | |
Any negative integer value | cl::event_status::error | Status indicating an error |
See Event Class Methods and Event Status subsections of OpenCL C++ specification.
Enqueue Policy
Available enqueue policies did not changed compared to OpenCL C. In OpenCL C enqueue policy type was kernel_enqueue_flags_t
enum, in OpenCL C++ it is cl::enqueue_policy
enum class.
OpenCL C | OpenCL C++ |
---|---|
CLK_ENQUEUE_FLAGS_NO_WAIT | cl::enqueue_polic::no_wait |
CLK_ENQUEUE_FLAGS_WAIT_KERNEL | cl::enqueue_polic::wait_kernel |
CLK_ENQUEUE_FLAGS_WAIT_WORK_GROUP | cl::enqueue_polic::wait_work_group |
See Enqueue Policy subsection of OpenCL C++ specification.
Requirements
Functor and lambda objects passed to enqueue_kernel()
method of device queue has to follow specific restrictions:
- It has to be trivially copyable.
- It has to be trivially copy constructible.
- It has to be trivially destructible.
Code enqueuing function objects that do not meet this criteria is ill-formed.
OpenCL C++ Specification References
Examples
Block syntax vs. lambda expression:
// OpenCL C++
#include <opencl_device_queue>
#include <opencl_memory>
kernel void my_func(cl::global_ptr<int> a, cl::global_ptr<int> b, cl::global_ptr<int> c)
{
// ...
auto dq = cl::get_default_device_queue();
dq.enqueue_kernel(
cl::enqueue_polic::no_wait,
cl::ndrange({10, 10}),
[=](){ //
*a = *b + *c; // Lambda expression
} //
);
// ...
}
// OpenCL C
kernel void my_func(global int *a, global int *b, global int *c)
{
// ...
enqueue_kernel(
get_default_queue(),
CLK_ENQUEUE_FLAGS_NO_WAIT,
ndrange_2D(1, 1),
^{ //
*a = *b + *c; // Block syntax
} //
);
// ...
}
Enqueuing a functor:
// OpenCL C++
#include <opencl_device_queue>
#include <opencl_memory>
struct my_functor {
void operator ()(cl::local_ptr<ushort16[]> p, int x) const
{ /* ... */ }
};
kernel void my_func(cl::device_queue q)
{
// ...
my_functor f;
dq.enqueue_kernel(
cl::enqueue_polic::no_wait,
cl::ndrange(1),
f, // functor
cl::local_ptr<ushort16[]>::size_type{10}, // define size of p
2 // x
);
// ...
}
Relational Functions
In OpenCL C++ there were significant changes in signatures and/or behaviour of built-in relational functions. This is because OpenCL C++ introduces boolN
type which can replace intN
as a type returned by relational functions.
all()
and any()
In OpenCL C:
// igentype can be char, charN, short, shortN, int, intN, long, and longN
int any (igentype x);
int all (igentype x);
any()
returns 1 if the most significant bit in any component ofx
is set; otherwise returns 0.
all()
returns 1 if the most significant bit in all components ofx
is set; otherwise returns 0.
In OpenCL C++:
bool any(booln t);
bool all(booln t);
any()
returnstrue
if any component oft
istrue
; otherwise returnsfalse
.
all()
returnstrue
if all components oft
aretrue
; otherwise returnsfalse
.
select()
In OpenCL C:
// igentype can be char, charN, short, shortN, int, intN, long, and longN
// ugentype can be uchar, ucharN, ushort, ushortN, uint, uintN, ulong, and ulongN
gentype select (gentype a, gentype b, igentype c);
gentype select (gentype a, gentype b, ugentype c);
For each component of a vector type,
result[i] = if MSB of c[i] is set ? b[i] : a[i]
.
For scalar type,
result = c ? b : a
.
igentype
andugentype
must have the same number of elements and bits asgentype
.
NOTE: The above definition means that the behavior of select and the ternary operator for vector and scalar types is dependent on different interpretations of the bit pattern of
c
.
In OpenCL C++ select()
is less confusing:
gentype select(gentype a, gentype b, booln c);
For each component of a vector type,
result[i] = c[i] ? b[i] : a[i]
.
For a scalar type,
result = c ? b : a
.
boolN
must have the same number of elements as gentype.
OpenCL C++ Specification References
Examples
// OpenCL C++
#include <opencl_relational>
kernel void foobar()
{
bool b1 = isequal(1.0f, 1.0f); // true
bool b2 = isequal(1.0, 2.0); // false
bool2 b3 = isequal(float2(1.0f), float2(1.0f)); // { true, true }
bool2 b4 = isequal(double2(1.0), double2(2.0)); // { false, false }
bool2 b5 = { true, false };
auto b6 = all(b3); // false
auto b7 = any(b3); // true
bool2 c { true, false };
float2 a { 1.0f, 1.0f };
float2 b { -1.0f, -1.0f };
auto r1 = select(a, b, c); // { -1.0f, 1.0f }
auto r2 = select(1.0f, 2.0f, false); // 1.0f
}
// OpenCL C
kernel void foobar()
{
// Note: in integer value -1 MSB is set to 1
int b1 = isequal(1.0f, 1.0f); // 1 (true)
long b2 = isequal(1.0, 2.0); // 0 (false)
int2 b3 = isequal((float2)(1.0f), (float2)(1.0f)); // { -1, -1 } ({ true, true })
long2 b4 = isequal((double2)(1.0), (double2)(2.0)); // { 0, 0 } ({ false, false })
int b5 = all( (int2)(-1, 10) ); // 0
int b6 = all( (int2)(-1, -1) ); // 1
int b7 = any( (int2)(-1, 0) ); // 1
int b8 = any( (int2)(1, 1) ); // 0
int2 c = (int2)(-1, 1);
float2 a = (float2)(1.0f, 1.0f);
float2 b = (float2)(-1.0f, -1.0f);
float2 r1 = select(a, b, c); // { -1.0f, 1.0f }
float r2 = select(1.0f, 2.0f, -1); // 2.0f
float r3 = select(1.0f, 2.0f, 1); // 1.0f
float r4 = select(1.0f, 2.0f, 0); // 1.0f
}
Vector Data Load and Store Functions
In OpenCL C++ vector data load and store functions were greatly simplified compared to OpenCL: instead of 39 different functions, now there are just 9 function templates. The requirements and the behaviours of functions have not be changed. Also arguments and their order was not changed.
OpenCL C | OpenCL C++ |
---|---|
gentypeN vloadN |
template <size_t N, class T> make_vector_t<T, N> vload |
void vstoreN(...) |
template <class T> void vstore(…, vector_element_t<T>* p) |
floatN vload_half[N] |
template <size_t N> make_vector_t<float, N> vload_half |
void vstore_half[N][_rounding_mode] |
template <rounding_mode R, class Type> void vstore_half(…, half* p) |
floatN vloada_halfN |
template <size_t N> make_vector_t<float, N> vloada_half |
void vstore_halfN[_rounding_mode] |
template <rounding_mode R, class T> void vstorea_half(…, half* p) |
Read Header <opencl_vector_load_store> Synopsis subsection of Vector Data Load and Store Functions section to see vector data load and store function templates declarations.
OpenCL C++ Specification References
Examples
vload
and vstore
:
// OpenCL C++
#include <opencl_vector_load_store>
using namespace cl;
kernel void foobar(float * fptr, const constant_ptr<half> hptr)
{
auto f4 = vload<4>(0, fptr); // reads from (fptr + (0 * 4)), float4 returned
auto f2 = vload<2>(2, fptr); // reads from (fptr + (2 * 2)), float2 returned
#ifdef cl_khr_fp16 // cl_khr_fp16 must be defined and supported
auto h8 = vload<8>(0, hptr); // reads from (hptr + (0 * 8)), half8 returned
#endif
vstore(float4{ 1, 2, 3, 4}, 0, fptr); // float4 stored at (fptr + (0 * 4))
vstore(f2, 2, fptr); // float2 stored at (fptr + (2 * 2))
}
// OpenCL C
kernel void foobar(float * fptr, const constant half * hptr)
{
float4 f4 = vload4(0, fptr); // reads from (fptr + (0 * 4)), float4 returned
float2 f2 = vload2(2, fptr); // reads from (fptr + (2 * 2)), float2 returned
#ifdef cl_khr_fp16 // cl_khr_fp16 must be defined and supported
half8 h8 = vload8(0, hptr); // reads from (hptr + (0 * 8)), half8 returned
#endif
vstore4(f4, 0, fptr); // float4 stored at (fptr + (0 * 4))
vstore2(f2, 2, fptr); // float2 stored at (fptr + (2 * 2))
}
vload_half
, vstore_half
, vloada_half
, and vstorea_half
:
// OpenCL C++
#include <opencl_vector_load_store>
using namespace cl;
kernel void foobar_half(half * hptr)
{
// half vload
auto f4 = vload_half<4>(0, hptr); // reads from (hptr + (0 * 4)), float4 returned
auto f3 = vload_half<3>(0, hptr); // reads from (hptr + (0 * 3)), float3 returned
// half array vload
auto f4a = vloada_half<4>(0, hptr); // reads from (hptr + (0 * 4)), float4 returned
auto f3a = vloada_half<3>(0, hptr); // reads from (hptr + (0 * 4)), float3 returned
// half vstore
vstore_half(f3, 0, hptr); // float3 stored at (hptr + (0 * 3)),
// rounded to nearest even (rounding_mode::rte)
vstore_half<rounding_mode::rtz>(f4, 0, hptr); // float4 stored at (hptr + (0 * 4)),
// rounded toward zero
// half array vstore
vstorea_half(f3a, 0, hptr); // float3 stored at (hptr + (0 * 4))
// rounded to nearest even (rounding_mode::rte)
vstorea_half<rounding_mode::rtz>(f4a, 0, hptr); // float4 stored at (hptr + (0 * 4))
// rounded toward zero
}
// OpenCL C
kernel void foobar_half(half * hptr)
{
// half vload
float4 f4 = vload_half4(0, hptr); // reads from (hptr + (0 * 4)), float4 returned
float3 f3 = vload_half3(0, hptr); // reads from (hptr + (0 * 3)), float3 returned
// half array vload
float4 f4a = vloada_half4(0, hptr); // reads from (hptr + (0 * 4)), float4 returned
float3 f3a = vloada_half3(0, hptr); // reads from (hptr + (0 * 4)), float3 returned
// half vstore
vstore_half3(f3, 0, hptr); // float3 stored at (hptr + (0 * 3)),
// rounded to nearest even
vstore_half4_rtz(f4, 0, hptr); // float4 stored at (hptr + (0 * 4)),
// rounded toward zero
// half array vstore
vstorea_half3(f3a, 0, hptr); // float3 stored at (hptr + (0 * 4))
// rounded to nearest even
vstorea_half4_rtz(f4a, 0, hptr); // float4 stored at (hptr + (0 * 4))
// rounded toward zero
}
Atomic Operations Library
OpenCL C atomic operation are based on C11 atomics. In OpenCL C++ atomics are based on C++14 atomics and synchronization operations. Section Atomic Operations Library of OpenCL C++ presents synopsis of the atomics library and differences from C++14 specification.
Because atomic functions in OpenCL C and OpenCL C++ have virtually the same argument lists adding using namespace cl;
can Significantly speed up porting kernels to OpenCL C++.
Atomic types
In OpenCL C++ different OpenCL C atomic types like atomic_int
, atomic_float
were replaced with one class template atomic<T>
, however, for supported types proper type alias are declared (for example: using atomic_int = atomic<int>;
).
- There are explicit specializations for integral types. Each of these specializations provides set of extra operators suitable for integral types.
- There is an explicit specialization of the atomic template for pointer types.
- All atomic classes have deleted copy constructor and deleted copy assignment operators.
- 64-bit atomic types require
cl_khr_int64_base_atomics
andcl_khr_int64_extended_atomics
extensions andatomic<double>
in addition requirescl_khr_fp64
.
Restrictions
- The generic
atomic<T>
class template is only available ifT
isint
,uint
,long
,ulong
,float
,double
,intptr_t
,uintptr_t
,size_t
,ptrdiff_t
. - The atomic data types cannot be declared inside a kernel or non-kernel function unless they are declared as
static
keyword or inlocal<T>
andglobal<T>
containers. See examples. - The atomic operations on the private memory can result in undefined behavior.
memory_order_consume
from C++14 is not supported by OpenCL C++.
Full list of restrictions can be found in subsection Restrictions of section Atomic Operations Library in OpenCL C++ specification.
OpenCL C++ Specification References
Examples
// OpenCL C++
#include <opencl_memory>
#include <opencl_atomic>
using namespace cl;
atomic_int a; // OK: program scope atomic in the global memory
// atomic_int is alias for atomic<int>
local<atomic<int>> b(1); // OK: program scope atomic in the local memory
// Initialized to 1. The initialization is not atomic.
global<atomic<int>> c = ATOMIC_VAR_INIT(2); // OK: program scope atomic in the global memory
// Initialized to 2. The initialization is not atomic.
kernel void foo()
{
static global<atomic<int>> d; // OK: atomic in the global memory
static atomic<int> e; // OK: atomic in the global memory
local<atomic<int>> f; // OK: atomic in the local memory
atomic<global<int>> g; // Error: class members cannot be
// in address space
atomic<int> h; // undefined behavior
atomic_init(&a, 123); // Initialize a to 123. The initialization is not atomic.
}
// OpenCL C+
global atomic_int a; // OK: program scope atomic in the global memory
local atomic_int b; // Error: program scope local variables not suppoerted in OpenCL C
global atomic_int c = ATOMIC_VAR_INIT(2); // OK: program scope atomic in the global memory
// Initialized to 2. The initialization is not atomic.
kernel void foo()
{
static global atomic_int d; // OK: atomic in the global memory
static atomic_int e; // OK: atomic in the global memory
local atomic_int f; // OK: atomic in the local memory
atomic_int h; // undefined behavior
atomic_init(&a, 123); // Initialize a to 123. The initialization is not atomic.
}
OpenCL C++ Compilation Process
OpenCL C++ kernel language can not be consumed by clCreateProgramWithSource()
API function, which is used to create a program from OpenCL C source. OpenCL C++ source first have to be compiled to SPIR-V 1.2 binary, which can later be passed to clCreateProgramWithIL()
to create an OpenCL program. After that program can be build with clBuildProgram()
.
OpenCL C++ Compilation to SPIR-V
To compile OpenCL C++ kernel language to SPIR-V user have to use compiler that is not a part of OpenCL framework. The Khronos Group provides reference offline compiler based on Clang 3.6 and an implementation of OpenCL C++ Standard Library called libclcxx.
Preprocessor options
Every preprocessor option that would normally be specified in clBuildProgram()
, for OpenCL C++ must be passed when it is being compiled to SPIR-V.
-D name
Predefine name as a macro, with definition 1.
-D name=definition
The contents of definition are tokenized and processed as if they appeared during translation phase three in a #define
directive. In particular, the definition will be truncated by embedded newline characters.
Other compilation options
Some feature-related options must be specified during compilation to SPIR-V:
-cl-fp16-enable
– enables full half data type support and definescl_khr_fp16
macro. Disabled by default.-cl-fp64-enable
– enables full double data type support and definescl_khr_fp64
macro. Disabled by default.-cl-zero-init-local-mem-vars
– enables software zero-initialization of variables allocated in local memory.
Building program created from SPIR-V
When an OpenCL program created using clCreateProgramWithIL()
is compiled (clBuildProgram()
) not all build options are allowed. They have to be passed when compiling to SPIR-V. Otherwise, there is no difference between building program created from SPIR-V and program created from OpenCL C source. Which options are ignored and which not is described in OpenCL 2.2 API Specification.
OpenCL C++ Specification and OpenCL 2.2 API References
Bibliography
OpenCL Specifications
- The OpenCL C++ 1.0 Specification (HTML)
- The OpenCL 2.2 API Specification (HTML)
- The OpenCL 2.2 Extension Specification (HTML)
- OpenCL 2.2 SPIR-V Environment Specification (HTML)
- The OpenCL C 2.0 Language Specification
- The OpenCL 2.1 API Specification
OpenCL Reference Pages
- The OpenCL 2.2 Reference Page (not published yet)
- The OpenCL 2.1 Reference Page
OpenCL Headers
Other
- Khronos OpenCL Registry (GitHub)
- OpenCL 2.2 Release Note
- Michael Wong, Adam Stanski, Maria Rovatsou, Ruyman Reyes, Ben Gaster, and Bartok Sochaski. 2016. C++ for OpenCL Workshop, IWOCL 2016. In Proceedings of the 4th International Workshop on OpenCL (IWOCL ’16).