cuda纹理和表面内存是一种特殊的全局内存,主要针对二维空间的局部性访问进行了优化。纹理内存是只读的,而表面内存是可读可写的。纹理和表面内存都支持一维、二维和三维。
纹理内存的属性:
read mode用来指定返回值是归一化模式,还是原始值模式。如果是cudaReadModeNormalizedFloat,则返回[0.0 1.0]之间的浮点数;如果是cudaReadModeElementType,则返回原始值。
addressing mode指定寻址模式,也就是超出坐标范围值如果取值。有四种模式:
1. cudaAddressModeBorder -> 超出范围取零值,例如:AA|ABCDE|EE;
2. cudaAddressModeClamp -> 超出范围取边界值,例如:00|ABCDE|00;
3. cudaAddressModeWrap -> 超出范围循环重叠,例如:DE|ABCDE|AB;
4. cudaAddressModeMirror -> 超出范围镜像模式,例如:BA|ABCDE|ED。
filtering mode指定滤波模式:cudaFilterModePoint是邻近点插值,cudaFilterModeLinear是线性插值。
使用到的数据类型:
cudaArray cuda数组,纹理和表面的内存一般用cuda数组来描述,也可以是Layerd Array,Cubemap Array。
cudaResourceDesc cuda资源描述符,描述资源的类型。
cudaTextureDesc cuda纹理描述符,描述纹理的属性。
cudaTextureObject_t 纹理对象。
cudaSurfaceObject_t 表面对象。
下面是使用纹理的例子:
#include <cuda_runtime.h>
#include <cuda.h>
__global__ void transformKernel(
float *output,
cudaTextureObject_t texObj,
int width, int height, float theta)
{
unsigned int x = blockIdx.x * blockDim.x + threadIdx.x;
unsigned int y = blockIdx.y * blockDim.y + threadIdx.y;
float u = x / (float)width;
float v = y / (float)height;
u -= 0.5f;
v -= 0.5f;
float tu = u * cosf(theta) - v * sinf(theta) + 0.5f;
float tv = v * cosf(theta) - u * sinf(theta) + 0.5f;
output[y * width + x] = tex2D<float>(texObj, tu, tv);
}
int main()
{
const int width = 1024;
const int height = 1024;
float angle = 0.5;
//分配主机内存,并初始化
float *h_data = (float*)malloc(sizeof(float) * width * height);
for(int i=0; i<width*height; i++)
h_data[i] = i;
//分配cuda数组
cudaChannelFormatDesc channelDesc =
cudaCreateChannelDesc(32, 0, 0, 0, cudaChannelFormatKindFloat);
cudaArray_t cuArray;
cudaMallocArray(&cuArray, &channelDesc, width, height);
//拷贝主机数组到cuda数组
const size_t pitch = width * sizeof(float);
cudaMemcpy2DToArray(cuArray, 0, 0, h_data, pitch,
width * sizeof(float), height, cudaMemcpyHostToDevice);
//设置资源类型
cudaResourceDesc resDesc = {};
resDesc.resType = cudaResourceTypeArray;
resDesc.res.array.array = cuArray;
//设置纹理属性
cudaTextureDesc texDesc = {};
texDesc.addressMode[0] = cudaAddressModeWrap;
texDesc.addressMode[1] = cudaAddressModeWrap;
texDesc.filterMode = cudaFilterModeLinear;
texDesc.readMode = cudaReadModeElementType;
texDesc.normalizedCoords = 1;
//创建纹理对象
cudaTextureObject_t texObj = 0;
cudaCreateTextureObject(&texObj, &resDesc, &texDesc, NULL);
//分配输出内存
float *output;
cudaMalloc(&output, width * height * sizeof(float));
//调用核函数
dim3 threads(16, 16);
dim3 blocks((width + threads.x - 1) / threads.x,
(height + threads.y - 1) / threads.y);
transformKernel<<<blocks, threads>>>(output, texObj, width, height, angle);
//拷贝数据回主机
cudaMemcpy(h_data, output, width * height * sizeof(float), cudaMemcpyDeviceToHost);
//释放内存和对象
cudaDestroyTextureObject(texObj);
cudaFreeArray(cuArray);
cudaFree(output);
free(h_data);
return 0;
}