cuda 纹理(texture)和表面(surface)的使用

       cuda纹理和表面内存是一种特殊的全局内存,主要针对二维空间的局部性访问进行了优化。纹理内存是只读的,而表面内存是可读可写的。纹理和表面内存都支持一维、二维和三维。

       纹理内存的属性:

read mode用来指定返回值是归一化模式,还是原始值模式。如果是cudaReadModeNormalizedFloat,则返回[0.0 1.0]之间的浮点数;如果是cudaReadModeElementType,则返回原始值。

addressing mode指定寻址模式,也就是超出坐标范围值如果取值。有四种模式:

1. cudaAddressModeBorder -> 超出范围取零值,例如:AA|ABCDE|EE;

2. cudaAddressModeClamp -> 超出范围取边界值,例如:00|ABCDE|00;

3. cudaAddressModeWrap -> 超出范围循环重叠,例如:DE|ABCDE|AB;

4. cudaAddressModeMirror -> 超出范围镜像模式,例如:BA|ABCDE|ED。 

filtering mode指定滤波模式:cudaFilterModePoint是邻近点插值,cudaFilterModeLinear是线性插值。

 

使用到的数据类型:

cudaArray cuda数组,纹理和表面的内存一般用cuda数组来描述,也可以是Layerd Array,Cubemap Array。

cudaResourceDesc cuda资源描述符,描述资源的类型。

cudaTextureDesc cuda纹理描述符,描述纹理的属性。

cudaTextureObject_t 纹理对象。

cudaSurfaceObject_t 表面对象。

下面是使用纹理的例子:

#include <cuda_runtime.h>
#include <cuda.h>

__global__ void transformKernel(
    float *output, 
    cudaTextureObject_t texObj,
    int width, int height, float theta)
{
    unsigned int x = blockIdx.x * blockDim.x + threadIdx.x;
    unsigned int y = blockIdx.y * blockDim.y + threadIdx.y;

    float u = x / (float)width;
    float v = y / (float)height;

    u -= 0.5f;
    v -= 0.5f;

    float tu = u * cosf(theta) - v * sinf(theta) + 0.5f;
    float tv = v * cosf(theta) - u * sinf(theta) + 0.5f;

    output[y * width + x] = tex2D<float>(texObj, tu, tv);    
}

int main()
{
    const int width = 1024;
    const int height = 1024;
    float angle = 0.5;

    //分配主机内存,并初始化
    float *h_data = (float*)malloc(sizeof(float) * width * height);
    for(int i=0; i<width*height; i++)
        h_data[i] = i;
    
    //分配cuda数组
    cudaChannelFormatDesc channelDesc = 
        cudaCreateChannelDesc(32, 0, 0, 0, cudaChannelFormatKindFloat);
    cudaArray_t cuArray;
    cudaMallocArray(&cuArray, &channelDesc, width, height);

    //拷贝主机数组到cuda数组
    const size_t pitch = width * sizeof(float);
    cudaMemcpy2DToArray(cuArray, 0, 0, h_data, pitch, 
        width * sizeof(float), height, cudaMemcpyHostToDevice);
    //设置资源类型
    cudaResourceDesc resDesc = {};
    resDesc.resType = cudaResourceTypeArray;
    resDesc.res.array.array = cuArray;

    //设置纹理属性
    cudaTextureDesc texDesc = {};
    texDesc.addressMode[0] = cudaAddressModeWrap;
    texDesc.addressMode[1] = cudaAddressModeWrap;
    texDesc.filterMode = cudaFilterModeLinear;
    texDesc.readMode = cudaReadModeElementType;
    texDesc.normalizedCoords = 1;

    //创建纹理对象
    cudaTextureObject_t texObj = 0;
    cudaCreateTextureObject(&texObj, &resDesc, &texDesc, NULL);

    //分配输出内存
    float *output;
    cudaMalloc(&output, width * height * sizeof(float));

    //调用核函数
    dim3 threads(16, 16);
    dim3 blocks((width + threads.x - 1) / threads.x,
                (height + threads.y - 1) / threads.y);
    transformKernel<<<blocks, threads>>>(output, texObj, width, height, angle);

    //拷贝数据回主机
    cudaMemcpy(h_data, output, width * height * sizeof(float), cudaMemcpyDeviceToHost);

    //释放内存和对象
    cudaDestroyTextureObject(texObj);
    cudaFreeArray(cuArray);
    cudaFree(output);

    free(h_data);

    return 0;
}