兰博随笔 | cuda 纹理(texture)和表面(surface)的使用

cuda 纹理(texture)和表面(surface)的使用

21 12 月, 2022 5:22 下午 by rambo dong

cuda纹理和表面内存是一种特殊的全局内存，主要针对二维空间的局部性访问进行了优化。纹理内存是只读的，而表面内存是可读可写的。纹理和表面内存都支持一维、二维和三维。

纹理内存的属性：

read mode用来指定返回值是归一化模式，还是原始值模式。如果是cudaReadModeNormalizedFloat，则返回[0.0 1.0]之间的浮点数；如果是cudaReadModeElementType，则返回原始值。

addressing mode指定寻址模式，也就是超出坐标范围值如果取值。有四种模式：

1. cudaAddressModeBorder -> 超出范围取零值，例如：AA|ABCDE|EE；

2. cudaAddressModeClamp -> 超出范围取边界值，例如：00|ABCDE|00；

3. cudaAddressModeWrap -> 超出范围循环重叠，例如：DE|ABCDE|AB；

4. cudaAddressModeMirror -> 超出范围镜像模式，例如：BA|ABCDE|ED。

filtering mode指定滤波模式：cudaFilterModePoint是邻近点插值，cudaFilterModeLinear是线性插值。

使用到的数据类型：

cudaArray cuda数组，纹理和表面的内存一般用cuda数组来描述，也可以是Layerd Array，Cubemap Array。

cudaResourceDesc cuda资源描述符，描述资源的类型。

cudaTextureDesc cuda纹理描述符，描述纹理的属性。

cudaTextureObject_t 纹理对象。

cudaSurfaceObject_t 表面对象。

下面是使用纹理的例子：

#include <cuda_runtime.h>
#include <cuda.h>

__global__ void transformKernel(
    float *output, 
    cudaTextureObject_t texObj,
    int width, int height, float theta)
{
    unsigned int x = blockIdx.x * blockDim.x + threadIdx.x;
    unsigned int y = blockIdx.y * blockDim.y + threadIdx.y;

    float u = x / (float)width;
    float v = y / (float)height;

    u -= 0.5f;
    v -= 0.5f;

    float tu = u * cosf(theta) - v * sinf(theta) + 0.5f;
    float tv = v * cosf(theta) - u * sinf(theta) + 0.5f;

    output[y * width + x] = tex2D<float>(texObj, tu, tv);    
}

int main()
{
    const int width = 1024;
    const int height = 1024;
    float angle = 0.5;

    //分配主机内存，并初始化
    float *h_data = (float*)malloc(sizeof(float) * width * height);
    for(int i=0; i<width*height; i++)
        h_data[i] = i;
    
    //分配cuda数组
    cudaChannelFormatDesc channelDesc = 
        cudaCreateChannelDesc(32, 0, 0, 0, cudaChannelFormatKindFloat);
    cudaArray_t cuArray;
    cudaMallocArray(&cuArray, &channelDesc, width, height);

    //拷贝主机数组到cuda数组
    const size_t pitch = width * sizeof(float);
    cudaMemcpy2DToArray(cuArray, 0, 0, h_data, pitch, 
        width * sizeof(float), height, cudaMemcpyHostToDevice);
    //设置资源类型
    cudaResourceDesc resDesc = {};
    resDesc.resType = cudaResourceTypeArray;
    resDesc.res.array.array = cuArray;

    //设置纹理属性
    cudaTextureDesc texDesc = {};
    texDesc.addressMode[0] = cudaAddressModeWrap;
    texDesc.addressMode[1] = cudaAddressModeWrap;
    texDesc.filterMode = cudaFilterModeLinear;
    texDesc.readMode = cudaReadModeElementType;
    texDesc.normalizedCoords = 1;

    //创建纹理对象
    cudaTextureObject_t texObj = 0;
    cudaCreateTextureObject(&texObj, &resDesc, &texDesc, NULL);

    //分配输出内存
    float *output;
    cudaMalloc(&output, width * height * sizeof(float));

    //调用核函数
    dim3 threads(16, 16);
    dim3 blocks((width + threads.x - 1) / threads.x,
                (height + threads.y - 1) / threads.y);
    transformKernel<<<blocks, threads>>>(output, texObj, width, height, angle);

    //拷贝数据回主机
    cudaMemcpy(h_data, output, width * height * sizeof(float), cudaMemcpyDeviceToHost);

    //释放内存和对象
    cudaDestroyTextureObject(texObj);
    cudaFreeArray(cuArray);
    cudaFree(output);

    free(h_data);

    return 0;
}

cuda 纹理(texture)和表面(surface)的使用

分类

标签