__CUDA_ARCH__
属于NVCC的宏
5.7.4. Virtual Architecture Macros 给出说明
The architecture identification macro__CUDA_ARCH__
is assigned a three-digit value string xy0 (ending in a literal 0) during each nvcc compilation stage 1 that compiles for compute_xy.
This macro can be used in the implementation of GPU functions for determining the virtual architecture for which it is currently being compiled. The host code (the non-GPU code) must not depend on it.
The architecture list macro__CUDA_ARCH_LIST__
is a list of comma-separated__CUDA_ARCH__
values for each of the virtual architectures specified in the compiler invocation. The list is sorted in numerically ascending order.
The macro__CUDA_ARCH_LIST__
is defined when compiling C, C++ and CUDA source files.
在编译时才定义,因此在代码编辑器中是看不到它的值的,也不要尝试自己写这个宏
For example, the following nvcc compilation command line will define__CUDA_ARCH_LIST__
as 500,530,800 :
nvcc x.cu \
--generate-code arch=compute_80,code=sm_80 \
--generate-code arch=compute_50,code=sm_52 \
--generate-code arch=compute_50,code=sm_50 \
--generate-code arch=compute_53,code=sm_53
通过nvcc编译命令-arch
设置架构
在vs中,如果设置了多个Code Generation(即命令-gencode=arch=compute_xx,code=sm_xx
),会以最高的架构版本为准
更多技术细节见NVIDIA CUDA Compiler Driver NVCC
宏的用法见Cuda 12.0文档 14.5.2.1. _CUDA_ARCH _,老版本的章节不同:Cuda 8.0文档 E.3.2.1. _ CUDA_ARCH _
如果想打印看看__CUDA_ARCH__
宏,可以这样做
#include__global__ void Mykernel()
{printf("%d\n", __CUDA_ARCH__);
}
int main()
{Mykernel<<<1, 5>>>();
cudaDeviceSynchronize();
return 0;
}
你是否还在寻找稳定的海外服务器提供商?创新互联www.cdcxhl.cn海外机房具备T级流量清洗系统配攻击溯源,准确流量调度确保服务器高可用性,企业级服务器适合批量采购,新人活动首月15元起,快前往官网查看详情吧