rocm是amd推出的类NVIDIA CUDA的开源的开发平台。
架构的变化(todo)
与cuda对比
CUDA | ROCm | Description |
---|---|---|
SM | Compute Unit, CU | One of many parallel vector processors in a GPU that contain parallel ALUs. All waves in a wrokgroup are assigned to the same CU. |
Kernel | Kernel | Functions launched to the GPU that are executed by multiple parallel workers on the GPU. Kernels can work in parallel with CPU. |
Warp | Wavefront | Collection of operations that execute in lockstep, run the same instructions, and follow the same control-flow path. Individual lanes can be masked off. Think of this as a vector thread. A 64-wide wavefront is a 64-wide vector op. |
Thread Block | Workgroup | Group of wavefronts that are on the GPU at the same time. Can synchronize together and communicate through local memory. |
Thread | Work Item / Thread | Individual lane in a wavefront. On AMD GPUs, mush run in lockstep with other work items in the wavefront. Lanes can be individually masked off. GPU programming models can treat this as a separate thread of execution, though you do not necessarily get forward sub-wavefront progress. |
subpartation of SM | SIMD | Both of them are 4 in SM/CU. |
ROCm 目前不支持managed memory。
Scalar Unit && Scalar Registers (todo) https://www.youtube.com/watch?v=uu-3aEyesWQ&list=PLx15eYqzJifehAxhWRD6T35GZwAqM9IK4&index=5&t=332s
AMD ROCm Profiler
https://rocmdocs.amd.com/en/latest/ROCm_Tools/ROCm-Tools.html
跟nvidia的ncu类似。但提供的hardware counters 比ncu的少很多。public的counters有:
https://github.com/ROCm-Developer-Tools/rocprofiler/blob/amd-master/test/tool/metrics.xml
https://github.com/ROCm-Developer-Tools/rocprofiler/blob/amd-master/test/tool/gfx_metrics.xml
需要用一个input file来指定需要的counters。
官方的快速入门教程:
https://developer.amd.com/resources/rocm-resources/rocm-learning-center/
https://www.youtube.com/watch?v=hSwgh-BXx3E&list=PLx15eYqzJifehAxhWRD6T35GZwAqM9IK4
官方手册:
https://rocmdocs.amd.com/en/latest/
rdna white paper:
https://www.amd.com/system/files/documents/rdna-whitepaper.pdf
Comments