pytorch1.10-dtk22.04安装教程

1) 本地whl所在目录

/public/software/apps/DeepLearning/whl/dtk-22.04

2) conda创建python3.7环境(以创建python3.7环境为例)

conda create -n pytorch_1.10-dtk_22.04 python=3.7

3)在conda环境中安装PyTorch1.10(以python3.7-pytorch1.10版本为例)

conda activate pytorch_1.10-dtk_22.04

pip install /public/software/apps/DeepLearning/whl/dtk-22.04.1/torch-1.10.0a0+git450cdd1.dtk22.4-cp37-cp37m-linux_x86_64.whl

4) 安装依赖包

pip install numpy -i https://pypi.tuna.tsinghua.edu.cn/simple

5) 查看安装是否成功(能否调用到dcu)

查看队列:

whichpartition

申请节点并登录计算节点,进行测试。

salloc -p 队列名 -N 1 --gres=dcu:2

登录节点(根据申请到的节点登录)

ssh 节点

切换rocm编译器版本(加载dtk22.04)

module switch compiler/dtk/22.04.1

6)在本地创建一个pytorch_env.sh的文件,添加环境变量

vi  ~/pytorch_env.sh

export
LD_LIBRARY_PATH=/public/software/apps/DeepLearning/PyTorch_Lib/lib:/public/software/apps/DeepLearning/PyTorch_Lib/lmdb-0.9.24-build/lib:/public/software/apps/DeepLearning/PyTorch_Lib/opencv-2.4.13.6-build/lib:/public/software/apps/DeepLearning/PyTorch_Lib/openblas-0.3.7-build/lib:$LD_LIBRARY_PATH

source ~/pytorch_env.sh

激活pytorch_1.10-dtk_22.04环境(登录到计算节点后会退出之前的环境,所以需要重新激活环境)

conda activate pytorch_1.10-dtk_22.04

进入环境中依次执行

python

import torch
torch.cuda.is_available()
torch.__version__

results matching ""

    No results matching ""