Install dependencies and pytorch

conda create -n pt python=3.8
conda activate pt
pip install requests bs4 argparse oauthlib pyyaml
# install nightly version of pytorch. The compatible cuda version could be found at https://pytorch.org/get-started/locally/
pip install --pre torch torchvision torchtext torchaudio -f https://download.pytorch.org/whl/nightly/cu116/torch_nightly.html

Install TorchBench

conda install git-lfs
git clone [email protected]:pytorch/benchmark.git
cd benchmark
git lfs install ; git lfs fetch --all ; git lfs checkout .
# install all models to conda environment
python install.py

Update model

如果更新了某些model,比如detectron2的model,则需要

pip uninstall detectron
python install.py detectron2_fasterrcnn_r_50_fpn

Run

python3 run.py -d cuda -t train detectron2_fasterrcnn_r_50_fpn

Profile

python3 run.py -d cuda -t train --profile --profile-detailed --profile-devices cpu,gpu detectron2_fasterrcnn_r_50_fpn

默认生成profile reports在当前目录的./logs/目录下。注意,由于pytorch profiler对于report文件名格式的限制,文件名比如以.pt.trace.json结尾。

拷贝logs目录到本地,在本地的python环境中,执行命令:

pip install torch-tb-profiler tensorboard

在reports所在目录执行

tensorboard --logdir ./logs

tensorboard将自动解析logs目录下的所有reports,并开启一个本地的http server,默认为http://127.0.0.1:6006。在浏览器打开,即可在线查看生成的profile report。

或者直接在vscode中ctrl+shift+p找到tensorboard,来打开tensorboard。

Conda Tips

# clone env 
conda create --name opt --clone pt
# remove env
conda env remove -n pt
# upgrade to latest torch
pip install --pre torch torchvision torchtext -f https://download.pytorch.org/whl/nightly/cu113/torch_nightly.html --upgrade

关闭nvidia tensorcore

export NVIDIA_TF32_OVERRIDE=0

手动下载whl包安装pytorch

由于nightly版本中,torchvision和torchtext的版本依赖不一定一致,比如torchtext-0.14.0.dev20221025-cp38-cp38-linux_x86_64.whl依赖的torch版本是1.14.0.dev20221025,但是torchvision-0.15.0.dev20221025+cu116-cp38-cp38-linux_x86_64.whl依赖的torch版本则是1.14.0.dev20221024+cu116,因此不能直接按照日期来确定安装的版本组合。

可以通过pkginfo包来确定某个whl文件的依赖包的版本。

pip install pkginfo
pkginfo -f requires_dist torchtext-0.14.0.dev20221025-cp38-cp38-linux_x86_64.whl
# output 如下
requires_dist: ['tqdm', 'requests', 'torch (==1.14.0.dev20221025)', 'numpy']

通过pkginfo的输出结果来确定正确的nightly版本组合。

20221026的torchtext,torchvision,torchdata,和torchaudio依赖的pytorch是相同的版本,下载链接为

https://download.pytorch.org/whl/nightly/torchtext-0.14.0.dev20221026-cp38-cp38-linux_x86_64.whl

https://download.pytorch.org/whl/nightly/cu116/torchvision-0.15.0.dev20221026%2Bcu116-cp38-cp38-linux_x86_64.whl

https://download.pytorch.org/whl/nightly/cu116/torch-1.14.0.dev20221026%2Bcu116-cp38-cp38-linux_x86_64.whl

https://download.pytorch.org/whl/nightly/torchdata-0.6.0.dev20221026-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

https://download.pytorch.org/whl/nightly/cu116/torchaudio-0.14.0.dev20221026%2Bcu116-cp38-cp38-linux_x86_64.whl

20221026对应的tensorboard版本应该是2.10.0 下载脚本 https://github.com/FindHao/ml_scripts/blob/main/download.sh 安装脚本 https://github.com/FindHao/ml_scripts/blob/main/install.sh 安装cuda的脚本 https://github.com/FindHao/ml_scripts/blob/main/install_cuda.sh

Reference

https://aiqm.github.io/torchani/start.html

https://stackoverflow.com/questions/50170588/list-dependencies-of-python-wheel-file


文章版权归 FindHao 所有丨本站默认采用CC-BY-NC-SA 4.0协议进行授权|
转载必须包含本声明,并以超链接形式注明作者 FindHao 和本文原始地址:
https://findhao.net/easycoding/2598.html

Comments