Install dependencies and pytorch
conda create -n pt python=3.8
conda activate pt
pip install requests bs4 argparse oauthlib pyyaml
# install nightly version of pytorch. The compatible cuda version could be found at https://pytorch.org/get-started/locally/
pip install --pre torch torchvision torchtext torchaudio -f https://download.pytorch.org/whl/nightly/cu116/torch_nightly.html
Install TorchBench
conda install git-lfs
git clone [email protected]:pytorch/benchmark.git
cd benchmark
git lfs install ; git lfs fetch --all ; git lfs checkout .
# install all models to conda environment
python install.py
Update model
如果更新了某些model,比如detectron2的model,则需要
pip uninstall detectron
python install.py detectron2_fasterrcnn_r_50_fpn
Run
python3 run.py -d cuda -t train detectron2_fasterrcnn_r_50_fpn
Profile
python3 run.py -d cuda -t train --profile --profile-detailed --profile-devices cpu,gpu detectron2_fasterrcnn_r_50_fpn
默认生成profile reports在当前目录的./logs/
目录下。注意,由于pytorch profiler对于report文件名格式的限制,文件名比如以.pt.trace.json
结尾。
拷贝logs目录到本地,在本地的python环境中,执行命令:
pip install torch-tb-profiler tensorboard
在reports所在目录执行
tensorboard --logdir ./logs
tensorboard将自动解析logs目录下的所有reports,并开启一个本地的http server,默认为http://127.0.0.1:6006。在浏览器打开,即可在线查看生成的profile report。
或者直接在vscode中ctrl
+shift
+p
找到tensorboard
,来打开tensorboard。
Conda Tips
# clone env
conda create --name opt --clone pt
# remove env
conda env remove -n pt
# upgrade to latest torch
pip install --pre torch torchvision torchtext -f https://download.pytorch.org/whl/nightly/cu113/torch_nightly.html --upgrade
关闭nvidia tensorcore
export NVIDIA_TF32_OVERRIDE=0
手动下载whl包安装pytorch
由于nightly版本中,torchvision和torchtext的版本依赖不一定一致,比如torchtext-0.14.0.dev20221025-cp38-cp38-linux_x86_64.whl依赖的torch版本是1.14.0.dev20221025,但是torchvision-0.15.0.dev20221025+cu116-cp38-cp38-linux_x86_64.whl依赖的torch版本则是1.14.0.dev20221024+cu116,因此不能直接按照日期来确定安装的版本组合。
可以通过pkginfo
包来确定某个whl文件的依赖包的版本。
pip install pkginfo
pkginfo -f requires_dist torchtext-0.14.0.dev20221025-cp38-cp38-linux_x86_64.whl
# output 如下
requires_dist: ['tqdm', 'requests', 'torch (==1.14.0.dev20221025)', 'numpy']
通过pkginfo的输出结果来确定正确的nightly版本组合。
20221026的torchtext,torchvision,torchdata,和torchaudio依赖的pytorch是相同的版本,下载链接为
https://download.pytorch.org/whl/nightly/torchtext-0.14.0.dev20221026-cp38-cp38-linux_x86_64.whl
https://download.pytorch.org/whl/nightly/cu116/torchvision-0.15.0.dev20221026%2Bcu116-cp38-cp38-linux_x86_64.whl
https://download.pytorch.org/whl/nightly/cu116/torch-1.14.0.dev20221026%2Bcu116-cp38-cp38-linux_x86_64.whl
https://download.pytorch.org/whl/nightly/torchdata-0.6.0.dev20221026-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
https://download.pytorch.org/whl/nightly/cu116/torchaudio-0.14.0.dev20221026%2Bcu116-cp38-cp38-linux_x86_64.whl
20221026对应的tensorboard版本应该是2.10.0 下载脚本 https://github.com/FindHao/ml_scripts/blob/main/download.sh 安装脚本 https://github.com/FindHao/ml_scripts/blob/main/install.sh 安装cuda的脚本 https://github.com/FindHao/ml_scripts/blob/main/install_cuda.sh
Reference
https://aiqm.github.io/torchani/start.html
https://stackoverflow.com/questions/50170588/list-dependencies-of-python-wheel-file
Comments