This post summarizes my logistics for debugging the default PyTorch compiler, TorchInductor.

1. Installation

1.1 Install Conda

# Follow the promptions and install the miniconda.
conda create -n pt_compiled python=3.11
conda activate pytorch

You can download the miniconda installation script for your platform from

conda install -y magma-cuda121 cmake ninja mkl mkl-include libpng libjpeg-turbo graphviz -c pytorch
# Make sure you use the correct numpy version if you'd like to use TorchBench 
pip install numpy==1.23.5

1.2 Use Nightly PyTorch and Its Dependencies

pip3 install --pre torch torchvision torchaudio --index-url

git clone
cd benchmark

1.3 Compiled From Source

For reference, see Pytorch源码编译.

I prefer the following folder structure:

├── audio
├── benchmark
├── data
├── logs   # created by myself
├── ml_scripts # cloned from
├── pytorch
├── text
└── vision

2. Testing Installation

If you installed PyTorch from source, you can use the benchmarks in the source code folder. Run the following command to test the installation of PyTorch compiler.

cd pytorch
python benchmarks/dynamo/  --performance -dcuda --inference --inductor  --only BartForCausalLM

The output is as follows.

loading model: 0it [00:03, ?it/s]
cuda eval  BartForCausalLM            
running benchmark: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:01<00:00, 22.01it/s]

There are multiple model collections(benchmark suites) you can use, such as,, and

Read Dynamo Benchmarks for more information.

3. Debugging

There are some debug flags you can set to obtain debugging information. The common flags I used are

TORCH_COMPILE_DEBUG=1 INDUCTOR_POST_FUSION_SVG=1 TORCH_LOGS="inductor,dynamo" INDUCTOR_TEST_DISABLE_FRESH_CACHE=1 python benchmarks/dynamo/  --performance -dcuda --inference --inductor  --only BartForCausalLM

Check more environment flags in

The terminal output shows a lot of information, and debugging files are generated in pytorch/benchmarks/dynamo/torch_compile_debug. The folder name follows the rule run_date_time-pid_XX.

The following is the example structure of a test run.

├── torchdynamo
│   └── debug.log
└── torchinductor
    ├── aot_resnet18___0_debug.log
    └── resnet18__0_inference_0.0
        ├── ir_post_fusion.txt
        ├── ir_pre_fusion.txt

The is the final compiled code! It contains the compiled triton kernels and wrapper code , so you can directly run it for testing! fx_graph_* shows the fx graphs generated after graph lowering, and ir_*.txt are TorchInductor IR pre- and post- operation fusions.

3.1 Debug Using VSCode

I usually debug torchinductor via VSCode. For me, GUI debugging is easier and faster than using the terminal. Here are the steps to debug torchinductor via VSCode.

First, install python extension in VSCode. It provides the python debugger.

Second, create launch.json file from the sidebar/Run and Debug/create a launch.json file. Here is an example file:

    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit:
    "version": "0.2.0",
    "configurations": [
            "name": "resnet18",
            "type": "debugpy",
            "request": "launch",
            "program": "/scratch/yhao/p9/pytorch/benchmarks/dynamo/",
            "console": "integratedTerminal",
            "justMyCode": false,
            // if you are debuging torchrun program
            // "subProcess": true,
            // "redirectOutput": true,
            "args": [
            "env": {
                "TORCH_COMPILE_DEBUG": "1",
                "INDUCTOR_POST_FUSION_SVG": "1",
                // "CUDA_VISIBLE_DEVICES": "2",
                // "INDUCTOR_WRITE_SCHEDULER_GRAPH": "1",

After creating a launch.json, the Run and Debug will automatically show the testing configs you added in this file. Add a breakpoint in this line


This function is a key point can help you understand how and what TorchInductor do.

The following screenshot shows an example of how VSCode debugging looks like.


I enabled the secondary sidebar in VSCode and placed the call stack and breakpoints there. The left sidebar includes the local/global variables, and you can add your own variables to watch.

You can use shortcuts like Step Into and Step Over to debug your PyTorch models with TorchInductor enabled. Check the menu bar/Run to see their shortcuts.




文章版权归 FindHao 所有丨本站默认采用CC-BY-NC-SA 4.0协议进行授权|
转载必须包含本声明,并以超链接形式注明作者 FindHao 和本文原始地址: