PyTorch Integrationlink
IREE supports compiling and running PyTorch programs represented as
nn.Module
classes
as well as models defined using functorch
.
Prerequisiteslink
Install IREE pip packages, either from pip or by building from source:
pip install \
iree-compiler \
iree-runtime
Install torch-mlir
, necessary for
compiling PyTorch models to a format IREE is able to execute:
pip install -f https://llvm.github.io/torch-mlir/package-index/ torch-mlir
A special iree_torch
package makes it easy to compile PyTorch programs and
run them on IREE:
pip install git+https://github.com/iree-org/iree-torch.git
Running a modellink
Going from a loaded PyTorch model to one that's executing on IREE happens in four steps:
- Compile the model to MLIR
- Compile the MLIR to IREE VM flatbuffer
- Load the VM flatbuffer into IREE
- Execute the model via IREE
Note
In the following steps, we'll be borrowing the model from
this BERT colab
and assuming it is available as model
.
Compile the model to MLIRlink
First, we need to trace and compile our model to MLIR:
model = # ... the model we're compiling
example_input = # ... an input to the model with the expected shape and dtype
mlir = torch_mlir.compile(
model,
example_input,
output_type=torch_mlir.OutputType.LINALG_ON_TENSORS,
use_tracing=True)
The full list of available output types can be found here and includes linalg on tensors, mhlo, and tosa.
Compile the MLIR to an IREE VM flatbufferlink
Next, we compile the resulting MLIR to IREE's deployable file format:
iree_backend = "llvm-cpu"
iree_vmfb = iree_torch.compile_to_vmfb(mlir, iree_backend)
Here we have a choice of backend we want to target. See the Deployment Configurations section of this site for a full list of targets and configurations.
The generated flatbuffer can now be serialized and stored for another time or loaded and executed immediately.
Load the VM flatbuffer into IREElink
Next, we load the flatbuffer into the IREE runtime. iree_torch
provides a
convenience method for loading this flatbuffer from Python:
invoker = iree_torch.load_vmfb(iree_vmfb, iree_backend)
Execute the model via IREElink
Finally, we can execute the loaded model:
result = invoker.forward(example_input)
Traininglink
Training with PyTorch in IREE is supported via functorch
. The steps for
loading the model into IREE, once defined, are nearly identical to the above
example.
You can find a full end-to-end example of defining a basic regression model, training with it, and running inference on it here.
Native / On-device Traininglink
A small (~100-250KB), self-contained binary can be built for deploying to resource-constrained environments. An example illustrating this can be found in this example. This binary runs a model without a Python interpreter.
Sampleslink
Colab notebooks | |
---|---|
Inference on BERT |
Example scripts |
---|
Basic Inference and Training Example |
Native On-device Training Example |