Training a policy using deep reinforcement learning consists of an agent interacting with the environment in a continuous loop. In practise, the agent is often modelled by deep networks that can take advantage of GPU parallelization, but the environment is still modelled by simulators that rely on the CPU. While the poor sample efficiency of RL algorithms remains a huge bottleneck, a significant amount of time is also spent on moving tensors from the CPU to the GPU, not to forget the additional delays caused by the lack of parallelism in CPU based simulation.
NVIDIA’s Isaac Gym is a simulation framework designed to address these limitations. It runs entirely on the GPU, thus eliminating the CPU bottleneck. This post is a brief walkthrough of Isaac Gym. We shall install
isaacgym, learn about its core principles, and train a policy for object manipulation using the AllegroHand. To learn more about Isaac Gym, I highly recommend watching these videos from RSS 2021 and reading the technical paper available on arXiv.
To download Isaac Gym, you need to head over to NVIDIA’s website and join the developer programme. At the time of this writing, the latest release is Isaac Gym Preview 3, which is what we’ll be working with throughout this post. For ease of development, I recommend using a linux based machine with NVIDIA GPUs. Once you download and extract the archive, documentation is available at
docs/index.html. To install, head over to the instructions at
The first thing to check after installing Isaac Gym is to make sure that it runs fine. Head over to
python/examples and run one of the example scripts, say
joint_monkey.py. You should see the simulation window pop up where all the joints of the humanoid are being animated.
$ cd python/examples $ python joint_monkey.py
python/examples directory only has a few scripts to test things out. NVIDIA has another repo of benchmarks trained using Isaac Gym, called IsaacGymEnvs (IGE) available on GitHub.
Follow the instructions in the README to install IGE. To test things out, go to the
isaacgymenvs directory and try running the training script. Once the window loads, you should notice that the cartpole starts balancing itself pretty quickly, in around 10 seconds or so.
$ cd isaacgymenvs $ python train.py task=Cartpole
After cartpole, let’s try out something that is a bit more involved. We shall focus on object manipulation using the Allegro Hand. IGE already has a script that we can use out of the box. To test it out, you can simply set
task=AllegroHand while running the training script from the previous section. By default, the script spawns 16,384 environments, and that can be really slow to visualize. We can change this to 16 instead. IGE extensively makes use of Hydra configs, so a lot of parameters are customizable directly through command line arguments. Try the following command. You should now see a window with 16 environments in parallel.
$ python train.py task=AllegroHand num_envs=16 train.params.config.minibatch_size=16
At first, the simulator window might not look like Fig. 3 above. You may need to move the camera a bit. All camera movements1 in Isaac Gym need to be performed while holding the right mouse button. To pan or tilt the camera, simply move the mouse while holding the right mouse button. To move forward and backward (dolly), use the
S keys. To move left and right (truck), use
D. To move up and down (pedestal), use the
Q keys. The next two shortcuts are specific to IGE and don’t need the right mouse button. To stop simulation and preempt training, press
ESC. To pause simulation but continue training, press
IGE uses a model of the Allegro Hand with BioTac sensors on its fingertips (Fig. 4). However, that’s an additional accessory and might not be the desired setup for many. The default fingertips look like the ones in Fig. 5, and the corresponding URDF file can be found in the official repo for the AllegroHand by Wonik Robotics (github.com/simlabrobotics/allegro_hand_ros).
After some inspection, it is easy to see that IGE also uses a URDF file to render its version of the AllegroHand, the path to which is defined in this config file. I tried replacing this URDF file with the one from simlabrobotics, but unfortunately I ran into segmentation faults.
Ultimately, I had to edit
allegro.urdf manually and swap the BioTac fingertips with the original ones. If you plan to edit URDF files manually, you should definitely check out gkjohnson’s online URDF visualizer that I found extremely helpful. To save yourself some time, you are free to use my implementation called
allegro_ros.urdf. You will also need a bunch of STL files for the default fingertips for this to work correctly. For reference, you can go through my fork of IGE.
Alright. So what’s going on here? Behind the scenes, IGE relies on this package called
rl_games. I couldn’t find much info about this package, except that it seems to implement a bunch of common RL algorithms particularly suited for Isaac Gym. At the time of this writing, IGE depends on
rl_games version 1.1.4. Note how IGE’s
train.py essentially calls
Following common practices,
rl_games loads algorithms dynamically according to a config object. When you run the training script with
task=AllegroHand, hydra loads two configuration files.
$ python train.py task=AllegroHand
The first is
task/AllegroHand.yaml (env config), which contains parameters to setup the environment, and the other is
train/AllegroHandPPO.yaml (train config), which contains parameters to train the agent. The train config specifies
rl_games.torch_runner.Runner parses this config, creates an
A2CAgent and calls
Somewhere during training,
rl_games calls the
step() function (shown below) of the environment defined in the env config. Two important methods to pay attention to are
post_physics_step(), which contain all the environment specific code that should run just before and after stepping through the environment.
For the AllegroHand, these methods are defined in
allegro_hand.py. If you’re planning to make any changes in the existing environment, this is the file you should look at.
If you’re curious, the interface between
isaacgymenvs is provided in
This post is a work in progress. More updates coming soon.