A Walkthrough of Isaac Gym

Table of Contents

Introduction

Training a policy using deep reinforcement learning consists of an agent interacting with the environment in a continuous loop. In practise, the agent is often modelled by deep networks that can take advantage of GPU parallelization, but the environment is still modelled by simulators that rely on the CPU. While the poor sample efficiency of RL algorithms remains a huge bottleneck, a significant amount of time is also spent on moving tensors from the CPU to the GPU, not to forget the additional delays caused by the lack of parallelism in CPU based simulation.

NVIDIA’s Isaac Gym is a simulation framework designed to address these limitations. It runs entirely on the GPU, thus eliminating the CPU bottleneck. This post is a brief walkthrough of Isaac Gym. We shall install isaacgym, learn about its core principles, and train a policy for object manipulation using the AllegroHand. To learn more about Isaac Gym, I highly recommend watching these videos from RSS 2021 and reading the technical paper available on arXiv.

Getting Started

To download Isaac Gym, you need to head over to NVIDIA’s website and join the developer programme. At the time of this writing, the latest release is Isaac Gym Preview 3, which is what we’ll be working with throughout this post. For ease of development, I recommend using a linux based machine with NVIDIA GPUs. Once you download and extract the archive, documentation is available at docs/index.html. To install, head over to the instructions at docs/install.html.

The first thing to check after installing Isaac Gym is to make sure that it runs fine. Head over to python/examples and run one of the example scripts, say joint_monkey.py. You should see the simulation window pop up where all the joints of the humanoid are being animated.

$ cd python/examples
$ python joint_monkey.py
Fig 1: Joint Monkey from isaacgym3/python/examples/joint_monkey.py.

IsaacGymEnvs

The python/examples directory only has a few scripts to test things out. NVIDIA has another repo of benchmarks trained using Isaac Gym, called IsaacGymEnvs (IGE) available on GitHub.

Follow the instructions in the README to install IGE. To test things out, go to the isaacgymenvs directory and try running the training script. Once the window loads, you should notice that the cartpole starts balancing itself pretty quickly, in around 10 seconds or so.

$ cd isaacgymenvs
$ python train.py task=Cartpole
Fig 2: Cartpole training on IsaacGymEnvs.

AllegroHand

After cartpole, let’s try out something that is a bit more involved. We shall focus on object manipulation using the Allegro Hand. IGE already has a script that we can use out of the box. To test it out, you can simply set task=AllegroHand while running the training script from the previous section. By default, the script spawns 16,384 environments, and that can be really slow to visualize. We can change this to 16 instead. IGE extensively makes use of Hydra configs, so a lot of parameters are customizable directly through command line arguments. Try the following command. You should now see a window with 16 environments in parallel.

$ python train.py task=AllegroHand num_envs=16 train.params.config.minibatch_size=16
Fig 3: AllegroHand on IsaacGymEnvs

Camera Movements

At first, the simulator window might not look like Fig. 3 above. You may need to move the camera a bit. All camera movements1 in Isaac Gym need to be performed while holding the right mouse button. To pan or tilt the camera, simply move the mouse while holding the right mouse button. To move forward and backward (dolly), use the W and S keys. To move left and right (truck), use A and D. To move up and down (pedestal), use the E and Q keys. The next two shortcuts are specific to IGE and don’t need the right mouse button. To stop simulation and preempt training, press ESC. To pause simulation but continue training, press V.

Changing Assets

IGE uses a model of the Allegro Hand with BioTac sensors on its fingertips (Fig. 4). However, that’s an additional accessory and might not be the desired setup for many. The default fingertips look like the ones in Fig. 5, and the corresponding URDF file can be found in the official repo for the AllegroHand by Wonik Robotics (github.com/simlabrobotics/allegro_hand_ros).

Fig. 4: AllegroHand with BioTac fingertips
Fig. 5: AllegroHand with default fingertips

After some inspection, it is easy to see that IGE also uses a URDF file to render its version of the AllegroHand, the path to which is defined in this config file. I tried replacing this URDF file with the one from simlabrobotics, but unfortunately I ran into segmentation faults.

Ultimately, I had to edit allegro.urdf manually and swap the BioTac fingertips with the original ones. If you plan to edit URDF files manually, you should definitely check out gkjohnson’s online URDF visualizer that I found extremely helpful. To save yourself some time, you are free to use my implementation called allegro_ros.urdf. You will also need a bunch of STL files for the default fingertips for this to work correctly. For reference, you can go through my fork of IGE.

Walkthrough

Alright. So what’s going on here? Behind the scenes, IGE relies on this package called rl_games. I couldn’t find much info about this package, except that it seems to implement a bunch of common RL algorithms particularly suited for Isaac Gym. At the time of this writing, IGE depends on rl_games version 1.1.4. Note how IGE’s train.py essentially calls rl_games.torch_runner.Runner.

Following common practices, rl_games loads algorithms dynamically according to a config object. When you run the training script with task=AllegroHand, hydra loads two configuration files.

$ python train.py task=AllegroHand

The first is task/AllegroHand.yaml (env config), which contains parameters to setup the environment, and the other is train/AllegroHandPPO.yaml (train config), which contains parameters to train the agent. The train config specifies train.algo.name: a2c_continuous. rl_games.torch_runner.Runner parses this config, creates an A2CAgent and calls agent.train().

Somewhere during training, rl_games calls the step() function (shown below) of the environment defined in the env config. Two important methods to pay attention to are pre_physics_step() and post_physics_step(), which contain all the environment specific code that should run just before and after stepping through the environment.

For the AllegroHand, these methods are defined in allegro_hand.py. If you’re planning to make any changes in the existing environment, this is the file you should look at. If you’re curious, the interface between rl_games and isaacgymenvs is provided in rlgames_utils.RLGPUEnv.

Resources


  1. Read more about the different types of camera movements here ↩︎