Table of Contents
Introduction
Training a policy using deep reinforcement learning consists of an agent interacting with the environment in a continuous loop. In practise, the agent is often modelled by deep networks that can take advantage of GPU parallelization, but the environment is still modelled by simulators that rely on the CPU. While the poor sample efficiency of RL algorithms remains a huge bottleneck, a significant amount of time is also spent on moving tensors from the CPU to the GPU, not to forget the additional delays caused by the lack of parallelism in CPU based simulation.
NVIDIA’s Isaac Gym is a simulation framework designed to address these limitations. It runs entirely
on the GPU, thus eliminating the CPU bottleneck. This post is a brief walkthrough of Isaac Gym. We
shall install isaacgym
, learn about its core principles, and train a policy for object
manipulation using the AllegroHand. To learn more about
Isaac Gym, I highly recommend watching
these videos from RSS
2021 and reading the technical paper available on arXiv.
Getting Started
To download Isaac Gym, you need to head over to NVIDIA’s
website and join the developer programme. At the time of
this writing, the latest release is Isaac Gym Preview 3, which is what we’ll be working with
throughout this post. For ease of development, I recommend using a linux based machine with NVIDIA
GPUs. Once you download and extract the archive, documentation is available at docs/index.html
. To
install, head over to the instructions at docs/install.html
.
The first thing to check after installing Isaac Gym is to make sure that it runs fine. Head over to
python/examples
and run one of the example scripts, say joint_monkey.py
. You should see the
simulation window pop up where all the joints of the humanoid are being animated.
$ cd python/examples
$ python joint_monkey.py
IsaacGymEnvs
The python/examples
directory only has a few scripts to test things out. NVIDIA has another repo
of benchmarks trained using Isaac Gym, called
IsaacGymEnvs (IGE) available on GitHub.
Follow the instructions in the
README to install IGE. To
test things out, go to the isaacgymenvs
directory and try running the training script. Once the
window loads, you should notice that the cartpole starts balancing itself pretty quickly, in around
10 seconds or so.
$ cd isaacgymenvs
$ python train.py task=Cartpole
AllegroHand
After cartpole, let’s try out something that is a bit more involved. We shall focus on object
manipulation using the Allegro Hand. IGE already has a script that we can use out of the box. To
test it out, you can simply set task=AllegroHand
while running the training script from the
previous section. By default, the script spawns 16,384 environments, and that can
be really slow to visualize. We can change this to 16 instead. IGE extensively makes use of
Hydra configs, so a lot of parameters are customizable directly through command
line arguments. Try the following command. You should now see a window with 16 environments in
parallel.
$ python train.py task=AllegroHand num_envs=16 train.params.config.minibatch_size=16
Camera Movements
At first, the simulator window might not look like Fig. 3 above. You may need to move the
camera a bit. All camera movements1 in Isaac Gym need to be performed while holding the
right mouse button. To pan or tilt the camera, simply move the mouse while holding the right mouse
button. To move forward and backward (dolly), use the W
and S
keys. To move left and right
(truck), use A
and D
. To move up and down (pedestal), use the E
and Q
keys. The next two
shortcuts are specific to IGE and don’t need the right mouse button. To stop simulation and preempt
training, press ESC
. To pause simulation but continue training, press V
.
Changing Assets
IGE uses a model of the Allegro Hand with BioTac sensors on its fingertips (Fig. 4). However, that’s an additional accessory and might not be the desired setup for many. The default fingertips look like the ones in Fig. 5, and the corresponding URDF file can be found in the official repo for the AllegroHand by Wonik Robotics (github.com/simlabrobotics/allegro_hand_ros).
After some inspection, it is easy to see that IGE also uses a URDF file to render its version of the AllegroHand, the path to which is defined in this config file. I tried replacing this URDF file with the one from simlabrobotics, but unfortunately I ran into segmentation faults.
Ultimately, I had to edit
allegro.urdf
manually and swap the BioTac fingertips with the original ones. If you plan to edit URDF files
manually, you should definitely check out
gkjohnson’s online URDF visualizer
that I found extremely helpful. To save yourself some time, you are free to use my implementation
called
allegro_ros.urdf
.
You will also need a bunch of STL files for the default fingertips for this to work correctly. For
reference, you can go through my fork
of IGE.
Walkthrough
Alright. So what’s going on here? Behind the scenes, IGE relies on this package called
rl_games
. I couldn’t find much info about this package,
except that it seems to implement a bunch of common RL algorithms particularly suited for Isaac Gym.
At the time of this writing, IGE depends on rl_games
version
1.1.4. Note how
IGE’s
train.py
essentially calls
rl_games.torch_runner.Runner
.
Following common practices, rl_games
loads algorithms dynamically according to a config object.
When you run the training script with task=AllegroHand
, hydra loads two configuration files.
$ python train.py task=AllegroHand
The first is
task/AllegroHand.yaml
(env config), which contains parameters to setup the environment, and the other is
train/AllegroHandPPO.yaml
(train config), which contains parameters to train the agent. The train config specifies
train.algo.name: a2c_continuous
.
rl_games.torch_runner.Runner
parses this config, creates an
A2CAgent
and calls
agent.train()
.
Somewhere
during training, rl_games
calls the
step()
function (shown below) of the environment defined in the env config. Two important methods to pay
attention to are pre_physics_step()
and post_physics_step()
, which contain all the environment
specific code that should run just before and after stepping through the environment.
For the AllegroHand, these methods are defined in
allegro_hand.py
.
If you’re planning to make any changes in the existing environment, this is the file you should look
at. If you’re curious, the interface between rl_games
and isaacgymenvs
is provided in
rlgames_utils.RLGPUEnv
.
Resources
- I highly recommend reading the technical paper about Isaac Gym available on arXiv.
- Google Brax is another GPU based simulation platform to look at. More info on GitHub.
- There’s also a GitHub repo of resources related to Isaac Gym available at awesome-isaac-gym.