DreamerV3 Report

Papers:

  1. RSSM

  2. DreamerV1

  3. DreamerV2

  4. DreamerV3

Projects

THere're currently 3 codebases about DreamerV3. The most readable one is sheeprl's codebase.

  1. The author's implementation, written in Jax: https://github.com/danijar/dreamerv3
  2. A PyTorch implementation by NM512: https://github.com/NM512/dreamerv3-torch
  3. A PyTorch implementation by sheeprl: https://github.com/Eclectic-Sheep/sheeprl

Besides, there're some impementations for DreamerV2 and DreamerV1.

  • adityabingi's Dreamerv1 & v2 codebase
  • EasyDreamer: A Simplified Version of the DreamerV1 Algorithm with Pytorch

There're also some explanations of the code:

  1. some code notes from a Reddit user
  2. implementations of the tricks in cleanRL, applied to PPO.: This obviously doesn't include any of the world model architecture or loss functions, just the new tricks introduced by DreamerV3

sheeprl

Installation:

Make sure you have g++ installed.

1
2
3
4
5
6
7
8
9
10
conda create -n sheeprl python=3.9
conda activate sheeprl

git clone [email protected]:Eclectic-Sheep/sheeprl.git
cd sheeprl
sudo apt install swig
pip install swig
pip install .
pip install .\[atari,box2d,dev,mujoco,test\]
pip install sheeprl\[crafter\]

Install osmesa:

1
sudo apt-get install libgl1-mesa-glx libosmesa6

Set:

1
export MUJOCO_GL=osmesa

Hafner's version

Hafner's codebase

See my fork: https://github.com/LYK-love/dreamerv3

NM512's version

Github: NM512's PyTorch implementation

1
2
3
4
5
6
7
conda create -n DreamerTorch python=3.9
conda activate DreamerTorch

git clone [email protected]:NM512/dreamerv3-torch.git
cd dreamerv3-torch
pip install setuptools==65.5.0 "wheel<0.40.0"
pip install -r requirements.txt

In addition, you need to install Atari ROMs to run Atari envs, follow here to download and install ROM:

1
2
3
wget http://www.atarimania.com/roms/Atari-2600-VCS-ROM-Collection.zip ./
unzip ./Atari-2600-VCS-ROM-Collection.zip
python -m atari_py.import_roms ./ROMS

This should print out the names of ROMs as it imports them. The ROMs will be copied to your atari_py installation directory.

Commands

Here I introduce my commands for running the codebase of sheeprl.

For Hafners'

1
python dreamerv3/train.py --logdir ./logdir --configs atari --batch_size 16 --run.train_ratio 32
1
2
3
python dreamerv3/train.py \
--logdir ~/logdir/$(date "+%Y%m%d-%H%M%S") \
--configs custom --batch_size 16 --run.train_ratio 32
1
2
3
python dreamerv3/train.py \
--logdir ./logdir/$(date "+%Y%m%d-%H%M%S") \
--configs custom --batch_size 16 --run.train_ratio 32

Tricks

Multi-GPU

Use multi -GPU supported by Ligntening Fabric

1
fabric.accelerator=gpu fabric.strategy=ddp fabric.devices=2 

Mixed-precision

Use mixed precision supported by Ligntening Fabric:

1
fabric.precision=16-mixed

Log

See logs:

1
tensorboard --logdir logs

Map generated videos to my photoview:

1
ln -s /home/lyk/Projects/sheeprl/logs $IMAGE_HOME/sheeprl_log

Commands for Hafner's

Log: --run.log_every 3

1
2
3
export CKPT="logdir/BouncingBall/checkpoint.ckpt" 
export LOGDIR="logdir/BouncingBall"
python dreamerv3/train.py --logdir $LOGDIR --configs bouncing_ball small --batch_size 16 --run.train_ratio 32 --run.log_every 3 --run.from_checkpoint $CKPT --run.log_every 3 --run.only_train True

Making ckpts:

1
WANDB_MODE=online python dreamerv3/train.py --logdir ./logdir/$(date "+%Y%m%d-%H%M%S") --configs bouncing_ball small --batch_size 16 --run.train_ratio 32 --run.steps 5000000 --run.only_train False
1
WANDB_MODE=online python dreamerv3/train.py --logdir ./logdir/$(date "+%Y%m%d-%H%M%S") --configs grid_world small --batch_size 16 --run.train_ratio 32
1
python dreamerv3/train.py --logdir ./logdir/$(date "+%Y%m%d-%H%M%S") --configs grid_world debug --batch_size 16 --run.train_ratio 32

Video pinball (If have a logfir, load it. Otherwise train it from scratch.)

1
2
export LOGDIR="logdir/VideoPinball"
WANDB_MODE=online python dreamerv3/train.py --logdir $LOGDIR --configs atari small --batch_size 16 --run.train_ratio 32

Scripts

https://github.com/Eclectic-Sheep/sheeprl/blob/main/howto/configs.md

https://github.com/Eclectic-Sheep/sheeprl/tree/main/howto

DMC

Box2D

CarRacing

CarRacing

1
python sheeprl.py exp=dreamer_v3 env=gym env.id=CarRacing-v2 algo.cnn_keys.encoder=\[rgb\] algo=dreamer_v3_XS fabric.accelerator=gpu fabric.strategy=ddp fabric.devices=2 fabric.precision=16-mixed algo.learning_starts=1024

8 GPUS:

1
python sheeprl.py exp=dreamer_v3 env=gym env.id=CarRacing-v2 algo.cnn_keys.encoder=\[rgb\] algo=dreamer_v3_XS fabric.accelerator=gpu fabric.strategy=ddp fabric.devices=2 fabric.precision=16-mixed algo.learning_starts=1024

For dev:

1
python sheeprl.py exp=dreamer_v3 env=gym env.id=CarRacing-v2 algo.cnn_keys.encoder=[rgb] algo=dreamer_v3_XS fabric.accelerator=gpu fabric.strategy=ddp fabric.devices=2 fabric.precision=16-mixed algo.learning_starts=200 env.num_envs=1

For testing videos:

1
python sheeprl.py exp=dreamer_v3 env=gym env.id=CarRacing-v2 algo.cnn_keys.encoder=\[rgb\] algo=dreamer_v3_XS fabric.accelerator=gpu fabric.strategy=ddp fabric.devices=2 fabric.precision=16-mixed env.num_envs=2  algo.learning_starts=1024 algo.total_steps=2000000

total_steps should be small.

I didn't set env.num_envs, by default it should be 4.

Eval from checkpoint:

1
2
3
export CKPT="/home/lyk/Projects/sheeprl/logs/runs/dreamer_v3/VideoPinballNoFrameskip-v4/2024-02-18_02-56-29_dreamer_v3_VideoPinballNoFrameskip-v4_42/version_0/checkpoint/ckpt_4800000_0.ckpt"

sheeprl-eval checkpoint_path=$CKPT fabric.accelerator=gpu env.capture_video=True

Or

1
2
3
4
5
seeds=(5 1024 42 1337 8 2)

for seed in "${seeds[@]}"; do
sheeprl-eval checkpoint_path=$CKPT fabric.accelerator=gpu env.capture_video=True seed=$seed
done

Atari

See gym's atari game list for all atari envs.

Alien

Single gpu:

1
python sheeprl.py exp=dreamer_v3 env=atari env.id=AlienNoFrameskip-v4 algo.cnn_keys.encoder=\[rgb\] algo.mlp_keys.encoder=\[\]  algo=dreamer_v3_XS fabric.accelerator=gpu fabric.devices=1 fabric.precision=16-mixed algo.learning_starts=1024

2 gpus:

1
python sheeprl.py exp=dreamer_v3 env=atari env.id=AlienNoFrameskip-v4 algo.cnn_keys.encoder=\[rgb\] algo.mlp_keys.encoder=\[\]  algo=dreamer_v3_XS fabric.accelerator=gpu fabric.strategy=ddp fabric.devices=8 fabric.precision=16-mixed algo.learning_starts=1024

8 gpus:

1
python sheeprl.py exp=dreamer_v3 env=atari env.id=AlienNoFrameskip-v4 algo.cnn_keys.encoder=\[rgb\] algo.mlp_keys.encoder=\[\] algo=dreamer_v3_XS fabric.accelerator=gpu fabric.strategy=ddp fabric.devices=8

For testing videos:

1
python sheeprl.py exp=dreamer_v3 env=atari env.id=AlienNoFrameskip-v4 algo.cnn_keys.encoder=\[rgb\] algo=dreamer_v3_XS fabric.accelerator=gpu fabric.strategy=ddp fabric.devices=2 fabric.precision=16-mixed env.num_envs=1  algo.learning_starts=1024 algo.total_steps=200000

Video pinball

1
python sheeprl.py exp=dreamer_v3 env=atari env.id=VideoPinballNoFrameskip-v4 algo.cnn_keys.encoder=\[rgb\] algo=dreamer_v3_XS fabric.accelerator=gpu fabric.strategy=ddp fabric.devices=2 fabric.precision=16-mixed algo.learning_starts=1024

For testing videos:

1
python sheeprl.py exp=dreamer_v3 env=atari env.id=VideoPinballNoFrameskip-v4 algo.cnn_keys.encoder=\[rgb\] algo=dreamer_v3_XS fabric.accelerator=gpu fabric.strategy=ddp fabric.devices=2 fabric.precision=16-mixed env.num_envs=1  algo.learning_starts=1024 algo.total_steps=2000000

total_steps should be small.

Alien:

Venture:

1
python sheeprl.py exp=dreamer_v3 env=atari env.id=VentureNoFrameskip-v4 algo.cnn_keys.encoder=\[rgb\] algo=dreamer_v3_M fabric.accelerator=gpu fabric.strategy=ddp fabric.devices=2 fabric.precision=16-mixed algo.learning_starts=1024

Star Gunner

1
python sheeprl.py exp=dreamer_v3 env=atari env.id=StarGunnerNoFrameskip-v4 algo.cnn_keys.encoder=\[rgb\] algo.mlp_keys.encoder=\[\] algo=dreamer_v3_M fabric.accelerator=gpu fabric.strategy=ddp fabric.devices=2 fabric.precision=16-mixed algo.learning_starts=1024

8 gpus:

1
python sheeprl.py exp=dreamer_v3 env=atari env.id=StarGunnerNoFrameskip-v4 algo.cnn_keys.encoder=\[rgb\] algo.mlp_keys.encoder=\[\] algo=dreamer_v3_XS fabric.accelerator=gpu fabric.strategy=ddp fabric.devices=8

Private Eye:

1
python sheeprl.py exp=dreamer_v3 env=atari env.id=PrivateEyeNoFrameskip-v4 algo.cnn_keys.encoder=\[rgb\] algo=dreamer_v3_M fabric.accelerator=gpu fabric.strategy=ddp fabric.devices=2 fabric.precision=16-mixed algo.learning_starts=1024

Riverraid:

1
python sheeprl.py exp=dreamer_v3 env=atari env.id=RiverraidNoFrameskip algo.cnn_keys.encoder=\[rgb\] algo=dreamer_v3_M fabric.accelerator=gpu fabric.strategy=ddp fabric.devices=2 fabric.precision=16-mixed algo.learning_starts=1024

Boxing

1
python sheeprl.py exp=dreamer_v3_100k_boxing fabric.strategy=ddp fabric.devices=8 fabric.accelerator=cuda

Crafter

1
python sheeprl.py exp=dreamer_v3_XL_crafter fabric.strategy=ddp fabric.devices=8 fabric.accelerator=cuda

Pacman

1
python sheeprl.py exp=dreamer_v3_100k_ms_pacman fabric.strategy=ddp fabric.devices=8 fabric.accelerator=cuda

Custom envs

1
pip install matplotlib

Errors

render backend:

https://github.com/google-deepmind/dm_control/issues/123

Try to use osmesa

MuJoCo/DMC supports three different OpenGL rendering backends: EGL (headless), GLFW (windowed), and OSMesa (headless). For each of them, you need to install some packages:

  • GLFW: sudo apt-get install libglfw3 libglew2.2
  • EGL: sudo apt-get install libglew2.2
  • OSMesa: sudo apt-get install libgl1-mesa-glx libosmesa6 In order to use one of these rendering backends, you need to set the MUJOCO_GL environment variable to "glfw", "egl", "osmesa", respectively.

Note

The libglew2.2 could have a different name, based on your OS (e.g., libglew2.2 is for Ubuntu 22.04.2 LTS).

It could be necessary to install also the PyOpenGL-accelerate package with the pip install PyOpenGL-accelerate command and the mesalib package with the conda install conda-forge::mesalib command.

For more information: https://github.com/deepmind/dm_control and https://mujoco.readthedocs.io/en/stable/programming/index.html#using-opengl.


When using osmesa, you may get:

1
libGL error: MESA-LOADER: failed to open swrast: /usr/lib/dri/swrast_dri.so: cannot open shared object file: No such file or directory (search paths /usr/lib/x86_64-linux-gnu/dri:\$${ORIGIN}/dri:/usr/lib/dri, suffix _dri)

Solution is copied from here. We can see that we have a swrast_dri.so in /usr/lib/x86_64-linux-gnu/dri/. So we can just create a symbol link to it:

1
2
sudo mkdir /usr/lib/dri
sudo ln -s /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so /usr/lib/dri/swrast_dri.so

If you interrupt the running command, you might get

1
Inconsistency detected by ld.so: ../sysdeps/x86_64/dl-machine.h: 534: elf_machine_rela_relative: Assertion `ELFW(R_TYPE) (reloc->r_info) == R_X86_64_RELATIVE' failed!

at next execution.

Solution: reboot the system.


Error:

1
2
Error in call to target 'gymnasium.envs.registration.make':
DependencyNotInstalled('Box2D is not installed, run `pip install gymnasium[box2d]`')

Solution:

1
pip install gymnasium[box2d]

When install box2d-py:

1
2
3
4
5
6
7
8
gcc -pthread -B /home/lyk/miniconda3/envs/sheeprl/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -Wall -fPIC -O2 -isystem /home/lyk/miniconda3/envs/sheeprl/include -I/home/lyk/miniconda3/envs/sheeprl/include -fPIC -O2 -isystem /home/lyk/miniconda3/envs/sheeprl/include -fPIC -I/home/lyk/miniconda3/envs/sheeprl/include/python3.9 -c Box2D/Box2D_wrap.cpp -o build/temp.linux-x86_64-cpython-39/Box2D/Box2D_wrap.o -I. -Wno-unused
gcc: fatal error: cannot execute ‘cc1plus’: execvp: No such file or directory
compilation terminated.
error: command '/usr/bin/gcc' failed with exit code 1
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for box2d

Solution:

This problem can happen if different versions of g++ and gcc are installed.

1
2
g++ --version
gcc --version

Reason:

Ubuntu 22.04 default GCC version does not match version that built latest default kernel. On Ubuntu 22.04, the default GNU C compiler version is gcc-11. However, it appears that the latest default kernel version (6.5. 0-14-generic as of writing this question) is built using gcc-12.


1
ValueError: bad marshal data (unknown type code)

If you get that error, the compiled version of the Python module (the .pyc file) is corrupt probably. Gentoo Linux provides python-updater, but in Debian the easier way to fix: just delete the .pyc file. If you don't know the pyc, just delete all of them (as root):

1
find <the error dir> -name '*.pyc' -delete

Training process

image-20240225085626417

I use sheeprl code base to trian the dreamer_v3_XS model (max_steps = 5,000,000), it'll take about 50 hours.

1
python sheeprl.py exp=dreamer_v3 env=atari env.id=VideoPinballNoFrameskip-v4 algo.cnn_keys.encoder=\[rgb\] algo=dreamer_v3_XS fabric.accelerator=gpu fabric.strategy=ddp fabric.devices=2 fabric.precision=16-mixed algo.learning_starts=1024

So in average, every 1,000,000 steps cost 10 hours. Meanwile, the output of the decoder is blurry and has a lot of colorful noise at step=1,200,000. I'll wait until step=5,000,000 to see if these noise still exists.