DreamerV3 Report

Posted on 2024-02-25 Edited on 2025-05-08 In Research Views: 398

Papers:

Projects

THere're currently 3 codebases about DreamerV3. The most readable one is sheeprl's codebase.

The author's implementation, written in Jax: https://github.com/danijar/dreamerv3
A PyTorch implementation by NM512: https://github.com/NM512/dreamerv3-torch
A PyTorch implementation by sheeprl: https://github.com/Eclectic-Sheep/sheeprl

Besides, there're some impementations for DreamerV2 and DreamerV1.

There're also some explanations of the code:

some code notes from a Reddit user
implementations of the tricks in cleanRL, applied to PPO.: This obviously doesn't include any of the world model architecture or loss functions, just the new tricks introduced by DreamerV3

sheeprl

Installation:

Make sure you have g++ installed.

conda create -n sheeprl python=3.9
conda activate sheeprl

git clone [email protected]:Eclectic-Sheep/sheeprl.git
cd sheeprl
sudo apt install swig
pip install swig
pip install .
pip install .\[atari,box2d,dev,mujoco,test\]
pip install sheeprl\[crafter\]

Install osmesa:

1	sudo apt-get install libgl1-mesa-glx libosmesa6

Set:

1	export MUJOCO_GL=osmesa

Hafner's version

Hafner's codebase

See my fork: https://github.com/LYK-love/dreamerv3

NM512's version

Github: NM512's PyTorch implementation

conda create -n DreamerTorch python=3.9
conda activate DreamerTorch

git clone [email protected]:NM512/dreamerv3-torch.git
cd dreamerv3-torch
pip install setuptools==65.5.0 "wheel<0.40.0"
pip install -r requirements.txt

In addition, you need to install Atari ROMs to run Atari envs, follow here to download and install ROM:

1
2
3

wget http://www.atarimania.com/roms/Atari-2600-VCS-ROM-Collection.zip ./
unzip ./Atari-2600-VCS-ROM-Collection.zip
python -m atari_py.import_roms ./ROMS

This should print out the names of ROMs as it imports them. The ROMs will be copied to your atari_py installation directory.

Commands

Here I introduce my commands for running the codebase of sheeprl.

For Hafners'

1	python dreamerv3/train.py --logdir ./logdir --configs atari --batch_size 16 --run.train_ratio 32

1
2
3

python dreamerv3/train.py \
  --logdir ~/logdir/$(date "+%Y%m%d-%H%M%S") \
  --configs custom --batch_size 16 --run.train_ratio 32

1
2
3

python dreamerv3/train.py \
  --logdir ./logdir/$(date "+%Y%m%d-%H%M%S") \
  --configs custom --batch_size 16 --run.train_ratio 32

Tricks

Multi-GPU

Use multi -GPU supported by Ligntening Fabric

1	fabric.accelerator=gpu fabric.strategy=ddp fabric.devices=2

Mixed-precision

Use mixed precision supported by Ligntening Fabric:

1	fabric.precision=16-mixed

Log

See logs:

1	tensorboard --logdir logs

Map generated videos to my photoview:

1	ln -s /home/lyk/Projects/sheeprl/logs $IMAGE_HOME/sheeprl_log

Commands for Hafner's

Log: --run.log_every 3

1
2
3

export CKPT="logdir/BouncingBall/checkpoint.ckpt" 
export LOGDIR="logdir/BouncingBall"
python dreamerv3/train.py --logdir $LOGDIR --configs bouncing_ball small --batch_size 16 --run.train_ratio 32 --run.log_every 3 --run.from_checkpoint $CKPT --run.log_every 3 --run.only_train True

Making ckpts:

1	WANDB_MODE=online python dreamerv3/train.py --logdir ./logdir/$(date "+%Y%m%d-%H%M%S") --configs bouncing_ball small --batch_size 16 --run.train_ratio 32 --run.steps 5000000 --run.only_train False

1	WANDB_MODE=online python dreamerv3/train.py --logdir ./logdir/$(date "+%Y%m%d-%H%M%S") --configs grid_world small --batch_size 16 --run.train_ratio 32

1	python dreamerv3/train.py --logdir ./logdir/$(date "+%Y%m%d-%H%M%S") --configs grid_world debug --batch_size 16 --run.train_ratio 32

Video pinball (If have a logfir, load it. Otherwise train it from scratch.)

1 2	export LOGDIR="logdir/VideoPinball" WANDB_MODE=online python dreamerv3/train.py --logdir $LOGDIR --configs atari small --batch_size 16 --run.train_ratio 32

Scripts

https://github.com/Eclectic-Sheep/sheeprl/blob/main/howto/configs.md

https://github.com/Eclectic-Sheep/sheeprl/tree/main/howto

DMC

Box2D

CarRacing

python sheeprl.py exp=dreamer_v3 env=gym env.id=CarRacing-v2 algo.cnn_keys.encoder=\[rgb\] algo=dreamer_v3_XS fabric.accelerator=gpu fabric.strategy=ddp fabric.devices=2 fabric.precision=16-mixed algo.learning_starts=1024

8 GPUS:

python sheeprl.py exp=dreamer_v3 env=gym env.id=CarRacing-v2 algo.cnn_keys.encoder=\[rgb\] algo=dreamer_v3_XS fabric.accelerator=gpu fabric.strategy=ddp fabric.devices=2 fabric.precision=16-mixed algo.learning_starts=1024

For dev:

python sheeprl.py exp=dreamer_v3 env=gym env.id=CarRacing-v2 algo.cnn_keys.encoder=[rgb] algo=dreamer_v3_XS fabric.accelerator=gpu fabric.strategy=ddp fabric.devices=2 fabric.precision=16-mixed algo.learning_starts=200 env.num_envs=1

For testing videos:

python sheeprl.py exp=dreamer_v3 env=gym env.id=CarRacing-v2 algo.cnn_keys.encoder=\[rgb\] algo=dreamer_v3_XS fabric.accelerator=gpu fabric.strategy=ddp fabric.devices=2 fabric.precision=16-mixed env.num_envs=2  algo.learning_starts=1024 algo.total_steps=2000000

total_steps should be small.

I didn't set env.num_envs, by default it should be 4.

Eval from checkpoint:

1
2
3

export CKPT="/home/lyk/Projects/sheeprl/logs/runs/dreamer_v3/VideoPinballNoFrameskip-v4/2024-02-18_02-56-29_dreamer_v3_VideoPinballNoFrameskip-v4_42/version_0/checkpoint/ckpt_4800000_0.ckpt"

sheeprl-eval checkpoint_path=$CKPT fabric.accelerator=gpu env.capture_video=True

seeds=(5 1024 42 1337 8 2)

for seed in "${seeds[@]}"; do
  sheeprl-eval checkpoint_path=$CKPT fabric.accelerator=gpu env.capture_video=True seed=$seed
done

Atari

See gym's atari game list for all atari envs.

Alien

Single gpu:

python sheeprl.py exp=dreamer_v3 env=atari env.id=AlienNoFrameskip-v4 algo.cnn_keys.encoder=\[rgb\] algo.mlp_keys.encoder=\[\]  algo=dreamer_v3_XS fabric.accelerator=gpu fabric.devices=1 fabric.precision=16-mixed algo.learning_starts=1024

2 gpus:

python sheeprl.py exp=dreamer_v3 env=atari env.id=AlienNoFrameskip-v4 algo.cnn_keys.encoder=\[rgb\] algo.mlp_keys.encoder=\[\]  algo=dreamer_v3_XS fabric.accelerator=gpu fabric.strategy=ddp fabric.devices=8 fabric.precision=16-mixed algo.learning_starts=1024

8 gpus:

python sheeprl.py exp=dreamer_v3 env=atari env.id=AlienNoFrameskip-v4 algo.cnn_keys.encoder=\[rgb\] algo.mlp_keys.encoder=\[\] algo=dreamer_v3_XS fabric.accelerator=gpu fabric.strategy=ddp fabric.devices=8

For testing videos:

python sheeprl.py exp=dreamer_v3 env=atari env.id=AlienNoFrameskip-v4 algo.cnn_keys.encoder=\[rgb\] algo=dreamer_v3_XS fabric.accelerator=gpu fabric.strategy=ddp fabric.devices=2 fabric.precision=16-mixed env.num_envs=1  algo.learning_starts=1024 algo.total_steps=200000

Video pinball

python sheeprl.py exp=dreamer_v3 env=atari env.id=VideoPinballNoFrameskip-v4 algo.cnn_keys.encoder=\[rgb\] algo=dreamer_v3_XS fabric.accelerator=gpu fabric.strategy=ddp fabric.devices=2 fabric.precision=16-mixed algo.learning_starts=1024

For testing videos:

python sheeprl.py exp=dreamer_v3 env=atari env.id=VideoPinballNoFrameskip-v4 algo.cnn_keys.encoder=\[rgb\] algo=dreamer_v3_XS fabric.accelerator=gpu fabric.strategy=ddp fabric.devices=2 fabric.precision=16-mixed env.num_envs=1  algo.learning_starts=1024 algo.total_steps=2000000

total_steps should be small.

Alien:

Venture:

python sheeprl.py exp=dreamer_v3 env=atari env.id=VentureNoFrameskip-v4 algo.cnn_keys.encoder=\[rgb\] algo=dreamer_v3_M fabric.accelerator=gpu fabric.strategy=ddp fabric.devices=2 fabric.precision=16-mixed algo.learning_starts=1024

Star Gunner

python sheeprl.py exp=dreamer_v3 env=atari env.id=StarGunnerNoFrameskip-v4 algo.cnn_keys.encoder=\[rgb\] algo.mlp_keys.encoder=\[\] algo=dreamer_v3_M fabric.accelerator=gpu fabric.strategy=ddp fabric.devices=2 fabric.precision=16-mixed algo.learning_starts=1024

8 gpus:

python sheeprl.py exp=dreamer_v3 env=atari env.id=StarGunnerNoFrameskip-v4 algo.cnn_keys.encoder=\[rgb\] algo.mlp_keys.encoder=\[\] algo=dreamer_v3_XS fabric.accelerator=gpu fabric.strategy=ddp fabric.devices=8

Private Eye:

python sheeprl.py exp=dreamer_v3 env=atari env.id=PrivateEyeNoFrameskip-v4 algo.cnn_keys.encoder=\[rgb\] algo=dreamer_v3_M fabric.accelerator=gpu fabric.strategy=ddp fabric.devices=2 fabric.precision=16-mixed algo.learning_starts=1024

Riverraid:

python sheeprl.py exp=dreamer_v3 env=atari env.id=RiverraidNoFrameskip algo.cnn_keys.encoder=\[rgb\] algo=dreamer_v3_M fabric.accelerator=gpu fabric.strategy=ddp fabric.devices=2 fabric.precision=16-mixed algo.learning_starts=1024

Boxing

1	python sheeprl.py exp=dreamer_v3_100k_boxing fabric.strategy=ddp fabric.devices=8 fabric.accelerator=cuda

Crafter

1	python sheeprl.py exp=dreamer_v3_XL_crafter fabric.strategy=ddp fabric.devices=8 fabric.accelerator=cuda

Pacman

1	python sheeprl.py exp=dreamer_v3_100k_ms_pacman fabric.strategy=ddp fabric.devices=8 fabric.accelerator=cuda

Custom envs

1	pip install matplotlib

Errors

render backend:

https://github.com/google-deepmind/dm_control/issues/123

Try to use osmesa

MuJoCo/DMC supports three different OpenGL rendering backends: EGL (headless), GLFW (windowed), and OSMesa (headless). For each of them, you need to install some packages:

GLFW: sudo apt-get install libglfw3 libglew2.2
EGL: sudo apt-get install libglew2.2
OSMesa: sudo apt-get install libgl1-mesa-glx libosmesa6 In order to use one of these rendering backends, you need to set the MUJOCO_GL environment variable to "glfw", "egl", "osmesa", respectively.

Note

The libglew2.2 could have a different name, based on your OS (e.g., libglew2.2 is for Ubuntu 22.04.2 LTS).

It could be necessary to install also the PyOpenGL-accelerate package with the pip install PyOpenGL-accelerate command and the mesalib package with the conda install conda-forge::mesalib command.

For more information: https://github.com/deepmind/dm_control and https://mujoco.readthedocs.io/en/stable/programming/index.html#using-opengl.

When using osmesa, you may get:

libGL error: MESA-LOADER: failed to open swrast: /usr/lib/dri/swrast_dri.so: cannot open shared object file: No such file or directory (search paths /usr/lib/x86_64-linux-gnu/dri:\$${ORIGIN}/dri:/usr/lib/dri, suffix _dri)

Solution is copied from here. We can see that we have a swrast_dri.so in /usr/lib/x86_64-linux-gnu/dri/. So we can just create a symbol link to it:

1 2	sudo mkdir /usr/lib/dri sudo ln -s /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so /usr/lib/dri/swrast_dri.so

If you interrupt the running command, you might get

1	Inconsistency detected by ld.so: ../sysdeps/x86_64/dl-machine.h: 534: elf_machine_rela_relative: Assertion `ELFW(R_TYPE) (reloc->r_info) == R_X86_64_RELATIVE' failed!

at next execution.

Solution: reboot the system.

Error:

1 2	Error in call to target 'gymnasium.envs.registration.make': DependencyNotInstalled('Box2D is not installed, run `pip install gymnasium[box2d]`')

Solution:

1	pip install gymnasium[box2d]

When install box2d-py:

gcc -pthread -B /home/lyk/miniconda3/envs/sheeprl/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -Wall -fPIC -O2 -isystem /home/lyk/miniconda3/envs/sheeprl/include -I/home/lyk/miniconda3/envs/sheeprl/include -fPIC -O2 -isystem /home/lyk/miniconda3/envs/sheeprl/include -fPIC -I/home/lyk/miniconda3/envs/sheeprl/include/python3.9 -c Box2D/Box2D_wrap.cpp -o build/temp.linux-x86_64-cpython-39/Box2D/Box2D_wrap.o -I. -Wno-unused
      gcc: fatal error: cannot execute ‘cc1plus’: execvp: No such file or directory
      compilation terminated.
      error: command '/usr/bin/gcc' failed with exit code 1
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for box2d

Solution:

This problem can happen if different versions of g++ and gcc are installed.

1 2	g++ --version gcc --version

Reason:

Ubuntu 22.04 default GCC version does not match version that built latest default kernel. On Ubuntu 22.04, the default GNU C compiler version is gcc-11. However, it appears that the latest default kernel version (6.5. 0-14-generic as of writing this question) is built using gcc-12.

1	ValueError: bad marshal data (unknown type code)

If you get that error, the compiled version of the Python module (the .pyc file) is corrupt probably. Gentoo Linux provides python-updater, but in Debian the easier way to fix: just delete the .pyc file. If you don't know the pyc, just delete all of them (as root):

1	find <the error dir> -name '*.pyc' -delete

Training process

I use sheeprl code base to trian the dreamer_v3_XS model (max_steps = 5,000,000), it'll take about 50 hours.

python sheeprl.py exp=dreamer_v3 env=atari env.id=VideoPinballNoFrameskip-v4 algo.cnn_keys.encoder=\[rgb\] algo=dreamer_v3_XS fabric.accelerator=gpu fabric.strategy=ddp fabric.devices=2 fabric.precision=16-mixed algo.learning_starts=1024

So in average, every 1,000,000 steps cost 10 hours. Meanwile, the output of the decoder is blurry and has a lot of colorful noise at step=1,200,000. I'll wait until step=5,000,000 to see if these noise still exists.