Innovative Design Choices in a New Reinforcement Learning Algorithm

From Mridutpal Bhattacharyya, Chief Policy Advisor

Jul 01, 2024

We are excited to share insights from a recent paper introducing a novel reinforcement learning (RL) algorithm. The full paper is available here. This algorithm focuses on distinguishing between the deterministic and stochastic components of world modeling. Although designed for RL, its foundational concepts can be applied to various machine learning challenges.

Balancing Uncertainty in Machine Learning

A robust world model must capture both deterministic dynamics (such as physical laws) and stochastic dynamics (like random events). Given the inherent stochasticity of machine learning, accurately modeling deterministic dynamics is a challenge. The paper addresses this by using a discrete autoencoder with latent variables represented as discrete tokens, which successfully learns deterministic dynamics.

Example: Imagine trying to predict the movement of a pendulum. The deterministic part is the pendulum's swing, governed by gravity and its length. The autoencoder acts like a set of rules that capture this swinging motion precisely. However, if a gust of wind occasionally pushes the pendulum, that's the stochastic part, and it needs a different approach to model.

Future Predictions with Transformers

The paper also tackles the challenge of making mid- to long-term predictions despite the uncertainty introduced by random events. To manage this, it uses an autoregressive transformer. Inspired by MPEG’s I-frame, they introduce I-tokens that periodically reset the system to a deterministic state. These I-tokens summarize the history up to their point of insertion, reducing the transformer's need to attend to earlier tokens and enhancing performance.

Example: Think of a weather forecast. The weather has predictable patterns (deterministic), like temperature changes through the day. But sudden rain showers (stochastic events) make predictions tricky. I-tokens work like weather stations that periodically update the forecast, summarizing past weather and resetting the prediction model to improve accuracy.

Integrating Autoencoder and Transformer

The paper effectively integrates the autoencoder and transformer by conditioning the autoencoder on past tokens. This integration allows the tokens generated by the autoencoder to serve as a vocabulary for the transformer, improving the model's ability to manage both deterministic and stochastic elements.

Example: Imagine teaching a robot to understand a story. The autoencoder learns basic story elements (deterministic), like characters and setting. The transformer, using these elements as building blocks, can then predict the plot's progression even if unexpected twists (stochastic events) occur. The autoencoder’s tokens act like a dictionary for the transformer, making sense of the story as it unfolds.

This innovative approach offers a promising method for modeling complex systems in reinforcement learning and other areas of machine learning.

This post is inspired by and adapted from content originally shared on LinkedIn: [https://www.linkedin.com/feed/update/urn:li:activity:7212938982202994688]

Innovative Design Choices in a New Reinforcement Learning Algorithm

From Mridutpal Bhattacharyya, Chief Policy Advisor

Discussion about this post