Maximum Entropy Diffusion Policies for Offline Reinforcement Learning
Title: Maximum Entropy Diffusion Policies for Offline Reinforcement Learning
DNr: Berzelius-2024-29
Project Type: LiU Berzelius
Principal Investigator: Per Mattsson <per.mattsson@it.uu.se>
Affiliation: Uppsala universitet
Duration: 2024-01-22 – 2024-08-01
Classification: 20202
Keywords:

Abstract

The core idea is to construct a tractable stochastic differential equation (SDE) and then sample actions with its reverse-time process, as a diffusion policy. Since the proposed SDE is tractable we can obtain the log probability of the policy as well as an estimation of the action given a sampled diffusion step. The entropy term introduces diversity to pre-collected actions, yielding a robust value function estimation and expressive policy generation for offline reinforcement learning.