Maximum Entropy Diffusion Policies for Offline Reinforcement Learning
Title: |
Maximum Entropy Diffusion Policies for Offline Reinforcement Learning |
DNr: |
Berzelius-2024-29 |
Project Type: |
LiU Berzelius |
Principal Investigator: |
Per Mattsson <per.mattsson@it.uu.se> |
Affiliation: |
Uppsala universitet |
Duration: |
2024-01-22 – 2024-08-01 |
Classification: |
20202 |
Keywords: |
|
Abstract
The core idea is to construct a tractable stochastic differential equation (SDE) and then sample actions with its reverse-time process, as a diffusion policy. Since the proposed SDE is tractable we can obtain the log probability of the policy as well as an estimation of the action given a sampled diffusion step. The entropy term introduces diversity to pre-collected actions, yielding a robust value function estimation and expressive policy generation for offline reinforcement learning.