Deconstructing deep active inference: a contrarian information gatherer

Champion, Théophile, Grzés, Marek, Bonheme, Lisa, Bowman, Howard (2024) Deconstructing deep active inference: a contrarian information gatherer. Neural Computation, 36 (11). pp. 2403-2445. ISSN 0899-7667. E-ISSN 1530-888X. (doi:10.1162/neco_a_01697) (KAR id:106996)

PDF Author's Accepted Manuscript Language: English
Download this file (PDF/3MB)	Preview
Request a format suitable for use with assistive technology e.g. a screenreader
Official URL: https://doi.org/10.1162/neco_a_01697

Abstract

Active inference is a theory of perception, learning, and decision making that can be applied to neuroscience, robotics, psychology, and machine learning. Recently, intensive research has been taking place to scale up this framework using Monte Carlo tree search and deep learning. The goal of this activity is to solve more complicated tasks using deep active inference. First, we review the existing literature and then progressively build a deep active inference agent as follows: we (1) implement a variational autoencoder (VAE), (2) implement a deep hidden Markov model (HMM), and (3) implement a deep critical hidden Markov model (CHMM). For the CHMM, we implemented two versions, one minimizing expected free energy, CHMM[EFE] and one maximizing rewards, CHMM[reward]. Then we experimented with three different action selection strategies: the ε-greedy algorithm as well as softmax and best action selection. According to our experiments, the models able to solve the dSprites environment are the ones that maximize rewards. On further inspection, we found that the CHMM minimizing expected free energy almost always picks the same action, which makes it unable to solve the dSprites environment. In contrast, the CHMM maximizing reward keeps on selecting all the actions, enabling it to successfully solve the task. The only difference between those two CHMMs is the epistemic value, which aims to make the outputs of the transition and encoder networks as close as possible. Thus, the CHMM minimizing expected free energy repeatedly picks a single action and becomes an expert at predicting the future when selecting this action. This effectively makes the KL divergence between the output of the transition and encoder networks small. Additionally, when selecting the action down the average reward is zero, while for all the other actions, the expected reward will be negative. Therefore, if the CHMM has to stick to a single action to keep the KL divergence small, then the action down is the most rewarding. We also show in simulation that the epistemic value used in deep active inference can behave degenerately and in certain circumstances effectively lose, rather than gain, information. As the agent minimizing EFE is not able to explore its environment, the appropriate formulation of the epistemic value in deep active inference remains an open question.

Item Type:	Article
DOI/Identification number:	10.1162/neco_a_01697
Subjects:	Q Science > QA Mathematics (inc Computing science)
Divisions:	Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Funders:	University of Kent (https://ror.org/00xkeyj56)
SWORD Depositor:	JISC Publications Router
Depositing User:	JISC Publications Router
Date Deposited:	23 Aug 2024 13:49 UTC
Last Modified:	05 Nov 2024 13:12 UTC
Resource URI:	https://kar.kent.ac.uk/id/eprint/106996 (The current URI for this page, for reference purposes)

University of Kent Author Information

Grzés, Marek.

Creator's ORCID:	https://orcid.org/0000-0003-4901-1539
CReDIT Contributor Roles:

Bonheme, Lisa.

Creator's ORCID:
CReDIT Contributor Roles:

Depositors only (login required):

Altmetric

Total Views

Total unique views of this page since July 2020. For more details click on the image.