On the Convergence of Techniques that Improve Value Iteration

Grzes, Marek, Hoey, Jesse (2013) On the Convergence of Techniques that Improve Value Iteration. In: Neural Networks (IJCNN), The 2013 International Joint Conference. Proceedings of International Joint Conference on Neural Networks (IJCNN). . pp. 1-8. ISBN 978-1-4673-6128-6. (doi:10.1109/IJCNN.2013.6706982) (KAR id:48658)

PDF Language: English
Download this file (PDF/437kB)	Preview
Request a format suitable for use with assistive technology e.g. a screenreader
Official URL: http://dx.doi.org/10.1109/IJCNN.2013.6706982

Abstract

Prioritisation of Bellman backups or updating only a small subset of actions represent important techniques for speeding up planning in MDPs. The recent literature showed new efficient approaches which exploit these directions. Backward value iteration and backing up only the best actions were shown to lead to a significant reduction of the planning time. This paper conducts a theoretical and empirical analysis of these techniques and shows new important proofs. In particular, (1) it identifies weaker requirements for the convergence of backups based on best actions only, (2) a new method for evaluation of the Bellman error is shown for the update that updates one best action once, (3) it presents the theoretical proof of backward value iteration and establishes required initialisation, (4) and shows that the default state ordering of backups in standard value iteration can significantly influence its performance. Additionally, (5) the existing literature did not compare these methods, either empirically or analytically, against policy iteration. The rigorous empirical and novel theoretical parts of the paper reveal important associations and allow drawing guidelines on which type of value or policy iteration is suitable for a given domain. Finally, our chief message is that standard value iteration can be made far more efficient by simple modifications shown in the paper.

Item Type:	Conference or workshop item (Paper)
DOI/Identification number:	10.1109/IJCNN.2013.6706982
Subjects:	Q Science > Q Science (General) Q Science > Q Science (General) > Q335 Artificial intelligence
Divisions:	Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Depositing User:	Marek Grzes
Date Deposited:	26 May 2015 20:54 UTC
Last Modified:	05 Nov 2024 10:32 UTC
Resource URI:	https://kar.kent.ac.uk/id/eprint/48658 (The current URI for this page, for reference purposes)

University of Kent Author Information

Grzes, Marek.

Creator's ORCID:	https://orcid.org/0000-0003-4901-1539
CReDIT Contributor Roles:

Depositors only (login required):

Altmetric

Total Views

Total unique views of this page since July 2020. For more details click on the image.