Grzes, Marek, Poupart, Pascal (2014) POMDP Planning and Execution in an Augmented Space. In: Alessio Lomuscio, Paul Scerri, Ana Bazzan, and Michael Huhns (eds.), Proceedings of the 13th International Con- ference on Autonomous Agents and Multiagent Systems (AAMAS 2014). Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS). . pp. 757-764. (Access to this publication is currently restricted. You may be able to access a copy if URLs are provided) (KAR id:48655)
PDF (Restricted due to publisher copyright policy)
Publisher pdf
Language: English Restricted to Repository staff only |
|
|
|
Official URL: http://www.aamas-conference.org/Proceedings/aamas2... |
Abstract
In planning with partially observable Markov decision processes, pre-compiled policies are often represented as finite state controllers or sets of alpha-vectors, which provide a lower bound on the value of the optimal policy. Some algorithms (e.g., HSVI2, SARSOP, GapMin) also compute an upper bound to guide the search and to offer performance guarantees, but they do not derive a policy from this upper bound due to computational reasons. The execution of a policy derived from an upper bound requires a one step lookahead simulation to determine the next best action and the evaluation of the upper bound at the reachable beliefs is complicated and costly (i.e., linear programming or sawtoooth approximation). The first aim of this paper is to show principled and computationally cheap ways of executing upper bound policies which can be even faster than executing lower bound policies based on alpha vectors. The second complementary contribution is a new method to find better upper bound policies that outperforms those obtained by existing algorithms, such as HSVI2, SARSOP, or GapMin, on a suite of benchmarks. Our approach is based on a novel synthesis of augmented and deterministic POMDPs and it facilitates efficient optimization of upper bound policies.
Item Type: | Conference or workshop item (Paper) |
---|---|
Uncontrolled keywords: | Planning under uncertainty; POMDP; Point-based value iteration |
Subjects: |
Q Science Q Science > Q Science (General) > Q335 Artificial intelligence |
Divisions: | Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing |
Depositing User: | Marek Grzes |
Date Deposited: | 26 May 2015 20:05 UTC |
Last Modified: | 05 Nov 2024 10:32 UTC |
Resource URI: | https://kar.kent.ac.uk/id/eprint/48655 (The current URI for this page, for reference purposes) |
- Export to:
- RefWorks
- EPrints3 XML
- BibTeX
- CSV
- Depositors only (login required):