| |
Abstract:
It is desirable that a complex decision-making problem in an
uncertain world be adequately modeled by a Markov Decision
Process (MDP) whose structural representation is adaptively
designed by a
parsimonious
resources allocation process. Resources include time and cost of
exploration, amount of memory and computational time allowed for
the policy or value function representation. Concerned about
making the best use of the available resources, we address the
problem of efficiently estimating where adding extra resources is
highly needed in order to improve the expected performance of the
resulting policy. Possible application in reinforcement learning
(RL), when real-world exploration is highly costly, concerns the
detection of those areas of the state-space that need primarily
to be explored in order to improve the policy. Another
application concerns approximation of continuous state-space
stochastic control problems using adaptive discretization
techniques for which highly efficient grid points allocation is
mandatory to survive high dimensionality. Maybe surprisingly
these two problems can be formulated under a common framework:
for a given resource allocation, which defines a belief state
over possible MDPs, find where adding new resources (thus
decreasing the uncertainty of some parameters -transition
probabilities or rewards) will most likely increase the expected
performance of the new policy. To do so, we use sampling
techniques for estimating the contribution of each parameter's
probability distribution function (
pdf
) to the expected loss of using an approximate policy (such as
the optimal policy of the most probable MDP) instead of the true
(but unknown) policy.
|