Journal of Applied Mathematics and Stochastic Analysis
Volume 12 (1999), Issue 2, Pages 151-160
doi:10.1155/S1048953399000155
Abstract
We consider the symmetric Poissonian two-armed bandit problem. For
the case of switching arms, only one of which creates reward, we solve explicitly the Bellman equation for a β-discounted reward and prove that a
myopic policy is optimal.