Abstract and Applied Analysis
Volume 2009 (2009), Article ID 103723, 17 pages
doi:10.1155/2009/103723
  
     
          
          Policy iteration for continuous-time average reward Markov decision processes in Polish spaces
          
            Quanxin Zhu1
            , Xinsong Yang2
             and Chuangxia Huang3
          
          1Department of Mathematics, Ningbo University, Ningbo 315211, China
          2Department of Mathematics, Honghe University, Mengzi 661100, China
          3The College of Mathematics and Computing Science, Changsha University of Science and Technology, Changsha 410076, China
          
          Abstract
We study the policy iteration algorithm (PIA) for continuous-time jump Markov decision processes in general state and action spaces. The corresponding transition rates are allowed to be $unbounded$, and the reward rates may have neither upper nor lower bounds. The criterion that we are concerned with is expected average reward. We propose a set of conditions under which we first establish the average reward optimality equation and present the PIA. Then under two $slightly$ different sets of conditions we show that the PIA yields the optimal (maximum) reward, an average optimal stationary policy, and a solution to the average reward optimality equation.