Journal Menu

Abstract and Applied Analysis
Volume 2009 (2009), Article ID 103723, 17 pages
doi:10.1155/2009/103723

Policy iteration for continuous-time average reward Markov decision processes in Polish spaces

Quanxin Zhu¹ , Xinsong Yang² and Chuangxia Huang³

¹Department of Mathematics, Ningbo University, Ningbo 315211, China
²Department of Mathematics, Honghe University, Mengzi 661100, China
³The College of Mathematics and Computing Science, Changsha University of Science and Technology, Changsha 410076, China

Abstract

We study the policy iteration algorithm (PIA) for continuous-time jump Markov decision processes in general state and action spaces. The corresponding transition rates are allowed to be $unbounded$, and the reward rates may have neither upper nor lower bounds. The criterion that we are concerned with is expected average reward. We propose a set of conditions under which we first establish the average reward optimality equation and present the PIA. Then under two $slightly$ different sets of conditions we show that the PIA yields the optimal (maximum) reward, an average optimal stationary policy, and a solution to the average reward optimality equation.

Policy iteration for continuous-time average reward Markov decision processes in Polish spaces

Quanxin Zhu1 , Xinsong Yang2 and Chuangxia Huang3

Abstract

Quanxin Zhu¹ , Xinsong Yang² and Chuangxia Huang³