Home   >   CSC-OpenAccess Library   >    Manuscript Information
Full Text Available

(327.1KB)
This is an Open Access publication published under CSC-OpenAccess Policy.
An Algorithm of Policy Gradient Reinforcement Learning with a Fuzzy Controller in Policies
Harukazu Igarashi, Seiji Ishihara
Pages - 17 - 26     |    Revised - 05-042013     |    Published - 30-04-2013
Volume - 4   Issue - 1    |    Publication Date - April 2013  Table of Contents
MORE INFORMATION
KEYWORDS
Reinforcement Learning, Policy Gradient Method, Fuzzy Inference, Membership Function
ABSTRACT
Typical fuzzy reinforcement learning algorithms take value-function based approaches, such as fuzzy Q-learning in Markov Decision Processes (MDPs), and use constant or linear functions in the consequent parts of fuzzy rules. Instead of taking such approaches, we propose a fuzzy reinforcement learning algorithm in another approach. That is the policy gradient approach. Our method can handle fuzzy sets even in the consequent part and also learn the rule weights of fuzzy rules. Specifically, we derived learning rules of membership functions and rule weights for both cases when input/output variables to/from the control system are discrete and continuous.
CITED BY (1)  
1 Sugimoto Masaya, Igarashi Harukazu, Ishihara Seiji, & Tanaka Ichi-ki (2014) fuzzy control strategy gradient method with the difference between the approach expressed by the rule:. Action decision in RoboCup small size league intelligence and information, 26 (3), 647-657.
1 Google Scholar
2 CiteSeerX
3 refSeek
4 Scribd
5 SlideShare
6 PdfSR
1 R. R. Yager and L. A. Zadeh. An Introduction to Fuzzy Logic Applications in Intelligent Systems. Norwell, MA, USA: Kluwer Academic Publishers, 1992.
2 R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. Cambridge, MA,USA: MIT Press, 1998.
3 L. Jouffe. “Fuzzy Inference System Learning by Reinforcement Methods.” IEEE Transactions on Systems, Man, and Cybernetics, vol. 28, No. 3, pp. 338-355, 1998.
4 C. Oh, T. Nakashima, and H. Ishibuchi. “Initialization of Q-values by Fuzzy Rules for Accelerating Q-learning.” in Proc. IEEE World Congress on Computational Intelligence, vol.3, 1998, pp. 2051-2056.
5 T. Horiuchi, A. Fujino, O. Katai, and T. Sawaragi. “Fuzzy Interpolation-based Q-learning with Continuous States and Actions,” in Proc. the Fifth Inter. Conf. on Fuzzy Systems, 1996, vol.1, pp. 594-600.
6 Y. Hoshino and K. Kamei. “A Proposal of Reinforcement Learning with Fuzzy Environment Evaluation Rules and Its Application to Chess.” J. of Japan Society for Fuzzy Theory and Systems, vol. 13, no. 6, pp. 626-632, 2001. (in Japanese)
7 H. R. Berenji. “A Reinforcement Learning-based Architecture for Fuzzy Logic Control.” Int. J.Approx. Reasoning, vol. 6, pp. 267–292, 1992.
8 H. R. Berenji and D. Vengerov. “Cooperation and Coordination Between Fuzzy Reinforcement Learning Agents in Continuous State Partially Observable Markov Decision Processes,” in 1999 IEEE Int. Fuzzy Systems Conf. Proc., 1999, vol. 2, pp. 621-627.
9 R.J. Williams. “Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning.” Machine Learning, vol. 8, pp. 229-256, 1992.
10 J. Baxter and P. L. Bartlett. “Infinite-Horizon Policy- Gradient Estimation,” Journal of Artificial Intelligence Research, vol. 15, pp. 319-350, 2001.
11 H. Igarashi, S. Ishihara, and M. Kimura. “Reinforcement Learning in Non-Markov Decision Processes-Statistical Properties of Characteristic Eligibility.” IEICE Transactions on Information and Systems, vol. J90-D, no. 9, pp. 2271-2280, 2007. (in Japanese)(This paper is translated into English and included in The Research Reports of Shibaura Institute of Technology, Natural Sciences and Engineering, vol. 52, no. 2, pp. 1-7, 2008.)
12 X. Wang, X. Xu, and H. He. “Policy Gradient Fuzzy Reinforcement Learning,” in Proc. 3rd Inter. Conf. on Machine Learning and Cybernetics, 2004, pp. 992-995.
13 S. Ishihara and H. Igarashi, “Applying the Policy Gradient Method to Behavior Learning in Multiagent Systems: The Pursuit Problem.” Systems and Computers in Japan, vol. 37, no.10, pp. 101-109, 2006.
14 S. Imai, H. Igarashi, and S. Ishihara. “Policy-Gradient Method Integrating Abstract Information in Policy Function and Its Application to Pursuit Games with a Tunnel of Static Obstacles.” IEICE Transactions on Information and Systems, vol. J94-D, no. 6, pp. 968-976,2011. (in Japanese).This paper is translated into English and included in The Research Reports of Shibaura Institute of Technology, Natural Sciences and Engineering, vol. 52, no. 2, pp. 7-12, 2011.
15 Y. Hosoya, T. Yamamura, M. Umano, and K. Seta. “Reinforcement Learning Based on Dynamic Construction of the Fuzzy State Space-Adjustment of Fuzzy Sets of States-,“ in Proc. of the 22nd Fuzzy System Symposium (CD-ROM), vol. 22, 8D3-1, 2006. (in Japanese).
16 M. Sugimoto, H. Igarashi, S. Ishihara, K. Tanaka. “Policy Gradient Reinforcement Learning with a Fuzzy Controller for Policy: Decision Making in RoboCup Soccer Small Size League,”presented at the 29th Fuzzy System Symposium, Osaka, Japan, 2013. (in Japanese).
Professor Harukazu Igarashi
Shibaura Institute of Technology - Japan
arashi50@sic.shibaura-it.ac.jp
Associate Professor Seiji Ishihara
Tokyo Denki University - Japan