Policy Modeling for Efficient Reinforcement Learning in Adversarial Multi-Agent Environments

Kwon, Ki-Duk;Kim, In-Cheol;

Journal of KIISE:Software and Applications (한국정보과학회논문지:소프트웨어및응용)

Volume 35 Issue 3
/
Pages.179-188
/
2008
/
1229-6848(pISSN)

Korean Institute of Information Scientists and Engineers (한국정보과학회)

Policy Modeling for Efficient Reinforcement Learning in Adversarial Multi-Agent Environments

적대적 멀티 에이전트 환경에서 효율적인 강화 학습을 위한 정책 모델링

권기덕 (경기대학교 전자계산학과) ;
김인철 (경기대학교 전자계산학과)

Published : 2008.03.15

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

An important issue in multiagent reinforcement learning is how an agent should team its optimal policy through trial-and-error interactions in a dynamic environment where there exist other agents able to influence its own performance. Most previous works for multiagent reinforcement teaming tend to apply single-agent reinforcement learning techniques without any extensions or are based upon some unrealistic assumptions even though they build and use explicit models of other agents. In this paper, basic concepts that constitute the common foundation of multiagent reinforcement learning techniques are first formulated, and then, based on these concepts, previous works are compared in terms of characteristics and limitations. After that, a policy model of the opponent agent and a new multiagent reinforcement learning method using this model are introduced. Unlike previous works, the proposed multiagent reinforcement learning method utilize a policy model instead of the Q function model of the opponent agent. Moreover, this learning method can improve learning efficiency by using a simpler one than other richer but time-consuming policy models such as Finite State Machines(FSM) and Markov chains. In this paper. the Cat and Mouse game is introduced as an adversarial multiagent environment. And effectiveness of the proposed multiagent reinforcement learning method is analyzed through experiments using this game as testbed.

멀티 에이전트 강화 학습에서 해결해야 할 중요한 문제는 자신의 작업 성능에 영향을 미칠 수 있는 다른 에이전트들이 존재하는 동적 환경에서 한 에이전트가 시행착오적 상호작용을 통해 어떻게 자신의 최적 행동 정책을 학습할 수 있느냐 하는 것이다. 멀티 에이전트 강화 학습을 위한 기존 연구들은 대부분 단일 에이전트 MDP 기반의 강화 학습기법들을 큰 변화 없이 그대로 적용하거나 비록 다른 에이전트에 관한 별도의 모델을 이용하더라도 다른 에이전트에 관해 요구되는 정보나 가정이 현실적이지 못하다는 한계점을 가지고 있다. 본 논문에서는 멀티 에이전트 강화 학습기술에 기초가 되는 기본 개념들을 정형화하고 이들을 기초로 기존 연구들의 특징과 한계점을 비교한다. 그리고 새로운 행동 정책 모델을 소개한 뒤, 이것을 이용한 강화 학습 방법을 설명한다. 본 논문에서 제안하는 멀티 에이전트 강화학습 방법은 상대 모델을 이용하는 기존의 멀티 에이전트 강화 학습 연구들에서 주로 시도되었던 상대 에이전트의 Q 평가 함수 모델 대신 상대 에이전트의 행동 정책 모델을 학습하며, 표현력은 풍부하나 학습에 시간과 노력이 많이 요구되는 유한 상태 오토마타나 마코프 체인과 같은 행동 정책 모델들에 비해 비교적 간단한 형태의 행동 정책 모델을 이용함으로써 학습의 효율성을 높였다. 또한, 본 논문에서는 대표적인 적대적 멀티 에이전트 환경인 고양이와 쥐게임을 소개하고, 이 게임을 테스베드삼아 비교 실험들을 수행하고 그 결과를 설명함으로써 본 논문에서 제안하는 정책 모델 기반의 멀티 에이전트 강화 학습의 효과를 분석해본다.

Keywords

References

Yang E. and Gu D., "Multiagent Reinforcement Learning for Multi-Robot Systems: A Survey," University of Essex Technical Report CSM-404, 2004
Tesauro G., "Multi Agent Learning: Mini Tutorial," IBM T.J.Watson Research Center, 2000
Rahimi K.A., Tabarraei H., Sadeghi B., "Reinforcement Learning Based Supplier-Agents for Electricity Markets," Proceedings of the IEEE International Symposium on Control and Automation, pp. 1405-1410, 2005
Shoham Y., Powers R., and Grenager T., "Multi- Agent Reinforcement Learning: A Critical Survey," Technical Report, Stanford University, 2003
Littman M.L., "Markov Games as Framework for Multi-Agent Reinforcement Learning," Proceedings of the 11th International Conference on Machine Learning, pp. 157-163, 1994
Hu J. and Wellman M.P., "Nash Q-learning for General-Sum Stochastic Games," Journal of Machine Learning Research, Vol.4, pp. 1039-1069, 2003 https://doi.org/10.1162/jmlr.2003.4.6.1039
Littman M.L., "Friend-or-Foe Q-learning in General- Sum Games," Proceedings of the 18th International Conference on Machine Learning, Morgan Kaufman, pp. 322-328, 2001
Claus C. and Boutilier C., "The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems," Proceedings of AAAI-98, pp. 746-752, 1998
Carmel D. and Markovitch S., "Learning Models of Intelligent Agents," Proceedings of AAAI-96, pp. 62-67, 1996
Riley P. and Veloso M., "Advice Generation from Observed Execution: Abstract Markov Decision Process Learning," Proceedings of AAAI-2004, 2004
Sutton, R.S., Barto, A.G. Reinforcement Learning: An Introduction, MIT Press, 1998
Chalkiadakis G. and Boutilier C., "Multiagent Reinforcement Learning: Theoretical Framework and An Algorithm," Proceedings of the 2nd AAMAS-03, pp. 709-716, 2003.

Journal of KIISE:Software and Applications (한국정보과학회논문지:소프트웨어및응용)

Policy Modeling for Efficient Reinforcement Learning in Adversarial Multi-Agent Environments

적대적 멀티 에이전트 환경에서 효율적인 강화 학습을 위한 정책 모델링

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)