-
大小: 3KB文件類型: .py金幣: 1下載: 0 次發布日期: 2021-06-15
- 語言: Python
- 標簽: 強化學習??模型無關??Q-learning??
資源簡介
Q-learning 是一種模型無關的強化學習方法,本文檔使用Q-learning做了一個簡單的搜索任務,有助于初學者理解強化學習,理解Q-learning.
代碼片段和文件信息
“““
A?simple?example?for?Reinforcement?Learning?using?table?lookup?Q-learning?method.
An?agent?“o“?is?on?the?left?of?a?1?dimensional?world?the?treasure?is?on?the?rightmost?location.
Run?this?program?and?to?see?how?the?agent?will?improve?its?strategy?of?finding?the?treasure.
View?more?on?my?tutorial?page:?https://morvanzhou.github.io/tutorials/
“““
import?numpy?as?np
import?pandas?as?pd
import?time
np.random.seed(2)??#?reproducible
N_STATES?=?6???#?the?length?of?the?1?dimensional?world
ACTIONS?=?[‘left‘?‘right‘]?????#?available?actions
EPSILON?=?0.9???#?greedy?police?---epsilon
ALPHA?=?0.1?????#?learning?rate
GAMMA?=?0.9????#?discount?factor
MAX_EPISODES?=?20???#?maximum?episodes
FRESH_TIME?=?0.3????#?fresh?time?for?one?move
def?build_q_table(n_states?actions):??#?建立一個Q表
????table?=?pd.Dataframe(
????????np.zeros((n_states?len(actions)))?????#?q_table?initial?values
????????columns=actions????#?actions‘s?name
????)
????#?print(table)????#?show?table
????return?table
def?choose_action(state?q_table):
????#?This?is?how?to?choose?an?action
????state_actions?=?q_table.iloc[state?:]
????if?(np.random.uniform()?>?EPSILON)?or?(state_actions.all()?==?0):??#?act?non-greedy?or?state-action?have?no?value
????????action_name?=?np.random.choice(ACTIONS)
????else:???#?act?greedy
????????action_name?=?state_actions.argmax()
????return?action_name
def?get_env_feedback(S?A):
????#?This?is?how?agent?will?interact?with?the?environment
????if?A?==?‘right‘:????#?move?right
????????if?S?==?N_STATES?-?2:???#?terminate
????????????S_?=?‘terminal‘
????????????R?=?1
????????else:
????????????S_?=?S?+?1
????????????R?=?0
????else:???#?move?left
????????R?=?-1
?
- 上一篇:基于Python的Vibe目標檢測代碼
- 下一篇:python串口讀寫
評論
共有 條評論