強化學習Q-learning算法

大小: 3KB

文件類型: .py

金幣: 1

下載: 0 次

發布日期: 2021-06-15
語言: Python
標簽: 強化學習??模型無關??Q-learning??

高速下載

資源簡介

Q-learning 是一種模型無關的強化學習方法，本文檔使用Q-learning做了一個簡單的搜索任務，有助于初學者理解強化學習，理解Q-learning.

資源截圖

小圖大圖

代碼片段和文件信息

“““
A?simple?example?for?Reinforcement?Learning?using?table?lookup?Q-learning?method.
An?agent?“o“?is?on?the?left?of?a?1?dimensional?world?the?treasure?is?on?the?rightmost?location.
Run?this?program?and?to?see?how?the?agent?will?improve?its?strategy?of?finding?the?treasure.
View?more?on?my?tutorial?page:?https://morvanzhou.github.io/tutorials/
“““
import?numpy?as?np
import?pandas?as?pd
import?time

np.random.seed（2）??#?reproducible


N_STATES?=?6???#?the?length?of?the?1?dimensional?world
ACTIONS?=?[‘left‘?‘right‘]?????#?available?actions
EPSILON?=?0.9???#?greedy?police?---epsilon
ALPHA?=?0.1?????#?learning?rate
GAMMA?=?0.9????#?discount?factor
MAX_EPISODES?=?20???#?maximum?episodes
FRESH_TIME?=?0.3????#?fresh?time?for?one?move


def?build_q_table（n_states?actions）:??#?建立一個Q表
????table?=?pd.Dataframe（
????????np.zeros（（n_states?len（actions）））?????#?q_table?initial?values
????????columns=actions????#?actions‘s?name
????）
????#?print（table）????#?show?table
????return?table


def?choose_action（state?q_table）:
????#?This?is?how?to?choose?an?action
????state_actions?=?q_table.iloc[state?:]
????if?（np.random.uniform（）?>?EPSILON）?or?（state_actions.all（）?==?0）:??#?act?non-greedy?or?state-action?have?no?value
????????action_name?=?np.random.choice（ACTIONS）
????else:???#?act?greedy
????????action_name?=?state_actions.argmax（）
????return?action_name


def?get_env_feedback（S?A）:
????#?This?is?how?agent?will?interact?with?the?environment
????if?A?==?‘right‘:????#?move?right
????????if?S?==?N_STATES?-?2:???#?terminate
????????????S_?=?‘terminal‘
????????????R?=?1
????????else:
????????????S_?=?S?+?1
????????????R?=?0
????else:???#?move?left
????????R?=?-1
?

上一篇：基于Python的Vibe目標檢測代碼
下一篇：python串口讀寫

91av视频/亚洲h视频/操亚洲美女/外国一级黄色毛片 - 国产三级三级三级三级

強化學習Q-learning算法

資源簡介

資源截圖

代碼片段和文件信息

評論

相關資源