91av视频/亚洲h视频/操亚洲美女/外国一级黄色毛片 - 国产三级三级三级三级

資源簡介

Q-learning 是一種模型無關的強化學習方法,本文檔使用Q-learning做了一個簡單的搜索任務,有助于初學者理解強化學習,理解Q-learning.

資源截圖

代碼片段和文件信息

“““
A?simple?example?for?Reinforcement?Learning?using?table?lookup?Q-learning?method.
An?agent?“o“?is?on?the?left?of?a?1?dimensional?world?the?treasure?is?on?the?rightmost?location.
Run?this?program?and?to?see?how?the?agent?will?improve?its?strategy?of?finding?the?treasure.
View?more?on?my?tutorial?page:?https://morvanzhou.github.io/tutorials/
“““
import?numpy?as?np
import?pandas?as?pd
import?time

np.random.seed(2)??#?reproducible


N_STATES?=?6???#?the?length?of?the?1?dimensional?world
ACTIONS?=?[‘left‘?‘right‘]?????#?available?actions
EPSILON?=?0.9???#?greedy?police?---epsilon
ALPHA?=?0.1?????#?learning?rate
GAMMA?=?0.9????#?discount?factor
MAX_EPISODES?=?20???#?maximum?episodes
FRESH_TIME?=?0.3????#?fresh?time?for?one?move


def?build_q_table(n_states?actions):??#?建立一個Q表
????table?=?pd.Dataframe(
????????np.zeros((n_states?len(actions)))?????#?q_table?initial?values
????????columns=actions????#?actions‘s?name
????)
????#?print(table)????#?show?table
????return?table


def?choose_action(state?q_table):
????#?This?is?how?to?choose?an?action
????state_actions?=?q_table.iloc[state?:]
????if?(np.random.uniform()?>?EPSILON)?or?(state_actions.all()?==?0):??#?act?non-greedy?or?state-action?have?no?value
????????action_name?=?np.random.choice(ACTIONS)
????else:???#?act?greedy
????????action_name?=?state_actions.argmax()
????return?action_name


def?get_env_feedback(S?A):
????#?This?is?how?agent?will?interact?with?the?environment
????if?A?==?‘right‘:????#?move?right
????????if?S?==?N_STATES?-?2:???#?terminate
????????????S_?=?‘terminal‘
????????????R?=?1
????????else:
????????????S_?=?S?+?1
????????????R?=?0
????else:???#?move?left
????????R?=?-1
?

評論

共有 條評論