資源簡介
Sutton課本中的小車爬山例程,強化學習中的基本仿真實驗程序。

代碼片段和文件信息
/*
This?is?an?example?program?for?reinforcement?learning?with?linear?
function?approximation.??The?code?follows?the?psuedo-code?for?linear?
gradient-descent?Sarsa(lambda)?given?in?Figure?8.8?of?the?book?
“Reinforcement?Learning:?An?Introduction“?by?Sutton?and?Barto.
One?difference?is?that?we?use?the?implementation?trick?mentioned?on?
page?189?to?only?keep?track?of?the?traces?that?are?larger?
than?“min-trace“.?
Before?running?the?program?you?need?to?obtain?the?tile-coding?
software?available?at?http://envy.cs.umass.edu/~rich/tiles.C?and?tiles.h
(see?http://envy.cs.umass.edu/~rich/tiles.html?for?documentation).
The?code?below?is?in?three?main?parts:?1)?Mountain?Car?code?2)?General?
RL?code?and?3)?top-level?code?and?misc.
Written?by?Rich?Sutton?12/19/00
?*/
#include?
#include?“tiles.h“
#include?“stdio.h“
#include?“stdlib.h“
#include?
#include?
#include?
//////////?????Part?1:?Mountain?Car?code?????//////////////
//?Global?variables:
float?mcar_position?mcar_velocity;???????//位置和速度值
#define?mcar_min_position?-1.2
#define?mcar_max_position?0.6
#define?mcar_max_velocity?0.07????????????//?the?negative?of?this?in?the?minimum?velocity
#define?mcar_goal_position?0.5
#define?POS_WIDTH?(1.7?/?8)???????????????//?the?tile?width?for?position
#define?VEL_WIDTH?(0.14?/?8)??????????????//?the?tile?width?for?velocity
//?Profiles
void?MCarInit();??????????????????????????????//?initialize?car?state
void?MCarStep(int?a);?????????????????????????//?update?car?state?for?given?action
bool?MCarAtGoal?();???????????????????????????//?is?car?at?goal?
void?MCarInit()
//?Initialize?state?of?Car
??{?mcar_position?=?-0.5;
????mcar_velocity?=?0.0;}
void?MCarStep(int?a)
//?Take?action?a?update?state?of?car
??{?mcar_velocity?+=?(a-1)*0.001?+?cos(3*mcar_position)*(-0.0025);
????if?(mcar_velocity?>?mcar_max_velocity)?mcar_velocity?=?mcar_max_velocity;
????if?(mcar_velocity?-mcar_max_velocity)?mcar_velocity?=?-mcar_max_velocity;
????mcar_position?+=?mcar_velocity;
????if?(mcar_position?>?mcar_max_position)?mcar_position?=?mcar_max_position;
????if?(mcar_position?????if?(mcar_position==mcar_min_position?&&?mcar_velocity<0)?mcar_velocity?=?0;}
bool?MCarAtGoal?()
//?Is?Car?within?goal?region?
??{?return?mcar_position?>=?mcar_goal_position;}
??
??
//////////?????Part?2:?Semi-General?RL?code?????//////////////
#define?MEMORY_SIZE?10000????????????????????????//?number?of?parameters?to?theta?memory?size
#define?NUM_ACTIONS?3????????????????????????????//?number?of?actions
#define?NUM_TILINGS?10
//?Global?RL?variables:
float?Q[NUM_ACTIONS];????????????????????????????//?action?values
float?theta[MEMORY_SIZE];????????????????????????//?modifyable?parameter?vector?aka?memory?weights
float?e[MEMORY_SIZE];????????????????????????????//?eligibility?traces??資格軌跡
int?F[NUM_ACTIONS][NUM_TILINGS];???????
?屬性????????????大小?????日期????時間???名稱
-----------?---------??----------?-----??----
?????文件??????13541??2015-08-24?10:28??mountaincar1.cpp
?????文件???????4306??2015-08-24?10:29??tiles.C
?????文件????????339??2015-08-20?09:42??tiles.h
-----------?---------??----------?-----??----
????????????????18186????????????????????3
評論
共有 條評論