資源簡介
Q學習,很有幫助.jie shao le Q-learning de ji ben shiyong
代碼片段和文件信息
%%?Q-learning?with?epsilon-greedy?exploration?Algorithm?for?Deterministic?Cleaning?Robot?V1
%??Matlab?code?:?Reza?Ahmadzadeh
%??email:?reza.ahmadzadeh@iit.it
%??March-2014
%%?The?deterministic?cleaning-robot?MDP
%?a?cleaning?robot?has?to?collect?a?used?can?also?has?to?recharge?its
%?batteries.?the?state?describes?the?position?of?the?robot?and?the?action
%?describes?the?direction?of?motion.?The?robot?can?move?to?the?left?or?to
%?the?right.?The?first?(1)?and?the?final?(6)?states?are?the?terminal
%?states.?The?goal?is?to?find?an?optimal?policy?that?maximizes?the?return
%?from?any?initial?state.?Here?the?Q-learning?epsilon-greedy?exploration
%?algorithm?(in?Reinforcement?learning)?is?used.
%?Algorithm?2-3?from:
%?@book{busoniu2010reinforcement
%???title={Reinforcement?learning?and?dynamic?programming?using?function?approximators}
%???author={Busoniu?Lucian?and?Babuska?Robert?and?De?Schutter?Bart?and?Ernst?Damien}
%???year={2010}
%???publisher={CRC?Press}
%?}
%?notice:?the?code?is?written?in?1-indexed?instead?of?0-indexed
%
%?V1?the?initial?evaluation?of?the?algorithm?
%
%%?this?is?the?main?function?including?the?initialization?and?the?algorithm
%?the?inputs?are:?initial?Q?matrix?set?of?actions?set?of?states
%?discounting?factor?learning?rate?exploration?probability
%?number?of?iterations?and?the?initial?state.
function?qlearning
%?learning?parameters
gamma?=?0.5;????%?discount?factor??%?TODO?:?we?need?learning?rate?schedule
alpha?=?0.5;????%?learning?rate????%?TODO?:?we?need?exploration?rate?schedule
epsilon?=?0.9;??%?exploration?probability?(1-epsilon?=?exploit?/?epsilon?=?explore)
%?states
state?=?[012345];
%?actions
action?=?[-11];
%?initial?Q?matrix
Q?=?zeros(length(state)length(action));
K?=?1000;?????%?maximum?number?of?the?iterations
state_idx?=?3;??%?the?initial?state?to?begin?from
%%?the?main?loop?of?the?algorithm
for?k?=?1:K
????disp([‘iteration:?‘?num2str(k)]);
????r=rand;?%?get?1?uniform?random?number
????x=sum(r>=cumsum([0?1-eps
- 上一篇:線性調頻信號模糊函數仿真
- 下一篇:點云的邊界提取
評論
共有 條評論