Figure above shows the domain of a more complex game. There are 25 grid locations the agent

could be in. A prize could be on one of the corners, or there could be no prize. When the agent

lands on a prize, it receives a reward of 10 and the prize disappears. When there is no prize, for

each time step there is a probability that a prize appears on one of the corners. Monsters can

appear at any time on one of the locations marked M. The agent gets damaged if a monster

appears on the square the agent is on. If the agent is already damaged, it receives a reward of -

10. The agent can get repaired (i.e., so it is no longer damaged) by visiting the repair station

marked R.

In this example, the state consists of four components: ⟨X,Y,P,D⟩, where X is the X-coordinate of

the agent, Y is the Y-coordinate of the agent, P is the position of the prize (P=0 if there is a prize

on P0, P=1 if there is a prize on P1, similarly for 2 and 3, and P=4 if there is no prize), and D is

Boolean and is true when the agent is damaged. Because the monsters are transient, it is not

necessary to include them as part of the state. There are thus 5×5×5×2 = 250 states. The

environment is fully observable, so the agent knows what state it is in. But the agent does not

know the meaning of the states; it has no idea initially about being damaged or what a prize is.

The agent has four actions: up, down, left, and right. These move the agent one step - usually

one step in the direction indicated by the name, but sometimes in one of the other directions. If

the agent crashes into an outside wall or one of the interior walls (the thick lines near the

location R), it remains where it was and receives a reward of -1.

The agent does not know any of the story given here. It just knows there are 250 states and 4

actions, which state it is in at every time, and what reward was received each time. You need


(i) Build a simulator that replicates the above behaviour of agent moving in the grid


(ii) Then, use Q-learning on the simulator built in (i) to learn the best policy for the

agent to move in this environment.

Квалификация: AI (Artificial Intelligence) HW/SW

Показать больше build iso boot linux, zen cart build pallet according dimensions, build jewelry website, build forum, sql 2005 build 2005 bids, sql 2005 build bids, build email application php, build bluetooth interface, build registration form web design, build alibabacom website, build file share site, build stick figure movie, direction magento right left, openerp reports right left direction, stick figure directions, openerp right left direction, build a transportation website thats similar to another website i own but with another name, q learning, Build a Simple, Appealing Website (Preferably Infographic style, but open)

О работодателе:
( 0 отзыв(-а, -ов) ) Singapore, Singapore

ID проекта: #22303611