• +91-9872993883
• +91-8283824812
• info@ris-ai.com

# LunarLander-v2¶

## Overview of LunarLander-v2 ¶

Landing pad is always at coordinates (0,0). Coordinates are the first two numbers in state vector. Reward for moving from the top of the screen to landing pad and zero speed is about 100..140 points. If lander moves away from landing pad it loses reward back. Episode finishes if the lander crashes or comes to rest, receiving additional -100 or +100 points. Each leg ground contact is +10. Firing main engine is -0.3 points each frame. Solved is 200 points. Landing outside landing pad is possible. Fuel is infinite, so an agent can learn to fly and then land on its first attempt. Four discrete actions available:do nothing ,fire left orientation engine, fire main engine, fire right orientation engine.

### Here are the steps for the movement of the lunar lander: ¶

#### 1. Importing Different Libraries ¶

In [ ]:
JUPYTER
FAQ
Reinforcement_Learning  ReinforcementLearning
Lunar-Lander
importing packages and initializing environment name

import numpy as np
import gym
from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten
from rl.agents.dqn import DQNAgent
from rl.policy import BoltzmannQPolicy, EpsGreedyQPolicy
from rl.memory import SequentialMemory
ENV_NAME = 'LunarLander-v2'
# Get the environment and extract the number of actions.
env = gym.make(ENV_NAME)


#### 2. Play for random action, without being trained¶

In [ ]:
for i_episode in range(5):
observation = env.reset()
for t in range(100):
env.render()
print(observation)
action = env.action_space.sample()
observation, reward, done, info = env.step(action)
if done:
print("Episode finished after {} timesteps".format(t+1))
break
env.close()

In [ ]:
np.random.seed(123)
env.seed(123)
nb_actions = env.action_space.n

model = Sequential()

#
print(model.summary())

# # Finally, we configure and compile our agent. You can use every built-in Keras optimizer and
# # even the metrics!
memory = SequentialMemory(limit=50000, window_length=1)
policy = EpsGreedyQPolicy()
dqn = DQNAgent(model=model, nb_actions=nb_actions, memory=memory, nb_steps_warmup=50,
target_model_update=200,train_interval=4, policy=policy)

In [ ]:
# # Okay, now it's time to learn something! We visualize the training here for show, but this
# # slows down training quite a lot. You can always safely abort the training prematurely using
# # Ctrl + C.
### uncomment this section to train your model,
# dqn.fit(env, nb_steps=10000, visualize=False, verbose=2)
#
# # Uncomment this to save your own weight
# dqn.save_weights('dqn_{}_weights.h5f'.format(ENV_NAME), overwrite=True)
#While training comment below two line
weights_filename = 'All_weights/dqn_{}_weights.h5f'.format(ENV_NAME)