When the agent takes action, it gets the reward on the basis of the result. This way the learning process continues depending on the positive and the negative reward. The learning is based on interaction with the environment. The agent discovers which action will give the maximum reward. Depending on that, the agent takes action.