This paper looks into the problem of precise autonomous landing of an Unmanned Aerial Vehicle (UAV) which is considered to be a difficult problem as one has to generate appropriate landing trajectories in presence of dynamic constraints, such as, sudden changes in wind velocities and directions, downwash effects, change in payload etc. The problem is further compounded due to uncertainties arising from inaccurate model information and noisy sensor readings. The problem is partially solved by proposing a Reinforcement Learning (RL) based controller that uses Least Square Policy Iteration (LSPI) to learn the optimal control policies required for generating these trajectories. The efficacy of the approach is demonstrated through both simulation and real-world experiments with actual Parrot AR drone 2.0. According to our study, this is the first time such experimental results have been presented using RL based controller for drone landing, making it a novel contribution in this field.