This paper proposes a dual timescale learning and adaptation framework to learn a probabilistic model of beam dynamics and concurrently exploit this model to design adaptive beam-training with low overhead: on a long timescale, a deep recurrent variational autoencoder (DR-VAE) uses noisy beam-training observations to learn a probabilistic model of beam dynamics; on a short timescale, an adaptive beam-training procedure is formulated as a partially observable Markov decision process and optimized using point-based value iteration by leveraging beam-training feedback and probabilistic predictions of the strongest beam pair provided by the DR-VAE. In turn, beam-training observations are used to refine the DR-VAE via stochastic gradient ascent in a continuous process of learning and adaptation. It is shown that the proposed DR-VAE learning framework learns accurate beam dynamics and, as learning progresses, the training overhead decreases and the spectral efficiency increases. Moreover, the proposed dual timescale approach achieves near-optimal spectral efficiency, with a gain of 85% over a policy that scans exhaustively over the dominant beam pairs, and of 18% over a state-of-the-art POMDP policy.