investigated. By considering the unknown spatio-temporal content popularity and user
preference, a user request model based on hidden Markov process is proposed to
characterize the fluctuant spatio-temporal traffic demands in F-RANs. Then, the Q-learning
method based on the reinforcement learning (RL) framework is put forth to seek the optimal
caching policy in a distributed manner, which enables fog access points (F-APs) to learn and …