While learning in an unknown Markov Decision Process (MDP), an agent should trade off exploration to discover new information about the MDP, and exploitation of the current …
The multi-armed bandit is a reinforcement learning model where a learning agent repeatedly chooses an action (pull a bandit arm) and the environment responds with a …
We develop a novel homomorphic encryption-based approach to privacy preservation in a dynamic personalized pricing setting. In each period, the firm offers a personalized price to …