We present a new class of structured reinforcement learning policy-architectures, Implicit Two-Tower (ITT) policies, where the actions are chosen based on the attention scores of …
In this thesis, we study the intersection of optimization and machine learning, especially how to use machine learning and optimization tools to make decisions. In Chapter 1, we propose …