The effect of choosing optimizer algorithms to improve computer vision tasks: a comparative study

E Hassan, MY Shams, NA Hikal, S Elmougy - Multimedia Tools and …, 2023 - Springer
Optimization algorithms are used to improve model accuracy. The optimization process
undergoes multiple cycles until convergence. A variety of optimization strategies have been …

Finite-time analysis of whittle index based Q-learning for restless multi-armed bandits with neural network function approximation

G Xiong, J Li - Advances in Neural Information Processing …, 2023 - proceedings.neurips.cc
Whittle index policy is a heuristic to the intractable restless multi-armed bandits (RMAB)
problem. Although it is provably asymptotically optimal, finding Whittle indices remains …

Markovian interference in experiments

V Farias, A Li, T Peng, A Zheng - Advances in Neural …, 2022 - proceedings.neurips.cc
We consider experiments in dynamical systems where interventions on some experimental
units impact other units through a limiting constraint (such as a limited supply of products) …

Breaking the deadly triad with a target network

S Zhang, H Yao, S Whiteson - International Conference on …, 2021 - proceedings.mlr.press
The deadly triad refers to the instability of a reinforcement learning algorithm when it
employs off-policy learning, function approximation, and bootstrapping simultaneously. In …

Finite Sample Analysis of Average-Reward TD Learning and -Learning

S Zhang, Z Zhang, ST Maguluri - Advances in Neural …, 2021 - proceedings.neurips.cc
The focus of this paper is on sample complexity guarantees of average-reward
reinforcement learning algorithms, which are known to be more challenging to study than …

Model-free robust average-reward reinforcement learning

Y Wang, A Velasquez, GK Atia… - International …, 2023 - proceedings.mlr.press
Abstract Robust Markov decision processes (MDPs) address the challenge of model
uncertainty by optimizing the worst-case performance over an uncertainty set of MDPs. In …

Optimal uniform OPE and model-based offline reinforcement learning in time-homogeneous, reward-free and task-agnostic settings

M Yin, YX Wang - Advances in neural information …, 2021 - proceedings.neurips.cc
This work studies the statistical limits of uniform convergence for offline policy evaluation
(OPE) problems with model-based methods (for episodic MDP) and provides a unified …

Stochastic first-order methods for average-reward markov decision processes

T Li, F Wu, G Lan - Mathematics of Operations Research, 2024 - pubsonline.informs.org
We study average-reward Markov decision processes (AMDPs) and develop novel first-
order methods with strong theoretical guarantees for both policy optimization and policy …

Off-policy average reward actor-critic with deterministic policy search

N Saxena, S Khastagir, S Kolathaya… - International …, 2023 - proceedings.mlr.press
The average reward criterion is relatively less studied as most existing works in the
Reinforcement Learning literature consider the discounted reward criterion. There are few …

Modified retrace for off-policy temporal difference learning

X Chen, X Ma, Y Li, G Yang… - Uncertainty in Artificial …, 2023 - proceedings.mlr.press
Off-policy learning is a key to extend reinforcement learning as it allows to learn a target
policy from a different behavior policy that generates the data. However, it is well known as …