Finite-time analysis of whittle index based Q-learning for restless multi-armed bandits with neural network function approximation

G Xiong, J Li - Advances in Neural Information Processing …, 2023 - proceedings.neurips.cc
Whittle index policy is a heuristic to the intractable restless multi-armed bandits (RMAB)
problem. Although it is provably asymptotically optimal, finding Whittle indices remains …

Learning infinite-horizon average-reward restless multi-action bandits via index awareness

G Xiong, S Wang, J Li - Advances in Neural Information …, 2022 - proceedings.neurips.cc
We consider the online restless bandits with average-reward and multiple actions, where the
state of each arm evolves according to a Markov decision process (MDP), and the reward of …

Index-aware reinforcement learning for adaptive video streaming at the wireless edge

G Xiong, X Qin, B Li, R Singh, J Li - Proceedings of the Twenty-Third …, 2022 - dl.acm.org
We study adaptive video streaming for multiple users in wireless access edge networks with
unreliable channels. The key challenge is to jointly optimize the video bitrate adaptation and …

Prioritized information bottleneck theoretic framework with distributed online learning for edge video analytics

Z Fang, S Hu, J Wang, Y Deng… - IEEE Transactions on …, 2025 - ieeexplore.ieee.org
Collaborative perception systems leverage multiple edge devices, such as surveillance
cameras or autonomous cars, to enhance sensing quality and eliminate blind spots. Despite …

Online restless multi-armed bandits with long-term fairness constraints

S Wang, G Xiong, J Li - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org
Restless multi-armed bandits (RMAB) have been widely used to model sequential decision
making problems with constraints. The decision maker (DM) aims to maximize the expected …

Towards Foundation-model-based Multiagent System to Accelerate AI for Social Impact

Y Zhao, N Boehmer, A Taneja, M Tambe - arXiv preprint arXiv:2412.07880, 2024 - arxiv.org
AI for social impact (AI4SI) offers significant potential for addressing complex societal
challenges in areas such as public health, agriculture, education, conservation, and public …

Dopl: Direct online preference learning for restless bandits with preference feedback

G Xiong, U Dinesha, D Mukherjee, J Li… - arXiv preprint arXiv …, 2024 - arxiv.org
Restless multi-armed bandits (RMAB) has been widely used to model constrained
sequential decision making problems, where the state of each restless arm evolves …

Crowd2: Multi-agent Bandit-based Dispatch for Video Analytics upon Crowdsourcing

Y Chen, S Zhang, Y Yan, Y Jin, N Chen… - … -IEEE Conference on …, 2023 - ieeexplore.ieee.org
Many crowdsourcing platforms are emerging, leveraging the resources of recruited workers
to execute various outsourcing tasks, mainly for those computing-intensive video analytics …

Congestion-aware routing and content placement in elastic cache networks

J Zhang, E Yeh - IEEE INFOCOM 2024-IEEE Conference on …, 2024 - ieeexplore.ieee.org
Caching can be leveraged to significantly improve network performance and mitigate
congestion. However, characterizing the optimal tradeoff between routing cost and cache …

Whittle index-based q-learning for wireless edge caching with linear function approximation

G Xiong, S Wang, J Li, R Singh - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org
We consider the problem of content caching at the wireless edge to serve a set of end users
via unreliable wireless channels so as to minimize the average latency experienced by end …