Get a head start: On-demand pedagogical policy selection in intelligent tutoring

G Gao, X Yang, M Chi - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org
Proceedings of the AAAI Conference on Artificial Intelligence, 2024ojs.aaai.org
Reinforcement learning (RL) is broadly employed in human-involved systems to enhance
human outcomes. Off-policy evaluation (OPE) has been pivotal for RL in those realms since
online policy learning and evaluation can be high-stake. Intelligent tutoring has raised
tremendous attentions as highly challenging when applying OPE to human-involved
systems, due to that students' subgroups can favor different pedagogical policies and the
costly procedure that policies have to be induced fully offline and then directly deployed to …
Reinforcement learning (RL) is broadly employed in human-involved systems to enhance human outcomes. Off-policy evaluation (OPE) has been pivotal for RL in those realms since online policy learning and evaluation can be high-stake. Intelligent tutoring has raised tremendous attentions as highly challenging when applying OPE to human-involved systems, due to that students' subgroups can favor different pedagogical policies and the costly procedure that policies have to be induced fully offline and then directly deployed to the upcoming semester. In this work, we formulate on-demand pedagogical policy selection (ODPS) to tackle the challenges for OPE in intelligent tutoring. We propose a pipeline, EduPlanner, as a concrete solution for ODPS. Our pipeline results in an theoretically unbiased estimator, and enables efficient and customized policy selection by identifying subgroups over both historical data and on-arrival initial logs. We evaluate our approach on the Probability ITS that has been used in real classrooms for over eight years. Our study shows significant improvement on learning outcomes of students with EduPlanner, especially for the ones associated with low-performing subgroups.
ojs.aaai.org
以上显示的是最相近的搜索结果。 查看全部搜索结果