Enabling identification of component processes in perceptual learning with nonparametric hierarchical Bayesian modeling

Yukai Zhao; Jiajuan Liu; Barbara Anne Dosher; Zhong-Lin Lu

doi:10.1167/jov.24.5.8

Abstract

Perceptual learning is a multifaceted process, encompassing general learning, between-session forgetting or consolidation, and within-session fast relearning and deterioration. The learning curve constructed from threshold estimates in blocks or sessions, based on tens or hundreds of trials, may obscure component processes; high temporal resolution is necessary. We developed two nonparametric inference procedures: a Bayesian inference procedure (BIP) to estimate the posterior distribution of contrast threshold in each learning block for each learner independently and a hierarchical Bayesian model (HBM) that computes the joint posterior distribution of contrast threshold across all learning blocks at the population, subject, and test levels via the covariance of contrast thresholds across blocks. We applied the procedures to the data from two studies that investigated the interaction between feedback and training accuracy in Gabor orientation identification over 1920 trials across six sessions and estimated learning curve with block sizes L = 10, 20, 40, 80, 160, and 320 trials. The HBM generated significantly better fits to the data, smaller standard deviations, and more precise estimates, compared to the BIP across all block sizes. In addition, the HBM generated unbiased estimates, whereas the BIP only generated unbiased estimates with large block sizes but exhibited increased bias with small block sizes. With L = 10, 20, and 40, we were able to consistently identify general learning, between-session forgetting, and rapid relearning and adaptation within sessions. The nonparametric HBM provides a general framework for fine-grained assessment of the learning curve and enables identification of component processes in perceptual learning.

Introduction

Perceptual learning, which refers to performance improvements in perceptual tasks through practice or training, occurs in all sensory modalities (Dosher & Lu, 2020; Fahle & Poggio, 2002; Green, Banai, Lu, & Bavelier, 2018; Laurent et al., 2001; Lu, Hua, Huang, Zhou, & Dosher, 2011; Lu & Dosher, 2022; Proulx, Brown, Pasqualotto, & Meijer, 2014; Sagi, 2011a; Shams & Seitz, 2008; Wright & Zhang, 2009). It can significantly enhance human performance (Ball & Sekuler, 1982; Dosher & Lu, 1998; Fiorentini & Berardi, 1980; Huang, Zhou, & Lu, 2008; Karni & Sagi, 1991; Petrov, Van Horn, & Ratcliff, 2011; Poggio, Fahle, & Edelman, 1992) and has been shown to persist for years (Zhou et al., 2006). The study of perceptual learning not only reveals the functions and mechanisms underlying these phenomena but also enhances our understanding of brain plasticity and the development of training procedures for perceptual expertise (Hoffman et al., 2013) or rehabilitation of clinical conditions (Huxlin, 2009; Levi, 2020; Maniglia, Visscher, & Seitz, 2021; Yu, Cheung, Legge, & Chung, 2010).

In this study, we developed and compared two nonparametric methods to estimate the learning curve at multiple temporal resolutions to uncover the component processes in perceptual learning. The learning curve is a fundamental empirical measure of perceptual learning. It is typically constructed empirically by connecting the average performance estimated from each block or session of a perceptual learning experiment across multiple time points throughout learning and is used to demonstrate learning, evaluate the time course and functional form of learning, and assess specificity and transfer of learning (e.g., Dale, Cochrane, & Green, 2021; Dosher & Lu, 2007; Karni & Sagi, 1991; Xiao et al., 2008). While average performance accuracy or response times are typically measured in the constant stimuli paradigm (Ball & Sekuler, 1982; Fahle, Edelman, & Poggio, 1995; Fahle & Morgan, 1996; Karni & Sagi, 1991; Kattner, Cochrane, & Green, 2017; Petrov, Dosher, & Lu, 2005; Petrov et al., 2011), average contrast threshold or the difference threshold between the to-be-discriminated stimuli is measured in adaptive training paradigms (Donovan, Szpiro, & Carrasco, 2015; Dosher & Lu, 1998; Polat, Ma-Naim, Belkin, & Sagi, 2004; Xiao et al., 2008).

Although most perceptual learning studies have primarily focused on general learning through practice or training, some studies have identified additional component processes. These include between-session consolidation (Karni, Tanne, Rubenstein, Askenasy, & Sagi, 1994; McDevitt, Rokem, Silver, & Mednick, 2014; Sasaki & Watanabe, 2015; Stickgold et al., 2002; Tamaki, Berard, et al., 2020; Tamaki, Wang, Watanabe, & Sasaki, 2019; Tamaki, Wang, et al., 2020; Yotsumoto et al., 2009), offline gain (Bang et al., 2018; Shibata et al., 2017), between-session forgetting (Beard, Levi, & Reich, 1995; Mascetti et al., 2013), within-session adaptation (Censor, Karni, & Sagi, 2006; Censor, Harris, & Sagi, 2016; Sagi, 2011b), and deterioration (Dosher & Lu, 2020; Zenger-Landolt & Fahle, 2001). For example, Yang et al. (2022) conducted a study involving 49 adult participants in seven perceptual tasks and examined both session-by-session (700 trials/session) and block-by-block (100 trials/block) learning curves. From the session-by-session curves, they identified general learning. However, the block-by-block analysis revealed ubiquitous long-term general learning and within-session relearning in most tasks. They also observed between-session forgetting in some tasks, including Vernier-offset discrimination, face view discrimination, and auditory-frequency discrimination. Furthermore, between-session offline gain was observed in the visual shape search task, whereas the contrast detection task exhibited within-session adaptation and offline gain.

To uncover the component processes in perceptual learning, two prerequisites are essential: distinct temporal characteristics of these processes and sufficient temporal resolution in the data to distinguish them. In Figure 1a, a trial-by-trial generative model of the learning curve (black curve) across six sessions is depicted, consisting of four latent component processes: general learning (yellow), between-session forgetting (purple), within-session re-learning (olive), and within-session adaptation (orange). Figures 1b through 1f illustrate how block size affects the predicted learning curve and the visibility of latent component processes. Larger block sizes decrease temporal resolution, making it challenging to identify these processes. Although we “know” the latent component processes in this example, in reality, we can only observe the learning curve and must infer these underlying processes. Without adequate temporal resolution, we cannot accurately estimate key aspects such as the true rate of general learning, as seen in the contrast between Figure 1f and Figure 1a. This example underscores the critical role of high temporal resolution in disentangling component processes in perceptual learning.

Figure 1.

View Original Download Slide

(a) The trial-by-trial generative model of the learning curve (black curve) across six sessions, comprising four latent component processes: general learning (yellow), between-session forgetting (purple), within-session re-learning (olive), and within-session adaptation (orange). The predicted learning curves (black) are shown for block averaging with different sizes: 20 trials (b), 40 trials (c), 80 trials (d), 160 trials (e), and 320 trials (f). These curves are based on averaging the true latent components from the generative model for each of these block sizes.

However, constructing the learning curve at high temporal resolutions presents significant challenges. In Figure 2, we observe the average standard deviations of the estimated block thresholds as a function of block size, using data from Liu, Lu, & Dosher (2010); Liu, Lu, & Dosher (2012), where the trial-by-trial data in each block for each subject was fitted with a Bayesian Inference procedure (see Methods and Results Sections below). Notably, as the block size decreases from 320 to 20, the average standard deviation increases significantly, from 0.035 to 0.199 log₁₀ units. This rise in standard deviation renders the threshold estimates highly unreliable.

Figure 2.

View Original Download Slide

Average standard deviation (SD) of the estimated block thresholds across all blocks and subjects as a function of block size using data from Liu et al. (2010), Liu et al. (2012).

Several parametric approaches have been developed to enhance the temporal resolution of the learning curve (Kattner, Cochrane, & Green, 2017; Zhang, Zhao, Dosher, & Lu, 2019a). Kattner et al. (2017) introduced a parametric method for estimating trial-by-trial thresholds in perceptual learning, modeling the threshold and slope of the psychometric function as parametric functions of time or trial number. This approach allows the construction of high-quality trial-by-trial learning curves for each observer during training and transfer, based on the model's best-fitting parameters (Dale et al., 2021). Zhang et al. (2019a) applied the qCD method (Zhao, Lesmes, & Lu, 2019), originally designed for measuring trial-by-trial dark adaptation curves, to adaptively measure and quantify trial-by-trial learning curves. The qCD method models perceptual sensitivity as a parametric function over time, uses the Bayesian adaptive testing framework (Lu & Dosher, 2013) to select the optimal stimulus and updates a joint probability distribution of parameters for the model of perceptual sensitivity change over time. It generated highly precise and accurate trial-by-trial learning curves in a 4-alternative forced-choice global motion direction perceptual learning experiment (Zhang et al., 2019a; Zhang, Zhao, Dosher, & Lu, 2019b).

These parametric methods perform well when the assumptions of the underlying parametric functional forms of learning are met. However, in most cases, we lack prior knowledge of the candidate functional form, which could be quite complex, depending on the number and nature of the component processes. The aim of this study is to develop nonparametric methods to enhance the temporal resolution of the estimated learning curves, enabling visualization of the dynamic processes in high-temporal resolution. This, in turn, helps us specify the functional form in parametric models and quantify the component processes in perceptual learning.

We have developed two nonparametric methods for estimating the learning curve in perceptual learning: a simple Bayesian inference procedure (BIP) with a uninformative uniform prior and a hierarchical Bayesian model (HBM), which uses information across subjects, tests, and blocks to construct informative priors. Although the BIP estimates the posterior threshold distribution in each block for each subject independently, the HBM estimates the joint posterior threshold distribution across all blocks and subjects in the entire dataset.

In the Bayesian inference framework, the accuracy and precision of the estimated posterior distribution are heavily influenced by two key elements: the prior and the data. The prior signifies the probability distribution of the to-be-estimated variable before new data is collected. Ideally, the prior serves as a mathematical representation of our understanding of the to-be-estimated variable before data collection, often depicted as a uniform distribution when there's little or no prior knowledge (Figure 3a, column 1), or as a concentrated informative distribution when ample knowledge is available (Figure 3b, column 1). However, misinformed priors can introduce bias (Figure 3c, column 1). Generally, the more informative the prior, the higher the accuracy and precision of the estimated posterior distribution with the same amount of data (Figure 3, columns 2–6). The most informative prior (Figure 3b) requires the least amount of data to achieve target levels of precision and accuracy, whereas the biased prior necessitates the most data (Figure 3c). As the amount of data increases, the influence of the prior diminishes (Figure 3, column 6). This paper primarily focuses on constructing informative priors for the to-be-estimated variable (i.e., contrast threshold in each block of trials) using the HBM, because of the limited data availability as the temporal resolution of the analysis increases. By explicitly modeling the covariance of thresholds at the population level, along with conditional dependencies across the three levels, the HBM generates an informative prior for the threshold at each block for each subject by incorporating information across all blocks and subjects in a dataset.

Figure 3.

Effects of prior and data on the estimated posterior distribution of log10 threshold with 10, 20, 40, and 80 trials of data in Bayesian inference. (a) An uninformative uniform prior and estimated posterior distributions. (b) An informative concentrated prior and estimated posterior distributions, and (c) A biased prior and estimated posterior distributions. The dotted vertical line indicates the true contrast threshold.

View Original Download Slide

Effects of prior and data on the estimated posterior distribution of log₁₀ threshold with 10, 20, 40, and 80 trials of data in Bayesian inference. (a) An uninformative uniform prior and estimated posterior distributions. (b) An informative concentrated prior and estimated posterior distributions, and (c) A biased prior and estimated posterior distributions. The dotted vertical line indicates the true contrast threshold.

The HBM consists of three levels: population, subject, and test, in which all subjects belong to a population and may in principle run the same experiment (called “test”¹) multiple times. In the HBM, the distributions of the thresholds at the test level are conditioned on the distributions at the subject level, which, in turn, are conditioned on the distributions at the population level. It also includes covariance hyperparameters at the population level to capture the relationship between thresholds across blocks, while the BIP do not.

Hierarchical structures are frequently employed in behavioral experiments (Kim, Pitt, Lu, Steyvers, & Myung, 2014; Yin, Qin, Sargent, Erlichman, & Shi, 2018). Typically, the top-level study population is divided into groups with varying training/transfer procedures, each consisting of multiple subjects. While many standard statistical procedures assume either homogeneity (e.g., fixed effects) or complete independence across groups and subjects, hierarchical models have been developed to effectively combine heterogeneous information across subjects and groups in a quantitative and coherent manner (Kruschke, 2014; Rouder & Lu, 2005). By combining sub-models and probability distributions at multiple levels of the hierarchy, the HBM computes the joint posterior distributions of the parameters and hyperparameters using Bayes' theorem and all available data, rather than data from a single subject or group (Kruschke, 2014; Kruschke & Liddell, 2018). These models take advantage of conditional dependencies within and across levels, leading to a reduction in the variance of the estimated posterior distributions through two primary mechanisms: (1) decomposition of variabilities from different sources using parameters and hyperparameters (Song, Behmanesh, Moaveni, & Papadimitriou, 2020), and (2) shrinkage of estimated parameters at lower levels towards the modes of higher levels when there is insufficient data at the lower level (Kruschke, 2014; Rouder, Sun, Speckman, Lu, & Zhou, 2003; Rouder & Lu, 2005).

HBMs have been developed and widely applied in cognitive science, spanning signal detection (Prins, 2024; Rouder et al., 2003; Rouder & Lu, 2005), decision making (Lee, 2006; Merkle, Smithson, & Verkuilen, 2011), and functional magnetic resonance imaging (Ahn, Krawitz, Kim, Busmeyer, & Brown, 2011; Palestro et al., 2018; Wilson, Cranmer, & Lu, 2020). Previously, parametric (Zhao, Lesmes, Dorr, & Lu, 2021; Zhao, Lesmes, Hou, & Lu, 2021; Zhao, Lesmes, Dorr, & Lu, 2023a) and nonparametric (Zhao et al., 2023c, Zhao et al., 2023b) HBMs were developed for visual acuity and contrast sensitivity tests. In the parametric HBMs, visual acuity (VA) and contrast sensitivity function (CSF) were modeled with parametric models separately for each modality (Zhao, Lesmes, Dorr, et al., 2021; Zhao, Lesmes, Hou, et al., 2021), and jointly for both VA and CSF (Zhao, Lesmes, Dorr, & Lu, 2023a). In all cases, the HBMs generated more accurate and precise estimates compared to the BIP for each individual modality. In recent developments of nonparametric methods, it was found that the contrast sensitivity (CS) estimates at multiple spatial frequencies (SFs) from the HBM were the most accurate and precise, whereas those from the BIP were less precise and exhibited biases (Zhao et al., 2023b; Zhao et al., 2023c).

In this article, we commence by introducing a generative model of trial-by-trial performance which is grounded in the contrast psychometric function for orientation identification. Following this, we provide an overview of the BIP, enabling the independent estimation of thresholds in each block. Subsequently, we present the three-level HBM designed to collectively estimate the block-by-block learning curves for all subjects. We applied these procedures to data from two studies that investigated the interaction between feedback and training accuracy in Gabor orientation identification over 1920 trials across six sessions (Liu et al., 2010; Liu, Lu, & Dosher, 2012). The learning curves were estimated with block sizes of 10, 20, 40, 80, 160, and 320 trials. Our analysis involved comparing the goodness of fit and the standard deviations of estimated block thresholds obtained from the two methods. Furthermore, we assessed the quality of the estimated learning curves at different block sizes, considering the reliability of estimated rates of general learning and their capacity to elucidate the component processes involved in perceptual learning.

Methods

Data

In this study, we analyzed data from two previously published studies (Liu et al., 2010; Liu et al., 2012) that examined the interaction between feedback and training accuracy in a Gabor orientation identification task (with orientations of 45° ± 10°). The first dataset comprised 24 naïve subjects who were randomly assigned to one of four groups: low training accuracy (65% correct) with and without feedback, and high training accuracy (85% correct) with and without feedback. The second dataset included 36 naïve subjects who were randomly assigned to one of six groups: low training accuracy (65% correct) with and without feedback, high training accuracy (85% correct) with and without feedback, and mixed training accuracy (65% and 85% correct) with and without feedback.

For our analysis in this study, we combined subjects from the low and high training accuracy conditions across the two datasets. We labeled subjects in the low training accuracy conditions without and with feedback as Groups 1 and 2, respectively, and those in the high training accuracy conditions without and with feedback as Groups 5 and 6, respectively. Subjects in the mixed training accuracy conditions without and with feedback were labeled as Groups 3 and 4.

All participants in the original studies had normal or corrected-to-normal vision. They completed the orientation identification task using an accelerated stochastic staircase procedure (Kesten, 1958) with 320 trials in each of the six daily sessions (s = 1, 2,..S; S = 6). Fifty-nine of the subjects received 60 to 80 trials of training using a QUEST procedure (Watson & Pelli, 1983) in the beginning of the first session, whereas one subject did not receive pretraining. Detailed descriptions of the experimental procedures can be found in the original articles (Liu et al., 2010; Liu et al., 2012).

Before the experiment, written consent was obtained from all participants. The study protocol received approval from the institutional review board for human subject research at the University of Southern California and adhered to the principles of the Declaration of Helsinki.

Apparatus

The experiments on human subjects were conducted using MATLAB (MathWorks Corp., Natick, MA, USA) on a Macintosh Power PC G4 computer with a Nanao Technology Flexscan 6600 monitor. Subjects viewed the displays binocularly at 72 cm in a dimly lit room, using a chin rest to maintain head positions. Data analyses were performed on a Dell computer with an Intel Xeon W-2145 @ 3.70GHz CPU (8 cores and 16 threads) and 64GB of installed memory (RAM), using MATLAB and JAGS (Plummer, 2003) in R (R Core Team, 2003).

Theoretical framework

In this section, we introduce the BIP and the HBM with population, subject, and test levels as well as covariance hyperparameters. These methods are used to estimate the threshold distributions within each block of perceptual learning data.

To begin, for each subject i ∈ [1, I] (I = 60) in each test j, we partitioned the data into K blocks. For generality, we keep the index j in the development because subjects could in principle run the same experiment multiple times or we can split the data for repeated analysis. Here, we set j =1 because all the subjects only ran the experiment once. All data collected during pretraining was considered as the initial block (k = 1). Subsequently, we evenly divided the staircase data into block sizes L of 320, 160, 80, 40, 20, and 10 trials, resulting in 6, 12, 24, 48, 96, and 192 blocks, respectively. Therefore we had K = 7, 13, 25, 49, 97, and 193 blocks, respectively. These two methods were then applied to the data in a blind manner, without considering the training accuracy or feedback conditions.

Generative model of trial-by-trial performance

For subject i in block k of test j, where θ_ijk represents the log₁₀ contrast threshold, the probability of obtaining a correct response, denoted as \({r_{{l_{ijk}}}} = 1\), to a stimulus with contrast \({c_{{l_{ijk}}}}\) in trial l is described using a Weibull psychometric function (Figure 4a):

\begin{eqnarray} && p\left( {{r_{{l_{ijk}}}} = 1{\rm{|}} {\theta _{ijk}},\beta,{c_{{l_{ijk}}}}} \right)= g\lambda + \left( {1 - \lambda } \right)\nonumber\\&&\quad\times\left( {g + \left( {1 - g} \right)\left( {1 - {\rm{exp}}\left( { - {{\left( {\frac{{{c_{{l_{ijk}}}}}}{{{\vartheta _{ijk}}}}} \right)}^\beta }} \right)} \right)} \right)\qquad\end{eqnarray}

(1a)

\begin{equation} {\rm{lo}}{{\rm{g}}_{10}}\left( {{\vartheta _{ijk}}} \right) = {\theta _{ijk}} - \frac{1}{\beta }{\rm{lo}}{{\rm{g}}_{10}}\left( {{\rm{log}}\left( {\frac{{1 - g}}{{1 - {p_{1.5}}}}} \right)} \right),\end{equation}

(1b)

where

• g = 0.5 represents the guessing rate.
• β represents the slope of the Weibull psychometric function in a two-alternative forced-choice task (2AFC).
• λ = 0.04 represents the lapse rate when subject makes a random guess.
• p_1.5 = 0.856 is the probability of making a correct response when d′ = 1.5 in a 2AFC task.
• ϑ_ijk is the threshold at d′ = 1.5.

The probability of obtaining an incorrect response is given by:

\begin{equation}\begin{array}{@{}l@{}}p\left( {{r_{{l_{ijk}}}} {=}\, 0{\rm{|}}{\theta _{ijk}},\beta, {c_{{l_{ijk}}}}} \right) {=}\, 1 - p\left( {{r_{{l_{ijk}}}} {=}\, 1{\rm{|}}{\theta _{ijk}}, \beta,{c_{{l_{ijk}}}}} \right). \end{array}\end{equation}

(2)

Figure 4.

View Original Download Slide

(a) Psychometric functions: Four psychometric functions are parameterized with different θ_ijk values but share the same slope β. These functions serve as the generative model for the analysis. β is set as the same across all subjects, tests, and blocks. (b) The BIP is used to compute the threshold distribution in each block of each test for each subject independently. θ_ijk represents the threshold for subject i in block k of test j. (c) The HBM estimates the joint distribution of thresholds across subjects, tests, and blocks. It utilizes mean μ and covariance Σ hyperparameters for the population and mean ρ_ik and standard deviation ε hyperparameters for individual subjects in each block k. ε is assumed to be the same for all subjects and blocks.

Equations 1 and 2 define the likelihood function by quantifying the probability of a correct or incorrect response based on the stimulus contrast \({c_{{l_{ijk}}}}\;\)and contrast threshold θ_ijk of subject i in trial l of block k of test j. The overall probability of observing all the responses (\({r_{1:{L_{ijk}}}})\) for subject i in block k of test j is determined by the product of individual probabilities \(p( {{r_{{l_{ijk}}}}{\rm{|}}{\theta _{ijk}},\beta,\;{c_{{l_{ijk}}}}} )\) for all trials in that specific block, assuming independence across trials:

\begin{equation} p\left( {{r_{1:{L_{ijk}}}}{\rm{|}}{\theta _{ijk}},\beta,{c_{1:{L_{ijk}}}}} \right) = \mathop \prod \limits_{{l_{ijk}} = 1}^{{L_{ijk}}} p\left( {{r_{{l_{ijk}}}}{\rm{|}}{\theta _{ijk}},\beta,{\rm{\;}}{c_{{l_{ijk}}}}} \right). \end{equation}

(3)

The specific values of L_ij1 and L_{ij2: K} depend on the subject, block, and test, with different subjects having varying numbers of trials L_ij1 and the subsequent trials L_{ij2: K} being determined by the block sizes (320, 160, 80, 40, 20, 10) for different values of K (7, 13, 25, 49, 97, 193). In essence, these equations allow the generative model to calculate the likelihood of both correct and incorrect responses for each trial, as well as the overall likelihood of obtaining the observed data for subject i in block k of test j, given the known contrast threshold θ_ijk.

BIP

The BIP is used to estimate the posterior distribution of θ_ijk from the trial-by-trial data \({Y_{ijk}} = \{ {( {{r_{1:{L_{ijk}}}},{c_{1:{L_{ijk}}}}} )} \}\;\)of subject i in block k of test j via Bayes’ rule (Figure 4b):

\begin{equation}\begin{array}{@{}l@{}} p\left( {{\theta _{ijk}},\beta {\rm{|}}{Y_{ijk}}} \right) = \frac{{\mathop \prod \nolimits_{{l_{ijk}} = 1}^{{L_{ijk}}} p\left( {{r_{{l_{ijk}}}}{\rm{|}}{\theta _{ijk}},\beta,{c_{{l_{ijk}}}}} \right){p_0}\left( {{\theta _{ijk}},\beta } \right)}}{{\int \mathop \prod \nolimits_{{l_{ijk}} = 1}^{{L_{ijk}}} p( {{rl_{ijk}}{\rm{|}}{\theta _{ijk}},\beta,{cl_{ijk}}} ){p_0} ( {{\theta _{ijk}},\beta } )d{\theta _{ijk}}d\beta }},\end{array}\end{equation}

(4)

where p(θ_ijk, β|Y_ijk) is the posterior distribution of θ_ijk and β, which represents the contrast threshold and slope of the psychometric functions, given the trial-by-trial data Y_ijk, \(\mathop \prod \nolimits_{{l_{ijk}} = 1}^{{L_{ijk}}} p( {{r_{{l_{ijk}}}}{\rm{|}}{\theta _{ijk}},\beta,{c_{{l_{ijk}}}}} )\) is the likelihood term, which quantifies the probability of observing the responses \({r_{{l_{ijk}}}}\) in the trials given θ_ijk and \({c_{{l_{ijk}}}}\), and p₀(θ_ijk,β) is the prior probability distribution of θ_ijk and β. In this application, the priors of θ_ijk are set as uniform distributions in log₁₀ units with a wide range spanning from 3% (10^−1.5) to 100% (10⁰) contrast:

\begin{equation}{p_0}\left( {{\theta _{ijk}}} \right) = \mathcal{U}\left( { - 1.5,0} \right),\end{equation}

(5a)

The denominator is an integral across all possible values of θ_ijk. β is a single parameter across all subjects, blocks, and tests. Its prior is set as a uniform distribution:

\begin{equation}{p_0}\left( \beta \right) = \mathcal{U}\left( {1,4} \right).\end{equation}

(5b)

For the prior of θ_ijk, the lower bound of −1.5 log₁₀ units was based on results from previous studies (Liu et al., 2010; Liu et al., 2012), and upper bound of 0 log₁₀ units was based on the physical limit of stimulus contrast. The bounds of the prior of slope β were based on the typical range of observations in the literature (Foley & Legge, 1981; Hou et al., 2015; Legge, Kersten, & Burgess, 1987; Lu & Dosher, 1999).

The BIP is applied independently in each of the K blocks to obtain the posterior contrast threshold distributions for all blocks of each test for a subject. It is then repeated across all subjects to obtain posterior contrast threshold distributions for all blocks in the entire experiment. This procedure allows for the estimation of the contrast threshold distributions based on the available trial data.

HBM

The HBM is a three-level hierarchical Bayesian model used to estimate contrast thresholds across tests, subjects, and blocks with the inclusion of covariance of hyperparameters across blocks at the population level (Figure 4c). Here's an overview of the key components of the HBM.

Population level

The probability distribution of the contrast threshold hyperparameter η, which consists of contrast thresholds in all blocks at the population level, is modeled as a mixture of a truncated K-dimensional Gaussian distribution with mean μ and covariance Σ:

\begin{eqnarray}\vphantom{\sum} p\left( \eta \right) = \mathcal{N}\left( {\eta,\mu,\Sigma } \right)T\left( {a,b} \right)p\left( \mu \right)p\left( \Sigma \right),\qquad\end{eqnarray}

(6a)

where the truncation boundaries a = −1.5 and b = 0 (log₁₀ units) are identical to the boundaries of the uniform prior used in the BIP. In this application, we set Σ = δΣ_HBMv, in which Σ_HBMv is the covariance matrix of the estimated contrast threshold across all subjects from another HBM model (HBMv) that only considers variance but not the covariance between thresholds across subjects (see Supplementary Materials B), and δ is a scaling factor:

\begin{eqnarray}p\left( \eta \right) = \mathcal{N}\left( {\eta,\mu,\delta {\Sigma _{{\rm{HBMv}}}}} \right)T\left( {a,b} \right)p\left( \mu \right)p\left( \delta \right),\qquad\end{eqnarray}

(6b)

where p(μ) and p(δ) are distributions of μ and δ.

Subject level

The probability distribution of the contrast threshold hyperparameter τ_ik of subject i at the subject level is modeled as a mixture of truncated Gaussian distributions with mean ρ_ik and standard deviation ε, with distributions p(ρ_ik|η_k) and p(ε):

\begin{eqnarray}p\left( {{\tau _{ik}}{\rm{|}}{\eta _k}} \right) = \mathcal{N}\left( {{\tau _{ik}},{\rho _{ik}},\varepsilon } \right)T\left( {a,b} \right)p\left( {{\rho _{ik}}|{\eta _k}} \right)p\left( \varepsilon \right),\quad\end{eqnarray}

(7)

in which ρ_ik is conditioned on η_k.

Test level

The probability distribution of the contrast threshold parameters θ_ijk is conditioned on τ_ik. The probability of obtaining the entire dataset is computed using probability multiplication, which involves all levels of the model and the likelihood function based on the trial data:

\begin{equation}\begin{array}{@{}l@{}}p\left( {{Y_{1:I,1:J,1:K}}{\rm{|}}X} \right) = \mathop \prod \limits_{i = 1}^I \mathop \prod \limits_{j = 1}^J \mathop \prod \limits_{k = 1}^K \mathop \prod \limits_{{l_{ijk}} = 1}^{{L_{ijk}}} p\left( {{r_{{l_{ijk}}}}{\rm{|}}{\theta _{ijk}},\beta,{c_{{l_{ijk}}}}} \right)\\ \hphantom{p\left( {{Y_{1:I,1:J,1:K}}{\rm{|}}X} \right)} \times p\left( {{\theta _{ijk}}{\rm{|}}{\tau _{ik}}} \right)p\left( {{\tau _{ik}}{\rm{|}}{\eta _k}} \right)p\left( \eta \right)p\left( \beta \right)\\= \mathop \prod \limits_{i = 1}^I \mathop \prod \limits_{j = 1}^J \mathop \prod \limits_{k = 1}^K \mathop \prod \limits_{{l_{ijk}} = 1}^{{L_{ijk}}} p\left( {{r_{{l_{ijk}}}}{\rm{|}}{\theta _{ijk}},\beta,{c_{{l_{ijk}}}}} \right)\\\times p\left( {{\theta _{ijk}}{\rm{|}}{\tau _{ik}}} \right)p\left( {{\rho _{ik}}|{\eta _k}} \right)\mathcal{N}\left( {{\tau _{ik}},{\rho _{ik}},\varepsilon } \right)T\left( {a,b} \right)p\left( \varepsilon \right)\\ \times\, p\left( {\eta,\mu,\delta {\Sigma _{{\rm{HBMv}}}}} \right)T\left( {a,{\rm{\;}}b} \right)p\left( \mu \right)p\left( \delta \right)p\left( \beta \right), \end{array}\!\!\!\!\!\end{equation}

(8)

where X = (θ_{1: I, 1: J}, ρ_{1: I, 1: K}, μ, δ, ε, β) are all the parameters and hyperparameters in the HBM.

Bayes' rule is used to compute the joint posterior distribution of X, which includes all parameters and hyperparameters of contrast thresholds across all blocks. This computation involves integrating over all possible values of X:

\begin{eqnarray}\begin{array}{@{}l@{}} p\left( {X{\rm{|}}{Y_{1:I,1:J,1:K}}} \right)\\ = \frac{\begin{array}{@{}l@{}} \prod\nolimits_{i = 1}^I \prod\nolimits_{j = 1}^J \prod\nolimits_{k = 1}^K \prod\nolimits_{_{{l_{ijk}}} = 1}^{{L_{ijk}}} p\left( {{r_{{l_{ijk}}}}{\rm{|}}{\theta _{ijk}},\beta,{c_{{l_{ijk}}}}} \right)p\left( {{\theta _{ijk}}{\rm{|}}{\tau _{ik}}} \right)\\ \quad\!\!\times\, p\left( {{\rho _{ik}}|{\eta _k}} \right)\mathcal{N}\left( {{\tau _{ik}},{\rho _{ik}},\varepsilon } \right)T\left( {a,\;b} \right){p_0}\left( \varepsilon \right)\\ \quad\!\!\times\, p\left( {\eta,\mu,\delta {\Sigma _{{\rm{HBMv}}}}} \right)T\left( {a,\;b} \right){p_0}\left( \mu \right){p_0}\left( \delta \right){p_0}\left( \beta \right) \end{array}}{\begin{array}{@{}l@{}}\int \prod\nolimits_{i = 1}^I \prod\nolimits_{j = 1}^J \prod\nolimits_{k = 1}^K \prod\nolimits_{_{{l_{ijk}}} = 1}^{{L_{ijk}}} p\left( {{r_{{l_{ijk}}}}{\rm{|}}{\theta _{ijk}},\beta,{c_{{l_{ijk}}}}} \right)\\ \quad\!\!\times\, p\left( {{\theta _{ijk}}{\rm{|}}{\tau _{ik}}} \right)p\left( {{\rho _{ik}}|{\eta _k}} \right)\mathcal{N}\left( {{\tau _{ik}},{\rho _{ik}},\varepsilon } \right)T\left( {a,\;b} \right){p_0}\left( \varepsilon \right)\\ \quad \!\!\times\, p\left( {\eta,\mu,\delta {\Sigma _{{\rm{HBMv}}}}} \right)T\left( {a,b} \right){p_0}\left( \mu \right){p_0}\left( \delta \right){p_0}\left( \beta \right)dX \end{array}}, \end{array}\!\!\!\!\!\!\!\!\!\!\nonumber\\\end{eqnarray}

(9)

where the denominator is an integral across all possible values of X and is a constant for a given dataset and HBM; p₀(μ), p₀(δ), p₀(ε), and p₀(β) are the prior distributions of μ, δ, ε, and β:

\begin{equation}{p_0}\left( \mu \right) = {\mathcal{U}_K}\left( { - 1.5,0} \right),\end{equation}

(10a)

\begin{equation}{p_0}\left( \delta \right) = \mathcal{U}\left( {0.8,1.25} \right),\end{equation}

(10b)

\begin{equation}{p_0}\left( {\frac{1}{{{\varepsilon ^2}}}} \right) = {\rm{\Gamma }}\left( {15,1} \right),\end{equation}

(10c)

\begin{equation}{p_0}\left( \beta \right) = \mathcal{U}\left( {1,4} \right),\end{equation}

(10d)

where \({\mathcal{U}_K}( {a,b} )\) denotes a uniform distribution between a and b in each of the K dimensions. The priors for μ and β, denoted as p₀(μ) and p₀(β), respectively, remain consistent with those used in the BIP. However, due to the additional variability introduced by the Gaussian random variables, the priors in the HBM are less informative than those in the BIP.

The HBM estimates the joint posterior distribution of contrast thresholds in all blocks across all tests and subjects in which the contrast threshold estimates mutually constrain each other. This allows for more robust and interconnected estimates of contrast threshold.

Computing the joint posterior distribution

We used R (R Core Team, 2003) function run.jags in JAGS (Plummer, 2003) to generate representative samples of θ_ijk in three Markov Chain Monte Carlo (MCMC) chains for subject i in block k, using the BIP through a random walk procedure (Kruschke, 2014). Each chain produced 5000 kept samples (with a thinning ratio of 10) after a burn-in phase of 10,000 steps and 200,000 adaptation steps. Similarly, we calculated 5000 kept samples (with a thinning ratio of 10) of the joint posterior distribution of θ_ijk (K × 60 − 1 parameters), ρ_ik (K × 60 − 1 parameters), σ (1 parameter), μ (K parameters), ε (1 parameter), and β (1 parameter) in three MCMC chains for HBM after a burn-in phase of 20,000 steps and 100,000 adaptation steps. The number of adaptation steps were determined to ensure convergence, following the Gelman and Rubin diagnostic rule (Gelman & Rubin, 1992), which is based on the ratio of between- and within-MCMC variances of each parameter, that is, the variance of the samples across MCMC processes divided by the variance of the samples in each MCMC process. The model was deemed “converged” when the ratios for all the parameters were smaller than 1.05. We applied these three models to both pretraining and training data, which were grouped into block sizes of L = 320, 160, 80, 40, 20, and 10 trials, with K = 7, 13, 25, 49, 97, 193.

Statistical analysis

We initially estimated and evaluated the block contrast thresholds (θ_i1k) for all subjects using the two methods, regardless of training accuracy and feedback conditions. In this case, j = 1 because each subject underwent testing only once. Subsequently, we unblinded the data to conduct group-level statistical analyses and recover the trial-by-trial learning curve from the estimated thresholds.

Goodness of fit

We assessed and compared goodness of fit among the two methods and block sizes using the Bayesian predictive information criterion (BPIC) (Ando, 2007; Ando, 2011). The BPIC calculates the likelihood of the data based on the joint posterior distribution of the model's parameters while also penalizing for model complexity.

Standard deviation

To measure variability or uncertainty at the test level for the two methods (Clayton & Hills, 1993; Edwards, Lindman, & Savage, 1963), we used the standard deviation (SD) of the posterior distribution of θ_i1k.

Comparing the learning curves

To assess the quality of the learning curves obtained using the BIP and HBM at different block sizes, we conducted a linear regression analysis using the R function lm(R Core Team, 2003). The steps involved in this analysis were as follows:

1. We constructed the average learning curves for the six experimental groups based on a single random sample (m∈[1,M], M is the number of total samples) drawn from the joint distribution of log contrast threshold θ_{1: I, 1, 1: K} obtained from the BIP or HBM solution at a specific block size.
2. We fitted a linear regression model to the average learning curve \({\bar \theta _{mgk}}\), which is expressed as:
\begin{eqnarray}{\hat \theta _{mgk}} = {b_{mg}} + {\gamma _{mg}}{\rm{lo}}{{\rm{g}}_{10}}\left( {{{\bar L}_{gk}}} \right) + {\xi _{mg}}.\qquad\end{eqnarray}
(11)
Here, \({\hat \theta _{mgk}}\) represents the predicted average log₁₀ contrast threshold sample m of group g, and b_mg, γ_mg, and ξ_mg are the intercept, slope (learning rate), and residual noise for sample m of group g. For the QUEST data (k = 1), we used the average midpoint of each group: \({\bar L_{g1}}\) = 36.7, 32.5, 31.7, 36.7, 34.2, and 34.5 for g = 1, …, 6. For the staircase data (k = 2, 3, …, K ), \({\bar L_{gk}}\) is the trial number at the midpoint of each block.
3. We repeated steps (1) and (2) for a total of M = 3000 times.
4. We computed the mean learning rate as:
\begin{eqnarray}{\bar \gamma _g} = \mathop \sum \limits_{m = 1}^M \frac{{{\gamma _{mg}}}}{M},\qquad\end{eqnarray}
(12a)
We also calculated the standard deviation of \({\bar \gamma _g}\;\)as:
\begin{eqnarray}{\widehat {SD}_{{{\bar \gamma }_g}}} = \sqrt {\frac{{\mathop \sum \nolimits_{m = 1}^M {{\left( {{\zeta _{{\gamma _{mg}}}}/4} \right)}^2}}}{M} + \frac{{\mathop \sum \nolimits_{m = 1}^M {{\left( {{\gamma _{mg}} - {{\bar \gamma }_g}} \right)}^2}}}{{M - 1}}} \qquad\end{eqnarray}
(12b)
Here, \({\zeta _{{\gamma _{gs}}}}\) represents the 95% credible interval of γ_mg obtained from linear regression, which is equivalent to four standard deviations of the underlying γ_mg distribution in the MCMC sample m.
5. We repeated steps (1) to (4) for each of the five block sizes.

We then used the 95% confidence intervals (\( \pm 2{\widehat {SD}_{{{\bar \gamma }_g}}}\)) of the γ_mg distributions to determine whether there was significant difference in level of learning between any two groups.

Identifying component processes

Our primary focus was on the method that produced the most precise contrast threshold estimates, which turned out to be HBM. For each block size, we calculated the average learning curve of the five groups (Groups 2 to 6) from the HBM estimates that exhibited similar levels of learning based on the results of the linear regression analysis, excluding the low training accuracy without feedback group that exhibited very small amount of learning (Group 1). This average learning curve is denoted as \({\bar \theta _k}\;\)and is computed as follows:

\begin{equation}{\bar \theta _k} = \mathop \sum \limits_{g = 2}^6 \mathop \sum \limits_{m = 1}^M \frac{{{{\bar \theta }_{mgk}}}}{M}.\end{equation}

(13)

We presented a generative model in Figure 1a, which comprises four latent component processes. Below, we provide a mathematical framework based on this generative model and apply it to identify these component processes from the data. The generative models include the following components:

• General learning: γlog₁₀(l) + b, where b represents the initial threshold and γ is the learning rate.
• Between-session forgetting, or consolidation: This is depicted as a step function at the beginning of each daily session, characterized by a height δ_s, which can vary across sessions (s);
• Within-session rapid relearning: Modeled as an elbow function with a rapid linear learning rate τ and an asymptotic level d_s, which can vary across sessions (s);
• Within-session adaptation or deterioration: Represented as a linear function with a rate φ_s, which can vary across sessions (s).

We then used the average predicted learning curves from the generative models, corresponding to the same block size (Figures 1b through 1f), to fit the data.

To identify the best-fitting model for the average learning curve at each block size, we constructed a model lattice. The saturated model contained a total of 19 parameters: γ, b, τ, with δ_s (five parameters), d_s (five parameters), and φ_s (six parameters) allowed to vary across sessions. In contrast, the most reduced model had only two parameters for general learning.

We used Matlab function fminsearch, which uses a simplex search method (Lagarias, Reeds, Wright, & Wright, 1998), for data fitting. The method aimed to minimize the sum of squares error (SSE) between model predictions and observed values, \({\rm{SSE}} = \mathop \sum \nolimits_{k = 1}^K {( { {{\bar \theta }_k} - {{\widehat {\bar \theta }}_k}} )^2}\). We assessed the goodness of fit using R², quantifying the proportion of variance explained:

\begin{eqnarray}{\rm{\;}}{R^2} = 1 - \frac{{\mathop \sum \nolimits_{k = 1}^K {{\left( { {{\bar \theta }_k} - {{\widehat {\bar \theta }}_k}} \right)}^2}}}{{\mathop \sum \nolimits_{k = 1}^K {{\left( { {{\bar \theta }_k} - \skew3\bar{\bar \theta } } \right)}^2}}},\qquad\end{eqnarray}

(14a)

\begin{eqnarray}\skew3\bar{\bar \theta } = \mathop \sum \limits_{k = 1}^K {\bar \theta _k}.\qquad\end{eqnarray}

(14b)

To statistically compare the goodness of fit between any two nested models, we used an F-test:

\begin{eqnarray}F\left( {d{f_1},d{f_2}} \right) = \frac{{\left( {r_{full}^2 - r_{reduced}^2} \right)/d{f_1}}}{{\left( {1 - r_{full}^2} \right)/d{f_2}}}.\qquad\end{eqnarray}

(15)

Here, df₁ = k_full − k_reduced, df₂ = K − k_full, k_full and k_reduced are the numbers of parameters of the full and reduced models, respectively, and K is the number of blocks (data points).

For block sizes L = 10, 20, 40, and 80, model comparisons were conducted in two steps. First, the full model was compared to seven reduced models where parameters of between-session forgetting, within-session rapid relearning, or within-session adaptation were constrained across sessions. Therefore the seven reduced models corresponded to all combinations of possible cross-session constraints placed on the parameters versus the full model, which did not constrain any of the parameters. Second, the best-fitting model from step one was compared to seven reduced models without one or more components. For block sizes L = 160 and 320, the first step used for L = 10, 20, 40, and 80 was skipped because there were not sufficient data in each session to constrain the models in the step. Instead, a model with δ_s, τ_s, d_s, and φ_s constrained across sessions was compared with reduced models without one or more component processes.

Results

Goodness of fit

With block sizes of 10, 20, 40, 80, 160, and 320 trials, we calculated the BPIC values for the two models (Table 1). Among these models, the HBM with a block size of 10 trials achieved the best fit. Notably, for a given block size, the fit provided by the HBM outperformed the BIP. As the block size decreased, the fit improved for the HBM. However, in the case of the BIP, the fit improved from L = 320 to 160 but then decreased from 160 to 10, likely because of biases introduced by the uninformative prior.

Table 1.

View Table

BPIC values for the BIP and HBM.

Posterior distributions

Figure 5a provides visual representations of a two-dimensional marginal posterior distribution of hyperparameters at the population level obtained from the HBM, demonstrating strong correlation between blocks. The HBM, with its incorporation of covariance hyperparameters, allowed us to quantify the relationships between contrast thresholds across blocks among different subjects to generate more informative priors (Figure 3b) to improve the accuracy and precision of the estimates, especially with smaller block sizes. The correlations between contrast thresholds in pairs of blocks were observed to range from −0.03 to 0.54, −0.04 to 0.58, −0.06 to 0.65, −0.09 to 0.78, −0.13 to 0.84, and −0.13 to 0.85 for block sizes of 10, 20, 40, 80, 160, and 320 trials, respectively.

Figure 5.

View Original Download Slide

Illustrations of posterior distributions of threshold hyperparameters for two consecutive blocks (k = 86, 87) at the population level at L = 20 from the HBM (a); hyperparameters at the subject level of a typical subject (i = 27) in blocks (k = 86, 87) from the HBM (b); and parameters at the test level from the HBM (c) and BIP (d) of the same subject and blocks in (b).

Figure 5b presents a two-dimensional marginal posterior distribution of hyperparameters at the subject level from HBM. Figures 5c, d provide a visualization of two-dimensional marginal posterior distributions of parameters at the test level from the HBM and BIP, respectively. The HBM exhibited much higher precision than the BIP at the test level (Figures 5c vs. 5d), attributable to the informative prior provided by the subject level distribution (Figure 5b).

Table 2 shows the average standard deviations of the posterior distributions of θ_i1k across all subjects. Notably, the HBM consistently produced more precise θ_i1k estimates across all block sizes. For the QUEST data, the average SD was largely consistent across block sizes and the HBM reduced the average SD (0.045 log₁₀ units) by 64% relative to the BIP (0.125 log₁₀ units). For the staircase data with block sizes of L = 10, 20, 40, 80, 160, and 320 trials, the average SD was 0.089, 0.077, 0.065, 0.053, 0.43, and 0.34 log₁₀ units for the HBM; and 0.270, 0.200, 0.125, 0.067, 0.047, and 0.035 log₁₀ units for the BIP. The HBM reduced the SD by 67%, 61%, 48%, 21%, 9%, and 3% relative to the BIP for these block sizes.

Table 2.

View Table

Average standard deviations of the posterior distributions of θ_i1k.

Lastly, considering block sizes of L = 10, 20, 40, 80, 160, and 320 trials, the mean and standard deviations of β were 2.36 ± 0.055, 2.10 ± 0.043, 2.11 ± 0.043, 2.18 ± 0.044, 2.16 ± 0.043, and 2.08 ± 0.039 from the BIP, and 2.00 ± 0.034, 2.05 ± 0.040, 2.10 ± 0.043, 2.12 ± 0.044, 2.11 ± 0.043, and 2.06 ± 0.040 from the HBM. In comparison, the HBM reduced the SD of β by 38% and 7% relative to BIP for L = 10 and 20, whereas SDs of β from the two methods were similar for all the other larger block sizes.

Linear regression analysis

Figure 6 presents the results of linear regression analysis on the average learning curves of Group 6 using the BIP and HBM solutions in varying block sizes. The patterns of results of the other five groups are similar to that of Group 6. They are presented in the Supplementary Materials.

Figure 6.

View Original Download Slide

Results of the linear regression analysis on the average learning curves of Group 6 (high training accuracy with feedback) using the BIP and HBM solutions with block sizes of L= (a)10, (b) 20, (c) 40, (d) 80, (e)160, and (f) 320 trials. Error bars: standard error.

Table 3 summarizes the average learning rates \({\bar \gamma _g}\) and their standard deviations \({\widehat {SD}_{{{\bar \gamma }_g}}}\;\)for the six groups across different block sizes from the BIP and HBM. The HBM generated more precise learning rate estimates in all groups and across all block sizes than the BIP.

Table 3.

View Table

Average learning rate \({{\rm{\bar \gamma }}_{\rm{g}}}\) and standard deviation \({\widehat {{\rm{SD}}}_{{{{\rm{\bar \gamma }}}_{\rm{g}}}}}\).

Based on the 95% confidence intervals of the estimated learning rates (\( \pm 2{\widehat {SD}_{{{\bar \gamma }_g}}}\)), the BIP did not detect significant learning consistently across block sizes. On the other hand, the HBM detected significant learning for all groups in all block sizes except Group 1 (training at 65% accuracy without feedback) when L = 160 and 320. The learning rates estimated from the HBM solutions were not significantly different across block sizes within each group. Averaged across block sizes, the learning rate of Group 1 (low training accuracy without feedback) was significantly lower (all p <0.01) than that of all the other five groups, which were not significantly different from each other (all p >0.27) (Figure 7). These results are consistent with the original studies (Liu et al., 2010; Liu et al., 2012) except that the original studies concluded that there was no significant learning in Group 1 based on analysis at a coarse temporal resolution (L = 80 in Liu et al. (2010) and 160 in Liu et al.(2012)), but we found significant learning in Group 1 in high temporal resolution analysis. In all six groups, the most precise estimates were obtained with a block size L = 10. Across all six groups, \({\widehat {SD}_{{{\bar \gamma }_g}}}\) decreased with decreased block size; the magnitude of reduction became very small between block sizes of 10 and 20. Although the standard deviation of the estimated threshold in each block increased with decreasing block size (Table 2), the increased number of data points in smaller block sizes improved the overall precision of the estimated learning rate.

Figure 7.

View Original Download Slide

Distributions of the average learning rate (γ_mg) across six block sizes of the six groups: Group 1 (low training accuracy without feedback; solid line), Group 2 (low training accuracy with feedback; dashed line), Group 3 (mixed training accuracy without feedback; dotted line:), Group 4 (mixed training accuracy with feedback; dot-dash line), Group 5 (high training accuracy without feedback; long dash line), and Group 6 (high training accuracy with feedback; two-dash line). γ_mg distributions of Groups 3, 4, 5 and 6 overlap almost completely and appear as the dark orange distributions on the left.

In addition to the different precision of the estimated slopes, there are obvious differences between the estimated learning curves from the two methods in smaller block sizes, suggesting large biases in some of the estimates (Figure 6). Because the posterior from Bayesian inference approaches the truth with more data, we used the results from the linear regression of the 320-trial HBM as the truth to compute the bias of the BIP and HBM estimates with various block sizes. As shown in Table 4, the HBM generated unbiased estimates in all groups across all block sizes. We conducted t-tests on the estimated thresholds from both methods at each block size across all groups. The BIP generated unbiased estimates only with large block sizes (p = 0.961 and 0.077 for L = 320 and 160, respectively), but exhibited significant bias with small block sizes (all p < 0.001 for L = 80, 40, 20, and 10). The results reflected the influence of the uninformative (BIP) and informative (HBM) priors on the posterior distributions (Figure 3).

Table 4.

View Table

Average bias of the estimated thresholds from the BIP and HBM.

Functioning of the methods

To analyze the functioning of the two methods, we examined how they generated the priors and posteriors of θ_i1k. The priors for two typical subjects in two distinct blocks (i = 39, 42; k = 23, 44) are shown in Figure 8, and the posteriors for one of the two subjects in one of the blocks (i = 39, k = 44) are depicted in Figure 9, both with a block size of L = 40.

Figure 8.

View Original Download Slide

Prior distributions of θ_i1k for two subjects in two distinct blocks (i = 39, 42; k = 23, 44) generated by (a) BIP and (b) HBM. In (a), all four priors are identical. In (b), priors for the two subjects in the two blocks differ (solid line: i = 39, k = 23; dashed line: i = 39, k = 44; dotted line: i = 42, k = 23; dot-dashed line: i = 42, k = 44).

Figure 9.

View Original Download Slide

(a) Prior and (b) posterior distributions of θ_i1k for subject (i = 39) in block (k = 44) obtained from the BIP and HBM.

To compute the prior distributions of θ_i1k for these two subjects in those two blocks using the two methods, we removed the data of one subject in one block at a time while keeping the data from all the other subjects and blocks unchanged, fit both the BIP and HBM to the data, and repeated the procedure four times. Because there were no data for these two subjects in those blocks, the posterior distributions of θ_i1k obtained from the models can be viewed as priors.

Figure 8 shows the solutions from the two methods. The BIP generated one uninformative θ_i1k prior for both subjects in both blocks because it models contrast threshold for each subject in each block independently and does not contain any relationship across blocks and subjects (Figure 8a). The HBM generated different θ_i1k priors for the two subjects in the two blocks because it incorporates relationships across all subjects and blocks (Figure 8b).

To illustrate how the two methods generate posterior distributions of θ_i1k for one of the two subjects in one of the blocks (i = 39, k = 44), we compared the priors of θ_i1k from the the methods as generated above (Figure 9a) and considered how the two methods generate posterior distributions of θ_i1k (Figure 8b). Both methods produced θ_i1k posteriors characterized by distinct peaks (Figure 9b). As the HBM began with a more informative prior, the posterior distributions of θ_i1k from the HBM exhibited a smaller SD (0.060 log₁₀ units) than the BIP (0.085 log₁₀ units), representing a 42% reduction.

Component processes

Because of its significantly higher precision compared to the BIP, we conducted analyses of component processes based only on the joint posterior distributions generated by the HBM. We fit the full 19-parameter generative model and its various reduced forms to the average empirical learning curves across the five experimental groups that exhibited similar magnitudes of learning for each block size. We used fminsearch in Matlab to find the best-fitting parameters of each model and conducted F-test on the R² of the nested models for each block size. The best-fitting models for the six block sizes are presented in Figure 10, and the parameters of these models are summarized in Table 5. The largest number of component processes was identified in the best-fitting model when L = 10, 20 and 40. As the block size increased, fewer component processes were identified.

Figure 10.

View Original Download Slide

The best fitting models (solid curves) for six block sizes: (a) 10, (b) 20, (c) 40, (d) 80, (e) 160, (f) 320 (data: orange asterisks).

Table 5.

View Table

Parameters of the best-fitting models. (General learning rate γ, initial threshold b, within-session adaptation or deterioration φ_s, between-session forgetting or consolidation δ_s, within-session rapid relearning rate τ and asymptotic level d_s).

For L = 10, 20, and 40, the best-fitting model included all the component processes, with variations in the magnitudes of between-session forgetting. The addition of variations in adaptation or within-session rapid relearning by session did not significantly improve the fit (all p > 0.97), whereas the removal of any component process resulted in a significantly worse fit (all p < 0.004). Notably, the estimated magnitudes of the components are quite consistent across different block sizes.

For L = 80, the best-fitting model included general learning, between-session gain, and within-session adaptation, with variations in the magnitudes of between-session gain. The addition of within-session rapid relearning did not lead to a significantly improved fit (p = 0.084), while the removal of any component process resulted in a significantly worse fit (all p < 0.033). For L = 160, the best-fitting model consisted of general learning and between-session gain. Adding adaptation and/or re-learning did not significantly enhance the fit (all p > 0.096) but removing between-session gain resulted in a significantly worse fit ( p = 0.009). For L = 320, the best-fitting model only included general learning, as the addition of any other component processes did not significantly improve the fit (all p > 0.094).

Effects of lapse rate

At the request of one reviewer, we conducted additional analysis with 10% and 15% lapse rates in the generative model (Equation 1a) to evaluate the impact of lapse rate on our analysis, considering a block size of L = 40. With both 10% and 15% lapse rates, the estimated block thresholds were not significantly different from the estimates with a 4% lapse rate across all blocks at the subject level (α = 0.05, with Bonferroni correction). Subsequently, we averaged the estimated block thresholds across the 48 subjects in the five groups that exhibited significant learning (Groups 2 to 6). We found that the 10% and 15% lapse rates led to an average −0.033 ± 0.003 and −0.063 ± 0.005 log₁₀ units threshold change compared to the 4% lapse rate, respectively. Although none of the threshold changes were significant with the 10% lapse rate, 42 out of the 49 threshold changes with the 15% lapse rate were significant at the group level (α = 0.05, with Bonferroni correction).

Next, we submitted the average thresholds from the five groups to the component processes analysis. With both 10% and 15% lapse rates, we identified significant contributions from all component processes, with variations in the magnitudes of between-session forgetting, qualitatively consistent with the results with the 4% lapse rate in the original analysis. Because the larger lapse rates caused consistent reductions of the estimated thresholds across blocks, the overall learning curve was shifted vertically, leading to only slight changes of the magnitudes of the parameters of the component processes (Table 6).

Table 6.

View Table

Parameters of the best-fitting models with L = 40 with three lapse rates.

Discussion

To uncover the component processes involved in perceptual learning, we developed two nonparametric Bayesian inference procedures aimed at conducting high-temporal-resolution analyses of the learning curve. By integrating between-subject covariance of contrast thresholds across blocks, the HBM generated significantly better fits to the data, smaller standard deviations, and more precise estimates of the learning rates, compared to the BIP across all block sizes. In addition, the HBM generated unbiased block threshold estimates across all groups and block sizes, whereas the BIP only generated unbiased estimates with larger block sizes but exhibited increased bias with smaller block sizes.

Among the HBM solutions at various temporal resolutions, the analysis at the highest temporal resolution (block size L = 10) yielded the best fit to the trial-by-trial data based on BPIC. It also provided the most precise estimate of the rate of general learning in all six experimental groups, allowing for the identification of the majority of component processes from the average learning curve of the five experimental groups that exhibited a similar rate of general learning.

With the HBM, we observed significant learning across all groups for all block sizes, except for Group 1 (which received training at 65% accuracy without feedback) with block sizes of 160 and 320. Notably, the learning rates of groups 2 to 6 remained relatively consistent across all block sizes and did not significantly differ from each other. However, Group 1 exhibited significantly lower learning rates compared to the other five groups. These findings align with the original results (Liu et al., 2010; Liu et al., 2012). In their study, learning curves were analyzed at a single coarse temporal resolution (L = 160), and they reported significant general learning in five out of the six experimental conditions, all characterized by similar learning rates. However, they did not find significant learning in Group 1. Our results largely corroborate these earlier findings but reveal significant learning in Group 1 when analyzed at higher temporal resolutions, albeit of much smaller learning rate than the other groups, underscoring the enhanced analytical power of high-temporal-resolution analysis.

We established a generative model framework incorporating four distinct component processes and applied this model lattice to the average learning curve observed in the five experimental groups that displayed similar rates of general learning across varying temporal resolutions. When we analyzed the data with L = 10, 20, and 40, we identified significant contributions from all component processes, with variations in the magnitudes of between-session forgetting. Moreover, the estimated magnitudes of the components are quite consistent across different block sizes. However, as we decreased the temporal resolution of the analysis, we observed a decreasing number of identifiable component processes. Specifically, for L = 80 and 160, some or all of the parameters initially interpreted as between-session forgetting, as identified at L = 10, 20, and 40, exhibited a change in sign, becoming indicative of between-session gain. This suggests that conducting analyses at lower temporal resolutions may lead to misidentifying the underlying component processes. It's worth noting that a study (Yang et al., 2022) reported findings of between-session gain in a contrast detection task when analyzed at L = 96. It would be intriguing to re-examine their data at higher temporal resolutions for a more comprehensive understanding.

Although our primary focus has been on analyzing the learning curve, the methods we have developed can also be effectively used to investigate specificity and transfer in perceptual learning. Many claims regarding specificity or transfer in perceptual learning have relied solely on initial transfer estimates, typically measured as performance during the first assessment after a task switch. Some studies, albeit fewer in number, have examined learning rates following a task switch and have reported instances of accelerated learning or “learning to learn” (Bejjanki et al., 2014; Kattner, Cochrane, Cox, et al., 2017; Liu & Weinshall, 2000; Zhang et al., 2021). However, the granularity of analysis in these studies has often been coarse, preventing a thorough examination of how prior learning impacts the underlying component processes.

This issue is illustrated in Figure 11, where we present a generative model alongside two levels of averaging based on five initial training sessions (days) followed by one session of transfer. Figures 11a through 11c illustrate a scenario characterized by complete specificity (no initial transfer) and no alteration in the general learning rate. In contrast, Figures 11d through 11f depict a scenario of complete specificity (no initial transfer) but faster general learning. The magnitude of transfer estimated from observed performance curves is contingent upon the learning rates (2c vs. 2f). This example underscores the critical role of an accurate estimation of the general learning rate, which necessitates a fine-grained analysis, in correctly estimating initial transfer. Task switches may have an impact on all the component processes, potentially introducing contamination into empirical initial transfer estimates. As demonstrated in our study, precise estimation of component processes can only be achieved through high temporal resolution analysis, emphasizing the need for such an approach to accurately estimate initial transfer.

Figure 11.

View Original Download Slide

(a, d) The trial-by-trial generative model of the learning curve (black curve) for five sessions (days) of training followed by one session in a transfer task. In both rows, there is complete specificity. In the top row (a, b, c) the general learning rate is the same in the initial and transfer tasks; in the bottom row (d, e, f) the general learning rate is faster for the transfer task. The observed immediate transfer depends on the post-transfer learning rate and cannot be estimated accurately if measured at coarse temporal resolution without accurate estimate of the learning rate. (learning curve: black; general learning: yellow, between-session forgetting: purple, within-session re-learning: olive, and within-session adaptation: orange)

The best-fitting generative model not only serves as a basis for understanding the learning curve but also offers a parametric functional form that can be used to construct parametric HBMs for estimating trial-by-trial learning curves. Traditional parametric approaches to learning curve estimation, such as exponential or power functions, are typically applied individually to subjects and are based on relatively simplistic functional forms. The function form with multiple component processes derived from a nonparametric HBM analysis, coupled with the HBM framework, has the potential to enhance these parametric approaches.

Furthermore, the multicomponent functional form of the learning curve can enhance adaptive testing procedures. In prior work, we developed the adaptive qCD method, employing exponential functions to assess perceptual sensitivity changes (Zhao et al., 2019). This method demonstrated superior accuracy and precision in estimating trial-by-trial learning curves compared to traditional staircase procedures. The functional form developed within the nonparametric HBM framework can augment and further enhance the qCD method.

In the current article, we assumed that the covariance matrix in the HBM is proportional to the covariance matrix estimated from the HBMv solutions and only estimated the proportional constant. With this assumption, it took one and two weeks to compute the HBM at 20 trials/block and 10 trials/block, respectively. The approximation was necessary because estimating large covariance matrices (97 × 97 with 20 trials/block; 193 × 193 with 10 trials/block) with the current computation power within a reasonable time is not feasible. We demonstrated that incorporating the covariance structure greatly improved the precision of the estimated thresholds compared to the HBMv. With more computational power, it may become possible to estimate the covariance structure within the HBM directly and further improve the solutions. The consistency in linear regression and quantification of the components across small block sizes (L = 10, 20 and 40) suggests that the current solutions might be close to the optimal.

The MCMC sampling algorithm in JAGS is based on a sequential process. Therefore the analysis may not run much faster on a high-performance computing cluster. However, we expect that the parallel MCMC sampling methods currently under active development may reduce computation time. It would also potentially allow us to compute the covariance matrices in the HBM directly.

Although we have developed the method within the context of perceptual learning, the nonparametric HBM framework is versatile and can be applied to the study of learning curves in various learning domains. It can also be harnessed to improve estimates of human performance parameters, such as d′, response time, and threshold, across multiple time points in learning or longitudinal studies, or across diverse experimental conditions, such as different spatial frequencies in contrast sensitivity function (CSF) tests or varying temporal frequencies in assessments of temporal modulation functions.

In summary, the nonparametric HBM offers a powerful tool to significantly increase the temporal resolution of learning curves and unveil crucial component processes in perceptual learning. It provides a versatile framework to generate accurate and precise estimates of human performance in experiments with hierarchical designs.

Acknowledgments

Supported by the National Eye Institute (EY017491 and EY032125).

Commercial relationships: Y. Zhao, None; J. Liu, None; B.A. Dosher, None; Z.-L. Lu, Adaptive Sensory Technology, Inc. (I), Jiangsu Juehua Medical Technology Co, LTD (I).

Corresponding author: Zhong-Lin Lu.

Email: zhonglin@nyu.edu.

Address: 4 Washington Place, Center for Neural Science, New York University, New York, NY 10003, USA.

Footnotes

1 The test level is an essential component of the general HBM framework, enabling the modeling of repeated tests. We have retained it in the development to maintain continuity with our previous work and to enable researchers to fit and test the HBM using split data (e.g., interleaved staircases) in perceptual learning.

References

Ahn, W.-Y., Krawitz, A., Kim, W., Busmeyer, J. R., & Brown, J. W. (2011). A model-based fMRI analysis with hierarchical Bayesian parameter estimation. Journal of Neuroscience, Psychology, and Economics, 4(2), 95–110. [CrossRef] [PubMed]

Ando, T. (2007). Bayesian predictive information criterion for the evaluation of hierarchical Bayesian and empirical Bayes models. Biometrika, 94(2), 443–458, https://doi.org/10.1093/biomet/asm017. [CrossRef]

Ando, T. (2011). Predictive Bayesian model selection. American Journal of Mathematical and Management Sciences, 31(1–2), 13–38, https://doi.org/10.1080/01966324.2011.10737798.

Ball, K., & Sekuler, R. (1982). A specific and enduring improvement in visual motion discrimination. Science, 218(4573), 697–698. [CrossRef] [PubMed]

Bang, J. W., Shibata, K., Frank, S. M., Walsh, E. G., Greenlee, M. W., Watanabe, T., & Sasaki, Y. (2018). Consolidation and reconsolidation share behavioural and neurochemical mechanisms. Nature Human Behaviour, 2(7), Article 7, https://doi.org/10.1038/s41562-018-0366-8. [CrossRef]

Beard, B. L., Levi, D. M., & Reich, L. N. (1995). Perceptual-learning in parafoveal vision. Vision Research, 35(12), 1679–1690. [PubMed]

Bejjanki, V. R., Zhang, R., Li, R., Pouget, A., Green, C. S., Lu, Z.-L., … Bavelier, D. (2014). Action video game play facilitates the development of better perceptual templates. Proceedings of the National Academy of Sciences, 111(47), 16961–16966, https://doi.org/10.1073/pnas.1417056111.

Censor, N., Harris, H., & Sagi, D. (2016). A dissociation between consolidated perceptual learning and sensory adaptation in vision. Scientific Reports, 6(1), Article 1, https://doi.org/10.1038/srep38819.

Censor, N., Karni, A., & Sagi, D. (2006). A link between perceptual learning, adaptation and sleep. Vision Research, 46(23), 4071–4074, https://doi.org/10.1016/j.visres.2006.07.022. [PubMed]

Clayton, D., & Hills, M. (1993). Statistical models in epidemiology. Oxford, UK: Oxford University Press.

Dale, G., Cochrane, A., & Green, C. S. (2021). Individual difference predictors of learning and generalization in perceptual learning. Attention, Perception, & Psychophysics, 83, 2241–2255, https://doi.org/10.3758/s13414-021-02268-3. [PubMed]

Donovan, I., Szpiro, S., & Carrasco, M. (2015). Exogenous attention facilitates location transfer of perceptual learning. Journal of Vision, 15(10), 11–11. [PubMed]

Dosher, B. A., & Lu, Z.-L. (1998). Perceptual learning reflects external noise filtering and internal noise reduction through channel reweighting. Proceedings of the National Academy of Sciences, 95(23), 13988–13993, https://doi.org/10.1073/pnas.95.23.13988.

Dosher, B. A., & Lu, Z.-L. (2007). The functional form of performance improvements in perceptual learning learning rates and transfer. Psychological Science, 18(6), 531–539, https://doi.org/10.1111/j.1467-9280.2007.01934.x. [PubMed]

Dosher, B. A., & Lu, Z.-L. (2020). Perceptual Learning: How Experience Shapes Visual Perception. Cambridge, MA: MIT Press.

Edwards, W., Lindman, H., & Savage, L. J. (1963). Bayesian statistical inference for psychological research. Psychological Review, 70(3), 193–242, https://doi.org/10.1037/h0044139.

Fahle, M., Edelman, S., & Poggio, T. (1995). Fast perceptual learning in hyperacuity. Vision Research, 35(21), 3003–3013. [PubMed]

Fahle, M., & Morgan, M. (1996). No transfer of perceptual learning between similar stimuli in the same retinal position. Current Biology, 6(3), 292–297.

Fahle, M., & Poggio, T. (2002). Perceptual Learning. Cambridge, MA: MIT Press.

Fiorentini, A., & Berardi, N. (1980). Perceptual learning specific for orientation and spatial frequency. Nature, 287, 43–44. [PubMed]

Foley, J. M., & Legge, G. E. (1981). Contrast detection and near-threshold discrimination in human vision. Vision Research, 21(7), 1041–1053, https://doi.org/10.1016/0042-6989(81)90009-2. [PubMed]

Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7(4), 457–472.

Green, C. S., Banai, K., Lu, Z., & Bavelier, D. (2018). Perceptual learning. Stevens’ Handbook of Experimental Psychology and Cognitive Neuroscience, 2, 1–47.

Hoffman, R. R., Ward, P., Feltovich, P. J., DiBello, L., Fiore, S. M., & Andrews, D. H. (2013). Accelerated Expertise: Training for High Proficiency in a Complex World. New York, NY: Psychology Press.

Hou, F., Lesmes, L. A., Kim, W., Gu, H., Pitt, M. A., Myung, J. I., … Lu, Z.-L. (2016). Evaluating the performance of the quick CSF method in detecting contrast sensitivity function changes. Journal of Vision, 16(6), 18, https://doi.org/10.1167/16.6.18. [PubMed]

Hou, F., Lesmes, L., Bex, P., Dorr, M., & Lu, Z.-L. (2015). Using 10AFC to further improve the efficiency of the quick CSF method. Journal of Vision, 15(9), 2, https://doi.org/10.1167/15.9.2. [PubMed]

Huang, C.-B., Zhou, Y., & Lu, Z.-L. (2008). Broad bandwidth of perceptual learning in the visual system of adults with anisometropic amblyopia. Proceedings of the National Academy of Sciences, 105(10), 4068–4073, https://doi.org/10.1073/pnas.0800824105.

Huxlin, K. R. (2009). Perceptual relearning of complex visual motion after V1 damage in humans. Journal of Neuroscience, 29(13), 3981–3991, https://doi.org/10.1523/JNEUROSCI.4882-08.2009.

Karni, A., & Sagi, D. (1991). Where practice makes perfect in texture discrimination: Evidence for primary visual cortex plasticity. Proceedings of the National Academy of Sciences, 88(11), 4966–4970.

Karni, A., Tanne, D., Rubenstein, B. S., Askenasy, J. J. M., & Sagi, D. (1994). Dependence on rem-sleep of overnight improvement of a perceptual skill. Science, 265(5172), 679–682. [PubMed]

Kattner, F., Cochrane, A., Cox, C. R., Gorman, T. E., & Green, C. S. (2017). Perceptual learning generalization from sequential perceptual training as a change in learning rate. Current Biology: CB, 27(6), 840–846, https://doi.org/10.1016/j.cub.2017.01.046. [PubMed]

Kattner, F., Cochrane, A., & Green, C. S. (2017). Trial-dependent psychometric functions accounting for perceptual learning in 2-AFC discrimination tasks. Journal of Vision, 17(11), 3–3, https://doi.org/10.1167/17.11.3. [PubMed]

Kesten, H. (1958). Accelerated stochastic approximation. The Annals of Mathematical Statistics, 29(1), 41–59, https://doi.org/10.1214/aoms/1177706705.

Kim, W., Pitt, M. A., Lu, Z.-L., Steyvers, M., & Myung, J. I. (2014). A hierarchical adaptive approach to optimal experimental design. Neural Computation, 26(11), 2465–2492, https://doi.org/10.1162/NECO_a_00654. [PubMed]

Kruschke, J. K. (2014). Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan. (2nd ed.). Cambridge, MA: Academic Press.

Kruschke, J. K., & Liddell, T. M. (2018). Bayesian data analysis for newcomers. Psychonomic Bulletin & Review, 25(1), 155–177, https://doi.org/10.3758/s13423-017-1272-1. [PubMed]

Lagarias, J. C., Reeds, J. A., Wright, M. H., & Wright, P. E. (1998). Convergence properties of the Nelder—Mead simplex method in low dimensions. SIAM Journal on Optimization, 9(1), 112–147, https://doi.org/10.1137/S1052623496303470.

Laurent, G., Stopfer, M., Friedrich, R. W., Rabinovich, M. I., Volkovskii, A., & Abarbanel, H. D. (2001). Odor encoding as an active, dynamical process: Experiments, computation, and theory. Annual Review of Neuroscience, 24(1), 263–297. [PubMed]

Lee, M. D. (2006). A hierarchical bayesian model of human decision-making on an optimal stopping problem. Cognitive Science, 30(3), 1–26, https://doi.org/10.1207/s15516709cog0000_69. [PubMed]

Legge, G. E., Kersten, D., & Burgess, A. E. (1987). Contrast discrimination in noise. Journal of the Optical Society of America. A, Optics and Image Science, 4(2), 391–404, https://doi.org/10.1364/josaa.4.000391. [PubMed]

Lesmes, L. A., Lu, Z.-L., Baek, J., & Albright, T. D. (2010). Bayesian adaptive estimation of the contrast sensitivity function: The quick CSF method. Journal of Vision, 10(3), 17.1–21, https://doi.org/10.1167/10.3.17. [PubMed]

Levi, D. M. (2020). Rethinking amblyopia 2020. Vision Research, 176, 118–129. [PubMed]

Liu, J., Lu, Z.-L., & Dosher, B. A. (2010). Augmented Hebbian reweighting: Interactions between feedback and training accuracy in perceptual learning. Journal of Vision, 10(10), 29–29, https://doi.org/10.1167/10.10.29. [PubMed]

Liu, J., Lu, Z.-L., & Dosher, B. A. (2012). Mixed training at high and low accuracy levels leads to perceptual learning without feedback. Vision Research, 61, 15–24, https://doi.org/10.1016/j.visres.2011.12.002. [PubMed]

Liu, Z. L., & Weinshall, D. (2000). Mechanisms of generalization in perceptual learning. Vision Research, 40(1), 97–109. [PubMed]

Lu, Z. L., & Dosher, B. A. (1999). Characterizing human perceptual inefficiencies with equivalent internal noise. Journal of the Optical Society of America A-Optics Image Science and Vision, 16(3), 764–778, https://doi.org/10.1364/JOSAA.16.000764. [PubMed]

Lu, Z.-L., & Dosher, B. A. (2013). Visual psychophysics: From laboratory to theory. Cambridge, MA: MIT Press.

Lu, Z.-L., & Dosher, B. A. (2022). Current directions in visual perceptual learning. Nature Reviews Psychology, 1(11), 654–668, https://doi.org/10.1038/s44159-022-00107-2. [PubMed]

Lu, Z.-L., Hua, T., Huang, C.-B., Zhou, Y., & Dosher, B. A. (2011). Visual perceptual learning. Neurobiology of Learning and Memory, 95(2), 145–151. [PubMed]

Maniglia, M., Visscher, K. M., & Seitz, A. R. (2021). Perspective on vision science-informed interventions for central vision loss. Frontiers in Neuroscience, 15, 734970. [PubMed]

Mascetti, L., Muto, V., Matarazzo, L., Foret, A., Ziegler, E., Albouy, G., ... Balteau, E. (2013). The impact of visual perceptual learning on sleep and local slow-wave initiation. Journal of Neuroscience, 33(8), 3323–3331, https://doi.org/10.1523/JNEUROSCI.0763-12.2013.

McDevitt, E. A., Rokem, A., Silver, M. A., & Mednick, S. C. (2014). Sex differences in sleep-dependent perceptual learning. Vision Research, 99, 172–179, https://doi.org/10.1016/j.visres.2013.10.009. [PubMed]

Merkle, E. C., Smithson, M., & Verkuilen, J. (2011). Hierarchical models of simple mechanisms underlying confidence in decision making. Journal of Mathematical Psychology, 55(1), 57–67, https://doi.org/10.1016/j.jmp.2010.08.011.

Palestro, J. J., Bahg, G., Sederberg, P. B., Lu, Z.-L., Steyvers, M., & Turner, B. M. (2018). A tutorial on joint models of neural and behavioral measures of cognition. Journal of Mathematical Psychology, 84, 20–48, https://doi.org/10.1016/j.jmp.2018.03.003.

Petrov, A. A., Dosher, B. A., & Lu, Z.-L. (2005). The dynamics of perceptual learning: An incremental reweighting model. Psychological Review, 112(4), 715–743. [PubMed]

Petrov, A. A., Van Horn, N. M., & Ratcliff, R. (2011). Dissociable perceptual-learning mechanisms revealed by diffusion-model analysis. Psychonomic Bulletin & Review, 18(3), 490–497. [PubMed]

Plummer, M. (2003). JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. Proceedings of the 3rd International Workshop on Distributed Statistical Computing, https://www.r-project.org/conferences/DSC-2003/.

Poggio, T., Fahle, M., & Edelman, S. (1992). Fast perceptual learning in visual hyperacuity. Science, 256(5059), 1018–1021. [PubMed]

Polat, U., Ma-Naim, T., Belkin, M., & Sagi, D. (2004). Improving vision in adult amblyopia by perceptual learning. Proceedings of the National Academy of Sciences, 101(17), 6692–6697.

Prins, N. (2024). Easy, bias-free Bayesian hierarchical modeling of the psychometric function using the Palamedes Toolbox. Behavior Research Methods, 56, 485–499, https://doi.org/10.3758/s13428-023-02061-0. [PubMed]

Proulx, M. J., Brown, D. J., Pasqualotto, A., & Meijer, P. (2014). Multisensory perceptual learning and sensory substitution. Neuroscience & Biobehavioral Reviews, 41, 16–25.

R Core Team. (2003). R: A language and environment for statistical computing [Computer software]. R Foundation for Statistical Computing. https://www.R-project.org/.

Rouder, J. N., & Lu, J. (2005). An introduction to Bayesian hierarchical models with an application in the theory of signal detection. Psychonomic Bulletin & Review, 12(4), 573–604, https://doi.org/10.3758/bf03196750. [PubMed]

Rouder, J. N., Sun, D. C., Speckman, P. L., Lu, J., & Zhou, D. (2003). A hierarchical Bayesian statistical framework for response time distributions. Psychometrika, 68(4), 589–606, https://doi.org/10.1007/BF02295614.

Sagi, D. (2011a). Perceptual learning in vision research. Vision Research, 51(13), 1552–1566. [PubMed]

Sagi, D. (2011b). Perceptual learning in vision research. Vision Research, 51(13), 1552–1566, https://doi.org/10.1016/j.visres.2010.10.019. [PubMed]

Sasaki, Y., & Watanabe, T. (2015). Visual perceptual learning and sleep. In Kansaku, K., Cohen, L. G., & Birbaumer, N. (Eds.), Clinical Systems Neuroscience (pp. 343–357). Springer Japan, https://doi.org/10.1007/978-4-431-55037-2_19.

Shams, L., & Seitz, A. R. (2008). Benefits of multisensory learning. Trends in Cognitive Sciences, 12(11), 411–417. [PubMed]

Shibata, K., Sasaki, Y., Bang, J. W., Walsh, E. G., Machizawa, M. G., Tamaki, M., ... Watanabe, T. (2017). Overlearning hyperstabilizes a skill by rapidly making neurochemical processing inhibitory-dominant. Nature Neuroscience, 20(3), Article 3, https://doi.org/10.1038/nn.4490.

Song, M., Behmanesh, I., Moaveni, B., & Papadimitriou, C. (2020). Accounting for modeling errors and inherent structural variability through a hierarchical Bayesian model updating approach: An overview. Sensors, 20(14), 3874, https://doi.org/10.3390/s20143874.

Stickgold, R., Mednick, S., Cantero, J. L., Atienza, M., Pathak, N., & Nakayama, K. (2002). Power napping and burnout: The restorative effect of naps after perceptual learning. Sleep, 25, A518–A519.

Tamaki, M., Berard, A. V., Barnes-Diana, T., Siegel, J., Watanabe, T., & Sasaki, Y. (2020). Reward does not facilitate visual perceptual learning until sleep occurs. Proceedings of the National Academy of Sciences, 117(2), 959–968, https://doi.org/10.1073/pnas.1913079117.

Tamaki, M., Wang, Z., Barnes-Diana, T., Guo, D., Berard, A. V., Walsh, E., ... Sasaki, Y. (2020). Complementary contributions of non-REM and REM sleep to visual learning. Nature Neuroscience, 23(9), Article 9, https://doi.org/10.1038/s41593-020-0666-y.

Tamaki, M., Wang, Z., Watanabe, T., & Sasaki, Y. (2019). Trained-feature–specific offline learning by sleep in an orientation detection task. Journal of Vision, 19(12), 12, https://doi.org/10.1167/19.12.12. [PubMed]

Watson, A. B., & Pelli, D. G. (1983). Quest: A Bayesian adaptive psychometric method. Perception & Psychophysics, 33(2), 113–120, https://doi.org/10.3758/BF03202828. [PubMed]

Wilson, J. D., Cranmer, S., & Lu, Z.-L. (2020). A hierarchical latent space network model for population studies of functional connectivity. Computational Brain & Behavior, 3, 384–399, https://doi.org/10.1007/s42113-020-00080-0.

Wright, B. A., & Zhang, Y. (2009). A review of the generalization of auditory learning. Philosophical Transactions of the Royal Society B: Biological Sciences, 364(1515), 301–311.

Xiao, L.-Q., Zhang, J.-Y., Wang, R., Klein, S. A., Levi, D. M., & Yu, C. (2008). Complete transfer of perceptual learning across retinal locations enabled by double training. Current Biology, 18(24), 1922–1926.

Yang, J., Yan, F.-F., Chen, L., Fan, S., Wu, Y., Jiang, L., ... Huang, C.-B. (2022). Identifying long- and short-term processes in perceptual learning. Psychological Science, 33(5), 830–843, https://doi.org/10.1177/09567976211056620. [PubMed]

Yin, J., Qin, R., Sargent, D. J., Erlichman, C., & Shi, Q. (2018). A hierarchical Bayesian design for randomized Phase II clinical trials with multiple groups. Journal of Biopharmaceutical Statistics, 28(3), 451–462, https://doi.org/10.1080/10543406.2017.1321007. [PubMed]

Yotsumoto, Y., Sasaki, Y., Chan, P., Vasios, C. E., Bonmassar, G., Ito, N., ... Watanabe, T. (2009). Location-specific cortical activation changes during sleep after training for perceptual learning. Current Biology: CB, 19(15), 1278–1282, https://doi.org/10.1016/j.cub.2009.06.011. [PubMed]

Yu, D., Cheung, S.-H., Legge, G. E., & Chung, S. T. (2010). Reading speed in the peripheral visual field of older adults: Does it benefit from perceptual learning? Vision Research, 50(9), 860–869. [PubMed]

Zenger-Landolt, B., & Fahle, M. (2001). Discriminating contrast discontinuities: Asymmetries, dipper functions, and perceptual learning. Vision Research, 41(23), 3009–3021. [PubMed]

Zhang, P., Zhao, Y., Dosher, B. A., & Lu, Z.-L. (2019a). Assessing the detailed time course of perceptual sensitivity change in perceptual learning. Journal of Vision, 19(5), 9, https://doi.org/10.1167/19.5.9. [PubMed]

Zhang, P., Zhao, Y., Dosher, B., & Lu, Z.-L. (2019b). Evaluating the performance of the staircase and qCD methods in measuring specificity/transfer of perceptual learning. Journal of Vision, 19(10), 29, https://doi.org/10.1167/19.10.29. [PubMed]

Zhang, R.-Y., Chopin, A., Shibata, K., Lu, Z.-L., Jaeggi, S. M., Buschkuehl, M., ... Bavelier, D. (2021). Action video game play facilitates “learning to learn.” Communications Biology, 4(1), Article 1, https://doi.org/10.1038/s42003-021-02652-7.

Zhao, Y., Lesmes, L. A., Dorr, M., & Lu, Z.-L. (2021). Quantifying uncertainty of the estimated visual acuity behavioral function with hierarchical Bayesian modeling. Translational Vision Science & Technology, 10(12), 18, https://doi.org/10.1167/tvst.10.12.18. [PubMed]

Zhao, Y., Lesmes, L. A., Dorr, M., & Lu, Z.-L. (2023a). Collective endpoint of visual acuity and contrast sensitivity function from hierarchical Bayesian joint modeling. Journal of Vision, 23(6), 13, https://doi.org/10.1167/jov.23.6.13. [PubMed]

Zhao, Y., Lesmes, L. A., Dorr, M., & Lu, Z.-L. (2023b). Non-parametric hierarchical Bayesian modeling enables statistical inference on contrast sensitivity at individual spatial frequencies. Investigative Ophthalmology & Visual Science, 64(8), 4988.

Zhao, Y., Lesmes, L. A., Dorr, M., & Lu, Z.-L. (2023c). Non-parametric hierarchical Bayesian modeling of the contrast sensitivity function. Journal of Vision, 23(9), 5312, https://doi.org/10.1167/jov.23.9.5312.

Zhao, Y., Lesmes, L. A., Hou, F., & Lu, Z.-L. (2021). Hierarchical Bayesian modeling of contrast sensitivity functions in a within-subject design. Journal of Vision, 21(12), 9, https://doi.org/10.1167/jov.21.12.9. [PubMed]

Zhao, Y., Lesmes, L., & Lu, Z.-L. (2019). Efficient assessment of the time course of perceptual sensitivity change. Vision Research, 154, 21–43, https://doi.org/10.1016/j.visres.2018.10.009. [PubMed]

Zhou, Y., Huang, C., Xu, P., Tao, L., Qiu, Z., Li, X., … Lu, Z.-L. (2006). Perceptual learning improves contrast sensitivity and visual acuity in adults with anisometropic amblyopia. Vision Research, 46(5), 739–750, https://doi.org/10.1016/j.visres.2005.07.031. [PubMed]

Figure 1.

View Original Download Slide

(a) The trial-by-trial generative model of the learning curve (black curve) across six sessions, comprising four latent component processes: general learning (yellow), between-session forgetting (purple), within-session re-learning (olive), and within-session adaptation (orange). The predicted learning curves (black) are shown for block averaging with different sizes: 20 trials (b), 40 trials (c), 80 trials (d), 160 trials (e), and 320 trials (f). These curves are based on averaging the true latent components from the generative model for each of these block sizes.

Figure 2.

View Original Download Slide

Average standard deviation (SD) of the estimated block thresholds across all blocks and subjects as a function of block size using data from Liu et al. (2010), Liu et al. (2012).

Figure 3.

View Original Download Slide

Effects of prior and data on the estimated posterior distribution of log₁₀ threshold with 10, 20, 40, and 80 trials of data in Bayesian inference. (a) An uninformative uniform prior and estimated posterior distributions. (b) An informative concentrated prior and estimated posterior distributions, and (c) A biased prior and estimated posterior distributions. The dotted vertical line indicates the true contrast threshold.

Figure 4.

View Original Download Slide

(a) Psychometric functions: Four psychometric functions are parameterized with different θ_ijk values but share the same slope β. These functions serve as the generative model for the analysis. β is set as the same across all subjects, tests, and blocks. (b) The BIP is used to compute the threshold distribution in each block of each test for each subject independently. θ_ijk represents the threshold for subject i in block k of test j. (c) The HBM estimates the joint distribution of thresholds across subjects, tests, and blocks. It utilizes mean μ and covariance Σ hyperparameters for the population and mean ρ_ik and standard deviation ε hyperparameters for individual subjects in each block k. ε is assumed to be the same for all subjects and blocks.

Figure 5.

View Original Download Slide

Illustrations of posterior distributions of threshold hyperparameters for two consecutive blocks (k = 86, 87) at the population level at L = 20 from the HBM (a); hyperparameters at the subject level of a typical subject (i = 27) in blocks (k = 86, 87) from the HBM (b); and parameters at the test level from the HBM (c) and BIP (d) of the same subject and blocks in (b).

Figure 6.

View Original Download Slide

Results of the linear regression analysis on the average learning curves of Group 6 (high training accuracy with feedback) using the BIP and HBM solutions with block sizes of L= (a)10, (b) 20, (c) 40, (d) 80, (e)160, and (f) 320 trials. Error bars: standard error.

Figure 7.

View Original Download Slide

Distributions of the average learning rate (γ_mg) across six block sizes of the six groups: Group 1 (low training accuracy without feedback; solid line), Group 2 (low training accuracy with feedback; dashed line), Group 3 (mixed training accuracy without feedback; dotted line:), Group 4 (mixed training accuracy with feedback; dot-dash line), Group 5 (high training accuracy without feedback; long dash line), and Group 6 (high training accuracy with feedback; two-dash line). γ_mg distributions of Groups 3, 4, 5 and 6 overlap almost completely and appear as the dark orange distributions on the left.

Figure 8.

View Original Download Slide

Prior distributions of θ_i1k for two subjects in two distinct blocks (i = 39, 42; k = 23, 44) generated by (a) BIP and (b) HBM. In (a), all four priors are identical. In (b), priors for the two subjects in the two blocks differ (solid line: i = 39, k = 23; dashed line: i = 39, k = 44; dotted line: i = 42, k = 23; dot-dashed line: i = 42, k = 44).

Figure 9.

View Original Download Slide

(a) Prior and (b) posterior distributions of θ_i1k for subject (i = 39) in block (k = 44) obtained from the BIP and HBM.

Figure 10.

View Original Download Slide

The best fitting models (solid curves) for six block sizes: (a) 10, (b) 20, (c) 40, (d) 80, (e) 160, (f) 320 (data: orange asterisks).

Figure 11.

View Original Download Slide

(a, d) The trial-by-trial generative model of the learning curve (black curve) for five sessions (days) of training followed by one session in a transfer task. In both rows, there is complete specificity. In the top row (a, b, c) the general learning rate is the same in the initial and transfer tasks; in the bottom row (d, e, f) the general learning rate is faster for the transfer task. The observed immediate transfer depends on the post-transfer learning rate and cannot be estimated accurately if measured at coarse temporal resolution without accurate estimate of the learning rate. (learning curve: black; general learning: yellow, between-session forgetting: purple, within-session re-learning: olive, and within-session adaptation: orange)

Table 1.

View Table

BPIC values for the BIP and HBM.

Table 2.

View Table

Average standard deviations of the posterior distributions of θ_i1k.

Table 3.

View Table

Average learning rate \({{\rm{\bar \gamma }}_{\rm{g}}}\) and standard deviation \({\widehat {{\rm{SD}}}_{{{{\rm{\bar \gamma }}}_{\rm{g}}}}}\).

Table 4.

View Table

Average bias of the estimated thresholds from the BIP and HBM.

Table 5.

View Table

Parameters of the best-fitting models. (General learning rate γ, initial threshold b, within-session adaptation or deterioration φ_s, between-session forgetting or consolidation δ_s, within-session rapid relearning rate τ and asymptotic level d_s).

Table 6.

View Table

Parameters of the best-fitting models with L = 40 with three lapse rates.

Jump To...

Related Articles

From Other Journals

Related Topics

Jump To...

This feature is available to authenticated users only.

Related Articles

From Other Journals

Related Topics

To View More...

You must be signed into an individual account to use this feature.