Target output distribution and distribution of bias for statistical model validation given a limited number of test data

MY Moon, KK Choi, D Lamb - Structural and Multidisciplinary Optimization, 2019 - Springer
Structural and Multidisciplinary Optimization, 2019Springer
Simulation model must be validated with experimental data to correctly predict the outputs of
engineered systems before they can be used with confidence. While doing so, pointwise
comparison between predicted output by simulation model and experimental data for model
verification and validation (V&V) is not appropriate since real-world phenomena are not
deterministic due to existence of irreducible uncertainty. Thus, the output prediction by a
simulation model needs to be represented by a certain probability density function (PDF) …
Abstract
Simulation model must be validated with experimental data to correctly predict the outputs of engineered systems before they can be used with confidence. While doing so, pointwise comparison between predicted output by simulation model and experimental data for model verification and validation (V&V) is not appropriate since real-world phenomena are not deterministic due to existence of irreducible uncertainty. Thus, the output prediction by a simulation model needs to be represented by a certain probability density function (PDF). Statistical model validation methods are necessary to compare the model prediction and physical test data. The validation of a simulation model entails the acquisition of extraordinarily detailed test data, which is expensive to generate, and practicing engineers can afford only a very limited number of test data. This paper proposes an effective method to validate simulation model by using a target output distribution, which closely approximates the true output distribution. Furthermore, the proposed target output distribution accounts for a biased simulation model with stochastic outputs—specifically, simulation output distribution—using limited numbers of input and output test data. Since limited test data may involve outlier or be sparse, a data quality checking process is proposed to determine whether a given output test data needs to be balanced. If necessary, stratified sampling using cluster analysis is employed to sample balanced test data. Next, Bayesian analysis is used to obtain many possible candidates of target output distributions, from which the one at the posterior median is selected. Then, the distribution of bias can be identified using Monte Carlo convolution. Three engineering examples are used to demonstrate that (1) the developed target output distribution closely approximates the true output distribution and is robust under different sets of test data; (2) the reallocated test dataset by a quality checking process and balance sampling leads to better matching to the true output distribution; and (3) the distribution of bias is effectively used to understand the model’s accuracy and model confidence for comparison study.
Springer
以上显示的是最相近的搜索结果。 查看全部搜索结果