Model reproducibility is a point of emphasis for the National Institutes of Health (NIH) and in science, broadly. As the use of computational modeling in biomechanics and orthopedics grows, so does the need to assess the reproducibility of modeling workflows and simulation predictions. The long-term goal of the KneeHub project is to understand the influence of potentially subjective decisions, thus the modeler's “art”, on the reproducibility and predictive uncertainty of computational knee joint models. In this paper, we report on the model calibration phase of this project, during which five teams calibrated computational knee joint models of the same specimens from the same specimen-specific joint mechanics dataset. We investigated model calibration approaches and decisions, and compared calibration workflows and model outcomes among the teams. The selection of the calibration targets used in the calibration workflow differed greatly between the teams and was influenced by modeling decisions related to the representation of structures, and considerations for computational cost and implementation of optimization. While calibration improved model performance, differences in the postcalibration ligament properties and predicted kinematics were quantified and discussed in the context of modeling decisions. Even for teams with demonstrated expertise, model calibration is difficult to foresee and plan in detail, and the results of this study underscore the importance of identification and standardization of best practices for data sharing and calibration.