Methods
Data from a cross-sectional study were used in this study. A total of 9078 subjects who met the inclusion criteria were recruited. 70% of these subjects (6358 cases) were randomly allocated to the training dataset for the constructing of the decision-tree. The remaining 30% (2720 cases) were used as the testing dataset to evaluate the performance of decision-tree. Two models were evaluated in this study. In model I, age, gender, body mass index, marital status, level of education, occupation status, depression and anxiety status, physical activity level, smoking status, LDL, TG, TC, FBG, uric acid and hs-CRP were considered as input variables and in model II, age, gender, WBC, RBC, HGB, HCT MCV, MCH, PLT, RDW and PDW were considered as input variables. The validation of the model was assessed by constructing a receiver operating characteristic (ROC) curve.
Results
The prevalence rates of hypertension were 32% in our population. For the decision-tree model I, the accuracy, sensitivity, specificity and area under the ROC curve (AUC) value for identifying the related risk factors of hypertension were 73%, 63%, 77% and 0.72, respectively. The corresponding values for model II were 70%, 61%, 74% and 0.68, respectively.