grained Chinese characters and the great imbalance over classes, which causes a serious
overfitting problem. We propose to apply Maximum Entropy Regularization to regularize the
training process, which is to simply add a negative entropy term to the canonical cross-
entropy loss without any additional parameters and modification of a model. We theoretically
give the convergence probability distribution and analyze how the regularization influence …