Anomaly detection in surveillance videos, as a special case of video-based action recognition, is an important topic in multimedia community and public security. Currently, most of the state-of-the-art methods utilize deep learning to recognize the patterns of anomaly or action. However, whether deep neural networks really learn the essence of the anomaly or just remember the background is an important but often neglected problem. In this paper, we develop a series of experiments to validate the existence of background-bias phenomenon, which makes deep networks tend to learn the background information rather than the pattern of anomalies to recognize abnormal behavior. To solve it, we first re-annotate the largest anomaly detection dataset and design a new evaluation metric to measure whether the models really learn the essence of anomalies. Then, we propose an end-to-end trainable, anomaly-area guided framework, where we design a novel region loss to explicitly drive the network to learn where is anomalous region. Besides, given very deep networks and scarce training data for anomaly, our architecture is trained with a meta learning module to prevent severe overfitting. Extensive experiments on the benchmark show that our approach outperforms other methods on both the previous and our proposed evaluation metrics through reducing the influence of the background information.