Video Summarization has become an important tool for multimedia information processing, but the automatic evaluation of a video summarization system remains a challenge. A major issue is that an ideal "best" summary does not exist, although people can easily distinguish "good" from "bad" summaries. A similar situation arise in machine translation and text summarization, where specific automatic procedures, respectively BLEU and ROUGE, evaluate the quality of a candidate by comparing its local similarities with several human-generated references. These procedures are now routinely used in various benchmarks. In this paper, we extend this idea to the video domain and propose the VERT (Video Evaluation by Relevant Threshold) algorithm to automatically evaluate the quality of video summaries. VERT mimics the theories of BLEU and ROUGE, and counts the weighted number of overlapping selected units between the computer-generated video summary and several human-made references. Several variants of VERT are suggested and compared.