The huge variance of human pose and the misalign-ment of detected human images significantly increase the difficulty of pedestrian image matching in person Re-Identification (Re-ID). Moreover, the massive visual data being produced by surveillance video cameras requires highly efficient person Re-ID systems. Targeting to solve the first problem, this work proposes a robust and discriminative pedestrian image descriptor, namely, the Global-Local-Alignment Descriptor (GLAD). For the second problem, this work treats person Re-ID as image retrieval and proposes an efficient indexing and retrieval framework. GLAD explicitly leverages the local and global cues in the human body to generate a discriminative and robust representation. It consists of part extraction and descriptor learning modules, where several part regions are first detected and then deep neural networks are designed for representation learning on both the local and global regions. A hierarchical indexing and retrieval framework is designed to perform offline relevance mining to eliminate the huge person ID redundancy in the gallery set, and accelerate the online Re-ID procedure. Extensive experimental results on widely used public benchmark datasets show GLAD achieves competitive accuracy compared to the state-of-the-art methods. On a large-scale person, with the Re-ID dataset containing more than 520 K images, our retrieval framework significantly accelerates the online Re-ID procedure while also improving Re-ID accuracy. Therefore, this work has the potential to work better on person Re-ID tasks in real scenarios.