The objective of this paper is to come up with a scalable modular hardware solution for real-time Face Recognition (FR) on large databases. Existing hardware solutions use algorithms with low recognition accuracy suitable for real-time response. In addition, database size for these solutions is limited by on-chip resources making them unsuitable for practical real-time applications. Due to high computational complexity we do not choose algorithms in literature with superior recognition accuracy. Instead, we come up with a combination of Weighted Modular Principle Component Analysis (WMPCA) and Radial Basis Function Neural Network (RBFNN) which outperforms algorithms used in existing hardware solutions on highly illumination and pose variant face databases. We propose a hardware solution for real-time FR which uses parallel streams to perform independent modular computations. A salient feature of proposed hardware solution is that we store a major part of data on off-chip memory in a novel format, so that latencies experienced accessing off-chip memory does not impact performance. This enables us to work on databases of very large sizes. To test functional correctness, the proposed architecture is synthesized and tested on Virtex-6 LX550T FPGA. This emulated system is able to perform 450 recognitions per second on images of size 128 × 128 with 450 classes.