People who have hearing loss can communicate through sign language. The sign language will subsequently be converted into text, specifically alphabets. This paper presents the specifically designed Convolutional Neural Network (CNN) is composed of eleven layers, each carefully calibrated using exact parameters. These layers encompass convolution, activation, max-pooling, and flattening processes, ultimately leading to dense layers that incorporate dropout regularization. The last dense layer employs softmax activation. In the initial phase, the MNIST dataset undergoes preprocessing steps. Next, numerous essential characteristics of the preprocessed hand gesture image are computed. The ASL employs 24 classes, representing the letters A to Y (without J and Z) in the alphabet. However, there are no cases available for 9 (representing J) or 25 (representing Z) due to the involvement of gesture motions associated with these letters. According to MNIST dataset, total number of training examples are 34,627 images which is divided in to 80% for training and 20% testing, all sharing a common size of 28 × 28 pixels image include gray scale image value 0 to 255. The results demonstrate that the suggested approaches yielded favorable results, achieving a recognition accuracy of the classification 98.75%.