Convolutional neural networks (CNNs) can automatically learn features from the hyperspectral image (HSI) data, avoiding the difficulty of manually extracting features. However, the number of training samples for the classification of HSIs is always limited, making it difficult for CNN to obtain effective features and resulting in low classification accuracy. To solve this problem, a pixel cluster CNN and spectral-spatial fusion (SSF) algorithm for hyperspectral image classification with small-size training samples is proposed in this article. First, spatial information is extracted by the gray level co-occurrence matrix. Then, spatial information and spectral information are fused by means of bands superposition, forming spectral-spatial features. To expand the number of training samples, the pixels after SSF are combined into pixel clusters according to a certain rule. Finally, a CNN framework is utilized to extract effective features from the pixel clusters. Experiments based on three standard HSIs demonstrate that the proposed algorithm can get better performance than the conventional CNN and also outperforms other studied algorithms in the case of small training set.