Monaural speech separation has remained a very challenging problem for a longtime which can be addressed using a supervised learning approach that uses features of the noisy input to predict an accurate time-frequency mask. Effective acoustic phonetic features can help in the accurate mask prediction at low Signal-to-Noise Ratios (SNRs). Individual features capture specific attributes of the audio signal; therefore, it’s essential to employ a set of features. This work examines different combinations of monaural features as input and ideal ratio mask a straining target to the DNN model. Feature combination sets are constructed by examining single features and then combining the most relevant ones. The results are evaluated for different feature combinations under non-stationary noises at low SNR levels. The feature performance is evaluated by using intelligibility and quality measures. A combination of two features is considered the best feature combination as it indicates a significant increase in speech intelligibility as compared to individual features and combinations consisting of more than two features.