The acoustic analysis helps to discriminate emotions according to non-verbal information, while linguistics aims to capture verbal information from written sources. Acoustic and linguistic analyses can be addressed for different applications, where information related to emotions, mood, or affect are involved. The Arousal-Valence plane is commonly used to model emotional states in a multidimensional space. This study proposes a methodology focused on modeling the user’s state based on the Arousal-Valence plane in different scenarios. Acoustic and linguistic information are used as input to feed different deep learning architectures mainly based on convolutional and recurrent neural networks, which are trained to model the Arousal-Valence plane. The proposed approach is used for the evaluation of customer satisfaction in call-centers and for health-care applications in the assessment of depression in Parkinson’s disease and the discrimination of Alzheimer’s disease. F-scores of up to 0.89 are obtained for customer satisfaction, of up to 0.82 for depression in Parkinson’s patients, and of up to 0.80 for Alzheimer’s patients. The proposed approach confirms that there is information embedded in the Arousal-Valence plane that can be used for different purposes.