We propose a set of features derived from skeleton tracking of the human body and depth maps for the purpose of action recognition. The descriptors proposed are easy to implement, produce relatively small-sized feature sets, and the multi-class classification scheme is fast and suitable for real-time applications. We intuitively characterize actions using pairwise affinities between view-invariant joint angles features over the performance of an action. Additionally, a new descriptor for spatio-temporal feature extraction from color and depth images is introduced. This descriptor involves an application of a modified histogram of oriented gradients (HOG) algorithm. The application produces a feature set at every frame, and these features are collected into a 2D array which then the same algorithm is applied to again (the approach is termed HOG^ 2). Both feature sets are evaluated in a bag-of-words scheme using a linear SVM, showing state-of-the-art results on public datasets from different domains of human-computer interaction.