Daumé III (2009) have mostly been considered for sparse binary-valued features, but not for
dense real-valued features such as those used in neural networks. In this paper, we
describe simple neural extensions of these techniques. First, we propose a natural
generalization of the feature augmentation method that uses K+ 1 LSTMs where one model
captures global patterns across all K domains and the remaining K models capture domain …