particularly in a few and zero-shot manner. As a consequence, researchers have been
investigating both the kind of information these networks learn and how such information
can be encoded in the parameters of the model. We survey the literature on changes in the
network during training, drawing from work outside of NLP when necessary, and on learned
representations of linguistic features in large language models. We note in particular the …