performance in downstream tasks. This geometry depends on the structure of the inputs, the
structure of the target outputs, and the architecture of the network. By studying the learning
dynamics of networks with one hidden layer, we discovered that the network's activation
function has an unexpectedly strong impact on the representational geometry: Tanh
networks tend to learn representations that reflect the structure of the target outputs, while …