M Simon, Y Gao, T Darrell, J Denzler… - … on Computer Vision …, 2017 - ieeexplore.ieee.org
Most recent CNN architectures use average pooling as a final feature encoding step. In the
field of fine-grained recognition, however, recent global representations like bilinear pooling …