Typical technique in knowledge distillation (KD) is regularizing the learning of a limited capacity model (student) by pushing its responses to match a powerful model's (teacher) …
Known as the key to the success of many neural networks (CNNs), convolutional blocks serve as local feature extractors. Yet, explicit supervision of intermediate layers becomes a …