M Cho,
S Adya,
D Naik - Advances in Neural Information …, 2024 - proceedings.neurips.cc
… w is destined to be pruned for some reason, instead of having a new parameter to denote
"to-prune", PDP lets SGD gradually make w itself smaller relatively against other parameters in …