FeatOut

Remove features to avoid shortcut learning

I had an idea for tackling shortcut learning with neural networks. Shortcut learning refers to the problem that the network just learns to classify samples based on few features, e.g. distinguishing dogs and cats only by their ears. These models fail to generalize (e.g. when the ears are not visible in the test set). The idea was to use gradient-based methods for detecting which features the model currently uses, and then to blur these features and retrain. I started a code base and did some initial tests, and we gave it to students in a one-day hackathon, but I would still like to explore this idea more in depth.

Overview of our approach

Resources: