DINO Read

Questions

Multi-crop training.
Caron, Mathilde, et al. “Unsupervised learning of visual features by contrasting cluster assignments.” Advances in neural information processing systems 33 (2020): 9912-9924.
Use a noise contrastive estimator (NCE) to compare instances instead of classifying them.
Wu, Zhirong, et al. “Unsupervised feature learning via non-parametric instance discrimination.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
Mean Teacher self-distillation.
Tarvainen, Antti, and Harri Valpola. “Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results.” Advances in neural information processing systems 30 (2017).
knowledge distillation.
Buciluǎ, Cristian, Rich Caruana, and Alexandru Niculescu-Mizil. “Model compression.” Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. 2006.
Extra learnable token in ViT.
Dosovitskiy, Alexey. “An image is worth 16x16 words: Transformers for image recognition at scale.” arXiv preprint arXiv:2010.11929 (2020).
Devlin, Jacob. “Bert: Pre-training of deep bidirectional transformers for language understanding.” arXiv preprint arXiv:1810.04805 (2018).
Attention mechanism.
Bahdanau, Dzmitry. “Neural machine translation by jointly learning to align and translate.” arXiv preprint arXiv:1409.0473 (2014).