中文版 | English
Title

基于自监督学习的小样本图像识别研究

Alternative Title
A RESEARCH OF FEW-SHOT IMAGE RECOGNITION BASED ON SELF-SUPERVISED LEARNING
Author
Name pinyin
LI Haoquan
School number
12032492
Degree
硕士
Discipline
0809 电子科学与技术
Subject category of dissertation
08 工学
Supervisor
张建国
Mentor unit
计算机科学与工程系
Publication Years
2023-05-13
Submission date
2023-06-27
University
南方科技大学
Place of Publication
深圳
Abstract

深度学习的成功很大程度上取决于数据规模和模型容量的扩展,然而,对数据的极度依赖也成为了制约其广泛应用和发展的瓶颈。小样本学习旨在降低深度学习对数据的依赖程度,使模型能够通过少量人工标注的样本更好的拟合新的任务,其中小样本图像识别是小样本学习的主要研究方向之一。本文聚焦于图像的局部特征,提出了一个全新的训练框架——TransVLAD,证明了基于图像掩码的自监督预训练可以缓解Transformer模型在通用小样本数据集上的过拟合。全面超越了由卷积神经网络(Convolutional Neural NetworkCNN)作为特征提取器的方法,为小样本学习的研究发展开辟了新的思路。

TransVLAD模型由一个标准的Transformer编码器和一个局部特征聚合模块(NeXtVLAD)组成。经过MAEMasked Autoencoder)自监督预训练,编码器能够生成图像的局部特征。NeXtVLAD模块进一步聚合这些局部特征,使得同类特征的分布更加紧密、不同类特征的分布更加分散,这有效提高了基于度量的小样本分类效果。小样本学习还存在两个本质的特征偏置问题:监督偏置和简单特征偏置。针对监督偏置问题,本文提出微调阶段的随机输入掩码可以使模型更加公平地对待不同的局部特征,选用70%的掩码率不仅提高了小样本分类准确率还提升了三倍的训练速度;针对简单特征的偏置问题,本文提出的软焦点损失函数提高了对难分特征的关注程度。

实验结果表明,我们的方法在5个常用小样本数据集的10个衡量基准上都取得了当期最优的准确率,实现了平均2%以上的提升,并在跨域小样本学习领域也取得了良好的效果。

Keywords
Language
Chinese
Training classes
独立培养
Enrollment Year
2020
Year of Degree Awarded
2023-06
References List

[1] RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding with unsupervised learning[M]. Technical report, OpenAI, 2018.
[2] RADFORD A, WU J, CHILD R, et al. Language models are unsupervised multitask learners[J]. OpenAI blog, 2019, 1(8): 9.
[3] BROWN T, MANN B, RYDER N, et al. Language models are few-shot learners[J]. Advances in Neural Information Processing Systems, 2020, 33: 1877-1901.
[4] TURING A M. Mind[J]. Mind, 1950, 59(236): 433-460.
[5] HOSPEDALES T, ANTONIOU A, MICAELLI P, et al. Meta-learning in neural networks: A survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 44(9): 5149-5169.
[6] REN M, TRIANTAFILLOU E, RAVI S, et al. Meta-learning for semi-supervised few-shot classification[M]//International Conference on Learning Representations. 2018.
[7] BERTINETTO L, HENRIQUES J F, TORR P H, et al. Meta-learning with differentiable closed-form solvers[M]//International Conference on Learning Representations. 2019.
[8] ORESHKIN B, RODRÍGUEZ LÓPEZ P, LACOSTE A. Tadam: Task dependent adaptive metric for improved few-shot learning[J]. Advances in Neural Information Processing Systems, 2018, 31.
[9] WAH C, BRANSON S, WELINDER P, et al. The caltech-ucsd birds-200-2011 dataset[M]. California Institute of Technology, 2011.
[10] LIN R, XIAO J, FAN J. Nextvlad: An efficient neural network to aggregate frame-level features for large-scale video classification[C]//Proceedings of the European Conference on Computer Vision Workshops. 2018: 0.
[11] JÉGOU H, DOUZE M, SCHMID C, et al. Aggregating local descriptors into a compact image representation[C]//2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, 2010: 3304-3311.
[12] SRIVASTAVA N, HINTON G, KRIZHEVSKY A, et al. Dropout: A simple way to prevent neural networks from overfitting[J]. Journal of Machine Learning Research, 2014, 15(1): 1929-1958.
[13] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2017: 2980-2988.
[14] RAVI S, LAROCHELLE H. Optimization as a model for few-shot learning[C]//International Conference on Learning Representations. 2017.
[15] ZHANG R, CHE T, GHAHRAMANI Z, et al. Metagan: An adversarial approach to few-shot learning[J]. Advances in Neural Information Processing Systems, 2018, 31.
[16] VINYALS O, BLUNDELL C, LILLICRAP T, et al. Matching networks for one shot learning[J]. Advances in Neural Information Processing Systems, 2016, 29.
[17] CHEN Y, LIU Z, XU H, et al. Meta-baseline: Exploring simple meta-learning for fewshot learning[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 9062-9071.
[18] TIAN Y, WANG Y, KRISHNAN D, et al. Rethinking few-shot image classification: A good embedding is all you need?[C]//European Conference on Computer Vision. Springer, 2020: 266-282.
[19] WANG Y, CHAO W L, WEINBERGER K Q, et al. Simpleshot: Revisiting nearest-neighbor classification for few-shot learning[A]. 2019. arXiv:1911.04623.
[20] MÜLLER R, KORNBLITH S, HINTON G E. When does label smoothing help?[J]. Advances in Neural Information Processing Systems, 2019, 32.
[21] ZHANG H, CISSE M, DAUPHIN Y N, et al. Mixup: Beyond empirical risk minimization[M]//International Conference on Learning Representations. 2018.
[22] YUN S, HAN D, OH S J, et al. Cutmix: Regularization strategy to train strong classifiers with localizable features[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 6023-6032.
[23] SNELL J, SWERSKY K, ZEMEL R. Prototypical networks for few-shot learning[J]. Advances in Neural Information Processing Systems, 2017, 30.
[24] SUNG F, YANG Y, ZHANG L, et al. Learning to compare: Relation network for few-shot learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018: 1199-1208.
[25] YANG S, LIU L, XU M. Free lunch for few-shot learning: Distribution calibration[M]//International Conference on Learning Representations. 2021.
[26] ZHANG B, LI X, YE Y, et al. Prototype completion with primitive knowledge for few-shot learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 3754-3762.
[27] ZHANG C, CAI Y, LIN G, et al. Deepemd: Few-shot image classification with differentiable earth mover’s distance and structured classifiers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 12203-12213.
[28] LIU Y, ZHANG W, XIANG C, et al. Learning to affiliate: Mutual centralized learning for few-shot classification[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 14411-14420.
[29] WANG D, CHENG Y, YU M, et al. A hybrid approach with optimization-based and metricbased meta-learner for few-shot learning[J]. Neurocomputing, 2019, 349: 202-211.
[30] FINN C, ABBEEL P, LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks[C]//International Conference on Machine Learning. Proceedings of Machine Learning Research, 2017: 1126-1135.
[31] NICHOL A, ACHIAM J, SCHULMAN J. On first-order meta-learning algorithms[A]. 2018. arXiv:1803.02999.
[32] LEE K, MAJI S, RAVICHANDRAN A, et al. Meta-learning with differentiable convex optimization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 10657-10665.
[33] CHENG G, CAI L, LANG C, et al. SPNet: Siamese-prototype network for few-shot remote sensing image scene classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 60: 1-11.
[34] HARIHARAN B, GIRSHICK R. Low-shot visual recognition by shrinking and hallucinating features[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2017: 3018-3027.
[35] GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial networks[J]. Communications of the ACM, 2020, 63(11): 139-144.
[36] TRIANTAFILLOU E, ZHU T, DUMOULIN V, et al. Meta-dataset: A dataset of datasets for learning to learn from few examples[M]//International Conference on Learning Representations. 2020.
[37] REQUEIMA J, GORDON J, BRONSKILL J, et al. Fast and flexible multi-task classification using conditional neural adaptive processes[J]. Advances in Neural Information Processing Systems, 2019, 32.
[38] BATENI P, GOYAL R, MASRANI V, et al. Improved few-shot visual classification[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020:14493-14502.
[39] DOERSCH C, GUPTA A, ZISSERMAN A. Crosstransformers: Spatially-aware few-shot transfer[J]. Advances in Neural Information Processing Systems, 2020, 33: 21981-21993.
[40] RUSSAKOVSKY O, DENG J, SU H, et al. Imagenet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3): 211-252.
[41] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2016: 770-778.
[42] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[J]. Advances in Neural Information Processing Systems, 2017, 30.
[43] SCHUSTER M, PALIWAL K K. Bidirectional recurrent neural networks[J]. IEEE transactions on Signal Processing, 1997, 45(11): 2673-2681.
[44] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
[45] LAKE B M, SALAKHUTDINOV R, TENENBAUM J B. Human-level concept learningthrough probabilistic program induction[J]. Science, 2015, 350(6266): 1332-1338.
[46] HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2017: 4700-4708.
[47] HOWARD A G, ZHU M, CHEN B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[A]. 2017. arXiv:1704.04861.
[48] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words:Transformers for image recognition at scale[M]//International Conference on Learning Representations. 2021.
[49] CHEN H, LI H, LI Y, et al. Sparse spatial transformers for few-shot learning[A]. 2021. arXiv:2109.12932.
[50] LIU L, HAMILTON W, LONG G, et al. A universal representation transformer layer for few-shot image classification[M]//International Conference on Learning Representations. 2021.
[51] HE K, FAN H, WU Y, et al. Momentum contrast for unsupervised visual representation learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 9729-9738.
[52] CHEN T, KORNBLITH S, NOROUZI M, et al. A simple framework for contrastive learning of visual representations[C]//International Conference on Machine Learning. Proceedings of Machine Learning Research, 2020: 1597-1607.
[53] CHEN X, XIE S, HE K. An empirical study of training self-supervised vision transformers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 9640-9649.
[54] DEVLIN J, CHANG M W, LEE K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[M]//American Chapter of the Association for Computational Linguistics–Human Language Technologies. 2018.
[55] BAO H, DONG L, WEI F. Beit: Bert pre-training of image transformers[M]//International Conference on Learning Representations. 2022.
[56] HE K, CHEN X, XIE S, et al. Masked autoencoders are scalable vision learners[M]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.
[57] EL-NOUBY A, IZACARD G, TOUVRON H, et al. Are Large-scale Datasets Necessary for Self-Supervised Pre-training?[M]//Computing Research Repository. 2021.
[58] MANGLA P, KUMARI N, SINHA A, et al. Charting the right manifold: Manifold mixup for few-shot learning[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2020: 2218-2227.
[59] CHEN Z, GE J, ZHAN H, et al. Pareto self-supervised training for few-shot learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021:13663-13672.
[60] LOSHCHILOV I, HUTTER F. Decoupled weight decay regularization[M]//International Conference on Learning Representations. 2019.
[61] CHEN M, RADFORD A, CHILD R, et al. Generative pretraining from pixels[C]//International Conference on Machine Learning. Proceedings of Machine Learning Research, 2020: 1691-1703.
[62] LOSHCHILOV I, HUTTER F. Sgdr: Stochastic gradient descent with warm restarts[M]//International Conference on Learning Representations. 2016.
[63] GOYAL P, DOLLÁR P, GIRSHICK R, et al. Accurate, large minibatch sgd: Training imagenet in 1 hour[A]. 2017. arXiv:1706.02677.
[64] CUBUK E D, ZOPH B, SHLENS J, et al. Randaugment: Practical automated data augmentation with a reduced search space[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 2020: 702-703.
[65] HUANG G, SUN Y, LIU Z, et al. Deep networks with stochastic depth[C]//European Conference on Computer Vision. Springer, 2016: 646-661.
[66] VAN DER MAATEN L, HINTON G. Visualizing data using t-SNE.[J]. Journal of Machine Learning Research, 2008, 9(11).
[67] LIU P, YUAN W, FU J, et al. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing[A]. 2021. arXiv:2107.13586.
[68] LEE K, MAJI S, RAVICHANDRAN A, et al. Meta-learning with differentiable convex optimization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 10657-10665.
[69] DHILLON G S, CHAUDHARI P, RAVICHANDRAN A, et al. A baseline for few-shot image classification[M]//International Conference on Learning Representations. 2020.
[70] KANG D, KWON H, MIN J, et al. Relational embedding for few-shot classification[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 8822-8833.
[71] XIE J, LONG F, LV J, et al. Joint distribution matters: Deep brownian distance covariance for few-shot classification[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 7972-7981.
[72] HILLER M, MA R, HARANDI M, et al. Rethinking generalization in few-shot classification[C]//Advances in Neural Information Processing Systems. 2022.

Academic Degree Assessment Sub committee
电子科学与技术
Domestic book classification number
TP18
Data Source
人工提交
Document TypeThesis
Identifierhttp://kc.sustech.edu.cn/handle/2SGJ60CL/544093
DepartmentDepartment of Computer Science and Engineering
Recommended Citation
GB/T 7714
李淏泉. 基于自监督学习的小样本图像识别研究[D]. 深圳. 南方科技大学,2023.
Files in This Item:
File Name/Size DocType Version Access License
12032492-李淏泉-计算机科学与工(9034KB) Restricted Access--Fulltext Requests
Related Services
Fulltext link
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Export to Excel
Export to Csv
Altmetrics Score
Google Scholar
Similar articles in Google Scholar
[李淏泉]'s Articles
Baidu Scholar
Similar articles in Baidu Scholar
[李淏泉]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[李淏泉]'s Articles
Terms of Use
No data!
Social Bookmark/Share
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.