中文版 | English
Title

An extensive study on pre-trained models for program understanding and generation

Author
Corresponding AuthorZhang,Yuqun
DOI
Publication Years
2022-07-18
Source Title
Pages
39-51
Abstract
Automatic program understanding and generation techniques could significantly advance the productivity of programmers and have been widely studied by academia and industry. Recently, the advent of pre-trained paradigm enlightens researchers to develop general-purpose pre-trained models which can be applied for a broad range of program understanding and generation tasks. Such pre-trained models, derived by self-supervised objectives on large unlabelled corpora, can be fine-tuned in downstream tasks (such as code search and code generation) with minimal adaptations. Although these pre-trained models claim superiority over the prior techniques, they seldom follow equivalent evaluation protocols, e.g., they are hardly evaluated on the identical benchmarks, tasks, or settings. Consequently, there is a pressing need for a comprehensive study of the pre-trained models on their effectiveness, versatility as well as the limitations to provide implications and guidance for the future development in this area. To this end, we first perform an extensive study of eight open-access pre-trained models over a large benchmark on seven representative code tasks to assess their reproducibility. We further compare the pre-trained models and domain-specific state-of-the-art techniques for validating pre-trained effectiveness. At last, we investigate the robustness of the pre-trained models by inspecting their performance variations under adversarial attacks. Through the study, we find that while we can in general replicate the original performance of the pre-trained models on their evaluated tasks and adopted benchmarks, subtle performance fluctuations can refute the findings in their original papers. Moreover, none of the existing pre-trained models can dominate over all other models. We also find that the pre-trained models can significantly outperform non-pre-trained state-of-the-art techniques in program understanding tasks. Furthermore, we perform the first study for natural language-programming language pre-trained model robustness via adversarial attacks and find that a simple random attack approach can easily fool the state-of-the-art pre-trained models and thus incur security issues. At last, we also provide multiple practical guidelines for advancing future research on pre-trained models for program understanding and generation.
Keywords
SUSTech Authorship
First ; Corresponding
Language
English
URL[Source Record]
Indexed By
EI Accession Number
20223512667335
EI Keywords
Deep learning ; Natural language processing systems
ESI Classification Code
Ergonomics and Human Factors Engineering:461.4 ; Data Processing and Image Processing:723.2
Scopus EID
2-s2.0-85134872006
Data Source
Scopus
Citation statistics
Cited Times [WOS]:0
Document TypeConference paper
Identifierhttp://kc.sustech.edu.cn/handle/2SGJ60CL/401636
DepartmentSouthern University of Science and Technology
Affiliation
1.Southern University of Science and Technology,China
2.Southern University of Science and Technology,Hong Kong Polytechnic University,China
3.Kwai,China
4.Hong Kong Polytechnic University,China
5.University of Illinois at Urbana-Champaign,United States
6.Research Institute of Trustworthy Autonomous Systems,Shenzhen,China
First Author AffilicationSouthern University of Science and Technology
Corresponding Author AffilicationSouthern University of Science and Technology
First Author's First AffilicationSouthern University of Science and Technology
Recommended Citation
GB/T 7714
Zeng,Zhengran,Tan,Hanzhuo,Zhang,Haotian,et al. An extensive study on pre-trained models for program understanding and generation[C],2022:39-51.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Export to Excel
Export to Csv
Altmetrics Score
Google Scholar
Similar articles in Google Scholar
[Zeng,Zhengran]'s Articles
[Tan,Hanzhuo]'s Articles
[Zhang,Haotian]'s Articles
Baidu Scholar
Similar articles in Baidu Scholar
[Zeng,Zhengran]'s Articles
[Tan,Hanzhuo]'s Articles
[Zhang,Haotian]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Zeng,Zhengran]'s Articles
[Tan,Hanzhuo]'s Articles
[Zhang,Haotian]'s Articles
Terms of Use
No data!
Social Bookmark/Share
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.