中文版 | English
Title

Third-Party Library Dependency for Large-Scale SCA in the C/C++ Ecosystem: How Far Are We?

Author
Corresponding AuthorZhang,Yuqun
DOI
Publication Years
2023-07-12
Source Title
Pages
1383-1395
Abstract
Existing software composition analysis (SCA) techniques for the C/C++ ecosystem tend to identify the reused components through feature matching between target software project and collected third-party libraries (TPLs). However, feature duplication caused by internal code clone can cause inaccurate SCA results. To mitigate this issue, Centris, a state-of-the-art SCA technique for the C/C++ ecosystem, was proposed to adopt function-level code clone detection to derive the TPL dependencies for eliminating the redundant features before performing SCA tasks. Although Centris has been shown effective in the original paper, the accuracy of the derived TPL dependencies is not evaluated. Additionally, the dataset to evaluate the impact of TPL dependency on SCA is limited. To further investigate the efficacy and limitations of Centris, we first construct two large-scale ground-truth datasets for evaluating the accuracy of deriving TPL dependency and SCA results respectively. Then we extensively evaluate Centris where the evaluation results suggest that the accuracy of TPL dependencies derived by Centris may not well generalize to our evaluation dataset. We further infer the key factors that degrade the performance can be the inaccurate function birth time and the threshold-based recall. In addition, the impact on SCA from the TPL dependencies derived by Centris can be somewhat limited. Inspired by our findings, we propose TPLite with function-level origin TPL detection and graph-based dependency recall to enhance the accuracy of TPL reuse detection in the C/C++ ecosystem. Our evaluation results indicate that TPLite effectively increases the precision from 35.71% to 88.33% and the recall from 49.44% to 62.65% of deriving TPL dependencies compared with Centris. Moreover, TPLite increases the precision from 21.08% to 75.90% and the recall from 57.62% to 64.17% compared with the SOTA academic SCA tool B2SFinder and even outperforms the well-adopted commercial SCA tool BDBA, i.e., increasing the precision from 72.46% to 75.90% and the recall from 58.55% to 64.17%.
Keywords
SUSTech Authorship
First ; Corresponding
Language
English
URL[Source Record]
Scopus EID
2-s2.0-85167705720
Data Source
Scopus
Citation statistics
Cited Times [WOS]:0
Document TypeConference paper
Identifierhttp://kc.sustech.edu.cn/handle/2SGJ60CL/559843
DepartmentSouthern University of Science and Technology
Affiliation
1.Southern University of Science and Technology,Shenzhen,China
2.Tencent Security Keen Lab,Shanghai,China
3.Research Institute of Trustworthy Autonomous Systems,Shenzhen,China
4.Guangdong Provincial Key Laboratory of Brain-inspired Intelligent Computation,China
First Author AffilicationSouthern University of Science and Technology
Corresponding Author AffilicationSouthern University of Science and Technology
First Author's First AffilicationSouthern University of Science and Technology
Recommended Citation
GB/T 7714
Jiang,Ling,Yuan,Hengchen,Tang,Qiyi,et al. Third-Party Library Dependency for Large-Scale SCA in the C/C++ Ecosystem: How Far Are We?[C],2023:1383-1395.
Files in This Item:
There are no files associated with this item.
Related Services
Fulltext link
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Export to Excel
Export to Csv
Altmetrics Score
Google Scholar
Similar articles in Google Scholar
[Jiang,Ling]'s Articles
[Yuan,Hengchen]'s Articles
[Tang,Qiyi]'s Articles
Baidu Scholar
Similar articles in Baidu Scholar
[Jiang,Ling]'s Articles
[Yuan,Hengchen]'s Articles
[Tang,Qiyi]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Jiang,Ling]'s Articles
[Yuan,Hengchen]'s Articles
[Tang,Qiyi]'s Articles
Terms of Use
No data!
Social Bookmark/Share
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.