Third-Party Library Dependency for Large-Scale SCA in the C/C++ Ecosystem: How Far Are We?
Existing software composition analysis (SCA) techniques for the C/C++ ecosystem tend to identify the reused components through feature matching between target software project and collected third-party libraries (TPLs). However, feature duplication caused by internal code clone can cause inaccurate SCA results. To mitigate this issue, Centris, a state-of-the-art SCA technique for the C/C++ ecosystem, was proposed to adopt function-level code clone detection to derive the TPL dependencies for eliminating the redundant features before performing SCA tasks. Although Centris has been shown effective in the original paper, the accuracy of the derived TPL dependencies is not evaluated. Additionally, the dataset to evaluate the impact of TPL dependency on SCA is limited. To further investigate the efficacy and limitations of Centris, we first construct two large-scale ground-truth datasets for evaluating the accuracy of deriving TPL dependency and SCA results respectively. Then we extensively evaluate Centris where the evaluation results suggest that the accuracy of TPL dependencies derived by Centris may not well generalize to our evaluation dataset. We further infer the key factors that degrade the performance can be the inaccurate function birth time and the threshold-based recall. In addition, the impact on SCA from the TPL dependencies derived by Centris can be somewhat limited. Inspired by our findings, we propose TPLite with function-level origin TPL detection and graph-based dependency recall to enhance the accuracy of TPL reuse detection in the C/C++ ecosystem. Our evaluation results indicate that TPLite effectively increases the precision from 35.71% to 88.33% and the recall from 49.44% to 62.65% of deriving TPL dependencies compared with Centris. Moreover, TPLite increases the precision from 21.08% to 75.90% and the recall from 57.62% to 64.17% compared with the SOTA academic SCA tool B2SFinder and even outperforms the well-adopted commercial SCA tool BDBA, i.e., increasing the precision from 72.46% to 75.90% and the recall from 58.55% to 64.17%.
First ; Corresponding
Cited Times [WOS]:0
|Document Type||Conference paper|
|Department||Southern University of Science and Technology|
1.Southern University of Science and Technology,Shenzhen,China
2.Tencent Security Keen Lab,Shanghai,China
3.Research Institute of Trustworthy Autonomous Systems,Shenzhen,China
4.Guangdong Provincial Key Laboratory of Brain-inspired Intelligent Computation,China
|First Author Affilication||Southern University of Science and Technology|
|Corresponding Author Affilication||Southern University of Science and Technology|
|First Author's First Affilication||Southern University of Science and Technology|
Jiang，Ling,Yuan，Hengchen,Tang，Qiyi,et al. Third-Party Library Dependency for Large-Scale SCA in the C/C++ Ecosystem: How Far Are We?[C],2023:1383-1395.
|Files in This Item:||There are no files associated with this item.|
|Recommend this item|
|Export to Endnote|
|Export to Excel|
|Export to Csv|
|Similar articles in Google Scholar|
|Similar articles in Baidu Scholar|
|Similar articles in Bing Scholar|
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.