中文版 | English
Title

一种基于集成学习和样本重要性的模型训练方法及系统

Alternative Title
Model training method and system based on ensemble learning and sample importance
Author
First Inventor
李淑娴
Original applicant
南方科技大学 ; 华为技术有限公司
First applicant
南方科技大学
Address of First applicant
518055 广东省深圳市南山区桃源街道学苑大道1088号
Current applicant
南方科技大学 ; 华为技术有限公司
Address of Current applicant
518055 广东省深圳市南山区桃源街道学苑大道1088号 (广东,深圳,南山区)
First Current Applicant
南方科技大学
Address of First Current Applicant
518055 广东省深圳市南山区桃源街道学苑大道1088号 (广东,深圳,南山区)
Application Number
CN202210446487.0
Application Date
2022-04-26
Open (Notice) Number
CN115063623A
Date Available
2022-09-16
Status of Patent
实质审查
Legal Date
2022-10-04
Subtype
发明申请
SUSTech Authorship
First
Abstract
本发明公开了一种基于集成学习和样本重要性的模型训练方法及系统,方法包括:在模型训练前,获取训练样本,并基于训练样本所对应的类别信息,确定训练样本所对应的不平衡因子;获取训练样本的密度信息,并基于密度信息,确定训练样本所对应的密度因子;在模型训练过程中,基于子分类器,确定训练样本被分到各个类别的概率,并基于训练样本被分到各个类别的概率确定样本的边界因子;基于不平衡因子、密度因子以及边界因子,确定训练样本的样本重要性信息,并将样本重要性信息融入集成学习的模型训练中,以完成模型训练。本发明可使得模型在训练过程中更关注小类样本、分布在主要区域的样本以及分类难度更高的样本,以提高模型预测的性能。
Other Abstract
The invention discloses a model training method and system based on ensemble learning and sample importance, and the method comprises the steps: obtaining a training sample before model training, and determining an imbalance factor corresponding to the training sample based on the class information corresponding to the training sample; acquiring density information of the training sample, and determining a density factor corresponding to the training sample based on the density information; in the model training process, based on the sub-classifiers, determining the probability that the training samples are classified into each category, and based on the probability that the training samples are classified into each category, determining boundary factors of the samples; and on the basis of the imbalance factor, the density factor and the boundary factor, determining sample importance information of the training sample, and fusing the sample importance information into model training of integrated learning to complete model training. According to the method, the model can pay more attention to subclass samples, samples distributed in a main region and samples with higher classification difficulty in the training process, so that the prediction performance of the model is improved.
CPC Classification Number
G06V10/764 ; G06N20/20
IPC Classification Number
G06V10/764 ; G06N20/20
INPADOC Legal Status
(ENTRY INTO FORCE OF REQUEST FOR SUBSTANTIVE EXAMINATION)[2022-10-04][CN]
INPADOC Patent Family Count
1
Extended Patent Family Count
1
Priority date
2022-04-26
Patent Agent
徐凯凯
Agency
深圳市君胜知识产权代理事务所(普通合伙)
URL[Source Record]
Data Source
PatSnap
Document TypePatent
Identifierhttp://kc.sustech.edu.cn/handle/2SGJ60CL/533599
DepartmentDepartment of Computer Science and Engineering
Recommended Citation
GB/T 7714
李淑娴,宋丽妍,姚新,等. 一种基于集成学习和样本重要性的模型训练方法及系统.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Export to Excel
Export to Csv
Altmetrics Score
Google Scholar
Similar articles in Google Scholar
[李淑娴]'s Articles
[宋丽妍]'s Articles
[姚新]'s Articles
Baidu Scholar
Similar articles in Baidu Scholar
[李淑娴]'s Articles
[宋丽妍]'s Articles
[姚新]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[李淑娴]'s Articles
[宋丽妍]'s Articles
[姚新]'s Articles
Terms of Use
No data!
Social Bookmark/Share
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.