Title | 一种基于集成学习和样本重要性的模型训练方法及系统 |
Alternative Title | Model training method and system based on ensemble learning and sample importance
|
Author | |
First Inventor | 李淑娴
|
Original applicant | 南方科技大学
; 华为技术有限公司
|
First applicant | 南方科技大学
|
Address of First applicant | 518055 广东省深圳市南山区桃源街道学苑大道1088号
|
Current applicant | 南方科技大学
; 华为技术有限公司
|
Address of Current applicant | 518055 广东省深圳市南山区桃源街道学苑大道1088号 (广东,深圳,南山区)
|
First Current Applicant | 南方科技大学
|
Address of First Current Applicant | 518055 广东省深圳市南山区桃源街道学苑大道1088号 (广东,深圳,南山区)
|
Application Number | CN202210446487.0
|
Application Date | 2022-04-26
|
Open (Notice) Number | CN115063623A
|
Date Available | 2022-09-16
|
Status of Patent | 实质审查
|
Legal Date | 2022-10-04
|
Subtype | 发明申请
|
SUSTech Authorship | First
|
Abstract | 本发明公开了一种基于集成学习和样本重要性的模型训练方法及系统,方法包括:在模型训练前,获取训练样本,并基于训练样本所对应的类别信息,确定训练样本所对应的不平衡因子;获取训练样本的密度信息,并基于密度信息,确定训练样本所对应的密度因子;在模型训练过程中,基于子分类器,确定训练样本被分到各个类别的概率,并基于训练样本被分到各个类别的概率确定样本的边界因子;基于不平衡因子、密度因子以及边界因子,确定训练样本的样本重要性信息,并将样本重要性信息融入集成学习的模型训练中,以完成模型训练。本发明可使得模型在训练过程中更关注小类样本、分布在主要区域的样本以及分类难度更高的样本,以提高模型预测的性能。 |
Other Abstract | The invention discloses a model training method and system based on ensemble learning and sample importance, and the method comprises the steps: obtaining a training sample before model training, and determining an imbalance factor corresponding to the training sample based on the class information corresponding to the training sample; acquiring density information of the training sample, and determining a density factor corresponding to the training sample based on the density information; in the model training process, based on the sub-classifiers, determining the probability that the training samples are classified into each category, and based on the probability that the training samples are classified into each category, determining boundary factors of the samples; and on the basis of the imbalance factor, the density factor and the boundary factor, determining sample importance information of the training sample, and fusing the sample importance information into model training of integrated learning to complete model training. According to the method, the model can pay more attention to subclass samples, samples distributed in a main region and samples with higher classification difficulty in the training process, so that the prediction performance of the model is improved. |
CPC Classification Number | G06V10/764
; G06N20/20
|
IPC Classification Number | G06V10/764
; G06N20/20
|
INPADOC Legal Status | (ENTRY INTO FORCE OF REQUEST FOR SUBSTANTIVE EXAMINATION)[2022-10-04][CN]
|
INPADOC Patent Family Count | 1
|
Extended Patent Family Count | 1
|
Priority date | 2022-04-26
|
Patent Agent | 徐凯凯
|
Agency | 深圳市君胜知识产权代理事务所(普通合伙)
|
URL | [Source Record] |
Data Source | PatSnap
|
Document Type | Patent |
Identifier | http://kc.sustech.edu.cn/handle/2SGJ60CL/533599 |
Department | Department of Computer Science and Engineering |
Recommended Citation GB/T 7714 |
李淑娴,宋丽妍,姚新,等. 一种基于集成学习和样本重要性的模型训练方法及系统.
|
Files in This Item: | There are no files associated with this item. |
|
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment