中文版 | English
Title

高维拟似然模型下的广义似然比检验

Alternative Title
GENERALIZED LIKELIHOOD RATIO TESTS IN HIGH DIMENSIONAL QUASI-LIKELIHOOD MODELS
Author
Name pinyin
WANG Haofeng
School number
11749262
Degree
博士
Discipline
0701 数学
Subject category of dissertation
07 理学
Supervisor
蒋学军
Mentor unit
统计与数据科学系
Publication Years
2022-10-22
Submission date
2023-01-03
University
哈尔滨工业大学
Place of Publication
哈尔滨
Abstract

大规模数据集存在于生物学、化学、经济学、金融学、遗传学、神经科学和物理学等许多领域。它给数据科学家带来了更多的挑战和新的机遇。大规模数据带来的主要挑战是维度灾难和内存限制。维数灾难意味着维数可能随样本量呈指数增长,需要在非多项式维度下研究估计及假设检验问题。内存限制意味着一台机器的内存无法容纳所有数据集或者勉强能存下数据集但不能进行任何计算。因此,许多传统的估计和推断方法需要重新研究。为了应对大数据的挑战,研究者提出了许多创新算法来解决大规模数据集下的统计推断问题。然而,这些算法不是依据统计学原理而开发的,这导致计算结果的准确性和稳定性没有理论保证。本论文主要在高维拟似然模型下研究单个参数或一组参数特定线性结构的显著性检验,并且提出了海量数据下检验统计量的分布式算法。

首先,对于非多项式维数的指数族广义线性模型,本文提出了一种高维普
通最小二乘投影(HOLP)来降低备择假设空间的维度。基于HOLP选择的子模型,本文构造了似然比统计量。然而,HOLP选择的子模型可能是一个随机子模型,子模型的随机性会降低似然比检验的效果。为了处理HOLP选择的子模型的随机性,本文提出了重新拟合似然比及其相应的Wald和Score检验。模拟和实际数据分析结果表明似然比和重新拟合检验在有限样本下具有较高的检验功效。

其次,对于非多项式维数的拟似然模型,基于高维惩罚拟似然估计,本文提出了广义似然比检验。虽然HOLP在指数族广义线性模型中表现良好,但是HOLP 难以应用到拟似然模型中。广义似然比的思想是首先通过惩罚拟似然估计选择子模型,从而降低备择假设空间的维度,第二步是在降维后的空间中进行假设检验。如果惩罚拟似然估计达到了变量选择的强相合性,广义似然比检验具有接近Oracle广义似然比检验的功效。然而,在超高维模型中,惩罚估计可能会产生虚假变量,从而降低检验的性能。为了解决这个问题,本文提出了重新拟合似然比及其相应的广义Wald和Score检验。这些重新拟合后的检验不仅不受虚假变量效应的影响,而且对调整参数也很稳健。模拟和实际数据分析研究表明广义似然比及其重新拟合检验在有限样本下是可行的、有效的。

最后,本文给出了广义似然比及其重新拟合统计量的分布式计算方法。由于HOLP具有封闭形式,可以使用分块矩阵递归算法来求解大规模矩阵的逆。因此,本文主要关注广义似然比统计量及其重新拟合统计量的计算。正如上面所说,广义似然比及其重新拟合统计量的核心点是变量选择。基于分治法(Divide-and-Conquer)和投票加权法, 本文提出了惩罚拟似然分布式估计。具体来说,将整个数据分成多个子数据集,存储在不同机器上,并在每台机器上拟合数据,然后通过投票加权法来整合所有机器的结果。基于分布式惩罚拟似然估计变量选择的结果,本文采用牛顿-拉夫森(Newton-Raphson)方法计算全数据下的极大拟似然估计量。由于牛顿-拉夫森方法每次迭代都有封闭形式,该封闭形式只涉及Score函数和Fisher信息矩阵的计算。故可以通过整合每台机器上的Score函数和Fisher信息矩阵来获得全数据下的极大拟似然估计量。基于极大拟似然估计,采用类似的平均整合方法就可以得到广义似然比及其重新拟合统计量。模拟试验和实际数据分析结果表明分布式惩罚拟似然估计与完全样本的惩罚拟似然估计具有相同的渐近效率。

Other Abstract

There exist massive high-dimensional data sets in various fields such as biology, chemistry, economics, finance, genetics, neuroscience, and physics. Meanwhile, data scientists are facing more challenges and new opportunities. The major challenges are the curse of dimensionality and memory constraints. The curse of dimensionality means that the dimensionality could grow exponentially with the sample size,  and hypothesis tests are conducted under non-polynomial dimensionality. The memory constraints mean that the memory of one machine cannot contain the whole data set or may barely hold the data set but have no extra room to do computation. Thus, improving the traditional estimation and inference methods is necessary. In order to overcome the challenges of big data, lots of innovative algorithms have been proposed to address statistical inference problems under high-dimensional data sets.
However, their developments are not followed by statistical principles, and their computational results' accuracy and stability are without theoretical supports. Thus, there always exists a gap between statistical theory and computational performance. Based on the high dimensional quasi-likelihood models, this dissertation focuses on the significant tests of a single parameter or the linear structure of a group of parameters, and proposes a distributed algorithm of test statistics in massive data.

First, for exponential generalized linear models with non-polynomial, this thesis propose a new method to reduce the dimension of alternative hypothesis space, referred to as the high dimensional ordinary least square projection(HOLP). Based on the sub-model selected by HOLP, the likelihood ratio statistic is constructed. However, the sub-model selected by HOLP may be  random, which deteriorates the performance of the test. To improve this situation, the refitted likelihood ratio test and the corresponding Score and Wald tests are proposed. The result of the simulation and real data demonstrates the refitted likelihood ratio test and the corresponding Score and Wald tests are powerful under finite samples.

Second, for quasi-likelihood models with non-polynomial dimensionality, the generalized likelihood ratio test is proposed based on the high dimensional penalized quasi-likelihood estimators. Although HOLP performs well in exponential generalized linear models, HOLP is hard to apply to quasi-likelihood models. The idea of the generalized likelihood ratio test is to reduce the dimension of the alternative hypothesis space by the high dimensional penalized quasi-likelihood estimation and then to conduct hypothesis testing in the dimension-reduced space. If the variable selection consistency is achieved, the generalized likelihood ratio test's performance is nearly oracle's. Whereas, in ultrahigh-dimensional models, the penalized estimation may cause misleading, which may include nuisance variables and let the performance of tests to become worse. To tackle this problem, we introduce refitted generalized likelihood ratio test and the corresponding generalized Score and Wald tests. These tests are free of spurious effects and robust against the tuning parameter. And the simulation studies and real data analysis show that the methodology is practicable and effective for finite samples.

Finally, the distributed computational method of the generalized likelihood ratio and the corresponding refitted statistics is proposed in this dissertation. Since HOLP has a closed form, the block-recursive algorithm is an efficient way to solve the inverse of a large-scale matrix. Then, we focus on the calculation of generalized likelihood ratio statistic and its refitted version. As mentioned above, the key issue of the generalized likelihood ratio statistics is the variable selection. Based on the divide-and-conquer and weighted voting method, we propose the distributed penalized quasi-likelihood estimation (DPQE). Specifically, we split the whole data into many subsets, which are stored in different machines. And then, we run each subset on one of the machines, and aggregate the results from all machines via weighted voting. By the sub-model selected by the DPQE, we adopt the Newton-Raphson method to compute the maximum quasi-likelihood estimators. Since there exists the closed form formula in each iteration of the Newton-Raphson method, and the closed form formula only depends on Score function and Fisher information matrix. The Score function and Fisher information matrix are obtained by averaging the corresponding Score function and Fisher information matrix of the subsample at each machine. Then MQLEs are obtained. Lastly, the generalized likelihood ratio statistic and the corresponding refitting statistics are obtained by the similar averaging method. Simulation and real data analysis  show that the distributed estimator shares the same asymptotic efficiency as the estimator based on the full sample.

Keywords
Other Keyword
Language
Chinese
Training classes
联合培养
Enrollment Year
2017
Year of Degree Awarded
2022-12
References List

[1] JOHNSTONE I M, TITTERINGTON D M. Statistical challenges of high-dimensional data[J]. Philosophical Transactions of the Royal Society A: Mathe-matical, Physical and Engineering Sciences, 2009, 367(1906): 4237-4253.
[2] FANJ,LVJ. Aselectiveoverviewofvariableselectioninhighdimensionalfeaturespace[J]. Statistica Sinica, 2010, 20(1): 101-148.
[3] FAN J, HAN F, LIU H. Challenges of big data analysis[J]. National Science Re-view, 2014, 1(2): 293-314.
[4] 智林. 大数据时代的机遇与挑战[J]. 中国工会财会, 2017, 1(5): 59-60.
[5] TIBSHIRANI R. Regression shrinkage and selection via the lasso[J]. Journal ofthe Royal Statistical Society: Series B (Statistical Methodology), 1996, 58(1): 267-288.
[6] FAN J, LI R. Variable selection via nonconcave penalized likelihood and its ora-cle properties[J]. Journal of the American Statistical Association, 2001, 96(456):1348-1360.
[7] 闫莉, 陈夏. 高维广义线性模型的惩罚拟似然 SCAD 估计[J]. 武汉大学学报 (理学版), 2018, 64(6): 66-72.
[8] ZHANG C H. Nearly unbiased variable selection under minimax concave penalty [J]. Annals of Statistics, 2010, 38(2): 894-942.
[9] BATTEY H, FAN J, LIU H, et al. Distributed testing and estimation under sparsehigh dimensional models[J]. Annals of Statistics, 2018, 46(3): 1352-1382.
[10] FAN J, GONG W, ZHU Z. Generalized high-dimensional trace regression via nuclear norm regularization[J]. Journal of Econometrics, 2019, 212(1): 177-202.
[11] FAN J, LI R, ZHANG C H, et al. Statistical foundations of data science[M]. Boca Raton: Chapman and Hall/CRC, 2020.
[12] FAN J, MA C, WANG K. Comment on “A tuning-free robust and efficient ap-proach to high-dimensional regression”[J]. Journal of the American Statistical Association, 2020, 115(532): 1720-1725.
[13] FAN J, LV J. Nonconcave penalized likelihood with NP-dimensionality[J]. IEEETransactions on Information Theory, 2011, 57(8): 5467-5484.
[14] SHI C, SONG R, CHEN Z, et al. Linear hypothesis testing for high dimensionalgeneralized linear models[J]. Annals of Statistics, 2019, 47(5): 2671-2703.
[15] FAN J, LV J. Sure independence screening for ultrahigh dimensional feature space [J]. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2008, 70(5): 849-911.
[16] WANG H. Forward regression for ultra-high dimensional variable screening[J]. Journal of the American Statistical Association, 2009, 104(488): 1512-1524.
[17] WANGX,LENGC. Highdimensionalordinaryleastsquaresprojectionforscreen-ing variables[J]. Journal of the Royal Statistical Society: Series B (StatisticalMethodology), 2016, 78(3): 589-611.
[18] ZHOU T, ZHU L, XU C, et al. Model-free forward screening via cumulative divergence[J]. Journal of the American Statistical Association, 2020, 115(531): 1393-1405.
[19] FRANKLE,FRIEDMANJH. Astatisticalviewofsomechemometricsregressiontools[J]. Technometrics, 1993, 35(2): 109-135.
[20] BREIMAN L. Better subset regression using the nonnegative garrote[J]. Technometrics, 1995, 37(4): 373-384.
[21] FAN J, PENG H. Nonconcave penalized likelihood with a diverging number of parameters[J]. Annals of Statistics, 2004, 32(3): 928-961.
[22] ZOU H. The adaptive lasso and its oracle properties[J]. Journal of the American Statistical Association, 2006, 101(476): 1418-1429.
[23] WANG H, LI R, TSAI C L. Tuning parameter selectors for the smoothly clipped absolute deviation method[J]. Biometrika, 2007, 94(3): 553-568.
[24] ZOU H, LI R. One-step sparse estimates in nonconcave penalized likelihood models[J]. Annals of Statistics, 2008, 36(4): 1509-1533.
[25] HUANG J, HOROWITZ J L, MA S. Asymptotic properties of bridge estimators in sparse high-dimensional regression models[J]. Annals of Statistics, 2008, 36(2): 587-613.
[26] ZOU H, ZHANG H H. On the adaptive elastic-net with a diverging number ofparameters[J]. Annals of Statistics, 2009, 37(4): 1733-1751.
[27] MEINSHAUSEN N, BÜHLMANN P. High-dimensional graphs and variable se-lection with the lasso[J]. Annals of Statistics, 2006, 34(3): 1436-1462.
[28] ZHAO P, YU B. On model selection consistency of lasso[J]. Journal of Machine Learning Research, 2006, 7(90): 2541-2563.
[29] ZHANG C H, HUANG J. The sparsity and bias of the lasso selection in high-dimensional linear regression[J]. Annals of Statistics, 2008, 36(4): 1567-1594.
[30] 金立斌, 许王莉, 朱利平, 等. 偏正态混合模型的惩罚极大似然估计[J]. 中国科学: 数学, 2019, 49(9): 1225-1250.
[31] CANDES E, TAO T. The Dantzig selector: Statistical estimation when p is much larger than n[J]. Annals of Statistics, 2007, 35(6): 2313-2351.
[32] BREHENY P J. Marginal false discovery rates for penalized regression models[J]. Biostatistics, 2019, 20(2): 299-314.
[33] HONDA T, ING C K, WU W Y. Adaptively weighted group lasso for semipara-metric quantile regression models[J]. Bernoulli, 2019, 25(4B): 3311-3338.
[34] LI X, WANG L, NETTLETON D. Additive partially linear models for ultra-high-dimensional regression[J]. Stat, 2019, 8(1): 223.
[35] WANG M, KANG X, TIAN G L. Modified adaptive group lasso forhigh-dimensional varying coefficient models[J]. Communications in Statistics-Simulation and Computation, 2020, 0(0): 1-16.
[36] HONDA T. The de-biased group lasso estimation for varying coefficient models [J]. Annals of the Institute of Statistical Mathematics, 2021, 73(1): 3-29.
[37] 张韵祺, 张春明, 唐年胜. 带有组结构的稀疏模型的参数估计和变量选择方法[J]. 应用数学学报, 2022, 45(1): 31-46.
[38] HALL P, TITTERINGTON D, XUE J H. Tilting methods for assessing the in-fluence of components in a classifier[J]. Journal of the Royal Statistical Society:Series B (Statistical Methodology), 2009, 71(4): 783-803.
[39] FAN J, SONG R. Sure independence screening in generalized linear models with NP-dimensionality[J]. Annals of Statistics, 2010, 38(6): 3567-3604.
[40] FANJ,FENGY,SONGR. Nonparametricindependencescreeninginsparseultra-high-dimensional additive models[J]. Journal of the American Statistical Associ-ation, 2011, 106(494): 544-557.
[41] FAN J, MA Y, DAI W. Nonparametric independence screening in sparse ultra-high-dimensional varying coefficient models[J]. Journal of the American Statisti-cal Association, 2014, 109(507): 1270-1284.
[42] LIUY,ZHANGJ,ZHAOX. Anewnonparametricscreeningmethodforultrahigh-dimensional survival data[J]. Computational Statistics & Data Analysis, 2018, 119(C): 74-85.
[43] ZHANGJ,YING,LIUY,etal. Censoredcumulativeresidualindependentscreen-ing for ultrahigh-dimensional survival data[J]. Lifetime Data Analysis, 2018, 24(2): 273-292.
[44] CHENX,CHENX,WANGH. Robustfeaturescreeningforultra-highdimensionalright censored data via distance correlation[J]. Computational Statistics & DataAnalysis, 2018, 119(C): 118-138.
[45] ZHANG S, ZHAO P, LI G, et al. Nonparametric independence screening for ultra-high dimensional generalized varying coefficient models with longitudinal data[J]. Journal of Multivariate Analysis, 2019, 171(C): 37-52.
[46] PAN J, ZHANG S, ZHOU Y. Variable screening for ultrahigh dimensional cen-sored quantile regression[J]. Journal of Statistical Computation and Simulation,2019, 89(3): 395-413.
[47] HU Q, ZHU L, LIU Y, et al. Nonparametric screening and feature selection forultrahigh-dimensional Case II interval-censored failure time data[J]. BiometricalJournal, 2020, 62(8): 1909-1925.
[48] ZHANG J, LIU Y. Model-free slice screening for ultrahigh-dimensional survival data[J]. Journal of Applied Statistics, 2021, 48(10): 1755-1774.
[49] ZHANGJ,LIUY,CUIH. Model-freefeaturescreeningviadistancecorrelationforultrahigh dimensional survival data[J]. Statistical Papers, 2021, 62(6): 2711-2738.
[50] ZHONG W, WANG J, CHEN X. Censored mean variance sure independencescreening for ultrahigh dimensional survival data[J]. Computational Statistics &Data Analysis, 2021, 159(C): 107206.
[51] ZHOU Y, ZHU L. Model-free feature screening for ultrahigh dimensional datathrough a modified blum-kiefer-rosenblatt correlation[J]. Statistica Sinica, 2018,28(3): 1351-1370.
[52] CHENGMY,HONDAT,ZHANGJT. Forwardvariableselectionforsparseultra-highdimensionalvaryingcoefficientmodels[J]. JournaloftheAmericanStatisticalAssociation, 2016, 111(515): 1209-1221.
[53] HONDA T, LIN C T. Forward variable selection for sparse ultra-high-dimensional generalized varying coefficient models[J]. Japanese Journal of Statistics and Data Science, 2021, 4(1): 151-179.
[54] ZHONG W, DUAN S, ZHU L. Forward additive regression for ultrahigh-dimensional nonparametric additive models[J]. Statistica Sinica, 2020, 30(1): 175-192.
[55] LUJ,LINL. Model-freeconditionalscreeningviaconditionaldistancecorrelation[J]. Statistical Papers, 2020, 61(1): 225-244.
[56] WASSERMAN L, ROEDER K. High dimensional variable selection[J]. Annalsof Statistics, 2009, 37(5A): 2178-2201.
[57] MEINSHAUSEN N, MEIER L, BÜHLMANN P. P-values for high-dimensionalregression[J]. Journal of the American Statistical Association, 2009, 104(488):1671-1681.
[58] LOCKHART R, TAYLOR J, TIBSHIRANI R J, et al. A significance test for thelasso[J]. Annals of Statistics, 2014, 42(2): 413-468.
[59] TIBSHIRANI R J, TAYLOR J, LOCKHART R, et al. Exact post-selection infer-ence for sequential regression procedures[J]. Journal of the American StatisticalAssociation, 2016, 111(514): 600-620.
[60] LEE J D, SUN D L, SUN Y, et al. Exact post-selection inference, with application to the lasso[J]. Annals of Statistics, 2016, 44(3): 907-927.
[61] WANG S S, CUI H J. Partial penalized empirical likelihood ratio test under sparse case[J]. Acta Mathematicae Applicatae Sinica, English Series, 2017, 33(2): 327-344.
[62] NING Y, ZHAO T, LIU H. A likelihood ratio framework for high-dimensionalsemiparametric regression[J]. Annals of Statistics, 2017, 45(6): 2299-2327.
[63] NING Y, LIU H. A general theory of hypothesis tests and confidence regions for sparse high dimensional models[J]. Annals of Statistics, 2017, 45(1): 158-195.
[64] FANGEX,NINGY,LIUH. Testingandconfidenceintervalsforhighdimensionalproportional hazards models[J]. Journal of the Royal Statistical Society: Series B(Statistical Methodology), 2017, 79(5): 1415-1437.
[65] ZHANG X, CHENG G. Simultaneous inference for high-dimensional linear models[J]. Journal of the American Statistical Association, 2017, 112(518): 757-768.
[66] VAN DE GEER S, BÜHLMANN P, RITOV Y, et al. On asymptotically optimalconfidence regions and tests for high-dimensional models[J]. Annals of Statistics,2014, 42(3): 1166-1202.
[67] NEYKOV M, NING Y, LIU J S, et al. A unified theory of confidence regions and testing for high-dimensional estimating equations[J]. Statistical Science, 2018, 33(3): 427-443.
[68] YU G, YIN L, LU S, et al. Confidence intervals for sparse penalized regressionwithrandomdesigns[J]. JournaloftheAmericanStatisticalAssociation,2020,115(530): 794-809.
[69] ZHU Y, SHEN X, PAN W. On high-dimensional constrained maximum likelihood inference[J]. JournaloftheAmericanStatisticalAssociation,2020,115(529):217-230.
[70] FANG E X, NING Y, LI R. Test of significance for high-dimensional longitudinal data[J]. Annals of Statistics, 2020, 48(5): 2622-2645.
[71] FAN J, GUO S, HAO N. Variance estimation using refitted cross-validation inultrahighdimensionalregression[J]. JournaloftheRoyalStatisticalSociety:SeriesB (Statistical Methodology), 2012, 74(1): 37-65.
[72] SUN T, ZHANG C H. Sparse matrix inversion with scaled lasso[J]. Journal ofMachine Learning Research, 2013, 14(1): 3385-3418.
[73] CHENZ,FANJ,LIR. Errorvarianceestimationinultrahigh-dimensionaladditivemodels[J]. Journal of the American Statistical Association, 2018, 113(521): 315-327.
[74] CHANG J, CHEN S X, TANG C Y, et al. High-dimensional empirical likelihoodinference[J]. Biometrika, 2021, 108(1): 127-147.
[75] SHI C, SONG R, LU W, et al. Statistical inference for high-dimensional modelsvia recursive online-score estimation[J]. Journal of the American Statistical Asso-ciation, 2021, 116(535): 1307-1318.
[76] HE Y, XU G, WU C, et al. Asymptotically independent U-statistics in high-dimensional testing[J]. Annals of Statistics, 2021, 49(1): 154-181.
[77] FEI Z, ZHU J, BANERJEE M, et al. Drawing inferences for high-dimensionallinear models: A selection-assisted partial regression and smoothing approach[J].Biometrics, 2019, 75(2): 551-561.
[78] CHAI H, ZHANG Q, HUANG J, et al. Inference for low-dimensional covariates in a high-dimensional accelerated failure time model[J]. Statistica Sinica, 2019, 29(2): 877-894.
[79] SURP,CANDÈSEJ. Amodernmaximum-likelihoodtheoryforhigh-dimensionallogisticregression[J]. ProceedingsoftheNationalAcademyofSciences,2019,116(29): 14516-14525.
[80] SUR P, CHEN Y, CANDÈS E J. The likelihood ratio test in high-dimensionallogistic regression is asymptotically a rescaled chi-square[J]. Probability Theoryand Related Fields, 2019, 175(1): 487-558.
[81] CANDÈS E J, SUR P. The phase transition for the existence of the maximumlikelihoodestimateinhigh-dimensionallogisticregression[J]. AnnalsofStatistics,2020, 48(1): 27-42.
[82] GREENWALD M B, KHANNA S. Power-conserving computation of order-statistics over sensor networks[C]Proceedings of the 23rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. New York: As-sociation for Computing Machinery, 2004: 275-285.
[83] GUHA S, MCGREGOR A. Stream order and order statistics: Quantile estimation inrandom-orderstreams[J]. SIAMJournalonComputing,2009,38(5):2044-2059.
[84] ZHANG Q, WANG W. A fast algorithm for approximate quantiles in high speed data streams[C]Proceedings of the 19th International Conference on Scientificand Statistical Database Management. NW Washington: IEEE Computer Society,2007: 29-29.
[85] LIR,LINDK,LIB. Statisticalinferenceinmassivedatasets[J]. AppliedStochas-tic Models in Business and Industry, 2013, 29(5): 399-409.
[86] JORDAN M I. On statistics, computation and scalability[J]. Bernoulli, 2013, 19(4): 1378-1390.
[87] MANN G, MCDONALD R, MOHRI M, et al. Efficient large-scale distributedtraining of conditional maximum entropy models[C]Proceedings of the 22nd InternationalConferenceonNeuralInformationProcessingSystems. NewYork:CurranAssociates Inc., 2009: 1231-1239.
[88] ZHANG Y, DUCHI J C, WAINWRIGHT M J. Communication-efficient algo-rithmsforstatisticaloptimization[J]. JournalofMachineLearningResearch,2013,14(1): 3321-3363.
[89] BOYD S, PARIKH N, CHU E, et al. Distributed optimization and statistical learning via the alternating direction method of multipliers[J]. Foundations and Trends in Machine learning, 2011, 3(1): 1-122.
[90] MINSKER S. Geometric median and robust estimation in Banach spaces[J].Bernoulli, 2015, 21(4): 2308-2335.
[91] WANG X, PENG P, DUNSON D B. Median selection subset aggregation for parallel inference[C]Proceedings of the 27th International Conference on Neural In-formation Processing Systems. Cambridge MA: MIT Press, 2014: 2195-2203.
[92] WANG X, DUNSON D, LENG C. Decorrelated feature space partitioning for distributed sparse regression[C]Proceedings of the 30th International Conference on Neural Information Processing Systems. New York: Curran Associates Inc., 2016: 802-810.
[93] CHEN X, LIU W, ZHANG Y. Quantile regression under memory constraint[J].Annals of Statistics, 2019, 47(6): 3244-3273.
[94] VOLGUSHEV S, CHAO S K, CHENG G. Distributed inference for quantile re-gression processes[J]. Annals of Statistics, 2019, 47(3): 1634-1662.
[95] CHEN X, LIU W, MAO X, et al. Distributed high-dimensional regression under a quantile loss function[J]. Journal of Machine Learning Research, 2020, 21(182):1-43.
[96] TU J, LIU W, MAO X, et al. Variance reduced median-of-means estimator forbyzantine-robust distributed inference[J]. Journal of Machine Learning Research,2021, 22(84): 1-67.
[97] TU J, LIU W, MAO X. Byzantine-robust distributed sparse learning for m-estimation[J]. Machine Learning, 2021, 0(0): 1-32.
[98] 郭婧璇, 徐慧超, 祝婉晴, 等. 异质性大数据的分布式估计[J]. 统计研究,2021, 37(10): 104-114.
[99] LV S, ZHOU X. Discussion of: “A review of distributed statistical inference”[J]. Statistical Theory and Related Fields, 2021, 0(0): 1-3.
[100] MCCULLAGH P. Quasi-likelihood functions[J]. Annals of Statistics, 1983, 11(1): 59-67.
[101] MCCULLAGH P, NELDER J A. Generalized linear models[M]. London: Chap-man& Hall, 1989.
[102] ELDAR Y C, KUTYNIOK G. Compressed sensing: theory and applications[M].Cambridge England: Cambridge University Press, 2012.
[103] CHEN J, CHEN Z. Extended BIC for small-n-large-P sparse GLM[J]. StatisticaSinica, 2012, 22(2): 555-574.
[104] YANG W Z, HU S H, WANG X J. The bahadur representation for sample quantiles under dependent sequence[J]. Acta Mathematicae Applicatae Sinica, English Series, 2019, 35(3): 521-531.
[105] WU Y, YU W, WANG X. The bahadur representation of sample quantiles for mixing random variables and its application[J]. Statistics, 2021, 55(2): 426-444.
[106] FONTANA R, SEMERARO P. Representation of multivariate bernoulli distributions with a given set of specified moments[J]. Journal of Multivariate Analysis,2018, 168(C): 290-303.
[107] SCHEETZ T E, KIM K Y A, SWIDERSKI R E, et al. Regulation of gene expres-sion in the mammalian eye and its relevance to eye disease[J]. Proceedings of the National Academy of Sciences, 2006, 103(39): 14429-14434.
[108] WILLE A, ZIMMERMANN P, VRANOVÁ E, et al. Sparse graphical gaussianmodeling of the isoprenoid gene network in arabidopsis thaliana[J]. Genome biol-ogy, 2004, 5(11): 1-13.
[109] CHEN Q, FAN D, WANG G. Heteromeric geranyl (geranyl) diphosphate synthase isinvolved in monoterpene biosynthesis in arabidopsis flowers[J]. MolecularPlant, 2015, 8(9): 1434-1437.
[110] ROMANO J P, LEHMANN E. Testing statistical hypotheses[M]. New York:Springer Berlin, 2005.
[111] BRADICJ,FANJ,JIANGJ. Regularizationforcox’sproportionalhazardsmodelwith np-dimensionality[J]. Annals of statistics, 2011, 39(6): 3092.
[112] BÜHLMANN P,VANDE GEER S. Statisticsforhigh-dimensional data: methods, theory and applications[M]. Springer: Science & Business Media, 2011.
[113] BELLONI A, CHERNOZHUKOV V. Least squares after model selection in high-dimensional sparse models[J]. Bernoulli, 2013, 19(2): 521-547.
[114] LI D, KE Y, ZHANG W. Model selection and structure specification in ultra-high dimensional generalised semi-varying coefficient models[J]. Annals of Statistics, 2015, 43(6): 2676-2705.
[115] BARTLETT M S. Some notes on insecticide tests in the laboratory and in the field[J]. SupplementtotheJournaloftheRoyalStatisticalSociety,1936,3(2):185-194.
[116] BENTKUS V. A lyapunov-type bound in 푅푑[J]. Theory of Probability & Its Applications, 2005, 49(2): 311-323.
[117] VAN’T VEER L J, DAI H, VAN DE VIJVER M J, et al. Gene expression profiling predicts clinical outcome of breast cancer[J]. Nature, 2002, 415(6871): 530-536.
[118] XIA X, LI J. Copula-based partial correlation screening: A joint and robust approach[J]. Statistica Sinica, 2021, 31(1): 421-447.
[119] FAN J, FENG Y, JIANG J, et al. Feature augmentation via nonparametrics and selection (fans) in high-dimensional classification[J]. Journal of the American Statistical Association, 2016, 111(513): 275-287.
[120] HUANG R, XIANG L, HA I D. Frailty proportional mean residual life regression for clustered survival data: A hierarchical quasi-likelihood method[J]. Statistics in Medicine, 2019, 38(24): 4854-4870.
[121] LI B. Simultaneous confidence intervals of estimable functions based on quasi-likelihood in generalized linear models for over-dispersed data[J]. Journal of Statistical Computation and Simulation, 2021, 91(1): 108-127.
[122] YU L, SEVILIMEDU V, VOGEL R, et al. Quasi-Likelihood ratio tests for ho-moscedasticityinlinearregression[J]. JournalofModernAppliedStatisticalMeth-ods, 2020, 18(1): 2845.
[123] GUOG,SUNY,JIANGX. Apartitionedquasi-likelihoodfordistributedstatisticalinference[J]. Computational Statistics, 2020, 35(4): 1577-1596.
[124] YOUSIF A H, ALI O A. Proposing robust lad-atan penalty of regression model estimation for high dimensional data[J]. Baghdad Science Journal, 2020, 17(2):550-555.
[125] LI N, ZHANG H H. Sparse learning with non-convex penalty in multi-classification[J]. Journal of Data Science, 2021, 19(1): 56-74.
[126] HAMIDIEH K. A data-driven statistical model for predicting the critical temperaure of a superconductor[J]. ComputationalMaterialsScience,2018,154:346-354.
[127] SHI C, FAN A, SONG R, et al. High-dimensional a-learning for optimal dynamic treatment regimes[J]. Annals of statistics, 2018, 46(3): 925.

Academic Degree Assessment Sub committee
数学系
Domestic book classification number
O212.1
Data Source
人工提交
Document TypeThesis
Identifierhttp://kc.sustech.edu.cn/handle/2SGJ60CL/418360
DepartmentDepartment of Statistics and Data Science
Recommended Citation
GB/T 7714
王浩枫. 高维拟似然模型下的广义似然比检验[D]. 哈尔滨. 哈尔滨工业大学,2022.
Files in This Item:
File Name/Size DocType Version Access License
11749262-王浩枫-统计与数据科学(4236KB) Restricted Access--Fulltext Requests
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Export to Excel
Export to Csv
Altmetrics Score
Google Scholar
Similar articles in Google Scholar
[王浩枫]'s Articles
Baidu Scholar
Similar articles in Baidu Scholar
[王浩枫]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[王浩枫]'s Articles
Terms of Use
No data!
Social Bookmark/Share
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.