Variable selection for high-dimensional data using known and novel graph information

来源：数学科学学院发布时间：2018-11-20 1314

题目：Variable selection for high-dimensional data using known and novel graph information

报告人：Qi Long教授（宾夕法尼亚大学）

时间：2018.12.26（周三）下午 14:00

地点：紫金港校区管理学院行政楼14楼1417报告厅

摘要：Variable selection for structured high-dimensional covariates lying on an underlying graph has drawn considerable interest. However, most of the existing methods may not be scalable to high dimensional settings involving tens of thousands of variables lying on known pathways such as the case in genomics studies, and they assume that the graph information is fully known. This talk will focus on addressing these two challenges. In the first part, I will present an adaptive Bayesian shrinkage approach which incorporates known graph information through shrinkage parameters and is scalable to high dimensional settings (e.g., p~100,000 or millions). We also establish theoretical properties of the proposed approach for fixed and diverging p. In the second part, I will tackle the issue that graph information is not fully known. For example, the role of miRNAs in regulating gene expression is not well-understood and the miRNA regulatory network is often not validated. We propose an approach that treats unknown graph information as missing data (i.e. missing edges), introduce the idea of imputing the unknown graph information, and define the imputed information as the novel graph information. In addition, we propose a hierarchical group penalty to encourage sparsity at both the pathway level and the within-pathway level, which, combined with the imputation step, allows for incorporation of known and novel graph information. The methods are assessed via simulation studies and are applied to analyses of cancer data.

欢迎广大师生踊跃参加！

联系人：张立新（stazlx@zju.edu.cn）