From High Dimensional Data to Big Data

We introduce a new family of robust semiparametric methods for analyzing large, complex, and noisy datasets. Our method is based on the transelliptical distribution family which assumes that the variables follow an elliptical distribution after a set of unknown marginal transformations. The transelliptical family includes many existing distributions as special cases and can be used to robustify a wide range of multivariate methods, including sparse covariance matrix estimation, principal component analysis, graphical models, discriminant analysis, regression analysis, and principal component regression. We present a hierarchical representation of the transelliptical distribution and propose a new estimation technique based on robust rank correlations. The theoretical properties of the transelliptical methods rely on the concentration behavior of a new type of random matrices under different norms. I will lay out some existing results and introduce an remaining open problem. This talk is based on joint work with Fang Han.



Han Liu


Princeton University