Thesis Title: Hypothesis Test for Manifolds and Networks
Dr. Yao Xie, School of Industrial and Systems Engineering, Georgia Tech
Dr. Alexander Shapiro, School of Industrial and Systems Engineering, Georgia Tech
Dr. Andy Sun, School of Industrial and Systems Engineering, Georgia Tech
Dr. Feng Qiu, Department of Energy Systems, Argonne National Laboratory
Date and Time:
Wednesday, April 21, 2021 @ 10:30am (EST)
Meeting URL (BlueJeans):
Meeting ID (BlueJeans):
Statistical inference of high-dimensional data is crucial for science and engineering. Such high-dimensional data are often structured. For example, they can be data from a certain manifold or from a large network. Motivated by the problems that arise in recommendation systems, power systems and social media etc., this dissertation aims to provide statistical modeling for such problems and perform statistical inferences. This dissertation focus on two problems. (i) statistical modeling for smooth manifold and inferences for the corresponding characteristic rank; (ii) detection of change-points for sequential data in a network.
In chapter 2, we study the problem of matrix completion. From a geometric perspective, we address the following questions: (i) what is the minimum achievable rank in the minimum rank matrix completion (MRMC) problem? (ii) Under what conditions, there will be a locally unique solution for MRMC problem? We also provide a statistical model for low rank matrix approximation problems. With such a model, we present a statistical test of the rank. With numerical experiments, we verify our theoretical results and show the performance of the proposed test procedure.
In chapter 3, we generalize the results in chapter 2. We develop a general theory for testing the goodness-of-fit of non-linear models. The observation noise is additive Gaussian. Our main result shows that the “residual” of the model fit (by solving a non-linear least-square problem) follows a (possibly non-central) $\chi^2$ distribution. The natural use of our result is to select the order of a model via a sequential test procedure by choosing between two nested models. We demonstrate the applications of this general theory in the settings of real and complex matrix completion from incomplete and noisy observations, signal source identification, and determining the number of hidden nodes in neural networks.
In chapter 4, we develop an online change-point detection procedure for power system’s cascading failure using multi-dimensional measurements over the networks. We incorporate the cascading failure’s characteristic into the detection procedure and model multiple changes caused by cascading failures using a diffusion process over networks. The model captures the property that the risk of component failing increases as more components around it fail. Our change-point detection procedure using the generalized likelihood ratio statistics assuming unknown post-change parameters of the measurements and the true failure time (change-points) at each node. We also provide a fast algorithm to perform the change-points detection. Numerical experiments show that our proposed method demonstrates good performance and can scale up to large systems.
In chapter 5, we proposed a change-point detection procedure by scan score statistics in a multivariate Hawkes network. Our scan score statistics are computationally efficient since we don’t need to compute the estimates of the post-change parameters, which is of importance for online detection. We present the theoretical results of our proposed procedure, including the analysis of the false alarm rate (FAR) and average run length (ARL) of the procedure under null hypothesis. We use simulation studies to testify our theoretical results and compare our method with an existing change-point detection procedure with generalized likelihood ratio statistics. We also apply our proposed procedure in real-world data such as memetracker and the stock market, which shows promising results in detecting an abrupt change in the network.