Speaker: Ziqi Zhang, School of CSE Ph.D. student
Date and Time: October 26, 12:00-1:00 p.m.
Location: Coda 1315
Title: scDisInFact: the disentangled learning for integration and prediction of multi-batch multi-condition single-cell RNA sequencing data.
Abstract: Single-cell RNA-sequencing (scRNA-seq) is able to measure the expression level of genes in each cell of an experimental batch. scRNA-seq has been widely used for disease studies, where samples are collected from donors at different stages of the disease. As a result, each sample's scRNA-seq count matrix is associated with one or more biological conditions which can be age, gender, drug treatment, disease severity, etc. On the other hand, samples from different donors are often obtained in different experimental batches, which introduce technical confounders that are also termed ``batch effects''. Often seen in practice are samples from different conditions and different batches, and the differences among their count matrices are caused by a mixture of technical batch effect and condition effect. Computational methods should remove the batch effect while keeping the biological variations caused by condition effects. Existing batch effect removal methods remove all systematic differences among samples, including both batch effect and condition effect. In contrast, existing perturbation prediction methods treat the differences among samples solely as condition effects, and predict gene expression data that are inaccurate as they ignore batch effects. Here we propose scDisInFact, a computational framework based on variational autoencoders that models both batch effect and condition effect among samples in scRNA-seq data. scDisInFact simultaneously performs three tasks including batch effect removal, condition-associated key gene detection, and perturbation prediction. We tested scDisInFact on both simulated and real datasets, and compared it with baseline methods for each task. The results show that by jointly performing these three tasks, scDisInFact shows superior performance compared to existing methods that work on each task.