Johann Gagnon-Bartsch
       

Johann Gagnon-Bartsch

Assistant Professor
Department of Statistics
University of Michigan


Contact Info

E-mail: johanngb@umich.edu

Office: 441 West Hall

Phone: (734) 763-1427

Mailing address:
Department of Statistics
323 West Hall
1085 South University
Ann Arbor, MI 48109-1107

 

I am an Assistant Professor in the Department of Statistics and am affiliated with the Michigan Institute for Data Science, the Child and Adolescent Data Lab, and the Michigan Center for Single-Cell Genomic Data Analytics. My research focuses on causal inference, machine learning, and nonparametric methods with applications in the biological and social sciences.

Quick Links

Working and Recent Papers

Working Papers

Recently Published / Available Online

Projects

Remove Unwanted Variation (RUV)

About

RUV is a set of methods originally developed to remove batch effects and other unwanted variation from gene expression data. More generally, RUV attempts to adjust high dimensional data for unobserved confounders, by making use of negative controls and replicates. A negative control is a variable that is known a priori to be (1) unaffected by the factor of interest, and (2) affected by the unobserved confounders. Negative controls and replicates can be used to help identify unwanted variation and separate it from variation of interest, even when the wanted and unwanted variation are correlated, and even when the factors causing the unwanted variation are unknown.

Resources


Balance Testing

About

In many studies in the social sciences and medicine the researcher does not control treatment assignment and instead may rely upon natural experiments or matching methods as a substitute to experimental randomization. In such cases it is helpful to check whether observed covariates are balanced across treatment conditions. The Classification Permutation Test (CPT) is a covariate balance test that first trains a classifier to distinguish treated units from control units, and then, using permutation inference, determines whether the classifier is able to do so better than would be expected by chance.

Resources


Estimating Cell Type Proportions

About

Biological tissues are typically composed of several distinct cell types. dtangle is a method to estimate the proportions of different cell types comprising a tissue sample from gene expression data. (This is sometimes referred to as "cell type deconvolution.") Similar to other deconvolution methods, dtangle requires reference expression profiles for each cell type, as well as a list of marker genes that are expressed primarily in one cell type. Where dtangle is unique is in its treatment of scale; gene expression values are considered on both linear and log scales, with the dual aims of a scientifically plausible mixing model, and statistical robustness of the fitting procedure.

Resources


Covariate Adjustment in Randomized Experiments

About

Two advantages of randomized trials are (1) potential confounding variables are largely balanced across treatment conditions, and (2) design-based inference may be used, in which statistical assumptions are largely justified by the physical act of randomization. Randomization does not balance potential confounders perfectly, however, and there are typically small observed imbalances in baseline covariates. Adjusting for these imbalances can improve the precision of treatment effect estimates, but methods that do so are not always design-based. We are working on developing new design-based estimators to fill this gap.

Resources


Stably Expressed Genes

About

Genes with relatively stable expression are of biological interest, and also useful for normalization (see RUV above). We are interested in discovering such genes, and especially interested in genes that are stable even at the level of single cells. At the single cell level, the notion of stability is somewhat ambiguous; for example, a gene could be stable in terms of the absolute quantity of transcripts, or in terms of concentration (proportional to cell size). Our goal is to identify different sets of genes that satisfy different notions of stability.

Resources


Social Media and Surveys

About

Traditional surveys, such as those used for political polling and economic research, are expensive and suffer increasingly from non-response. Social media has been suggested as an alternative data source to augment or even replace survey data. We are exploring the feasibility of using social media data in this manner.

Resources


Microenvironment Microarrays

About

The immediate physical and bio-chemical surroundings of a cell, the cellular microenvironment, is an important component of many fundamental cell and tissue level processes and is implicated in many diseases and dysfunctions. To study perturbations of cellular microenvironments a novel image-based cell-profiling technology called the microenvironment microarray (MEMA) has been recently employed. We are helping to develop an analysis pipeline for MEMA data, which will allow for the integration of image features, as well as adjustments for spatial and other technical artifacts in the data.

Resources