Yixin Wang works in the fields of Bayesian statistics, machine learning, and causal inference, with applications to recommender systems, text data, and genetics. She also works on algorithmic fairness and reinforcement learning, often via connections to causality. Her research centers around developing practical and trustworthy machine learning algorithms for large datasets that can enhance scientific understandings and inform daily decision-making. Her research interests lie in the intersection of theory and applications.

I am an applied statistician working on statistical machine learning methods for analyzing complex biomedical data sets. I develop multivariate statistical methods such as probabilistic graphical models, cluster analysis, discriminant analysis, and dimension reduction to uncover patterns from massive data set. Recently, I also work on topics related to robust statistics, non-convex optimization, and data integration from multiple sources.

Yang Chen received her Ph.D. (2017) in Statistics from Harvard University and then joined the University of Michigan as an Assistant Professor of Statistics and Research Assistant Professor at the Michigan Institute of Data Science (MIDAS). She received her B.A. in Mathematics and Applied Mathematics from the University of Science and Technology of China. Research interests include computational algorithms in statistical inference and applied statistics in the field of biology and astronomy.

Yuekai Sun, PhD, is Assistant Professor in the department of Statistics at the University of Michigan, Ann Arbor.

Dr. Sun’s research is motivated by the challenges of analyzing massive data sets in data-driven science and engineering. I focus on statistical methodology for high-dimensional problems; i.e. problems where the number of unknown parameters is comparable to or exceeds the sample size. My recent work focuses on two problems that arise in learning from high-dimensional data (versus black-box approaches that do not yield insights into the underlying data-generation process). They are:

1. model selection and post-selection inference: discover the latent low-dimensional structure in high-dimensional data and perform inference on the learned structure;

2. distributed statistical computing: design scalable estimators and algorithms that avoid communication and minimize “passes” over the data.

A recurring theme in my work is exploiting the geometry of latent low-dimensional structure for statistical and computational gains. More broadly, I am interested in the geometric aspects of high-dimensional data analysis.

Keshav Pokhrel, PhD, is Assistant Professor of Statistics at the University of Michigan, Dearborn.

Prof. Pokhrel’s research interests include the epidemiology of cancer, time series forecasting, quantile regression and functional data analysis. The skewed and non-normal data are increasingly more frequent than ever before. The data in the extreme ends are of their own importance. Hence the importance of quantile regression. The availability of the information is increasingly functional. My current work is gearing towards functional data analysis techniques such as principal differential analysis which can estimate a system of differential equations to reveal the dynamics of real data.

My primary project, election forensics, concerns using statistical analysis to try to determine whether election results are accurate. Election forensics methods use data about voters and votes that are as highly disaggregated as possible. Typically this means polling station (precinct) data, sometimes ballot box data. Data can comprises hundreds of thousands or millions of observations. Geographic information is used, with geographic structure being relevant. Estimation involves complex statistical models. Frontiers include: distinguishing frauds from effects of strategic behavior; estimating frauds probabilities for individual observations (e.g., polling stations); adjoining nonvoting data such as from in-person election observations.

Elizaveta (Liza) Levina and her group work on various questions arising in the statistical analysis of large and complex data, especially networks and graphs. Our current focus is on developing rigorous and computationally efficient statistical inference on realistic models for networks. Current directions include community detection problems in networks (overlapping communities, networks with additional information about the nodes and edges, estimating the number of communities), link prediction (networks with missing or noisy links, networks evolving over time), prediction with data connected by a network (e.g., the role of friendship networks in the spread of risky behaviors among teenagers), and statistical analysis of samples of networks with applications to brain imaging, especially fMRI data from studies of mental health).

Johann Gagnon-Bartsch, PhD, is Assistant Professor of Statistics in the College of Literature, Science, and the Arts at the University of Michigan, Ann Arbor.

Prof. Gagnon-Bartsch’s research currently focuses on the analysis of high-throughput biological data as well as other types of high-dimensional data. More specifically, he is working with collaborators on developing methods that can be used when the data are corrupted by systematic measurement errors of unknown origin, or when the data suffer from the effects of unobserved confounders. For example, gene expression data suffer from both systematic measurement errors of unknown origin (due to uncontrolled variations in laboratory conditions) and the effects of unobserved confounders (such as whether a patient had just eaten before a tissue sample was taken). They are developing methodology that is able to correct for these systematic errors using “negative controls.” Negative controls are variables that (1) are known to have no true association with the biological signal of interest, and (2) are corrupted by the systematic errors, just like the variables that are of interest. The negative controls allow us to learn about the structure of the errors, so that we may then remove the errors from the other variables.