CfS Annual Conference 2020

The Centre for Statistics Annual Conference brings together researchers working with data from across the University of Edinburgh and Associated Institutions.

The conference will feature a number of talks from invited speakers, on December 7th, and will culminate in a public lecture on December 8th.

Running Order (all times GMT):

December 7th

12.55-13.00: Housekeeping and Introductions

13.00-14.20: Session 1

14.20-14.35: Coffee Break

14.35-15.35: Session 2

15.35-16.00 onwards: GatherTown Social

December 8th

15.00-16.00: Public Lecture - Register here

The online events will be hosted on Zoom. Ticket holders will receive an email on the day of each event, with instructions on how to join.

December 7th

13.00 - Dr. Belen Martin-Barragan, University of Edinburgh Business School

On Explainable Classification and Regression methods for Functional Data Analysis

When classification or regression is applied to high-dimensional data, selecting a subset of the predictors may lead to an improvement in the predictive ability of the estimated model, in addition to reducing the model complexity. In Functional Data Analysis (FDA), data provides information about a curve or a function. Expainable models can be obtained by selecting instants or short intervals able to capture the information needed to predict the response variable. In this talk we will present several methods that address this problem by taking advantage of the functional nature of the data. A novel continuous optimization algorithm will be used to fit the SVM or SVR parameters and select intervals and instants.

13.20 - Dr. Gail Robertson, Statistical Consultancy Unit, University of Edinburgh

Bayesian networks and chain event graphs as decision making tools in forensic science

Bayes’ theorem and likelihood ratios are used in forensic statistics to compare evidence supporting different propositions put forward during court proceedings. There is widespread interest among forensic scientists in using Bayesian network models to evaluate the extent to which scientific evidence supports hypotheses proposed by the prosecution and defence. Bayesian networks are frequently used to compare support for source-level propositions, e.g. propositions concerned with determining the source of samples found at crime scenes such as hair, fibres, and DNA. While comparing source-level propositions is useful, propositions which refer to criminal activities (i.e. those concerned with understanding how a sample came to be at the crime scene) are of increasing interest to courts. In this study, we used Bayesian networks and chain event graphs to combine different types of evidence supporting activity-level propositions from a real-world drug trafficking case. We compared the use of Bayesian networks and chain event graphs in evaluating evidence to support activity-level propositions associated with the case, and demonstrate how graphical methods can be used to evaluate the extent to which evidence from a drug trafficking case supports prosecution or defence propositions.

13.40 - Dr. Marcelo Pereya, School of Mathematics and Computer Sciences, Heriot-Watt University

Bayesian inference with data-driven image priors encoded by neural networks

This talk presents a mathematical and computational methodology for performing Bayesian inference in problems where prior knowledge is available in the form of a training dataset or set of training examples. This prior information is encoded into the model by using a deep neural network, which is combined with an explicit likelihood function by using Bayes' theorem to derive the posterior distribution for the quantities of interest given the available data. Bayesian computation is then performed by using appropriate Markov chain Monte Carlo stochastic algorithms. We study the properties of the proposed models and computation algorithms and illustrate performance on a range of imaging inverse problems involving point estimation, uncertainty quantification, hypothesis testing, and model misspecification diagnosis. Joint work with Matthew Holden and Kostas Zygalakis.

14.00 - Dr. Roxanne Connelly, School of Social and Political Science, University of Edinburgh

Reflections on a Transparent and Reproducible Sociological Study of General Cognitive Ability and Social Class Inequalities

Despite there being an increasing desire and requirement to make research more transparent, and to actively render it reproducible, sociology lags behind other disciplines in the transparency and reproducibility. Our research made use of Jupyter notebooks, an internationally recognized open source research platform, which allows third parties to fully reproduce the complete workflow behind the production of our article, and to duplicate the empirical results. In addition to increasing transparency, this approach enables the possibility for other researchers to extend the work, for example with different measures, additional data or alternative techniques. In developing an open and published workflow, we have drawn upon ideas advanced in computer science especially the concept of ‘literate computing’.

The presentation will reflect on the experience of undertaking an empirical study in a transparent and reproducible framework. We will argue that improving research practices in sociology requires new skills and new perspectives in research methods training and practice.

14.20-14.35: Coffee Break

14.35 - Dr. Javier Palarea-Albaladejo, BioSS

Recent advances in compositional data analysis and its scientific applications

Multivariate observations referring to parts of a whole, so-called compositions or compositional data in statistics, are common across varied scientific fields. This is for example the case when dealing with chemical concentrations, food nutritional contents, election vote shares, time-use and behavioural patterns, relative abundances, amongst others. Their distinctive feature is that there is an inherent interplay between the parts of the composition and, hence, the information they convey is fundamentally relative to each other, rather than absolute as ordinary statistical methods assume. This can in first instance cause some technical difficulties with ordinary data analysis and modelling, such as perfect multicollinearity and spurious correlations; but beyond that, it affects the way we think about a scientific problem, the statistical analysis to be conducted, and the interpretation of its results. In this talk I will discuss some recent developments and applications across different domains of the natural and health sciences.

14.55 - Dr. Catalina Vallejos Meneses, Institute of Genetics and Molecular Medicine, University of Edinburgh

scMET: Bayesian modelling of DNA methylation heterogeneity at single-cell resolution

High throughput measurements of molecular phenotypes (such as gene expression or DNA methylation) at single-cell resolution are a promising resource to quantify cell-to-cell heterogeneity and its role in a variety of complex biological processes. However, technological limitations result in sparse and noisy datasets, effectively posing challenges to robustly quantify genuine cellular heterogeneity. In this talk, I will focus on the statistical challenges associated to the analysis of single-cell DNA methylation data. I will introduce scMET, a hierarchical Bayesian model which overcomes data sparsity by sharing information across cells and genomic features, resulting in a robust and biologically interpretable quantification of variability. scMET can be used to both identify highly variable features that drive epigenetic heterogeneity and perform differential methylation and differential variability analysis between pre-specified groups of cells. We demonstrate scMET’s effectiveness on some recent large scale single cell methylation datasets, showing that the scMET feature selection approach facilitates the characterisation epigenetically distinct cell populations. Moreover, we illustrate how scMET variability estimates enable the formulation of novel biological hypotheses on the epigenetic regulation of gene expression in early development.

An R package implementation of scMET is publicly available at https://github.com/andreaskapou/scMET.

15.15 - Prof. Gabi Hegerl, School of GeoSciences, University of Edinburgh

What are the implications of past extreme weather and climate events for the future?

Much work is being done on analysing the contribution by greenhouse gases and other factors to changing extreme events. This talk gives a short introduction to the topic, and then discusses some extreme events in the more distant past, showing both statistical methods used in the climate research community, and lessons we can learn from past rare and exceptional events. As examples, I will discuss the ‘year without a summer’ 1816 and the dustbowl heatwaves and drought in the 1930s. Record-breaking summer heat waves were experienced across the contiguous United States during the decade-long ‘‘Dust Bowl’’ drought. Climate model simulations suggest that the heat waves were influenced by anomalous sea surface temperatures in the Atlantic particularly, favouring dry springs which amplifies summer heatwaves in the region. Atmospheric model simulations show that human-induced deterioration of vegetation can further sharply amplify heatwaves. Similar heatwaves would have been slightly less warm without greenhouse gas forcing at the time, and the return period of a rare heat wave summer (as observed in 1936) would be much reduced at the present time due to greenhouse warming. The drought and heat of the 1930s Dust Bowl as well as its socioeconomic consequences highlights the danger that naturally occurring compound events, amplified by vegetation feedbacks, can create very extreme events and possibly tipping elements in a warmer world.

15.35-16.00 onwards: GatherTown Social

December 8th

15.00-16.00 - Public Lecture, Genevera Allen, Rice University

Can we Trust Data-Driven Scientific Discoveries? Machine Learning & Scientific Reproducibility

As more and more scientific domains are collecting vast troves of data, we rely on machine learning techniques to analyze the data and help make data-driven scientific discoveries. In this talk, I will discuss how machine learning has been used to advance science. But, we pause to ask, are these data-driven discoveries reproducible? And, how can we use machine learning to draw reliable scientific conclusions? I will discuss these questions by giving examples from my own research, including an extended example on clustering. Additionally, I will outline both new research directions and offer practical advice for improving the reliability and reproducibility of data-driven discoveries.

Register here

Dec 07 2020 13.00 - 16.00