Jack Baker

I'm currently a third year PhD student at the STOR-i Doctoral Training Centre at Lancaster University. My PhD is in collaboration with the University of Washington. I work on scalable Markov chain Monte Carlo methods under the supervision of Professor Paul Fearnhead, Dr Christopher Nemeth and Professor Emily Fox at the University of Washington.

Research

My research interests are in scalable Markov chain Monte Carlo (MCMC) algorithms. A common problem when fitting complex models is overfitting, and Bayesian methods provide a natural solution to this. But MCMC, one of the most popular methods for fitting Bayesian methods, does not scale well with the dataset size. As dataset sizes have been increasing, the need to improve the scalability of MCMC has become clear.

Publications

  • J. Baker, P. Fearnhead, E. B. Fox and C. Nemeth (2018), Large-Scale Stochastic Sampling from the Probability Simplex. Submitted. ArXiv. Code
  • J. Baker, P. Fearnhead, E. B. Fox and C. Nemeth (2017), sgmcmc: An R Package for Stochastic Gradient Markov Chain Monte Carlo. Journal of Statistical Software. Accepted. ArXiv. Code
  • J. Baker, P. Fearnhead, E. B. Fox and C. Nemeth (2017), Control Variates for Stochastic Gradient MCMC. Statistics and Computing. Accepted. ArXiv. Code

Software

  • sgmcmc: R package for stochastic gradient MCMC. Website. Github

Unpublished Technical Reports

  • Comparison of Markov Chain Monte Carlo Algorithms for Large Datasets Article.

Talks, Tutorials and Awards


Talks
Warwick University (2018). Invited Speaker. Algorithms and Computation Group.
Afternoon on Bayesian Computation 2018: Control Variates for Stochastic Gradient MCMC
University of Edinburgh (2017). Invited Speaker.
RSC 2016: A Comparison of MCMC for Big Data
Afternoon on Bayesian Computation 2016: A Comparison of MCMC for Big Data

Grants
STOR-i Research Fund, research visit to the University of Washington, Summer 2017
Travel and accommodation grant, data science consulting week, Alan Turing Institute, Spring 2017
Young researchers travel grant, MCMSki, Spring 2016

Posters
ISBA 2018: Control Variates for Stochastic Gradient MCMC
BayesComp 2018: Control Variates for Stochastic Gradient MCMC
i-like Workshop 2016: A Comparison of MCMC for Big Data
MCMSki 2016: A Comparison of MCMC for Big Data

Tutorials
Lancaster University Mathematics Summer School
School Impact Sessions: Lancaster Royal Grammar School
STORC: Linux Terminal Tools for the STOR-i Computing Group.
STORC: An Overview of Coding Tools for the STOR-i Computing Group.
CSML: A Tutorial on STAN for Lancaster University CSML Group.
STORC: A Tutorial on using Git for Version Control for the STOR-i Computing Group.
STORC: An Overview of Useful Linux Command Line Tools for the STOR-i Computing Group.

Masters Research Projects

Lancaster University: Computational statistics for big data

This report formed my master's dissertation and PhD proposal for the STOR-i MRes at Lancaster University. Companies are under increased pressure to make the most use of their customer data, those who do not are at a massive competetive disadvantage. This has led to the amount of data being stored by organisations to increase at an astonishing rate.

As statistical models grow in complexity and size, traditional machine learning algorithms are struggling to scale well to the large datasets required for model fitting. Markov chain Monte Carlo (MCMC) is a particularly important algorithm that has been left behind by the big data rush. This report discusses, implements and compares a number of possible solutions that enable MCMC to scale more effectively to large datasets.

hide


Lancaster University: Exact approximate MCMC

This report and presentation formed one of two research topics completed as part of the STOR-i MRes at Lancaster University. Interest in a process which can only be observed indirectly is a problem encountered in a variety of applications. Examples include the analysis of biological sequences, speech recognition and time series analysis. Statistical analysis of problems with missing data or unobservable processes can be difficult when the unobserved process cannot be marginalised out. This report and presentation introduces a new breed of MCMC algorithms, which allow us to simulate from distributions which cannot be evaluated explicitly, but for which unbiased estimators can be produced.

It is recommended you open the presentation in adobe pdf reader, as otherwise some of the animations won't work. The code is part of an example included in the report.

hide


Lancaster University: Artificial neural networks and deep learning

This report and poster formed one of two research topics completed as part of the STOR-i MRes at Lancaster University. Artificial neural networks are currently a very active area of research in statistical learning and data science. Since 2009, algorithms based on artificial neural networks have won a lot of pattern recognition contests, so are a very hot topic at the moment. They were also the first statistical learning method to reach 'superhuman performance' in such a contest.

Artificial neural networks were originally inspired by the study of biological neural networks, such as those found in the brains and nervous systems of animals. The family of algorithms have considerable, flexible use in pattern recognition and machine learning. The report and poster aims to give a brief review of the area.

hide

About Me

Presentation I gave to other Mathematics students about my summer internship at the Met Office.

Academic & Technical Background

I'm a second year PhD student at the STOR-i CDT at Lancaster University. I completed my masters degree with distinction in statistics and OR as part of the CDT. Here I did a number of projects on Markov chain Monte Carlo, and that's where my interest in these methods really began. During the summer of the first year of my PhD I took a data science internship at Improve Digital in Amsterdam.

I completed my undergraduate degree at the University of Edinburgh, where I was awarded a First Class Honours BSc in Mathematics and Statistics. During my time at Edinburgh, I had a couple of summer internships. One at the Met Office, where I modelled the performance and usage of the MASS storage system, which stores the Met Office's climate data. Another at Motorola Solutions, where I developed an alternative compression algorithm in C for the data logs on Motorola's wireless products.

Email: j.baker1 [at] lancaster.ac.uk

STOR-i

About STOR-i

Presentation I gave to other Mathematics students about my summer internship at the Met Office.

I am currently enrolled in STOR-i, a Doctoral Training Centre based at Lancaster University. It offers a four year PhD programme that focuses on research at the interface between Statistics, Operational Research and industry.

The programme is developed and delivered with industrial partners, and offers the chance to perform research with considerable collaboration with industry. I joined STOR-i in the first year of intake since it was awarded a new round of funding by EPSRC. The funding will allow 60 new students to join STOR-i over the course of the next 5 years.