Centre for Statistics

Buzzwords unwrapped 2: what do people mean by "model"?

Prof. Chris Dent continues our series in which he "unwraps" current prominent "buzzwords".

Prof. Chris Dent
Prof. Chris Dent, Co-Director at the CfS

Different communities’ differing understandings of the word “model” have caused confusion within a number of projects on which I have worked. This note clarifies the various elements which make up a computer model. It does not attempt to give a definitive definition of the single word “model”, as given the range of meanings it takes on that would probably be futile – as per my Centre for Statistics blog on the meanings of the terms “Digital Twin” and “AI”, my intention is to explain the relevant concepts to help people follow others’ arguments and ask the right questions.

I should stress that I am not claiming that the term “model” is a buzzword – it has been in general use for many years in this context (the Oxford English Dictionary’s first relevant example is from the Journal of the Royal Statistical Society in 1901). Nevertheless, it does fit will with the theme of this series, as being a term whose meaning causes confusion in communication, particularly between different research and professional disciplines.

A computer model typically consists of:

  • An underlying mathematical model structure, which would typically be a set of equations which one writes out on paper, and which is independent of the computer implementation. In a physical system such as an electricity network, this underlying model structure would often be referred to as the governing equations.
  • Data.
  • Possibly a numerical scheme which is used to evaluate model outputs. Again, this is something which one writes down on paper, and is thus not the same thing as the computer implementation.
  • The computer implementation of the model structure and numerical scheme, and combination of these with the data.

A particular implementation of the underlying mathematical model, i.e. the last of these, is often referred to in the statistical uncertainty quantification community as a simulator. Thus in many circumstances simulatorand computer model may be regarded as synonymous, but one has to be careful to avoid ambiguity.

It is common in the mathematical science community to use the term “model” to refer only to the first of the above. This is linked to the general matter of good practice in computer implementations of separating the model structure and data, so that they can be managed independently. A major drawback of working in Excel, for instance, is that there is no such separation where worksheet formulae are used, in which case changing the size or structure of a dataset inevitably also means copying formulae. More generally, spreadsheets also contain multiple instances of the same formula in the model’s structure. These issues reduce transparency, and create possibilities for hard-to-spot bugs.

The energy modelling framework TIMES[1], which allows users to input data from a system of their choice and which is then brought together with a common underlying model structure, provides an instructive example. Where people in the energy application community refer to building a TIMES model (such as Scottish TIMES or UK TIMES), in the above classification this would refer to developing a dataset for a particular region or country rather than specifying a new underlying model structure. The TIMES framework then brings together the underlying mathematical model specification and the data to form an optimization model for the particular question poised, and then passes it to an efficient general purpose optimization code for solution.

 

Chris Dent is Professor of Industrial Mathematics;  Director of the Statistical Consulting Unit at the University of Edinburgh; and a Turing Fellow at the Alan Turing Institute.