How we build simulations

Scientific model implementation done right

At a talk at CSER Cambridge, Doyne Farmer, the Director of the Complexity Economics Programme at the University of Oxford, summarized the impact of computation on science as follows:

In my lifetime computation has revolutionized science, it’s been the biggest driver of progress in science, it’s been the new technology that’s made the difference...

From competition and innovation policy to EU monetary policy, from pandemic interventions to economic policies that prevent climate change, computation now helps policymakers in dealing with the world’s most pressing problems. However, despite the impact of computation on science, the idea of writing sustainable, composable, and reliable implementations is only in its infancy in scientific research.

A 2016 Nature survey showed that 90% of scientists believe there is a reproducibility crisis in science (while 3% believe there isn’t). One of the top reasons given for the lack of reproducibility is “Methods, code unavailable”, and around half of the top 10 reasons given for the lack of reproducibility are related to what boils down to good software engineering practices in computational disciplines.

Richard McElreath, anthropology professor and author of one of the best-selling Bayesian statistics textbooks, argues that many of the reproducibility problems academia is currently facing have already been solved in the software industry. This aligns with our observation: While the software industry is often criticized for being short-sighted and preferring technological quick fixes over sustainable development, industry-developed code is generally still significantly more sustainable than research code and is decades ahead of research code in terms of best practices.

Our goal is to bridge this well-known gap by implementing our Baseline model in the way scientific models should be implemented.

To achieve the best development productivity, our model is written in Clojure, a productivity-focused1, high-level, functional language that leverages the Java Virtual Machine (JVM) for performance, cross-platform deployment, and reliability. Alternatively, the model can be run directly in the browser to provide an interactive, visually attractive tool for researchers to explore scenarios that simulate competition for the development of transformative AI. The model is implemented in a Lisp-family language and so, in the future, it can be used as a basis for research on program-induction-based approaches to human-level AI (e.g. Bayesian program learning) as well as recursively self-improving AI (e.g. Gödel machines2). This works by Clojure’s meta-circular evaluator making it possible to write algorithms that are able to rewrite the code of the model as well as their own code.

Most importantly, we identified three foundational problems with scientific model implementations and addressed them in our implementation. These are reliability, composability, and sustainability.


Software issues are an abundant reliability problem for scientific papers. From three Science papers being retracted3 at the same time to one in five genetics papers containing errors because of Excel and hundreds of studies being affected by a bug in a Python script. There are a vast number of problems caused by unreliable code. These problems have real-world consequences. For example, a paper whose conclusion was due to a trivial coding error influenced the eurozone countries’ response to the Great Recession and was waved as an argument against austerity on the floor of the US Congress. Furthermore, even when results aren’t necessarily wrong, a study’s unreliable code can be used to discredit its conclusion.

Some steps have been taken, such as those elaborated in Nature: “Worried about a rising tide of results that fail to measure up, journals are starting to take action. In the latest such move, Nature Biotechnology announced on 7 April a plan to prevent such embarrassing episodes in its pages (Nature Biotechnol. 33, 319; 2015). Its peer reviewers will now be asked to assess the availability of documentation and algorithms used in computational analyses, not just the description of the work. The journal is also exploring whether peer reviewers can test complex code using services such as Docker, a piece of software that allows study authors to create a shareable representation of their computing environment. Researchers say that such measures are badly needed.”

To prevent reliability issues, the Baseline model implementation includes over 300 assertions that are checked as part of the tests and can autogenerate a nearly endless number of checks that important invariants are maintained throughout the model runs. Additionally, we have manually reproduced certain model runs in spreadsheets and verified that the results are equal to those produced by our implementation.

The code for the Baseline model is thoroughly documented, from variables and functions to architectural decisions, and the reasoning behind the evolution of the code is described in detail in commit messages as well as architecture decision records. Furthermore, we have written extensive instructions to make sure that anyone is able to run our code.


In the paper The Future of Agent-Based Modeling, Ricardi arrives at the following conclusion: “I envisage the foundation of a Modular Macroeconomic Science, where new models with heterogeneous interacting agents, endowed with partial information and limited computational ability, can be created by recombining and extending existing models in a unified computational framework.” Furthermore, he elaborates ‘...modularity has not been exploited so far in [agent-based] modeling because of the high fixed cost involved in developing a flexible simulator, where modules can be easily combined, replaced, or extended. The relative novelty of the methodology, and the incentives for immediate returns in terms of publications and funding, has resulted in highly specific computational architecture and “disposable” models.’

Ricardi’s conclusions are very similar to our own realizations when we started writing the agent-based model code. As part of implementing the Baseline model, we used our software engineering expertise to, at the same time, lay the foundations for the kind of flexible simulator described by Ricardi. The Baseline model is implemented as a composition of individual model features and its components can be recombined and extended at will to compose new models. It is possible to build upon our work by adding or removing any desired components without needing to adjust a single line of code in the original implementation.


The scientific community wouldn’t accept a journal that loses a significant part of its past publications each year (and especially without any backup copies being available), yet we are losing the ability to run an ever-increasing amount of the code that produced our scientific conclusions. As elaborated in Nature: “Although computation plays a key and ever-larger part in science, scientific articles rarely include their underlying code, Rougier says. Even when they do, it can be difficult for others to execute it, and even the original authors might encounter problems sometime later. Programming languages evolve, as do the computing environments in which they run, and code that works flawlessly one day can fail the next.”

This is a problem much broader than the scientific community. Even best-practice code that currently is considered “sustainable” is rarely expected to work in a decade or two. Writing code that will work decades into the future requires not only new technological solutions but also a fundamental culture change in how software is developed, namely a thorough prioritization of long-term backward compatibility.

To solve the sustainability issue, we chose a programming language whose design and community have embraced such a culture. Furthermore, we have built our model implementation with backward compatibility in mind at every level and our aim is to write code and use technologies that still work decades from now.


In our model implementation, we focused on scientific rigor and have, therefore, put special focus on avoiding three common issues with scientific model implementations, namely reliability, composability, and sustainability. In doing this, we also started laying the technological foundation for others to produce reliable, composable, and sustainable agent-based models in the future.

A sustainable approach to scientific software means that our scientific conclusions are understandable and reproducible decades into the future. When sustainability is combined with compositionality, it opens the door for a scientific process where researchers truly “stand on the shoulders of giants” by building on each other’s code over decades instead of writing and re-writing the same disposable models from scratch over and over again. An additional focus on reliability would allow a whole new level of thoroughness in computational science since components of models would be scrutinized, improved, and recombined over decades. We hope that the Baseline model implementation can provide a small but valuable step in that direction.

As acclaimed MIT electrical engineering professor Gerald Jay Sussman points out, we usually think of computers as doing work for us, but the computing revolution is also an epistemological revolution:

The computer revolution is a revolution in the way we think and in the way we express what we think.

Yet, it is this helping with clarity in thinking that is most missing from scientific software. We hope that a more reliable, composable, and sustainable approach to software in science can contribute to a future where we can make software reach its full potential, both in terms of computing our scientific results and in terms of aiding in clarity of thinking.

1 From the paper: “The panels of the figure plot model-based predictions of the number of bug-fixing commits as function of commits for two extreme cases: C++ (most bugs) (...) and Clojure (least bugs).”

2 From the paper: “In this paper we have presented a novel Gödel Machine specification geared towards implementation. Our own approach so far has been to implement a virtual machine capable of running a specially invented programming language with self-referential constructs to attain the self-reflexivity needed for a Gödel Machine. The solver, searcher, and scheduler are then implemented in this language. It should be noted though, that a simpler existing technique can be used to attain self-reflexivity, namely by using meta-circular evaluators [1]. A meta-circular evaluator is basically an interpreter for the same programming language as the one in which the interpreter is written. Especially suitable for this technique are homoiconic languages such as Scheme [5], which is very close to λ-calculus and is often used to study meta-circular evaluators and self-reflection in programming in general [1, 6, 4].”

3 From the link: “Due to an error caused by a homemade data-analysis program, on page 1875, Geoffrey Chang and his colleagues retract three Science papers and report that two papers in other journals also contain erroneous structures.”