On consequences of Goodhart’s law (2024)

This post was initially written in French, in the Winter 2021.

As Marilyn Strathern stated, Goodhart’s Law says that “when a measure becomes a goal, it ceases to be a good measure.” There are many economic applications, but this law also helps to understand the dangers of algorithmic decisions, or to explain the difficulty of using the data available since the beginning of the SARS-CoV-2 COVID-19 pandemic.

Goodhart’s Law, public policy evaluation, and Facebook

When a measure becomes a goal, it ceases to be a good measure” is the simplest formulation of Goodhart’s Law. Some also speak of Campbell’s law[i], Donald Campbell having stated “the more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor”. This corruption of metrics, or statistics, has been observed in many areas, particularly in health, justice and education.

The budgets of some schools were directly related to the results obtained on certain tests. Understandably, there is a strong incentive for teachers to start preparing for the test instead of teaching in a generalist manner, or even to quietly remove some children who might fail, as Strauss (2015) relates. By setting the pass rate as a goal to be maximized, this measure no longer means anything because it has induced incentive behaviors that no longer reflect the quality of the training. . This criticism can also be levelled at many actuarial associations around the world, which have introduced ‘professional exams’, which students with actuarial training can take. Many universities have shifted their focus from general education to offering preparation for these exams. Exams are no longer a ‘measure’ of students’ knowledge, as students find it more and more difficult to get out of the very academic framework of highly formatted exercises.

In healthcare, in the United States, Poku (2015) notes that beginning in 2012, under the Affordable Care Act, Medicare began imposing financial penalties on hospitals with “higher than expected” 30-day readmission rates. As a result, the average 30-day hospital readmission rate for fee-for-service beneficiaries has declined. Is this due to improved transition and care coordination efforts by hospitals, or is it related to an increase in “observation” stays during the same period? Often, setting a target based on a specific measure (in this case, the 30-day readmission rate) not only renders this measure completely useless for quantifying the risk of falling ill again, but also has a direct influence on other measures (in this case, the number of “observation” stays), which are difficult to monitor over time.

On the Internet, algorithms are increasingly asked to sort content, to judge the defamatory or racist nature of tweets, to see if a video is a deepfake, to give a reliability score to a Facebook account, etc. And many would like to know how these scores are created. Unfortunately, as Dwoskin (2018) noted “not knowing how [Facebook is] judging us is what makes us uncomfortable. But the irony is that they can’t tell us how they are judging us – because if they do, the algorithms that they built will be gamed,”[ii] exactly as Goodhart’s Law implies.

Lucas’ critique and Friedman’s thermostat

In the early 1970s, Robert Lucas explained that economic decision-makers should avoid “naively” relying on past statistics to predict the future behavior of agents, “given that the structure of an econometric model consists of optimal decision rules of economic agents, and that optimal decision rules vary systematically with changes in the structure of series relevant to the decision maker, it follows that any change in policy will systematically alter the structure of econometric models. The underlying idea was simply that agents adapt to the signals they receive.

As Charles Goodhart put it a few years later, “as the statistical relationships derived from the past depended on the particular kind of policy aim pursued by the authorities over the period considered, there would be no guarantee of their exact continuation in the future, should that policy be altered“. Or put another way, any observed statistical relationship (think of a strong correlation between two variables) will tend to disappear once pressure is put on it for control purposes. In fact, Charles Goodhart goes further than Robert Lucas in Goodhart (1975), suggesting that in many cases, agents will modify their behaviour to their advantage, even if this is at the expense of a possible collective well-being (we can think of the examples of education or health). This law was developed after observing how Margaret Thatcher’s government, in the 1980s, targeted the money supply to control inflation, but then found that monetary aggregates had lost their previously strong relationship with inflation. Inflation got out of control even when the government put strong pressure on the money supply.

Friedman (2003) used the thermostat analogy to explain the problem: the central bank is the thermostat of the economy. The policymaker has information ( ) that allows him to act on a control variable (c), to ensure that a variable (t) is close to the target value (t^*). And the forecast errors (t-t^*) must be uncorrelated with the information and the control c, if the expectations are rational, which may seem paradoxical. To use Farrell’s (2012) image, let us imagine a driver driving on a very hilly road at a constant speed. With ups and downs, this means that the driver perfectly doses his or her acceleration and braking to control the speed. However, seen from a distance, one would be tempted to say that the level of acceleration has no impact on speed, and that if we were to regress the speed on the acceleration of the car, the correlation between the two would be zero, in other words, here, accelerating and decelerating has no influence on speed…

Beyond control

The problem evoked by Goodhart’s law is also found in statistical modeling and in machine learning. In the latter case, the objective function to be optimized is related to the problem to be solved: in predictive regression algorithms, the prediction will be compared with the realization, and the sum of squares of the errors may be taken; for a classification or labeling problem (fraud/non-fraud), the number of classification errors will be counted (with potentially different cost functions if the two types of errors have different impacts). But often, we are not looking for a perfect model, without errors, we want a model that will predict well on new data! Therefore, we will avoid judging the predictive qualities of a model on the data that were used to build the model. We will then use part of the data to build the model, and another part to judge its predictive power, and see when the model starts to model noise, instead of looking for a strong link between the explanatory variables, and the variable of interest. This approach, generalized with the notion of “cross-validation”, makes it possible to separate the objective from the measurement.

In dynamic programming, Goodhart’s law is also well known (even if sometimes with another name). In dynamic systems, the agent is interested in a quantity x_t, for example a stock of goods that it will sell, and will seek to maximize a function of the form f(x_1,x_2, \cdots,x_T), from a given initial value, x0x_0x0, for example the sum (discounted or not) of all the xtx_txt, or may be only the terminal value x_T. The dynamics of (x_t) depends on a control variable, (u_t), which the agent can choose, knowing that xt+1x_{t+1}xt+1 will depend directly on utu_tut, and possibly on other quantities, like x_t. Bellman (1957) laid the mathematical foundations for solving this kind of problem, which we find generalized in reinforcement learning [ii], where the agent will have to explore, try different controls, in order to learn the way utu_tut will influence x_{t+1}. A recent example could be the control of a pandemic, where x_t would be the number of infected people, or the number of deaths, and utu_tut is a control lever, such as the number of tests offered, or the number of people allowed to go to work.

Of course, if xtx_txt measures the number of people testing positive at date ttt, a control variable that can easily lower x_t is the number of tests performed, but this will do nothing to slow the spread of the epidemic (which seems to be the natural goal). In fact, as noted by immunologist Anthony Fauci, director since 1984 of the National Institute of Allergy and Infectious Diseases in the United States, “if it looks like you’re overreacting, you’re probably doing the right thing” (quoted in Budryk (2020)), given feedback effects.
On quantification overdose and metrics control

Before returning to the pandemic of 2020, let us note that metrics are often introduced for the sake of transparency, as a semblance of democratic requirement, as the translation or concrete expression of a collective, but also often individual, objective. But all metrics, all statistics, often hide a much more complex reality. The example of unemployment statistics has probably been one of the most studied, as Desrosières (2008) shows. The cyclical statistical series on unemployment, published by INSEE, was eagerly awaited by politicians and the press, to the point of becoming the stated objective of several governments: “to bring down the unemployment figures”, as Errard (2015) recalls, for example. In wanting to give the illusion of controlling (and lowering) unemployment, pressure has been put on Pôle Emploi counselors to increase write-offs, to offer formulas for young people, to encourage taking two part-time jobs rather than one full-time job. Once they understood how the target metric was calculated, it was easy to artificially lower it. As Desrosières (2000) wrote, “quantitative indicators retroact on quantified actors,” as was mentioned in the mathematical formalization of dynamic optimal control. But more than the economist (or econometrist) Charles Goodhart, it is above all Donald Campbell who has sought to understand the way in which metrics distort behavior and lead participants to exploit them. And recently, Bruno & Didier (2013), or Muller (2018) show how to avoid Goodhart’s law to apply.

For as Charles Goodhart already noted, sometimes the explicit optimization of a system using a metric finally renders the metric unusable, because in the end it is no longer correlated with the objective. This is the case of many punishment and reward systems, which aim at creating incentives. One can think of class attendance, for example, where absences are punished in order to encourage students to work. By implementing such measures, class attendance increases, but students do not work. This is so even though there is a causal relationship between the measure and the objective, not just a correlation. As in the Facebook example, keeping one metric secret is an easy solution, another is to use multiple metrics.

Goodhart’s Law and the current health crisis

One of the objectives, regularly hammered out since March 2020, is that we must not saturate the health systems in all countries, the famous “flatten the curve“, evoked by Ferguson et al. (2020). It seemed essential to ensure, at all costs, that hospitals were not overburdened. In the spring of 2020, television news channels were giving, in continuous time, the number of people in intensive care, and the number of deaths in hospitals, measurements that will then be found in the form of graphs, updated every week, or even every evening, on dedicated websites. In this period of crisis, at the height of hospital saturation, the N.H.S. in England asked each hospital to estimate its bed capacity, in order to reallocate resources globally. Announcing that few beds were available was the best strategy to obtain more funding. This raises the question of how full the system really is, with each hospital having understood the rule and manipulating the measure as it sees fit. And just as worrying, while governments focused on hospitals (providing the official data used to construct most of the indicators), nursing homes experienced disastrous hecatombes, which took a long time to be quantified. Giles (2020) reports that in England, some doctors would have asked their elderly patients to think carefully about whether they really wanted to go to hospital, and to use the emergency services, or risk spending several weeks isolated from their families.

Statistics on the number of (officially) positive people have never ceased to baffle statisticians, because they are easily manipulated. We all remember Donald Trump’s statements at the beginning of the summer of 2020, echoed for example by Sheth (2020), who claimed that in order to reduce the number of ‘positive’ people, it was sufficient to test less. At the beginning of the pandemic, a clearly stated goal was to detect asymptomatic positives, and therefore targeted testing was necessary. The positive rate was then a sign that the targeting of tests was working. On the other hand, in order to monitor the evolution of the pandemic, it was essential to test as randomly as possible.

The crisis created by the SARS-CoV-2 COVID-19 pandemic, with its excessive quantification and real-time updated statistics, reminded us of the dangers of Goodhart’s Law. As Laroussie (2021) noted, the flood of statistics also allowed many of us to try the exercise, to try to predict the future evolution of the curves, but also to question the reliability of the data, and of their construction. Following the number of people supposedly positive without understanding who was tested, with what type of test, made no sense. The dynamics of the curves were then itself impacted by a feedback loop, resulting from decisions of policy makers, who had decided to test fewer elderly people when it was time to go back to work, for example. How do you make sound public policy decisions under these conditions? This is ultimately the profound question posed by Goodhart’s law, reminding us also that policymakers must learn to distinguish between the spirit of the law and the letter of the law – the roads to hell being paved with good intentions – by keeping a measured mind.

References

Bellman, R. (1957). Dynamic Programming. Princeton University Press.

Bruno, I. & Didier, E. (2013). Benchmarking. L’État sous pression statistique. Paris, La Découverte.

Budryk, Z. (2020). ‘If it looks like you’re overreacting, you’re probably doing the right thing‘The Hill, 15 mars 2020.

Campbell, D. T. (1975) Assessing the impact of planned social change. In G. M. Lyons

(ed.), Social Research and Public Policies: The Dartmouth/OECD Conference (pp. 3–45). Hanover, NH: Public Affairs Center

Charpentier, A., Elie, R. & Remlinger, C. (2020). Reinforcement Learning in Economics and Finance. arXiv:2003.10014

Daston, L. (2010). Why statistics tend not only to describe the world but to change it. The London Review of Books, 22:8.

Desrosières, A. (2000). La Politique des grands nombres : Histoire de la raison statistique. La Découverte.

Desrosières, A. (2008). Gouverner par les nombres. Presses de l’École des Mines.

Dwoskin, E. (2018). Facebook is rating the trustworthiness of its users on a scale from zero to one. Washington Post, 21 aout 2018,

Errard, G. (2015). Le contrôle des chômeurs peut-il faire baisser le chômage ?. Le Figaro, 26 août 2015,

Farrell, H. (2012). Milton Friedman’s Thermostat. Monkey Cage, 31 juillet 2012.

Ferguson, N. et al. (2020). Impact of non-pharmaceutical interventions to reduce covid-19 mortality and healthcare demand. Imperial College COVID-19 Response Team 9.

Friedman, M. (2003). The Fed’s Thermostat. The Wall Street Journal, 19 août 2003.

Giles, C. (2020). Goodhart’s law comes back to haunt the UK’s Covid strategy. Financial Times, 14 mai 2020,

Goodhart, C.A.E. (1975) Problems of monetary management: The UK experience. Papers in Monetary Economics, Volume I. Sydney: Reserve Bank of Australia.

Laroussie, D. (2021). Covid-19 : ces modélisateurs qui anticipent la pandémie. Le Monde, 5 janvier 2021,

Muller, J. Z. (2018). The tyranny of metrics. Princeton University Press.

Rodamar, J. (2018) There ought to be a law! Campbell versus Goodhart. Significance, 15:6.

Sheth, S. (2020). Trump says that ‘if we stop testing right now, we’d have very few cases’ of the coronavirus. Business Insider, 15 juin 2020

Strauss, V. (2015). How and why convicted Atlanta teachers cheated on standardized tests. The Washington Post, 1 avril 2015,

[i] Rodamar (2018) revisits the comparison between the two publications, Goodhart (1975) and Campbell (1975), which state the same principle, in very different contexts.

[ii] As described in Charpentier et al (2020).

Cite this blog post
Arthur Charpentier (2022, July 10). On consequences of Goodhart’s law. Freakonometrics. Retrieved June 29, 2024, from https://doi.org/10.58079/ovk2

On consequences of Goodhart’s law (2024)
Top Articles
Latest Posts
Article information

Author: Kelle Weber

Last Updated:

Views: 5331

Rating: 4.2 / 5 (53 voted)

Reviews: 84% of readers found this page helpful

Author information

Name: Kelle Weber

Birthday: 2000-08-05

Address: 6796 Juan Square, Markfort, MN 58988

Phone: +8215934114615

Job: Hospitality Director

Hobby: tabletop games, Foreign language learning, Leather crafting, Horseback riding, Swimming, Knapping, Handball

Introduction: My name is Kelle Weber, I am a magnificent, enchanting, fair, joyous, light, determined, joyous person who loves writing and wants to share my knowledge and understanding with you.