Averting statistical tragedies with better data
In a July 2020 article for Brookings Papers on Economic Activity, Tristan Reed and I showed that, contrary to expectations, Covid-19 deaths per capita were much lower in poorer countries than in richer ones. Readers immediately countered that this finding must be due to mismeasurement or a lack of data for these countries. Our result has since withstood scrutiny and the test of time, but the initial response was revealing: statistics originating in developing countries tend to be met with suspicion (and often are dismissed outright).
Is this bias justified? In a recent paper for the Journal of Economic Perspectives, "Why is Growth in Developing Countries so Hard to Measure?" my co-authors and I find that it is not. Notwithstanding a few highly publicised cases of data manipulation, growth estimates from developing countries are as reliable as those from advanced economies, on average.
To be sure, there is no single, well-defined metric for judging the quality of a country's growth estimates. But the traditional approach in the economics literature is to look for correlation between estimates obtained using different data sources. Employing this method, we compared estimates based on three distinct sources: the System of National Accounts (SNA), household survey data, and newly available satellite data (mainly on nighttime light, and occasionally on vegetation).
Such comparisons show that the differences in average growth rates across the three data sources are small, typically around 1.5 percentage points or less. Though gaps of this magnitude might be considered large for high-income countries (where annual growth rates have recently been in the 3-4% range), they are relatively modest for many fast-growing developing economies. An average error margin of about 1.5 percentage points does not seem grave in view of the uncertainty surrounding these estimates.
Moreover, a new database assembled by the International Monetary Fund and the World Bank shows no evidence that SNA data from low-income countries are systematically manipulated for political reasons. Interestingly, it is middle-income countries that seem more problematic, suggesting that politically motivated manipulation may be more feasible above some threshold level of statistical capacity and sophistication. To a certain extent, these findings are a reason for optimism, because they show that statistics generated in developing countries are indeed meaningful, and that it would be ill-informed to dismiss them out of hand.
Nonetheless, poorer countries could obviously benefit from greater statistical capacity. As Shanta Devarajan of the World Bank argued in an influential 2013 article, low-income countries, especially in Africa, suffer from a "statistical tragedy". Owing to a lack of resources for data collection, management, and dissemination, and to an absence of coordination among relevant agencies and stakeholders, policymakers in many low-income countries must resort to using old data and outdated methods.
In fact, several highly publicised cases of unreliable growth estimates resulted from outdated methods rather than politically motivated manipulation. And even in these instances, local authorities seem to have done a miraculous job of producing relatively reliable numbers, given the constraints they faced.
The question is how developing economies can increase their statistical capacity. This is easier said than done. As my co-authors and I explain in the Journal of Economic Perspectives paper:
"International efforts to support national statistics offices are often focused on one-off data collection activities with limited attention to building the skills and knowledge of national statisticians or to developing data systems. Collecting data is a relatively well-defined task with a clear end date that usually wraps up with a completion report. Investments to improve statistical capacity are much more difficult to monitor, less certain to succeed, time-consuming, and often lacking clear outcome deliverables."
Given the high costs and uncertainty associated with such investments, they are unlikely to be pursued at a time when governments are already under fiscal pressure from the Covid-19 crisis.
Fortunately, there is a more feasible approach: harness technology and use new data sources to alleviate resource constraints. An explosion of new and publicly available data -- from web scraping, Google searches, digital transactions, mobile-phone metadata, social-media usage, and satellite data -- has helped researchers estimate important economic variables at a lower cost.
Such data sources proved especially useful during the pandemic, allowing economists to obtain much-needed information about poverty, inflation, businesses' prospects, and people's well-being -- all in real time. And because these data can be obtained faster and at much lower cost than with traditional methods (such as door-to-door in-person surveys), they are a source of hope for resource-starved developing countries.
But these new data are not without limitations. While traditional data sources seek complete coverage of the relevant population, newer data tend to suffer from selection problems. Although they can yield massive samples and be very timely, they are rarely representative of a country's population. They are best used to complement, not replace, traditional data.
It may seem naive to appeal for better data in the midst of a pandemic, when many low-income countries have still not managed to secure life-saving vaccines for their populations. Yet one of the biggest risks of the pandemic is that it will cause important development agendas to be neglected or postponed indefinitely. Just as policymakers must insist on additional measures to empower women and increase investments in human capital, so must they pursue greater statistical capacity. We cannot improve what we cannot measure. ©2021 Project Syndicate
Pinelopi Koujianou Goldberg, a former World Bank Group chief economist and editor-in-chief of the American Economic Review, is Professor of Economics at Yale University.