What a month… the end of February seems like an age ago, and life for everyone has changed beyond comprehension since then.
The dramatic rise of the COVID-19 pandemic has highlighted the crucial underlying importance of rigourous analytical methods to both understand what is going on, and to inform decisions around the best course of action.
Given this, we thought we would dedicate this, the April edition of our Royal Statistical Society Data Science Section newsletter, to highlighting features and articles on the Data Science of the COVID-19.
As always- any and all feedback most welcome! If you like these, do please send on to your friends- we are looking to build a strong community of data science practitioners- and sign up for future updates here:
Industrial Strength Data Science April 2020 NewsletterRSS Data Science Section
Data Science and COVID-19
One thing that became apparent pretty quickly as COVID-19 started spreading was that there was all sorts of data readily available on the extent of the pandemic and that this was reported on in all sorts of ways…
Identifying trusted resources that allow you to cut through the click-enticing headlines and put the figures in context has been crucial. Some of the sites we have found useful follow below-
- Johns Hopkins University of Medicine has been at the forefront, and their tracker has become widely used for understanding how COVID-19 has spread globally.
- Although the Financial Times have been just as guilty as others in their reporting of scientific research, their visualisation of deaths over time by country is a good example of putting the figures in context allowing for quick comparison of the efficacy of the different actions taken around the world.
- Another useful resource is our world in data with detailed descriptions of metrics and data sources and clean visualisations of the different exponential growth rates in different countries.
Forecasting and predictions- the story so far…
Scroll back a couple of weeks, and the UK Government’s initial response was focused on containment and ‘herd immunity’. Although this was heavily influenced by scientific experts including (amongst others) an experienced team from Imperial College London, it was at odds with much of the rest of the world. This generated consternation from a wide variety of commentators, with a widely read article (40m views and counting) from Tomás Pueyo perhaps summarising the concerns the best. Other trusted sources on the pandemic who are easily followed on twitter include: @CT_Bergstrom, @mlipsitch, @maiamajumder and @MackayIM.
These concerns were not unchallenged (an interesting counter-point to the Pueyo post is here from Thomas House) but became less relevant on Monday 16th March when the Imperial College COVID-19 Response team issued a new paper, apparently based on updated data from Italy, depicting a very different future, and urging for stronger action. Almost immediately the UK Government began the process of moving the country into the current state of lockdown to attempt to stem the spread of the virus.
‘Model Addiction’ and Best Practices
The UK Government has come in for a good deal of criticism for the decisions made, and the apparent clouding of responsibility behind the banner of ‘science’. Nassim Taleb (of Black Swan and Fooled by Randomness fame) wrote an opinion piece in the Guardian taking the government to task on their over-reliance on forecasting models without thoroughly understanding the underlying assumptions. Coronadaily makes a similar point in a thoughtful post about Model Addiction. (For anyone interested in the basics of how the underlying model works, try out this on youtube).
There are other aspects of the models informing policy which do not seem to adhere to best practices from a data science perspective. Code transparency and reproducibility are core components of good data science, and although Neil Ferguson and his team at Imperial are attempting to provide more details, it was disconcerting to hear that the approach was based on “thousands of lines of undocumented C”. A well formulated approach to reproducible research, such as that advocated by Kirstie Whitaker at the Turing Institute would go a long way to help.
Although the models used in the Imperial paper have had success historically (particularly in the lesser developed world with outbreaks such as Ebola) the area of infectious diseases has, unfortunately, been extremely underfunded. Thus the main people working on these models and who are best placed to advise policy are in a poorly resourced area of academia.
Regardless of the accuracy of a given predictive model, there will always be assumptions and alternatives, and another area in which the combined government/research group have foundered is in communicating this uncertainty. This is certainly far from straightforward, but one entity we could all learn from is the IPCC and the way they assimilate different approaches to modelling climate change impact, producing a number of well articulated alternative scenarios with clearly documented assumptions.
Martin Goodson, the RSS Data Science section chairman wrote a provocative post bringing together all these threads, advocating 6 rules for policy-makers, journalists and scientists.
Calls to action and collaboration
The increased attention and importance of ‘experts’ and mathematical modelling in general, has driven numerous ways for the community to participate. There are many calls to action and ways to get involved including:
- an extensive list of collaborations here
- Rapid Assistance in Modelling Pandemic being co-ordinated by the Royal Society
- And a kaggle competition co-sponsored by the NIH and the White House
In addition, a number of Data Science and AI related tools are being made available to the community for free:
Other Posts we Like
It’s sometimes hard to remember, but of course there are other things going on in the world- here’s a few posts on data science we enjoyed this month.
- Some useful pointers on remote working as it pertains to data science
- A great read on the ups and downs of autonomous driving, and the challenges still ahead
- A useful review of AI research over the last year, with a focus on Transformers
- An interesting new approach to understanding the complexity in Deep Learning architectures from Mehmet Suzen
- An intriguing concept- building 3d representations from text
- And if anyone needs a project…. playing automatic Pong on a Raspberry Pi
Upcoming Events and Section and Member Activities
Sadly, but not surprisingly, we have had to put on hold a number of our upcoming events. However, we are still keen to continue in an adapted way, and are looking to re-work our program in an online format- more details to follow. Many of the Data Science and AI meetups are doing the same so keep checking back to meetup.com for details.
Finally, it was great to see RSS Data Science Committee members Richard Pugh and Jim Weatherall make the dataIQ top 100 list.
Again, hope you found this useful. Please do send on to your friends- we are looking to build a strong community of data science practitioners- and sign up for future updates here: