Happy New Year! I hope you have all had a festive holiday period and found some time to catch up on those deep learning research papers you had been meaning to dig into… Fingers crossed 2021 proves better than 2020…. as a start, how about welcoming in the new year with a few curated data science reading materials!
Following is the January edition of our Royal Statistical Society Data Science Section newsletter. Hopefully some interesting topics and titbits to feed your data science curiosity …
As always- any and all feedback most welcome! If you like these, do please send on to your friends- we are looking to build a strong community of data science practitioners.
Industrial Strength Data Science January 2021 NewsletterRSS Data Science Section
A new year but sadly not much change in the story – however with vaccinations now actively happening, an end does seem in sight, even if it seems tantalisingly far away.
- A new strain of COVID-19 materialised in south east England. Although virus mutations happen all the time, this one was important as the strain appears significantly more transmissible. Its prevalence in positive tests appears strongly linked to dramatic rises in new cases.
- An imperial study modelling the case rates concludes that the new strain “has a transmission advantage of 0.4 to 0.7 in reproduction number compared to the previously observed strain.”
- This report also highlights how statisticians and data scientists still need to work on the art of communication….
"Using whole genome prevalence of different genetic variants through time and phylodynamic modelling (dynamics of epidemiological and evolutionary processes), researchers show that this variant is growing rapidly."
- And in case you missed it, the UK government managed to lose case data again… Clearly not learned from the last time.
- There has however been fantastic news on the vaccine front with vaccinations now rolling out around the world. As discussed in our previous newsletter, the mRNA approach used in the Moderna and BioNTech vaccines is huge breakthrough- there is an excellent interview on the Andreesen-Horowitz A16z podcast with Stephane Bancel the Moderna CEO where he goes through the development process in detail, including how they generated the vaccine blueprint within 48 hours of receiving the virus DNA sequence.
We used to grow our vaccines, now we can “print” them.
- For more detail on what the ‘code’ of the mRNA vaccine actually means check out this excellent post from Bert Hubert.
- There was more great news on the vaccine front, with the Oxford-AstraZeneca vaccine gaining approval in the UK. As discussed in the last newsletter, the phase 3 trial results were somewhat inconclusive regarding the optimal dosing regime, but this good NY Times summary talks through the rational behind the approved dosing approach. The approach being taken by the UK government to the vaccine roll-out is not without pushback, however.
- More innovation on the testing/identification of Covid front:
- Although based on a small study, the Oura smart ring could identify Covid infections using it’s continuous monitoring capabilities.
- Researchers at MIT have found a way of identifying potential Covid infection from the sound of a cough.
- Moving from prototype study to trusted repeatable performance out of sample is still difficult however, as the Lancet and the Guardian discuss.
We are all conscious that times are incredibly hard for many people and are keen to help however we can- if there is anything we can do to help those who have been laid-off (networking and introductions help, advice on development etc.) don’t hesitate to drop us a line.
As we announced just before Christmas, we are all incredibly excited about our upcoming fireside chat with none-other than Andrew Ng on February 10th – save the date! We want to make the discussion as relevant to our community as possible, so do please send any topics or questions on becoming an AI technical leader to Martin (@martingoodson).
As we previously announced we are looking forward to our first AI Ethics Happy Hour event – details to follow.
The joint RSS/British Computing Society/Operations Research Society discussions on data science accreditation are picking up again and we are actively involved in these. We also hope to be posting our own version of a basic data science curriculum soon- will keep you posted.
Martin Goodson, our chair, continues to run the excellent London Machine Learning meetup and continues to be very active in with virtual events. The next event is on 13th January where Jakob Foerster from FacebookAI will discuss Zero-Shot (Human-AI) Co-ordination. Videos are posted on the meetup youtube channel – and future events will be posted here.
Elsewhere in Data Science
Lots of non-Covid data science going on, as always!
Ethics and more ethics…
Bias, ethics and diversity continue to be hot topics in data science…
- Google caused a publicity storm by ousting Timnit Gebru from her position as technical co-lead of Google’s Ethical Artificial Intelligence Team. A good summary of what happened is here.
- Gebru has been incredibly influential in researching algorithmic bias – Rachel Thomas of FastAI details her accomplishments in this compelling thread, including the groundbreaking gendershades work which highlighted the dramatic differences in error rates of facial recognition programs for women by shades of skin colour.
- MIT Technology review digs into the research paper at the eye of the storm.
- The storm continues to reverberate through the AI community
- In case we forget about how important it is to think through these critical issues, the NY Times reported that as As China Tracked Muslims, Alibaba Showed Customers How They Could, Too
- Even the venerable Co-op in the UK is getting in on the act.
- Of course not all implementations of AI result in unintended or nefarious consequences, but making sure we learn from those that do is critical. This piece from MIT Technology Review looks at a team of lawyers who are actively pursuing these types of cases, while the Partnership On AI has introduced a database to document When AI Systems Fail
Real world data science applications …
All sorts of great applications of data science and machine learning, regularly coming to light.
- AirBnB have released an elegant new approach to dealing with positional bias in search rankings. If you are learning preferences from historical data, how do you deal the fact that actions (clicks, likes etc) will be influenced by the position rank of the given item?
This creates a feedback loop, where listings ranked highly by previous models continue to maintain higher positions in the future, even when they could be misaligned with guest preferences.
- Interesting article from the National Bureau of Economic Research showing how corporate disclosures have changed over time to take advantage of known machine readable biases.
- Impressive results from Google’s Loon project, where they have used AI to keep their balloons aloft for over 300 days in a row.
- While the Cruise team announced their breakthrough in self driving cars, Uber decided enough was enough and is selling off their self-driving unit. Note this was not a small initiative… Uber Advanced Technology Group (which was its autonomous vehicle unit) numbered 1200 employees…
Developments in Data Science…
As always, lots of new developments…
- The researchers at Deep Mind have been busy again… As we mentioned last time, AlphaFold has developed a solution to a 50 year old grand challenge in biology But what is it and how does it work? Fabian Fuchs does a good job of breaking it all down in this enlightening post
"In short: this module is a neural network that iteratively refines the structure predictions while respecting and leveraging an important symmetry of the problem, namely that of roto translations."
- In what may prove to be even more groundbreaking, Deep Mind have also released Mu-Zero, their approach to Deep Reinforcement Learning when the “rules of the game” are unknown. This bbc post summarises the work and gives insight into practical applications such as video compression.
"The results of DeepMind's work are quite astounding and I marvel at what they are going to be able to achieve in the future given the resources they have available to them"
- If you feel like you missed out on AI research developments last year (what with all that pandemic stuff…), this is an excellent summary of key papers, from Louis Bouchard, and another one based on on Neurips 2020 papers from Prabhu Prakash Kagitha – both well worth a read.
Getting AI into production…
Still one of the biggest obstacles…
- Useful summary of all the different aspects of a robust MLOps architecture, by Alejandro Saucedo, with a particular focus on monitoring which is so critical and so easy to miss. Another quick guide to monitoring here.
- Interesting update from Facebook AI talking through their plans to make their AI infrastructure and modules inter-operable
- Useful step by step summary from Michael Skarlinski of learnings made by the Weight Watchers data science team in developing their MLOps framework, Primrose.
- Keeping on-top of data quality in an automated way is a key requirement for successful ML and AI implementation. Anomalo looks like a useful option in this space.
- How important is ‘real-time’ to your model… or even what does ‘real-time’ really mean? Good breakdown of the different aspects of real-time machine learning and how to implement them from Chip Huyen
- Best practices in this space are still hard to find although Andrew Ng highlights this as a key area for opportunity in 2021:
"While building good models is important, many organizations now realize that much more needs to be done to put them into practical use, from data management to deployment and monitoring. In 2021, I hope we will get much better at understanding the full cycle of machine learning projects, at building MLOps tools to support this work, and at systematically building, productionizing, and maintaining AI models."
Practical Projects and Learning Opportunities
As always here are a few potential practical projects to while away the socially distanced hours:
- Murphy’s classic book, “Probabilistic Machine Learning: an Introduction”, just got a new edition
- Elegant tutorial on Markov Chains using real world data from Carolina Bento
- Excellent hands on example of tuning GPT-2 language models to create a bot based on the legal writings of the late great Ruth Bader Ginsburg from Alex Orona.
- Identifying chess pieces on a board with Convolutional Neural Networks.
- A useful new open source language data set (‘the Pile’) to test out all your new found NLP skills… This data set is large enough to train models such as GPT-3, so a big step towards true open versions of large language models.
Updates from Members and Contributors
- Mani Sarkar has been busy updating his NLP Profiler python library– he has a useful notebook working through the different features here.
- Kevin O’Brien draws our attention to JuliaCon 2021 which will be free and virtual with the main conference taking place Wednesday 28th July to Friday 30th July 2021 (workshops will be held the week before). Julia is a high performance dynamic language designed to address the requirements of high-level numerical and scientific computing, and is becoming increasingly popular in Machine Learning and Data Science. Stay up to date on further announcement by joining the JuliaCon 2021 event page on LinkedIn.
Again, hope you found this useful. Please do send on to your friends- we are looking to build a strong community of data science practitioners- and sign up for future updates here:
The views expressed are our own and do not necessarily represent those of the RSS