Well, January seemed to flash by in the blink of an eye- certainly the holiday period seems a long time ago already. All is not lost- the Winter Olympics seems to have crept up on us and is just about to start which will no doubt provide some entertainment and distraction…. as I hope will some thought provoking data science reading materials.
Following is the February edition of our Royal Statistical Society Data Science and AI Section newsletter. Hopefully some interesting topics and titbits to feed your data science curiosity. Check out our new ‘Jobs!’ section… an extra incentive to read to the end!
As always- any and all feedback most welcome! If you like these, do please send on to your friends- we are looking to build a strong community of data science practitioners. And if you are not signed up to receive these automatically you can do so here.
We are all conscious that times are incredibly hard for many people and are keen to help however we can- if there is anything we can do to help those who have been laid-off (networking and introductions help, advice on development etc.) don’t hesitate to drop us a line.
The committee is busy planning out our activities for the year with lots of exciting events and even hopefully some in-person socialising… Watch this space for upcoming announcements.
We do in fact have a couple of spaces opening up on our committee (RSS Data Science and AI Section) – if you are interested in learning more please contact James Weatherall
Anyone interested in presenting their latest developments and research at the Royal Statistical Society Conference? The organisers of this year’s event – which will take place in Aberdeen from 12-15 September – are calling for submissions for 20-minute and rapid-fire 5-minute talks to include on the programme. Submissions are welcome on any topic related to data science and statistics. Full details can be found here. The deadline for submissions is 5 April.
Meanwhile, Martin Goodson continues to run the excellent London Machine Learning meetup and is very active in with events. The next talk will be tomorrow (February 2nd) where Sebastian Flennerhag, research scientist at DeepMind, will give a talk entitled “Towards machines that teach themselves“. Videos are posted on the meetup youtube channel – and future events will be posted here.
This Month in Data Science
Lots of exciting data science going on, as always!
Ethics and more ethics…
Bias, ethics and diversity continue to be hot topics in data science…
- With the anniversary of the January 6th attack on the US Capital, there is commentary in the mainstream press about misinformation and how algorithms can both exacerbate and help curb the problem – see here in the Washington Post for example.
"The provocative idea behind unrest prediction is that by designing an AI model that can quantify variables — a country’s democratic history, democratic “backsliding,” economic swings, “social-trust” levels, transportation disruptions, weather volatility and others — the art of predicting political violence can be more scientific than ever."
- Security screening is another example where AI solutions comes with significant ethical and privacy tradeoffs – here, identifying concealed weapons at baseball stadia.
- Insightful research from the Montreal AI Ethics Institute about the debate in China on the societal and ethical implications of AI.
- Some dry but useful materials published regarding military use of AI:
- Ethical Principals for Artificial Intelligence published by the JAIC (part of the US Department of Defence). “DoD personnel will exercise appropriate levels of judgment and care, while remaining responsible for the development, deployment, and use of AI capabilities.” – it’s a start.
- Responsible AI Guidelines in Practice from the US Defence Innovation Unit.
- We’ve posted previously about bias in recruiting and hiring algorithms – so it’s welcome to see the Data and Trust Alliance‘s publication of their Algorithmic Bias Safeguards for Workforce: criteria and education for HR teams to evaluate vendors on their ability to detect, mitigate, and monitor algorithmic bias in workforce decisions
- There was an interesting recent recommendation from the UK Law Commission that users of self driving cars should have immunity from a wide range of motoring offences. This is increasingly relevant, as the various self-driving car providers move towards commercial propositions- Waymo (Google/Alphabet’s self-driving unit), for instance, recently announced its first commercial autonomous trucking customer (interesting background on how Waymo does what it does here)
"While a vehicle is driving itself, we do not think that a human should be required to respond to events in the absence of a transition demand (a requirement for the driver to take control). It is unrealistic to expect someone who is not paying attention to the road to deal with (for example) a tyre blow-out or a closed road sign. Even hearing ambulance sirens will be difficult for those with a hearing impairment or listening to loud music.”
- Thought provoking article in Wired about the changing dynamics of inter-personal communication when mediated through “auto suggestions” and other AI driven tools.
"People were more likely to roll with a positive suggestion than a negative one— participants also often found themselves in a situation where they wanted to disagree, but were only offered expressions of agreement. The effect is to make a conversation go faster and more smoothly" ... ... "This technology (combined with our own suggestibility) could discourage us from challenging someone, or disagreeing at all. In making our communication more efficient, AI could also drum our true feelings out of it, reducing exchanges to bouncing “love it!” and “sounds good!” back at each other"
Developments in Data Science…
As always, lots of new developments on the research front and plenty of arXiv papers to read…
- The research theme around making models more ‘efficient’ (whether that’s in terms of power consumption, model size, data usage etc) continues:
- Focusing on reducing computational cost for low power network-edge usage, ‘Mobile-Former‘ breaks all sorts of records
- Interesting research into reducing/simplifying inputs to neural net models looks promising … and they said feature engineering was dead;-)
- More progress on ‘few-shot learning’ (making accurate predictions with limited examples) – this time with ‘HyperTransformers‘
- Active Learning is an elegant approach to improving sample efficiency by focusing efforts in the most productive areas of the data space – however, watch out for outliers
- Then some more random research directions…
- An exploration of ‘Prospective Learning’ (as opposed to retrospective learning, ie learning from past experience) – how do you ‘learn’ a new object and put it in the right context?
- Transformers have been the ‘breakout hit’ of the 2020s so far – can good old CovNets compete?
- The AI Economist… optimal economic policy with two-level Deep Reinforcement Learning!
- Automated model search and training (or AutoML) has become relatively commonplace and accessible in supervised learning tasks – could it work in Reinforcement Learning? Yes, but it’s hard – AutoRL
“However, Automated Reinforcement Learning (AutoRL) involves not only standard applications of AutoML but also includes additional challenges unique to RL, that naturally produce a different set of methods. As such, AutoRL has been emerging as an important area of research in RL, providing promise in a variety of applications from RNA design to playing games such as Go.”
- Excellent summary of progress in ML and NLP in 2021 from Sebastian Ruder – well worth a read.
- Intriguing assessment of opportunities and research direction for Geometric and Graph ML from Michael Bronstein
- As always the industry powerhouses keep producing great work:
- Data2Vec from Facebook looks promising as a generalised approach to self-supervised learning
- And an excellent post from Jeff Dean SVP Google Research summarising AI research themes – bit of a must read
"Over the last several decades, I've witnessed a lot of change in the fields of machine learning (ML) and computer science. Early approaches, which often fell short, eventually gave rise to modern approaches that have been very successful. Following that long-arc pattern of progress, I think we'll see a number of exciting advances over the next several years, advances that will ultimately benefit the lives of billions of people with greater impact than ever before"
Real world applications of Data Science
Lots of practical examples making a difference in the real world this month!
- What seems like our now monthly update from ETH Zürich’s Robotic Systems Lab this time ‘robots learning to hike‘ (cue robot-dog interaction videos…).
- In order for robots to take action, they have to understand the world around them, a far from trivial task: a couple of useful developments in this space using large language models to understand the relationship between objects and relevant actions, from MIT and also from Carnegie-Mellon/ Google Brain
“In an effort to solve this problem, MIT researchers have developed a model that understands the underlying relationships between objects in a scene. Their model represents individual relationships one at a time, then combines these representations to describe the overall scene. This enables the model to generate more accurate images from text descriptions, even when the scene includes several objects that are arranged in different relationships with one another.”
- More examples of robots leaving the lab … this time from Google X (how about ‘Chief Robot Officer’ for a job title….)
- And John Deer is putting self driving tractors ‘in the fields’ so to speak
“It's a monumental shift,” says Jahmy Hindman, Deere’s chief technology officer, of the new machine, revealed at the 2022 Consumer Electronics Show in Las Vegas. “I think it's every bit as big as the transition from horse to tractor.”
- Useful review in Nature of current AI use cases in Health and Medicine experimental data points and actively guiding the most productive areas for future research.
- A number of applications have struggled to gain traction amongst healthcare professionals- interesting analysis of why that is happening and what to do about it
- Are we all about to lose our jobs?! A new coding assistant for data science…
- Pouring a little cold water on our progress with chat-bots…(our very own Martin Goodson has done the same here)
"'Is it safe to walk downstairs backwards if I close my eyes?' GPT-3: Yes, there is nothing to worry about. It’s safe because the spiral stairs curve outwards, it will make your descent uncomfortable. I asked the same question three more times and got three authoritative, confusing, and contradictory answers: GPT-3: That depends. Do you have a TV? GPT-3: No, it is not safe. GPT-3: Yes, it is safe to walk downstairs backwards if you close your eyes."
- Quick summary of how AI is increasingly used for foreign language subtitles
- Great in-depth article on how ‘AI conquered poker’ (I guess solvers are officially ‘AI’ now…)
“You’re playing a pot that’s effectively worth half a million dollars in real money,” he said afterward. “It’s just so much goddamned stress.”
- Amazon is increasingly good at getting AI into production in real world situations, focusing on the outcomes not necessarily the underlying research or approach.
- Firstly they have announced how they use Deep Learning to reduce packaging waste
- Then we have their upcoming roll-out of their first ever physical apparel store in Los Angeles which is set to include all sorts of ML based real-time recommendations. Our friends at ‘The Batch‘ previously highlighted a number of areas of Amazon research that will likely be incorporated, including Outfit-VITON for trying clothes on virtually, Visio-linguistic attention learning for honing product search, and category based subspace attention network for product pairing. Impressive stuff.
How does that work?
A new section on understanding different approaches and techniques
- For those with a programming background, vectorisation may come naturally, but it can be hard to think through if you are new to it … it does speed things up though, so worth digging into: good python tutorial here.
- We are a section of the Royal Statistical Society, so it’s good to see a bit of stats once in a while- ‘Six Statistical Critiques That Don’t Quite Work‘
- If you’ve not come across Streamlit, you should definitely check it out – very quick and easy way to create apps in python.
- JAX is a relatively new but very scalable framework for numerical methods (bayesian sampling etc) developed at DeepMind – definitely worth exploring
- It’s always good to understand at a low level how different modelling approaches work. If you’re unclear on the fundamentals of neural networks, this is an excellent introductory guide from Simon Hørup Eskildsen (love that it’s called ‘Napkin Math’!)
"In this edition of Napkin Math, we'll invoke the spirit of the Napkin Math series to establish a mental model for how a neural network works by building one from scratch"
- I know, we’ve had a fair few ‘this is how Transformers work’ posts over the last few months… but they are so central to many of the image processing and NLP improvements over the last few years that checking out another good one couldn’t hurt..
"It was in the year 2017, the NLP made the key breakthrough. Google released a research paper “Attention is All you need” which introduced a concept called Attention. Attention helps us to focus only on the required features instead of focusing on all features. Attention mechanism led to the development of the Transformer and Transformer-based models.."
- Finally, variational autoencoders... unsupervised learning is an area of data science that can sometimes feel neglected, and variational autoencoders are a fantastic tool in the unsupervised learning arsenal, leveraging the power of Deep Learning.
- For anyone interested in learning more about how DeepMind does what it does, I definitely recommend Hannah Fry‘s podcast- the last episode, ‘A breakthrough unfolds‘ tells the story well of how they went from winning at Go to predicting protein structures…
How to drive analytics and ML into production
- More commentary on why a data driven (rather than model driven) approach to ML problems often leads to better outcomes … the Andrew Ng philosophy!
- “Real-Time” machine learning means different things to different people- useful post talking through the definitions, challenges and some options for solving them in production from Chip Huyen
- Not sure about this one- ‘Offline Replay Experimentation‘. It sounds incredibly useful (running the equivalent of AB tests on offline data) but I need to dig more into how it works…
- Useful practical tips on dealing with data drift in production ml systems from Elena Samuylova
- Another useful tutorial on approaches to testing for ML models … to stop data drift before it occurs!
- Managing models and experiments is a whole can of worms on it’s own… so when DeepMind releases their framework for managing model experiments (xmanager) it’s worth playing around with!
- SQL is code too!
- Ian Ozsvald (pyData London founder) writes an excellent newsletter on all things python which is well worth subscribing to. Recent updates include commentary on Kubernetes and TDD/linting.
- Finally a few ‘life as a data scientist’ tips and tricks:
- First of all, a review of the job search process in AI research …
- Fun twitter thread on ‘what to spend your learning allowance on‘
- Thinking about joining a company- what are some of the ‘red-flags’ to look out for?
- And finishing with a thoughtful review on the first year of being a data science manager.
"I’m not a management expert, but I did try really hard during my first year managing and I’ve since spent time digesting the experience. My hope is that others will find a few of the things I learned useful when they’re at the start of their own management journey.”
Bigger picture ideas
Longer thought provoking reads – lean back and pour a drink!
- Good article in the guardian digging into recent trends in scientific research where research direction is guided by machine learning models: “Are we witnessing the dawn of post-theory science?”
"Isaac Newton apocryphally discovered his second law – the one about gravity – after an apple fell on his head. Much experimentation and data analysis later, he realised there was a fundamental relationship between force, mass and acceleration. He formulated a theory to describe that relationship – one that could be expressed as an equation, F=ma – and used it to predict the behaviour of objects other than apples. His predictions turned out to be right (if not always precise enough for those who came later). Contrast how science is increasingly done today."
- Two interesting articles on what it means to ‘understand’ and whether or not our current versions AI truly do so:
"These schemas were the subject of a competition held in 2016 in which the winning program was correct on only 58% of the sentences — hardly a better result than if it had guessed. Oren Etzioni, a leading AI researcher, quipped, 'When AI can’t determine what ‘it’ refers to in a sentence, it’s hard to believe that it will take over the world.'”
- Lex chatting with Yann Lecun – has to be worth a listen even if only to figure out what he means by ‘dark matter of intelligence’
- Ok, so this is more physics than data science, but it’s pretty cool and does highlight how the world moves in mysterious ways…. Dice Become Ordered When Stirred, Not Shaken
"Repeatedly tap on a box of marbles or sand and the pieces will pack themselves more tightly with each tap. However, the contents will only approach its maximum density after a long time and if you use a carefully crafted tapping sequence. But in new experiments with a cylinder full of dice vigorously twisted back and forth, the pieces achieved their maximum density quickly. The experiments could point to new methods to produce dense and technologically useful granular systems, even in the zero gravity environments of space missions."
Practical Projects and Learning Opportunities
As always here are a few potential practical projects to keep you busy:
- Fancy contributing to an open source project- useful thread on reddit highlighting a few candidates
- Definitely on point – using a GAN to create an NFT?
Although there are still some Covid restrictions in place, the UK Government has eased a number of rules: to be fair, it’s quite hard to keep track. Omicron is far from gone though…
- Official recorded Covid cases in the UK are certainly decreasing – however, with changing policy on testing, and constraints on testing capacity, it’s clear that recorded cases are not necessarily as representative of Covid infections as they once were.
- Last month, we called the ONS Coronavirus infection survey‘s estimate of 1 in 25 people with Covid in England, “astonishing”, given that it was 1 in 1000 back in May. Well, as the government eases restrictions, the latest ONS Coronavirus infection survey estimates 1 in 20 people in England have Covid. I guess it’s still astonishing…
- Thankfully, Covid hospitalisations do seem to be falling
Updates from Members and Contributors
- Kevin O‘Brien highlights a couple of excellent events:
- The inaugural SciMLCon (of the Scientific Machine Learning Open Source Software Community) will take place online on Wednesday 23rd March 2022. SciMLCon is focused on the development and applications of the Julia-based SciML tooling -with expansion into R and Python planned in the near future.
- JuliaCon which will be free and virtual with the main conference taking place Wednesday 27th July to Friday 29th July 2022. (Julia is a high performance, high-level dynamic language designed to address the requirements of high-level numerical and scientific computing, and is becoming increasingly popular in Machine Learning, IOT, Robotics, Energy Trading and Data Science)
- Harald Carlens launched a very useful Discord server to help facilitate easier matchmaking for teams in the competitive ML community spanning across Kaggle and other platforms (AIcrowd/Zindi/DrivenData/etc), to go along with the mlcontests.com website. There are over 250 people on the server already and the audience is growing daily. More info here
- Prithwis De contributed as chair at the 6th International Conference on Data Management, Analytics & Innovation, held during January 14-16, 2022.
- Sarah Parker calls out the work of Professor Simon Maskell, (Professor Autonomous Systems, and Director of the EPSRC Centre for Doctoral Training in Distributed Algorithms at University of Liverpool), who has developed a Bayesian model used by the UK Government to estimate the UK’s R number – the reproduction number – of COVID -19. More info here.
A new section highlighting relevant job openings across the Data Science and AI community (let us know if you have anything you’d like to post here…)
- Holisticai, a startup focused on providing insight, assessment and mitigation of AI risk, has a number of relevant AI related job openings- see here for more details
- EvolutionAI, are looking for a machine learning research engineer to develop their award winning AI-powered data extraction platform, putting state of the art deep learning technology into production use. Strong background in machine learning and statistics required
- AstraZeneca are looking for a Data Science Training Developer – more details here
- Cazoo is looking for an experienced Principal Data Scientist to lead technical development of a wide range of ML projects – more details here (I’m biased… but this is an amazing job for the right person 😉 )
Again, hope you found this useful. Please do send on to your friends- we are looking to build a strong community of data science practitioners- and sign up for future updates here.
The views expressed are our own and do not necessarily represent those of the RSS