The UK AI Roadmap: your expert views needed

This year will see the first version of an AI Strategy from the UK government. Led by the Office for AI, this strategy will build on the AI Roadmap (which was published in January 2021).

If you work in data science or AI, the AI strategy will affect your career.

The Data Science Section of the  Royal Statistical Society will ensure the voice of technical practitioners is heard – and that decisions are made with your interests in mind. 

However, we cannot do this without your help. Please fill in the UK Artificial Intelligence Strategy Survey and give us your expert views. It will take less than 5 minutes to complete.

This is a great opportunity. The government is attempting to embrace data science and AI. Help us make sure the strategy focuses on the areas that will really make a difference.

Many thanks!

RSS plans for engaging with the development of the government’s AI strategy

The government’s AI Roadmap, published at the start of 2021, sets the direction for the development of a national AI strategy.
As the roadmap is developed into a strategy, there is a vital role for the RSS to play in setting out the role that statistics and data science have to play in the wider national strategy. There are two senses in which this perspective is important:

  • The RSS as a membership organisation has access to the experience and expertise of over a thousand professional data scientists who, as practitioners, focus on the types of issues which are central to the roadmap – there is an opportunity to strengthen the strategy by ensuring that these experiences are represented in the development of the strategy.
  • As they stand, the government’s plans do not show an appreciation of the role that statistics will have to play in the strategy. The disciplines of statistics and data science are closely related, and the role of statistics – as well as data science – should be reflected in the AI strategy.

The RSS – led by our Data Science Section – is kicking off a programme of work to shape the AI strategy and ensure that both the discipline of statistics and the experience of working data scientists are reflected in the strategy.

To start this work, we are planning to highlight a number of questions concerning the practice of data science in order to further inform the roadmap. We will be launching a survey soon to help gather intelligence from the community to support this work.

We are also planning a series of events and roundtables to discuss these issues. These events will help us share knowledge and refine our thinking, as well as engage directly with government stakeholders.

This is an important point in the development of AI as countries seek to position themselves as leaders in the field. The UK is well-positioned to lead on many areas of AI – but the strategy must be right and we hope to be able to help shape the strategy in the coming months.

RSS Chief Executive Stian Westlake said:

“The RSS welcomes the development of a national AI strategy, but it is important that the views of practitioners are represented in the process. With our strong data science section, the RSS is uniquely placed to access the perspective of practitioners and there is a vital role for us to play in ensuring that this is represented as the strategy develops.”

June Newsletter

Hi everyone-

It’s a bank holiday weekend – again – so that means it’s June and hopefully some warmer weather as May has definitely not delivered on that front … perhaps a few curated data science reading materials might prove useful for sunshine in the garden?

Following is the June edition of our Royal Statistical Society Data Science Section newsletter. Hopefully some interesting topics and titbits to feed your data science curiosity … We are continuing with our move of Covid Corner to the end to change the focus a little.

As always- any and all feedback most welcome! If you like these, do please send on to your friends- we are looking to build a strong community of data science practitioners. And if you are not signed up to receive these automatically you can do so here.

Industrial Strength Data Science June 2021 Newsletter

RSS Data Science Section

Committee Activities

We are all conscious that times are incredibly hard for many people and are keen to help however we can- if there is anything we can do to help those who have been laid-off (networking and introductions help, advice on development etc.) don’t hesitate to drop us a line.

We are now ‘two for two’ on our ‘Fireside chat’ series! Following on from our fantastic discussion with Andrew Ng, Giles Pavey hosted an engaging and enlightening conversation with with Anthony Goldbloom on May 20th. Anthony is founder and CEO of Kaggle (now a Google company), the world’s largest data science and machine learning community. There was a great deal of insight into the evolution of data science over the 10 years Kaggle has been running as well as lots of audience questions. We will distill the session down and publish a summary shortly.

We will soon be releasing a survey to our readers and members focused on the UK Government’s proposed AI Strategy. We are passionate about making sure the government focuses on the right things in this area, and feel like true Data Science and AI practitioners need to feed into this process. So when you see the survey, do please take the time to fill it out if you can!

The full programme for this year’s RSS Conference, which takes place in Manchester from 6-9 September, has been confirmed.  The programme includes keynote talks from the likes of Hadley Wickham, Bin Yu and Tom Chivers.  Registration is open with early-bird discounts available until Friday 4 June. 
In addition, the RSS now has a new accreditation – Data Analyst.

Data Analyst is a registered form of professional membership status that provides formal recognition of a member’s statistical training and work-based experience at entry level

Martin Goodson, our chair, continues to run the excellent London Machine Learning meetup and is very active in with virtual events. The last event was on 24th May where Christian Szegedy, machine learning and AI researcher at Google Research, gave a talk titled ‘The Inverse Mindset of Machine Learning‘. Videos are posted on the meetup youtube channel – and future events will be posted here.

This Month in Data Science

Lots of exciting data science going on, as always!

Ethics and more ethics…
Bias, ethics and diversity continue to be hot topics in data science…

The real danger wasn’t “Deep Fakes.” The real danger is cheap fakes, fakes that can be produced quickly, easily, in bulk, and at virtually no cost
  • Regulators are rightly becoming increasingly active in an attempt to combat these issues. This HBR article helps map out what organisations need to know to be prepared.
  • We all know how complex ML models are becoming and the scale at which some of them now operate, and so we have to be open to the fact that mistakes will happen. The critical question becomes: what do you do about it when the issue surfaces? Twitter has taken a positive and transparent approach to dealing with some of their previous bias related issues in automated cropping, releasing a detailed and technical analysis about why it was happening and the steps they are taking to remove the bias:
We want to thank you for sharing your open feedback and criticism of this algorithm with us. As we discussed in our recent blog post about our Responsible ML initiatives, Twitter is committed to providing more transparency around the ways we’re investigating and investing in understanding the potential harms that result from the use of algorithmic decision systems like ML.
  • Really interesting discussion on the Kara Swisher’s Sway podcast with Daniel Kahneman (renowned behavioural economist – “Thinking Fast and Slow”) delving into why we require much higher accuracy from computers and technology than from humans before we are willing to trust them.
  • And in a similar vein, this is thought provoking– does more data necessarily mean better decision making?
  • Less specifically focused on bias and ethics, but really interesting commentary from Benedict Evans on Amazon and how much it really knows about what it sells, touching on how much of a responsibility a platform has for moderation of its own recommendation content.
Of Amazon’s top 50 best-sellers in “Children's Vaccination & Immunisation”, close to 20 are by anti-vaccine polemicists, and 5 are novels about fictional pandemics

Developments in Data Science…
As always, lots of new developments…

Real world applications of Data Science
Lots of practical examples making a difference in the real world this month!

How does that work?
A new section on understanding different approaches and techniques

Getting it live
How to drive ML into production

"For me, teaching this course was an unusual experience. MLOps standards and tools are still evolving, so it was exciting to survey the field and try to convey to you the cutting edge. I hope you will find it equally exciting to learn about this frontier of ML development, and that the skills you gain from this will help you build and deploy valuable ML systems." Andrew Ng

The Art of Visualisation
Making data science look right..

Practical Projects and Learning Opportunities
As always here are a few potential practical projects to keep you busy:

Covid Corner

Again, more positive progress in the UK on the Covid front with over 40m people now having received their first vaccine dose and over 25m fully vaccinated. However, the new variant originating in India is cause for concern.

 Experts gave a median estimate of 30,000 Covid deaths by the end of the year, whereas the non-experts said 20,000. The truth was around 75,000

Updates from Members and Contributors

  • Harald Carlens has put together a very useful comparison of cloud GPU services and pricing – definitely check it out if you are using deep learning in the cloud.
  • Lucie Burgess would like to announce an interesting set of discussions around the provenance and legality of automated decisions taking place on June 15th and June 22nd. Helix Data Innovation are running the sessions on behalf of the PLEAD project (King’s College London, University of Southampton, with partners Experian, Roke and Southampton Connect) – sign up here for what should be a good discussion on a very relevant topic
  • Kevin O’Brien highlights the upcoming UseR! 2021 conference on 5-9th of July – a must see for those R users out there

Again, hope you found this useful. Please do send on to your friends- we are looking to build a strong community of data science practitioners- and sign up for future updates here.

– Piers

The views expressed are our own and do not necessarily represent those of the RSS

May Newsletter

Hi everyone-

It’s a bank holiday weekend, so it’s probably May and another month has flown by… I hope the excitement of venturing out from our cave-like lockdown has not proved too overwhelming … perhaps a few curated data science reading materials might prove relaxing?

Following is the May edition of our Royal Statistical Society Data Science Section newsletter. Hopefully some interesting topics and titbits to feed your data science curiosity … A particularly strong section from Members and Contributors this month- good reason to read to the end! Also we are moving Covid Corner to the end to change the focus a little.

As always- any and all feedback most welcome! If you like these, do please send on to your friends- we are looking to build a strong community of data science practitioners. And if you are not signed up to receive these automatically you can do so here.

Industrial Strength Data Science May 2021 Newsletter

RSS Data Science Section

Committee Activities

We are all conscious that times are incredibly hard for many people and are keen to help however we can- if there is anything we can do to help those who have been laid-off (networking and introductions help, advice on development etc.) don’t hesitate to drop us a line.

Fresh on the heels of our incredibly successful event with Andrew Ng, we are excited to announce the next instalment in the series at 6.30pm on Thursday May 20th. The RSS Data Science section invites you to a fireside conversation with Anthony Goldbloom – founder and CEO of Kaggle (now a Google company), the world’s largest data science and machine learning community with over 6MM members. Hear Anthony share his thoughts and experiences from the past 10 years at the forefront of competitive Machine Learning – sign up here to attend.

Martin Goodson, our chair, continues to run the excellent London Machine Learning meetup and is very active in with virtual events. The next event is on 10th May where Noam Brown, research scientist at Facebook AI in New York, will give a talk titled ‘AI for Imperfect-Information Games: Poker and Beyond‘. Videos are posted on the meetup youtube channel – and future events will be posted here.

This Month in Data Science

Lots of exciting data science going on, as always!

Ethics and more ethics…
Bias, ethics and diversity continue to be hot topics in data science…

"This is like trying to write a single law covering 'cars', that covers drunk driving, emissions standards, parking, and the tax treatment of highways.."
  • In addition, it is very hard to regulate ‘AI’ when it is far from clear we have a good definition of what ‘AI’ actually is, as our very own Martin Goodson points out in his recent blog post.
"The Act has already caused dismay amongst statisticians, who had no idea they were actually doing AI all along."
"accountability (n) - The act of holding someone else responsible for the consequences when your AI system fails."

Developments in Data Science…
As always, lots of new developments…

Real world applications of Data Science
Making a difference in the real world

Practical pointers on recommenders and search
Lots of good tips on search and recommendations this month

How does that work?
A new section on understanding different approaches and techniques

It’s all about the data
Which is more important… the data or the algorithm?

"Having clean data is in this category of “ghost knowledge” that, if you’ve been working in data for a long time, you know painfully from your own experience."
"Systematic improvement of data quality on a basic model is better than chasing the state-of-the-art models with low-quality data."

The Art of Visualisation
Making data science look right..

Practical Projects and Learning Opportunities
As always here are a few potential practical projects to while away the socially distanced hours:

Covid Corner

Again, more positive progress in the UK on the Covid front with over 35m people now having received their first vaccine dose and other metrics, such as deaths and hospitalisations all progressing in the right direction.

  • The latest ONS Coronavirus infection survey estimates the current prevalence of Covid in the community in England to be roughly 1 in 1000 so we have come a considerable way from January, when prevalence peaked at around 1 in 50. It is interesting to note though that we are not yet back to where we were last summer, when it dropped to 1 in 2000.
  • Some very positive results regarding the efficacy of the various vaccines ‘in the wild’ against the current variants have been recently published in the BMJ.
"Vaccination with a single dose of Oxford-AstraZeneca or Pfizer-BioNTech vaccines, [] significantly reduced new SARS-CoV-2 infections in this large community surveillance study"
 “The new vaccine can be mass-produced in chicken eggs — the same eggs that produce billions of influenza vaccines every year in factories around the world”

Updates from Members and Contributors

Again, hope you found this useful. Please do send on to your friends- we are looking to build a strong community of data science practitioners- and sign up for future updates here.

– Piers

The views expressed are our own and do not necessarily represent those of the RSS

Anthony Goldbloom Fireside Chat – Sign Up Now!

We are delighted to announce the second instalment of our ‘Fireside Chat’ series.

Following fresh on the heels of our excellent conversation with Andrew Ng, Giles Pavey will be chatting by the virtual fireside with Anthony Goldbloom, the founder of Kaggle (now a Google company) – the world’s most influential competitive data science platform.

The event will take place at 6.30pm on Thursday May 20th – sign up here to attend.

Forbes has twice named Anthony one of the 30 under 30 in technology, the MIT Technology Review has named him as one of the 35 Innovators Under 35 and the University of Melbourne has given Anthony an Alumni of Distinction Award.

Join us to hear Anthony’s reflections on 10 years at the heart of applied AI, the wisdom of Kaggle’s global crowd of 7 million members and what he believes this new decade has in store for Data Science.

Don’t miss out on what we are sure will be a compelling discussion- sign up for the event here, and send any topics or questions you would like to see covered to Giles Pavey.

April Newsletter

Hi everyone-

Another month flies by… still cold, but I’ve definitely seen the sun once or twice… I hope the on-again off-again dreams of a proper summer holiday aren’t proving too painful … perhaps a few curated data science reading materials might ease the burden over the Easter weekend?

Following is the April edition of our Royal Statistical Society Data Science Section newsletter. Hopefully some interesting topics and titbits to feed your data science curiosity …

As always- any and all feedback most welcome! If you like these, do please send on to your friends- we are looking to build a strong community of data science practitioners.

Industrial Strength Data Science April 2021 Newsletter

RSS Data Science Section

Covid Corner

It definitely feels like progress, at least in the UK, on the Covid front, with over 30m people now having received their first vaccine dose. Supply issues notwithstanding, it is clear that the vaccine roll-out is progressing very well.

  • It is now over a year since the UK first went into lockdown to attempt to restrict the spread of the virus. It’s interesting to reflect on how much data and statistics have become part of general public discussion: we still have daily updates of a number of different metrics on the news and published in papers. ‘More or Less’ has a nice summary of the UK’s efforts to collate and disseminate the figures and how the centralised healthcare setup contrasts favourably with the US, which required volunteers to generate national figures in the Covid Tracking Project.
  • Despite (or perhaps because of) the proliferation of data, the statistics have been made to argue many sides of the same case as highlighted in this research from MIT, stressing the importance of good visualisations.
  • It has been quite a month for Astra Zeneca …
 “Overall it’s a win for the world”

Committee Activities

We are all conscious that times are incredibly hard for many people and are keen to help however we can- if there is anything we can do to help those who have been laid-off (networking and introductions help, advice on development etc.) don’t hesitate to drop us a line.

Our first ‘Ethics Happy Hour’ on March 17th was very well received – see the write up here. The video recording will shortly be posted on youtube and we will publish links to it when it is available. Please let us know if you have any comments or would like to suggest topics for future events via email to dss.ethics@gmail.com

Fresh on the heels of our incredibly successful event with Andrew Ng, we are excited to announce the next instalment in the series. The RSS Data Science section invites you to a fireside conversation with Anthony Goldbloom – founder and CEO of Kaggle (now a Google company), the world’s largest data science and machine learning community with over 6MM members. Forbes has twice named Anthony one of the 30 under 30 in technology, the MIT Technology Review has named him as one of the 35 Innovators Under 35 and the University of Melbourne has given Anthony an Alumni of Distinction Award. Hear Anthony share his thoughts and experiences from the past 10 years at the forefront of competitive Machine Learning. Watch this space for more details!

Martin Goodson, our chair, continues to run the excellent London Machine Learning meetup and continues to be very active in with virtual events. The next event is on 7th April where Mike Lewis, research scientist at Facebook AI Research in Seattle, will give a talk titled ‘Beyond BERT: Representation Learning for Natural Language at Scale’ . Videos are posted on the meetup youtube channel – and future events will be posted here.

Elsewhere in Data Science

Lots of non-Covid data science going on, as always!

Ethics and more ethics…
Bias, ethics and diversity continue to be hot topics in data science…

"I will have a lot more to say about this later. But announcing a new org by a Black woman as if we’re all interchangeable while harassing, terrorizing and gaslighting my team and doing absolutely ZERO to acknowledge & redress the harm that’s been done is beyond gaslighting."
"Everything the company does and chooses not to do flows from a single motivation: Zuckerberg’s relentless desire for growth."

Developments in Data Science…
As always, lots of new developments…

The Practical side … getting stuff to work in production

"When a system isn’t performing well, many teams instinctually try to improve the Code. But for many practical applications, it’s more effective instead to focus on improving the Data."
"It’s a common joke that 80 percent of machine learning is actually data cleaning, as though that were a lesser task. My view is that if 80 percent of our work is data preparation, then ensuring data quality is the important work of a machine learning team."

How does that work?
A new section on understanding different approaches and techniques

Thinking about intelligence and bigger picture stuff
Stepping back from the code for a bit…

  • Thought provoking article proposing that “Computers will never write good novels” – definitely worth thinking through how much of this you agree with
"The best that computers can do is spit out word soups. They leave our neurons unmoved."
"Employees are far happier when they are led by people with deep expertise in the core activity of the business."

Practical Projects and Learning Opportunities
As always here are a few potential practical projects to while away the socially distanced hours:

Updates from Members and Contributors

  • Marco Gorelli is running an excellent workshop on 10th April about contributing to Pandas. The workshop is being run in collaboration with PyLadies and is specifically targeting people from underrepresented genders in tech. Sign up for the morning session or the afternoon session.
  • Emre Kasim is running the brilliant Algo Conference which this year is taking place online on April 29th with a number of very relevant streams, including ‘Foundational AI’, ‘AI and Innovation’ and ‘Implications of AI and other Disruptive Technologies- well worth signing up for here.
  • Alex Spanos highlights the upcoming Data Science Festival which in April is focused on Fintech- check out his talk on Data Science/Machine Learning and Open Banking APIs on April 15th.
  • Vijay Kumar Mishra, Research Scientist at Public Health for India, is running a 5-day online international workshop on ‘’Designing and Conducting Clinical Trials” from the 3rd to the 7th of May. The workshop will be jointly conducted by Public Health Foundation of India, Sitaram Bhartia Institute of Science and Research, Paropakar Maternity and Women Hospital and University College London and will be aimed at providing a theoretical understanding of designing and conducting clinical trials. Contact Vijay (vijay.mishra@phfi.org) for more details.
  • Harin Sellahewa draws our attention to the 35 of 70 masters students entering their final assessment for the University of Buckingham MSc in Applied Data Science- best of luck to everyone!

Again, hope you found this useful. Please do send on to your friends- we are looking to build a strong community of data science practitioners- and sign up for future updates here:

Processing…
Success! You're on the list.

– Piers

The views expressed are our own and do not necessarily represent those of the RSS

The inaugural RSS Data Science Ethics Happy Hour

Event report by Giles Pavey, RSS Data Science Section Committee member

Wednesday March 17th saw the RSS Data Science Section host its first ‘Ethics Happy Hour’. Events in this new series provide an opportunity to discuss and meet other people interested in questions of AI ethics and data science ethics more broadly. Taking place in a relaxed and informal setting, our aim for these sessions is to stimulate intellectual exchange and contribute to community building around ethics in the context of data science.

The inaugural event took place virtually and focused on COVID-19. The discussion took the form of a panel chaired by RSS Data Science Section Committee member Dr Florian Ostmann with three experts sharing their thoughts on the ethics of data science in addressing the public health crisis:

  • Dr Zachary Lipton (Carnegie Mellon University) is a machine learning researcher and jazz saxophonist. He is currently an Assistant Professor of Operations Research and Machine Learning at Carnegie Mellon University, where he runs the Approximately Correct Machine Intelligence lab.
  • Dr Anjali Mazumder (RSS Data Science Section Committee / The Alan Turing Institute) is the Theme Lead on AI and Justice & Human Rights at the Alan Turing Institute and also a member of the Data Science Section Committee, among other RSS roles. Her research interests include Bayesian decision support systems, causal reasoning, detecting bias and algorithmic fairness, and responsible data sharing practices.
  • Dr Nicola Stingelin (RSS Data Ethics and Governance Section Committee) is a member of the RSS Data Ethics and Governance Section Committee and an associate researcher at the University of Basel. Building on business experience in the pharmaceutical sector, she acts in various advisory roles with a focus on the ethics of data innovation including big data, algorithms and public health data ethics in health care research and practice.

The event attracted around 35 attendees from across academia, business, and the public and charitable sectors. After the introduction there was a lively debate covering multiple aspects of the use of data and AI around the world in response to the pandemic and its ramifications.

Access to data was a major discussion point, with Nicola arguing that the importance of data in creating competitive advantage, especially in commerce, was causing obstacles to data sharing which could stall progress.

Zach conjectured that we should not rely on technical solutions to what might be considered primarily societal and philosophical problems such as access to data or AI resource. In answer to this point, Anjali suggested that there are at least some relevant problems where technical solutions can help. For instance, Privacy Enhancing Technologies, such as Differential Privacy, can enable insights whilst protecting individuals’ privacy.

Another area of debate concerned differences between what we should expect from data and AI when thinking about micro (individual) level predications and classifications versus more high/macro-level decision making. For example, is it acceptable to use thermal imaging AI models in public spaces for social control?

The discussion also touched on areas such as whether society is becoming too reliant on data and whether this is creating a digital divide; how we feel about a society where one has to have a smart phone to access services; and how things will develop as the populous are asked to share more and more data. Lastly, on the subject of speed of change: COVID19 has highlighted tensions between the desire and need to move at a quick pace and the academic norm of considered peer review.

The event was drawn to a close (having overrun by 15’) with a general agreement that the ethical issues are many and complex and that data science and statistical methods will offer key tools for us to navigate the future.

Please contact the RSS Data Science Section if you have any comments or would like to suggest topics for future events via email to dss.ethics@gmail.com. To stay informed about future ethics happy hours and other events organised by the Data Science Section, we recommend signing up to the Data Science Section mailing list.

Our First Ethics Happy Hour: March 17th, 5-6pm

Ethics Happy Hour on March 17th:
“Ethical data science in the context of COVID-19 — What are the most important issues?”

We are excited to host our first ‘Ethics Happy Hour’. Events in this new series provide an opportunity to discuss and meet other people interested in questions of AI ethics and data science ethics more broadly. Taking place in a relaxed and informal setting, our aim for these sessions is to stimulate intellectual exchange and contribute to community building around ethics in the context of data science. 

Our first event will take place virtually and focus on COVID-19. We are delighted that the following three experts have agreed to share their thoughts on the ethics of data science in addressing the public health crisis:

The hour-long session will take place on March 17th from 5 to 6pm. It will begin with each expert offering an initial take on the topic, drawing on their different areas of experience. This will be followed by an open discussion with the opportunity for all participants to share questions, comments, and contributions. 

To sign up for the event, please register here. 

As previously announced, the organising team is always looking for submissions of data science-related ethical challenges and dilemmas that community members have encountered in their professional lives and that are suitable to serve as case studies to be discussed in future sessions. If you have a suitable story to share, we look forward to hearing from you with an initial brief summary sent to dss.ethics@gmail.com 

March Newsletter

Hi everyone-

Another month flies by… At least it’s getting a bit lighter in the mornings and I’ve even seen the sun once or twice. I hope you are staying as sane as possible despite home-schooling, home-working, home-everything else (delete as appropriate…) … perhaps a few curated data science reading materials might lighten the mood?

Following is the March edition of our Royal Statistical Society Data Science Section newsletter. Hopefully some interesting topics and titbits to feed your data science curiosity …

As always- any and all feedback most welcome! If you like these, do please send on to your friends- we are looking to build a strong community of data science practitioners.

Industrial Strength Data Science March 2021 Newsletter

RSS Data Science Section

Covid Corner

The vaccination roll-out in the UK continues to progress well with now over 20m first doses delivered, and we even have a road-map out of lockdown… perhaps some light at the end of the tunnel.

"Both of these are working spectacularly well"
  • In additional positive vaccine news, a recent FDA review showed that the new Johnson and Johnson ‘one-shot’ vaccine appeared safe and effective in trials; and we also saw the first shipment of the AstraZeneca vaccine as part of the COVAX program, delivered to Ghana.
  • As we all know, the pandemic has thrown up a wide variety of new terms, metrics and statistics that can be easily misinterpreted or misunderstood – the RSS has published an excellent FAQ on Covid-19 measures and statistics which is well worth circulating.
  • The UK government has charted a cautious route out of lockdown. In sobering reading, this cautiousness was apparently linked to research commissioned from the teams at Imperial and Warwick University by the modelling group (SPI-M) in SAGE.
    • These models have proved surprisingly accurate, at least in terms of predicting the surge in cases over the winter.
    • This time both teams were asked to independently model the effect of different lockdown exit strategies and both reached similar conclusions- that lifting all restrictions by April 26th would likely drive another wave comparable in size to January 2021, resulting in a further 62,000 to 107,000 deaths in England.
  • The NHS Test and Trace App did not have the most auspicious beginnings, but recent research from the Alan Turing Institute indicates that it has indeed had a positive effect in reducing the impact of Covid.
  • The virus does seem to be in retreat in a number of countries around the world. The recent decrease in positive cases in the US is puzzling researchers somewhat (also covered by more or less)- decreased testing? improved behaviour? vaccination roll-out? seasonality? herd immunity? … the upshot seems to be, a little bit of everything and we don’t really know.
  • Although the retreat is great news, the results in the US and elsewhere have been devastating and disproportionately felt. This recent study published in PNAS shows how life expectancy in the US has fallen by 1.13 years due to Covid, with “estimated reductions for the Black and Latino populations 3 to 4 times that for Whites”.
  • Finally a thoughtful piece from the Ada Lovelace Institute about vaccination passports and what role they could or should play in society.

Committee Activities

We are all conscious that times are incredibly hard for many people and are keen to help however we can- if there is anything we can do to help those who have been laid-off (networking and introductions help, advice on development etc.) don’t hesitate to drop us a line.

Our Fireside Chat with Andrew Ng on February 10th was a roaring success. We had over 500 people attend what proved to be an entertaining and thought provoking discussion on technical leadership in AI, artfully hosted by our chairman Martin Goodson, and introduced by the RSS President Sylvia Richardson. For those who missed it, here’s the 5 minute edited highlights (and below if you are viewing on the blog) – check out the full video here

We are excited to host our first ‘Ethics Happy Hour’, which will take place on March 17th from 5 to 6pm. As previously announced, events in this new series provide an opportunity to discuss and meet other people interested in questions of AI ethics and data science ethics more broadly. The first event will take place virtually and focus on COVID-19. We are delighted that the following three experts have agreed to share their thoughts on the ethics of data science in addressing the public health crisis:

  • Dr Zachary Lipton (Carnegie Mellon University)
  • Dr Anjali Mazumder (RSS Data Science Section Committee / The Alan Turing Institute)
  • Dr Nicola Stingelin (RSS Data Ethics and Governance Section Committee)

The event will begin with each expert offering an initial take on the topic, drawing on their different areas of experience. This will be followed by an open discussion with the opportunity for all participants to share questions, comments, and contributions. To sign up for the event, please register here.

The joint RSS/British Computing Society/Operations Research Society discussions on data science accreditation are picking up again and we are actively involved in these. We also hope to be posting our own version of a basic data science curriculum soon- will keep you posted.

Martin Goodson, our chair, continues to run the excellent London Machine Learning meetup and continues to be very active in with virtual events. The next event is on 8th march where Mingxing Tan a research scientist at Google Brain, will talk about AutoML for Efficient Vision Learning. Videos are posted on the meetup youtube channel – and future events will be posted here.

Elsewhere in Data Science

Lots of non-Covid data science going on, as always!

Ethics and more ethics…
Bias, ethics and diversity continue to be hot topics in data science…

"We found that falsehood diffuses significantly farther, faster, deeper, and more broadly than the truth, in all categories of information, and in many cases by an order of magnitude... False news is more novel, and people are more likely to share novel information"
  • In addition they talk about what could be done to limit the power and reach of incumbent social networks
    • Portability and interoperability – as happened with mobile phone numbers, and instant messenger apps – is much more likely to succeed than splitting up the leading players, since the network effects naturally lead to another dominant player taking over.
  • Clearly, flagging or removing false information and inflammatory posts would be beneficial all around, but automating and scaling this process is very difficult as this article about how ads for clothing for people with disabilities have been repeatedly banned, highlights.

Developments in Data Science…
As always, lots of new developments…

“By having the human iteratively teach the model, it's possible to make a better model, in less time, with much less labelled data.”

The Art of Visualisation…

How does that work?
A new section on understanding different approaches and techniques

Thinking about intelligence…
How does the brain really work, how should we think about AI morality…

"Imagine it’s 2026. An autonomous public robocar is driving your two children to school, unsupervised by a human. Suddenly, three unfamiliar kids appear on the street ahead – and the pavement is too slick to stop in time. The only way to avoid killing the three kids on the street is to swerve into a flooded ditch, where your two children will almost certainly drown."

Practical Projects and Learning Opportunities
As always here are a few potential practical projects to while away the socially distanced hours:

Updates from Members and Contributors

Again, hope you found this useful. Please do send on to your friends- we are looking to build a strong community of data science practitioners- and sign up for future updates here:

Processing…
Success! You're on the list.

– Piers

The views expressed are our own and do not necessarily represent those of the RSS

February Newsletter

Hi everyone-

Well.. January seemed to fly by. 2021 has certainly started with a bang (Brexit!, Impeachment!, New President!, Vaccinations!) and the holidays seem an age ago. I hope you are surviving lockdown 3.0 as best as you can… maybe there is room in the long dark evenings for a few curated data science reading materials?

Following is the February edition of our Royal Statistical Society Data Science Section newsletter. Hopefully some interesting topics and titbits to feed your data science curiosity …

As always- any and all feedback most welcome! If you like these, do please send on to your friends- we are looking to build a strong community of data science practitioners.

Industrial Strength Data Science February 2021 Newsletter

RSS Data Science Section

Covid Corner

I keep thinking we might be able to drop the ‘Covid Corner’ section from the newsletter, but sadly the pandemic is still very much alive. The vaccination roll-out in the UK does seem to be going well, however, with over 9m first dose vaccinations made (as of Feb 1st) which is great news.

"one side claims that the tests are more than 90% effective at what they do; the other side says they could be as low as 3%, depending on what you mean by “effective”."
  • Finally, this feels like a very exciting development. The recent breakthroughs in natural language processing (NLP) and language models (like BERT-2/3) are at heart based on understanding the likelihood of different sequences of letters and words, codified into word embeddings (vector representations). Applying this approach to other fields (remember chess?) feels very elegant, and the MIT researchers in this case have used the underlying gene sequences (‘letters’) of viruses to train their model. From this they are able to predict likely virus mutations using sequence data alone:
"The model achieved 0.85 AUC in predicting SARS-CoV-2 variants that were highly infectious and capable of evading antibodies."

Committee Activities

We are all conscious that times are incredibly hard for many people and are keen to help however we can- if there is anything we can do to help those who have been laid-off (networking and introductions help, advice on development etc.) don’t hesitate to drop us a line.

There is still time to register for our upcoming fireside chat with none-other than Andrew Ng on February 10th. We are very excited for what is going to be a fantastic event: don’t miss out, sign up here.

As we previously announced we are looking forward to our first AI Ethics Happy Hour event – details to follow.

The joint RSS/British Computing Society/Operations Research Society discussions on data science accreditation are picking up again and we are actively involved in these. We also hope to be posting our own version of a basic data science curriculum soon- will keep you posted.

Giles Pavey has been discussing what it takes to build world class data science teams.

Martin Goodson, our chair, continues to run the excellent London Machine Learning meetup and continues to be very active in with virtual events. The next event is on 11th February where Manzil Zaheer, a research scientist at Google, will talk about Big Bird: Transformers for Longer Sequences. Videos are posted on the meetup youtube channel – and future events will be posted here.

Finally, we are really pleased to include a call for contributions to RSS 2021 Conference, 6-9 September in Manchester. The organisers are seeking submissions for contributed talks which can be on any topic related to statistics and data science (deadline April 6th).

Elsewhere in Data Science

Lots of non-Covid data science going on, as always!

Big Government and AI
Governments around world mapping out grand AI plans…

Ethics and more ethics…
Bias, ethics and diversity continue to be hot topics in data science…

AI in Healthcare
Increasing utilisation of AI and machine learning in healthcare…

  • Exciting announcement from the Korea Institute of Science and Technology who have developed a prostate cancer urine screening test using machine learning.
  • Interesting comment published in Nature discussing how recent applications of AI to ageing research are leading to the emergence of the field of longevity medicine.
  • We have seen a number of studies in recent times highlighting the power of deep learning techniques in medical imaging and the automatic assessment of resulting scans- this review article in nature assesses the overall gains over the last decade.
  • As the previous article alludes to, going from prototype to real world production in a healthcare setting is far from simple, and this article from Rachel Thomas of fast.ai highlights some of the underlying issues.
  • Interestingly, the FDA in the US has released an action plan focused on methods for approving AI and Machine Learning based applications in health care in the US.

Developments in Data Science…
As always, lots of new developments…

  • Fresh on the heels of GPT-3, OpenAI have released an amazing application, called DALL-E (Salvador Dali crossed with Pixar’s WALL-E…), a 12 billion parameter version of GPT-3 trained to generate images from text descriptions. You have to try this… Good summary here from MIT Technology Review.
“In the long run, you’re going to have models which understand both text and images. AI will be able to understand language better because it can see what words and sentences mean.”
  • Not to be outdone on the ‘my model has more parameters than your model’ stakes, Google recently announced their Switch Transformer Language Model with 1.6 trillion parameters.
  • Great summary, from Jeff Dean, head of Google AI, of Google’s research output in 2020 (over 800 publications) and what lies ahead for 2021. This is long, but well worth a read as it highlights the amazing breadth and depth of the output from the Google researchers.
"I’m particularly enthusiastic about the possibilities of building more general-purpose machine learning models that can handle a variety of modalities and that can automatically learn to accomplish new tasks with very few training examples"

How does that work?
A new section on understanding different approaches and techniques

Teams, people and production…
Still one of the biggest obstacles…

  • Interesting commentary from Gergely Orosz on the approach to motivating and empowering software engineers in Silicon Valley, very relevant also for Data Scientists and ML engineers.
  • What skills do you really need in your data team? Is it all about the models, or do you need more breadth, both on the business side, and engineering.
  • How do you scale a team at different stages of development? Useful advice here from Peter Gao.
  • If you want to put in place proper monitoring of your ML systems but aren’t quite ready for a full blown MLOps solution, how about giving this a try, from Jeremy Jordan?
  • A pretty bland ‘top x trends in data’ title, but some useful pointers on best practices in building out a a modern data stack

Practical Projects and Learning Opportunities
As always here are a few potential practical projects to while away the socially distanced hours:

Updates from Members and Contributors

  • Adriano Soares Koshiyama highlights what looks like an excellent upcoming UCL webinar on AI in the Judicial System on Feb 25 at 1pm: “In this webinar we welcome Dr Pamela Ugwudike (University of Southampton, Alan Turing Institute) and Charles Kerrigan (CMS partner and global head of Fintech) to present their perspectives from academia and industry”. Register here.
  • Rafael Garcia-Navarro has been doing some impressive work in conversational ai, implementing on top of Metaflow (Netflix’s MLOps framework) – definitely worth a read.
  • Kevin O’Brien draws our attention to a great write-up on the Climate Modeling Alliance (CliMA) project and how they use Julia (“Meet the team shaking up climate models”). Also, don’t forget JuliaCon 2021 Wednesday 28th July to Friday 30th July 2021.

Again, hope you found this useful. Please do send on to your friends- we are looking to build a strong community of data science practitioners- and sign up for future updates here:

Processing…
Success! You're on the list.

– Piers

The views expressed are our own and do not necessarily represent those of the RSS

Andrew Ng Fireside Chat – Sign Up Now!

We are incredibly honoured and excited to be hosting Andrew Ng for a fireside chat on 10th February at 6.30pm (Sign up here)

Many data scientists’ first encounter with Andrew Ng was through his Stanford University machine learning course – which has enrolled almost 4m people! However, some may be unaware that his contribution to AI, machine learning and data science goes much further.

In addition to his Professorship at Stanford University and co-founding of Coursera, he has been one of the most important drivers of the success of Deep Learning over the last decade. He founded and led the “Google Brain” project, which developed massive-scale deep learning algorithms, before moving on to lead Baidu’s 1300 person AI Group, developing technologies in deep learning, speech, computer vision, NLP, and other areas.

More recently he has set up a number of initiatives including DeepLearning.AI, Landing.AI and the AI Fund, focused on promoting the practical use of AI to solve real world problems.

Our fireside chat will focus on technical leadership in artificial intelligence. We’ll be asking Andrew’s advice on:

  • How technical people can become effective AI leaders or entrepreneurs.
  • How to run a successful R&D team for AI product development.
  • How the UK can support a new generation of AI leaders.

The discussion will be hosted by Martin Goodson, Chair of the Data Science Section and CEO of artificial intelligence startup, Evolution AI. The event will be opened by the President of the Royal Statistical Society, Sylvia Richardson.

Don’t miss out on what we are sure will be a compelling discussion- sign up for the event here, and send any topics or questions you would like to see covered to Martin Goodson.

January Newsletter

Hi everyone-

Happy New Year! I hope you have all had a festive holiday period and found some time to catch up on those deep learning research papers you had been meaning to dig into… Fingers crossed 2021 proves better than 2020…. as a start, how about welcoming in the new year with a few curated data science reading materials!

Following is the January edition of our Royal Statistical Society Data Science Section newsletter. Hopefully some interesting topics and titbits to feed your data science curiosity …

As always- any and all feedback most welcome! If you like these, do please send on to your friends- we are looking to build a strong community of data science practitioners.

Industrial Strength Data Science January 2021 Newsletter

RSS Data Science Section

Covid Corner

A new year but sadly not much change in the story – however with vaccinations now actively happening, an end does seem in sight, even if it seems tantalisingly far away.

  • A new strain of COVID-19 materialised in south east England. Although virus mutations happen all the time, this one was important as the strain appears significantly more transmissible. Its prevalence in positive tests appears strongly linked to dramatic rises in new cases.
  • An imperial study modelling the case rates concludes that the new strain “has a transmission advantage of 0.4 to 0.7 in reproduction number compared to the previously observed strain.”
  • This report also highlights how statisticians and data scientists still need to work on the art of communication….
"Using whole genome prevalence of different genetic variants through time and phylodynamic modelling (dynamics of epidemiological and evolutionary processes), researchers show that this variant is growing rapidly."

Yes, quite…

  • And in case you missed it, the UK government managed to lose case data again… Clearly not learned from the last time.
  • There has however been fantastic news on the vaccine front with vaccinations now rolling out around the world. As discussed in our previous newsletter, the mRNA approach used in the Moderna and BioNTech vaccines is huge breakthrough- there is an excellent interview on the Andreesen-Horowitz A16z podcast with Stephane Bancel the Moderna CEO where he goes through the development process in detail, including how they generated the vaccine blueprint within 48 hours of receiving the virus DNA sequence.
We used to grow our vaccines, now we can “print” them.

Committee Activities

We are all conscious that times are incredibly hard for many people and are keen to help however we can- if there is anything we can do to help those who have been laid-off (networking and introductions help, advice on development etc.) don’t hesitate to drop us a line.

As we announced just before Christmas, we are all incredibly excited about our upcoming fireside chat with none-other than Andrew Ng on February 10th – save the date! We want to make the discussion as relevant to our community as possible, so do please send any topics or questions on becoming an AI technical leader to Martin (@martingoodson).

As we previously announced we are looking forward to our first AI Ethics Happy Hour event – details to follow.

The joint RSS/British Computing Society/Operations Research Society discussions on data science accreditation are picking up again and we are actively involved in these. We also hope to be posting our own version of a basic data science curriculum soon- will keep you posted.

Martin Goodson, our chair, continues to run the excellent London Machine Learning meetup and continues to be very active in with virtual events. The next event is on 13th January where Jakob Foerster from FacebookAI will discuss Zero-Shot (Human-AI) Co-ordination. Videos are posted on the meetup youtube channel – and future events will be posted here.

Elsewhere in Data Science

Lots of non-Covid data science going on, as always!

Ethics and more ethics…
Bias, ethics and diversity continue to be hot topics in data science…

Real world data science applications …
All sorts of great applications of data science and machine learning, regularly coming to light.

  • AirBnB have released an elegant new approach to dealing with positional bias in search rankings. If you are learning preferences from historical data, how do you deal the fact that actions (clicks, likes etc) will be influenced by the position rank of the given item?
This creates a feedback loop, where listings ranked highly by previous models continue to maintain higher positions in the future, even when they could be misaligned with guest preferences.

Developments in Data Science…
As always, lots of new developments…

"In short: this module is a neural network that iteratively refines the structure predictions while respecting and leveraging an important symmetry of the problem, namely that of roto translations."
"The results of DeepMind's work are quite astounding and I marvel at what they are going to be able to achieve in the future given the resources they have available to them"

Getting AI into production…
Still one of the biggest obstacles…

"While building good models is important, many organizations now realize that much more needs to be done to put them into practical use, from data management to deployment and monitoring. In 2021, I hope we will get much better at understanding the full cycle of machine learning projects, at building MLOps tools to support this work, and at systematically building, productionizing, and maintaining AI models."

Practical Projects and Learning Opportunities
As always here are a few potential practical projects to while away the socially distanced hours:

Updates from Members and Contributors

  • Mani Sarkar has been busy updating his NLP Profiler python library– he has a useful notebook working through the different features here.
  • Kevin O’Brien draws our attention to JuliaCon 2021 which will be free and virtual with the main conference taking place Wednesday 28th July to Friday 30th July 2021 (workshops will be held the week before). Julia is a high performance dynamic language designed to address the requirements of high-level numerical and scientific computing, and is becoming increasingly popular in Machine Learning and Data Science. Stay up to date on further announcement by joining the JuliaCon 2021 event page on LinkedIn.

Again, hope you found this useful. Please do send on to your friends- we are looking to build a strong community of data science practitioners- and sign up for future updates here:

Processing…
Success! You're on the list.

– Piers

The views expressed are our own and do not necessarily represent those of the RSS

Andrew Ng at the RSS Data Science Section!

We are going to have the great honour of hosting a fireside chat with Andrew Ng in February. The Data Science Section of the Royal Statistical Society have invited Andrew to come and talk to us about how technical people can become leaders in artificial intelligence and data science.

Andrew needs little introduction to the world of Machine Learning and AI. A successful scientist, inventor, writer and huge contributor to a field that we all share a common enthusiasm for.

Our topic of discussion will be the art and science of creating successful RnD teams which are able to deliver business value consistently. We would like to invite some guest questions. Let us know what you’d like to ask Andrew Ng. For example:
– How do you structure an effective R&D team?
– How do you decide what’s important to research?
– How can the UK government support its technical data scientists?

I’m sure you have more ideas!

Please sign up to our mailing list to find out more: 

Processing…
Success! You're on the list.

December Newsletter

Hi everyone-

It’s hard to believe it’s December… in some respects the year has felt incredibly slow as we have watched the pandemic run its inexorable course; and then in other ways it feels like a blur of zoom calls and box sets that has gone by in a flash. Lets hope 2021 proves better… for the time being, how about hunkering down with some ‘home-improvement’ via a few curated data science reading materials!

Following is the December edition of our Royal Statistical Society Data Science Section newsletter. Hopefully some interesting topics and titbits to feed your data science curiosity …

As always- any and all feedback most welcome! If you like these, do please send on to your friends- we are looking to build a strong community of data science practitioners- and sign up for future updates here:

Processing…
Success! You're on the list.

Industrial Strength Data Science December 2020 Newsletter

RSS Data Science Section

Covid Corner

As the virus continues to spread, how the holiday period will effect infection rates, and how quickly vaccines can be distributed are some of the hot topics. As always numbers, statistics and models are front and centre in all sorts of ways.

  • The dramatic rise in positive cases and hospitalisations led to a national lockdown in the UK at the end of November which is set to be lifted at the beginning of December. Although positive case rates do seem to have slowed somewhat, there are still over 16,000 people hospitalised with Covid, not far off the peak of 20,000 in April, so the risk of increased fatalities is still high, particularly with the loosening of restrictions set to take place over Christmas.
  • For those planning a get together over the holidays, this tool can give you an understanding of the risks involved.
  • Again, the communication of numbers, statistics and trends is very much front and centre: David Spiegelhalter argues how important trust is, in this process
  • Minimising unnecessary fatalities seems increasingly important as the first positive news in a while came to light.
    • Finally (for the time being with many more vaccines potentially on the way) on November 23rd, the Oxford University/AstraZeneca partnership announced that their vaccine candidate showed an average efficacy of 70%, with a specific dosage regime exhibiting efficacy of 90%.
  • The first two vaccines (Pfizer and Moderna) have been developed using a relatively new approach – mRNA – and the speed of the development and their success is a true scientific breakthrough, as Adam Finn discusses.
  • For those of us used to the realms of ‘Big Data’ where machine learning data sets of millions of records are common, the numbers involved in the clinical trials are striking. Clearly it is impossible to conduct these trials at the scale of web data but with the relatively low incidence of Covid, the 30 to 40 thousand participants in the phase 3 trials result in less than 100 infections combined across the test and control groups. For instance, with Moderna:
"This first interim analysis was based on 95 cases, of which 90
cases of COVID-19 were observed in the placebo group versus 5 cases
observed in the mRNA-1273 group, resulting in a point estimate of
vaccine efficacy of 94.5% (p <0.0001)"
  • While these figures are very unlikely to have been generated by chance, it highlights the intractability of understanding effects in sub-groups (age bands etc), and also the critical importance of true randomisation in test and control group selection.
  • The Oxford-AstraZenca has a number of positives compared to the other two vaccines, notably cost (it is likely to be far more affordable and so practical globally) and distribution (it does not need to be kept at the very cold temperatures of the other two). However, questions are now being raised about the trial results (see Wired, and New Scientist) which may mean that further trials are required before regulatory approval is gained.
  • Testing is of course still key- MIT Researchers have produced a prototype AI model to detect Covid from recordings of coughs … perhaps soon Alexa will be able to diagnose us!
  • A different but important Covid related topic is understanding the economic impact of the pandemic. Raj Chetty Professor of Public Economics at Harvard University, has been using a variety of publicly available data in novel ways to attempt to understand how Covid has affected different socio-economic groups from a financial perspective. He has produced elegant visualisations highlighting the disparity in outcomes.
Declines in high-income spending led to significant employment
losses among low-income individuals working in the most affluent ZIP
codes in the country,
  • For those looking for a good listen, Chetty was interviewed by Kara Swisher for the NYTimes Sway podcast. The study using anonymised tax records to build longitudinal understanding of the changes in inequality, and the impact of small changes in location, was particularly insightful.

Committee Activities

We are all conscious that times are incredibly hard for many people and are keen to help however we can- if there is anything we can do to help those who have been laid-off (networking and introductions help, advice on development etc.) don’t hesitate to drop us a line.

As previewed in our last newsletter, and our recent release, we are excited to be launching a new initiative: AI Ethics Happy Hours. We are now working on organising the first event based on suggestions we have received.

The joint RSS/British Computing Society/Operations Research Society discussions on data science accreditation are picking up again and we are actively involved in these. We also hope to be posting our own version of a basic data science curriculum soon- will keep you posted.

We are excited to announce that both Jim Weatherall and Anjali Mazumder from the Data Science Section committee have been elected to the full Royal Statistical Society Council.

Anjali was also one of the four co-chairs organising the very successful ‘AI and Data Science in the age of COVID-19‘ conference at the Alan Turing Institute. There were representatives from 35 countries, 58 government departments, 62 institutes and 158 universities engaged in the audience! The talks will be published on YouTube shortly.

Piers Stobbs has won the Digital Masters Award for Data Excellence, 2020.

Martin Goodson, our chair, continues to run the excellent London Machine Learning meetup and has been active in with virtual events. The most recent event was on December 2nd, where Sasha Rush from Hugging Face discussed deep probabilistic structure in NLP. Videos are posted on the meetup youtube channel – and future events will be posted here.

Elsewhere in Data Science

Lots of non-Covid data science going on, as always!

Ethics and more ethics…
Bias, ethics and diversity continue to be hot topics in data science…

  • Following the theme of BMW publishing their AI code of ethics which we highlighted in the last newsletter, AstraZenca have done the same.
  • Is it possible to conduct facial recognition research ethically? Interesting discussion in Nature.
  • What do we really mean by ‘model explainability’?. Useful breakdown of different terms and approaches.
  • In a slightly different vein (perhaps data science for transparency), Sophie Hill has created a graph based visualisation (engagingly titled ‘My Little Crony‘) that uses public data to highlight the links between politicians and companies awarded contracts during the pandemic.
  • Another piece pointing the finger at recommendation algorithms for exacerbating ‘filter bubbles’ – this time at Facebook. Interestingly, it contrasts the way the Facebook algorithm works compared to that at Reddit and highlights some key differences:
    • At Facebook, you are recommended things based on people who have agreed to be friends with you- so you are unlikely to ever see content from different viewpoints.
    • Reddit prioritises content based on what users vote to be the most interesting or informative, but Facebook gives priority to what has garnered the most engagement.
  • Another example of flawed governmental use of algorithms and machine learning models- this time in housing.
The recurring theme here is an assumption that by just using an
algorithm you can find a completely objective solution to any issue.
That all these algorithms have struggled as they come into contact
with the real world suggests otherwise

Real world data science applications …
All sorts of great applications of data science and machine learning, regularly coming to light.

We have been stuck on this one problem – how do proteins fold up –
for nearly 50 years. To see DeepMind produce a solution for this,
having worked personally on this problem for so long and after so
many stops and starts, wondering if we’d ever get there, is a very
special moment.

Developments in Data Science…
As always, lots of new developments…

"Neural nets are surprisingly good at dealing with a rather small
amount of data, with a huge numbers of parameters, but people are
even better"

Some practical tips and tricks to try..
And as always, lots of fantastic tutorials out there…

Practical Projects
As always here are a few potential practical projects to while away the socially distanced hours:

Updates from Members and Contributors

  • Kevin O’Brien mentioned what looks to be an excellent webinar series: X-Europe Webinars. X-Europe Webinars is an organization for joint online events of Vienna Data Science Group, Frankfurt Data Science, Budapest Data Science Meetup, BCN Analytics, Budapest.AI, Barcelona Data Science and Machine Learning Meetup, Budapest Deep Learning Reading Seminar and Warsaw R Users Group.

Again, hope you found this useful. Please do send on to your friends- we are looking to build a strong community of data science practitioners- and sign up for future updates here:

Processing…
Success! You're on the list.

– Piers

The views expressed are our own and do not necessarily represent those of the RSS

November Newsletter

Hi everyone-

It’s all go… US Presidential Elections, second waves, third tiers and second lockdowns, all while struggling to maintain some semblance of professionalism for the next Zoom call… Definitely time for ‘self-care’ via some selected data science reading materials!

Following is the November edition of our Royal Statistical Society Data Science Section newsletter. Hopefully some interesting topics and titbits to feed your data science curiosity …

As always- any and all feedback most welcome! If you like these, do please send on to your friends- we are looking to build a strong community of data science practitioners- and sign up for future updates here:

Processing…
Success! You're on the list.

Industrial Strength Data Science November 2020 Newsletter

RSS Data Science Section

Covid Corner

Lockdowns are looming again across Europe as COVID-19 cases continue to rise. As always numbers, statistics and models are front and centre in all sorts of ways.

  • Positive Covid case numbers are still rising- the New York Times is a useful resource for comparing case rates across countries and regions on a like for like basis (US local area data, UK local area data). We often think we are faring better than the US, but right now that is not the case: London currently has double the infection rate of New York.
  • Back in September, we were told by Sir Patrick Vallance, the government’s Chief Scientific Advisor, that unless we took collective measures to stem the spread of the virus, the estimated trajectory “would lead to 200 deaths a day by mid-November”.
    • Since then we have seen the various alert levels, tiers and associated restrictions implemented.
    • Sadly, though, we passed 200 deaths a day by mid-October, ahead of the schedule laid out by Vallance.
    • Clearly the measures we have attempted to implement so far have not been successful. Hence the recently announced lockdown… let’s hope it proves more effective.
  • In a similar vein, it appears that while the “Eat Out to Help Out” program may well have provided significant economic benefit for the hospitality sector it may also have unintentionally helped drive the rise in Covid infections.
"Between 8 and 17% of the newly detected COVID-19 infection clusters
can be attributed to the scheme"
  • One initiative we have been repeatedly told is critical to bringing the virus spread under control is our ‘Test and Trace’ program, into which a great deal of money has been spent.
    • Indeed, we have certainly made great strides in the volume of tests being run, and on a per-capita basis are now doing better than many European countries.
    • However, testing on its own does not necessarily help. An interdiscipinary team at UCL have put together this insightful dashboard highlighting the point. Leveraging data science best practice, they have broken the process down into different stages (Find, Test, Trace, Isolate, Support), defining key success metrics for each stage.
    • Not only are the key metrics we can currently measure very weak (only 14% of close contacts were advised to isolate in the recent period), but we are not even capturing the data to measure certain pieces of the puzzle (we do not actually know how many of those advised to isolate do indeed do so although survey data indicates this may be as low as 20%).
  • It’s clear that we are still learning about how the virus spreads. Originally we thought it was only spread by symptomatic individuals via surface transmission but now we know asymptomatic people can also spread the virus, and there is increasing consensus that understanding ‘aerosol transmission’ could be the key to slowing the spread.
  • Clearly, numbers, statistics and logical analytics are front and centre in this crisis: Carlo Rovelli, the renowned Physicist, argues powerfully for far more widespread training in these fields.
In this uncertain world, it is foolish to ask for absolute 
certainty. Whoever boasts of being certain is usually the least 
reliable. But this doesn’t mean either that we are in the dark. 
Between certainty and complete uncertainty there is a precious
intermediate space – and it is in this intermediate space that our
lives and our decisions unfold.

Committee Activities

We are all conscious that times are incredibly hard for many people and are keen to help however we can- if there is anything we can do to help those who have been laid-off (networking and introductions help, advice on development etc.) don’t hesitate to drop us a line.

As previewed in our last newsletter, and our recent release, we are excited to be launching a new initiative: AI Ethics Happy Hours. We are now working on organising the first event based on suggestions we have received.

The joint RSS/British Computing Society/Operations Research Society discussions on data science accreditation are picking up again and we are actively involved in these.

Janet Bastiman recently gave a talk at the Minds Mastering Machines conference with the provocative title “Your testing sucks”, using lessons from space exploration to highlight the areas where people in Data Science and AI should apply testing thinking.

Martin Goodson, our chair, continues to run the excellent London Machine Learning meetup and has been active in with virtual events. Next up, on Wednesday November 4th, is “Beyond Accuracy: Behavioural Testing of NLP Models with CheckList“, by Marco Ribeiro, Senior Researcher at Microsoft Research. Videos are posted on the meetup youtube channel – and future events will be posted here.

Elsewhere in Data Science

Lots of non-Covid data science going on, as always!

Bias and more bias…
Bias, ethics and diversity continue to be hot topics in data science…

By defining unfairness as the presence of a harmful influence from 
the sensitive attribute in the graph, CBNs provides a simple and
intuitive visual representation for describing different possible
unfairness scenarios underlying a dataset

Science Data Science …
An exciting application of data science and machine learning is in enabling scientific research. In fact, Demis Hassabis, CEO and co-founder of Deep Mind, talks about this being the area of AI application that he is most excited about. In this excellent interview with Azeem Azhar, he discusses the foundations of Deep Mind, the systematic approach they have taken to generalising their AI breakthroughs, and specifically about applications in scientific research.

  • The first (and most currently accessible) avenue to advancing scientific discovery is in leveraging machine learning best practice to parse through the increasingly sizeable volumes of data generated in scientific experimentation.
  • The area with more potential long term impact is in leveraging machine learning to help drive the direction of new scientific research, similar to the way in which AlphaZero allowed Go players to explore previously unknown strategic options.
    • Deep Mind had previously discussed their Alpha-Fold system, allowing exploration of potential protein structures.
    • Recently they have released FermiNet, a novel Deep Learning architecture that allows scientists to explore the structure of complex molecules from first principles by estimating solutions to Schrödinger’s equation at scale.

Developments in Data Science…
As always, lots of new developments…

Applications of Data Science…
And as always, lots of new applications as well…

AI Trends and Business

To create such innovations, Apple relies on a structure that centers
on functional expertise. Its fundamental belief is that those with
the most expertise and experience in a domain should have decision
rights for that domain.
Apple’s managers at every level, from senior vice president on down,
have been expected to possess three key leadership characteristics:
deep expertise that allows them to meaningfully engage in all the
work being done within their individual functions; immersion in the
details of those functions; and a willingness to collaboratively
debate other functions during collective decision-making

Having senior technical leaders who can “meaningfully engage in all the work being done within their function” is something we are big believers in for data science, and something that is lacking in many organisations.

“We’re seeing that this blending of humans and machines is where
companies are performing well” 

Practical Projects
As always here are a few potential practical projects to while away the socially distanced hours:

Updates from Members and Contributors

Again, hope you found this useful. Please do send on to your friends- we are looking to build a strong community of data science practitioners- and sign up for future updates here:

Processing…
Success! You're on the list.

– Piers

This pandemic belief is mad, bad and dangerous to know

I know reality doesn’t matter anymore. I get that beliefs no longer require an underlying connection to facts about the world.

I’ve made my peace with all of that. Because, generally speaking, I don’t care what you think. If you want to believe that 5G towers can spread viruses, be my guest.

But a false belief that could cause me and my family direct harm is starting to take hold in the UK. This is the idea that a COVID-19 second wave is not really happening here.

People who believe this are unlikely to comply with Test and Trace or a temporary lockdown. If this idea spreads, large scale vaccination programmes will also become impossible. Nearly all of the population need to be vaccinated to achieve population immunity.

It’s not just conspiracy theorists; I was shocked to see a friend on twitter last week praising a fringe ‘researcher’ arguing that the second wave is fake. My friend is a sharp and level-headed software engineer. I respect his opinion.

It’s not hard to see why he was sucked in by this particular misinformationist: she is a figure of authority (a doctor); she is proposing something we all want to believe (that COVID is over); and she presents copious data in graphs and tables.

Terrible ideas from astrology to numerology have been supported by graphs and tables. Tellingly, our ‘researcher’ is neither a statistician nor an epidemiologist, and yet presents her statistical analysis as fact.

She argues that the spike in infections we are witnessing is simply a result of false positives produced by the PCR based COVID-19 test. This is the common thread within this brand of COVID-19 misinformation; people who wouldn’t know a PCR primer from a Dulux Wood Primer are pontificating about false positive rates on social media as we speak.

I’m going to demonstrate the wrong-headedness of this parasitic meme, using one simple argument. Please help me spread this information. Perhaps together we can inoculate the population against this one dumb idea while there is still time.

This graph is taken from the latest government report on testing.

In the first week of July the number of people testing positive was low: roughly 5,000, out of about 500,000 people tested. So the data doesn’t allow you to believe that the false positive rate is more than 1%. The most sceptical position possible is that all positive cases in July were false positives (even the most insane QAnon-grade sceptic can’t believe fewer than zero people had COVID in July).

If the false positive rate is at most 1%— and there is no evidence the test protocol has changed since July—then the 1.5M tests carried out this week in October could generate at most 15,000 false positives. But 90,000 people tested positive in total. What about the other 75,000 people who tested positive?

We are forced us to accept the reality of at least 75,000 true COVID-19 infections this week, no matter how many charts they throw at us. The second wave is real.

Sign up for future updates, news, events & articles from us here:

Processing…
Success! You're on the list.

Thanks to Piers Stobbs, Magda Piatkowska, Lucy Hayes and Karl Ove Kovalsgaard for reading a draft of this post.

How not to lose 16,000 COVID-19 test results: a data scientist’s view

A critical piece of the UK Test and Trace infrastructure failed hard this week. All contacts of almost 16,000 COVID-19 infected people were allowed to circulate unknowingly for an entire seven days in the community. That’s about 50,000 people.

I’m not going to complain about Public Health England (PHE) using excel to merge the test results from each test centre. That was obviously wrong.

This is about something far more worrying: I don’t understand why there wasn’t proper monitoring in place. This is a shameful failure of technical leadership. I’m not calling for excel to be replaced; I’m calling for the NHS Test and Trace leadership to be replaced by people who understand data.

Monitoring is basic data science. If your team can’t perform at this level then you shouldn’t be handling any important data, certainly not the data that our national pandemic strategy depends on. The firm I work at only captures data for financial institutions; nobody’s life is in our hands. Yet we’d never deploy a data pipeline without proper monitoring.

Why isn’t this basic skill at the core of the NHS Test and Trace system? The UK government have spent £10B for the programme, most of which has gone to external consultants. I guess £10B doesn’t buy you a serious technical culture, where throwing things together in excel without proper monitoring would be a point of professional embarrassment.

Experienced data scientists would put (at least) two simple monitors on this process:

  • A plot showing the total number of new positive tests each day
  • A plot showing the number of tests from each test centre

Let’s take these in turn.

Plot of the total number of new positive tests each day

This plots looks like this when there is a problem in data collection:

A plot like this just looks weird. Why has the trajectory suddenly flattened off? Does this reflect a trend in the underlying data or is it an artefact of the data collection process? This needs to be investigated.

This is where the second plot comes in.

Plot of the counts of tests from each test centre

This plot looks like this if data ingestion has failed:

The counts are bunched up at the right-hand side (implying lots of files have the same number of rows). This is a dead give-away that some artificial limit has been hit. Naturally occurring counts never look like this.

A Data Science Culture

Each test centre should speak briefly with PHE on a daily basis, to verify that their data has been collected properly. This week we collected X positive samples from Test Centre Y. Yes, that’s the correct number.

This, the world’s most boring meeting, will seem like a complete waste of time until one day somebody says ‘we collected 65,536 rows of data’ and a sharp junior scientist says ‘wtf!’. (65,536 is the maximum number of rows allowed in older versions of excel)

Using these two simple plots and a regular meeting we have created an early warning system for data collection issues. Please don’t tell me ‘we’re in a pandemic crisis, there just wasn’t time to set this up’. There was time, because we’re in a pandemic, and because people will die because this process has failed.

I’m chair of the organisation representing the professional interests of data scientist in the UK, the Data Science Section of the Royal Statistical Society. Many of our members complain they are prevented from being effective because they don’t have a technical manager who understands their work. This kind of manager would be deeply troubled until basic data monitoring was in place.

High quality data science work requires an organisation-wide understanding of data. I do mean organisation-wide: I know companies where the CEO herself checks over data ingestion figures every day.

An experienced data science leader should be installed into NHS Test and Trace immediately, and given authority to run the data pipeline to a professional standard. A culture of rigour and discipline will prevent another catastrophic error. Unfortunately, you can’t buy a culture like this, even (or perhaps especially) by hurling billions of pounds at large consultancy companies.

Sign-up for future updates, news & events from us here:

Processing…
Success! You're on the list.

Thanks to Piers Stobbs and Ryan Procter for reading a draft of this post.

October Newsletter

Hi everyone-

As the rain pours down it definitely feels like winter has arrived- all the more reason to spend some time indoors huddled up with some good data science reading materials!

Following is the October edition of our Royal Statistical Society Data Science Section newsletter. Hopefully some interesting topics and titbits to feed your data science curiosity while figuring the difference between second waves and spikes…

As always- any and all feedback most welcome! If you like these, do please send on to your friends- we are looking to build a strong community of data science practitioners- and sign up for future updates here:

Processing…
Success! You're on the list.

Industrial Strength Data Science October 2020 Newsletter

RSS Data Science Section

Covid Corner

As Trump tests positive, the inevitable seems to be happening, with COVID-19 cases on the rise again in many areas of the world. As always numbers, statistics and models are front and centre in all sorts of ways.

#19 Publish your results in a newspaper first (all criticism 
of the study by scientists will be old news and sour grapes 
by the time they get a chance to make it, and government policy 
will already have been made)
The RSS has been concerned that, during the Covid-19 outbreak, 
many new diagnostic tests for SARS-CoV-2 antigen or antibodies 
have come to market for use both in clinical practice and for 
surveillance without adequate provision for statistical 
evaluation of their clinical and analytical performance. 
  • Of course, this is all rather undermined when you discover the official national case tracking data is being managed in excel
  • Elsewhere, Wired gives a good analysis of the different approaches being taken by the various vaccine research groups to show whether or not their vaccine actually works. A recent paper, Machine Learning for Clinical Trials in the Era of COVID-19 in the Statistics in Biopharmaceutical Research Journal, highlights how machine learning can help with some of these issues.
  • On the epidemiological front, a recent article in Nature, highlights how innovative use of anonymised mobile phone data can be used to track the virus spread.
  • Is dispersion (k) the overlooked variable in our quest to understand the spread of the virus? Breaking down the distribution of infection events (rather than using the average, as with R) could help better explain super-spreaders and inform test and trace programs. Really interesting article from the Atlantic.
  • If anyone is keen to roll up their sleeves and dig in to the data, the c3.ai COVID-19 Grand Challenge might be of interest…
  • Finally, The Alan Turing Institute is convening a public conference “AI and Data Science in the Age of COVID-19” on November 24th. In addition to public discussion there will be a series of closed workshop sessions to assess the response of the UK’s data science and AI community to the current pandemic- if you are interested in participating in the closed sessions you can apply here.

Committee Activities

We are all conscious that times are incredibly hard for many people and are keen to help however we can- if there is anything we can do to help those who have been laid-off (networking and introductions help, advice on development etc.) don’t hesitate to drop us a line.

As previewed in our last newsletter, and our recent release, we are excited to be launching a new initiative: AI Ethics Happy Hours. If you have encountered or
witnessed ethical challenges in your professional life as a data scientist that you think would make for an interesting discussion, we would love to hear from you at dss.ethics@gmail.com (deadline October 15th).

Martin Goodson, our chair, continues to run the excellent London Machine Learning meetup and has been active in lockdown with virtual events. Next up, on Monday October 12th, is “From Machine Learning to Machine Reasoning“, by Drew Hudson from Stanford University. Videos are posted on the meetup youtube channel – and future events will be posted here.

Anjali Mazumder is helping organise the Turing Institute event mentioned above in Covid Corner.

Elsewhere in Data Science

Lots of non-Covid data science going on, as always!

Bias and more bias…

The more we collectively dig into the underlying models driving our every day activities, the more issues we uncover…

"I don't trust linear regressions when it's harder to guess the
direction of the correlation from the scatter plot than 
to find new constellations on it"

Recommenders Gone Wild …

One example that Rachel Thomas discussed in the talk above, is recommendation systems. With the proliferation of content and product choices now available online, we could all use some help curating and narrowing down the options available. When implemented well, recommendation systems can elegantly assist in this. Many typically work through some form of collaborative filtering which really boils down to identifying similar behaviours and extrapolating:

If Alice likes oranges, pineapples and mangos, 
and Bob likes oranges and pineapples, 
maybe Bob will also like mangos...

However, depending on how these similarities are codified and calculated, it has now been shown that feedback loops can quite easily be generated.

  • Wired dug into the YouTube recommender in 2019 with Guillaume Chaslot, one of the original engineers on the project, highlighting the importance the metric chosen to optimise – in this case viewing time – has in driving the material recommended and so consumed.
  • In a recent follow up, “YouTube’s Plot to Silence Conspiracy Theories” , they highlight some of the changes that have been implemented to reduce the issues identified. Interestingly the focus seems to be on identifying potentially hazardous material that is then excluded from the recommender rather than changing the recommender itself.
  • DeepMind recently released research digging into these feedback loops (“Degenerate Feedback Loops in Recommender Systems”) giving a theoretical grounding to the concepts of “echo chambers” and “filter bubbles” and why they occur.
  • In “Overcoming Echo Chambers in Recommender Systems“, Ryan Millar digs into alternative methods using the fabled MovieLens data set, giving an example of how, through different objective functions, you can reduce the feedback loop effects in the recommender system itself. This feels similar to the concepts of “explore” vs “exploit” in Thompson sampling, and an approach well worth considering if you are building a system yourself.
  • Finally Eugene Yan gives a useful summary of RecSys 2020, highlighting a number of research papers on the topic of feedback loops and bias in recommender systems.

Yet more GPT-3 …
Continuing our regular feature on GPT-3 (OpenAI’s 175 billion parameter NLP model) as it continues to generate news and commentary.

AI Trends and Business

Practical Projects
As always here are a few potential practical projects to while away the socially distanced hours:

Updates from Members and Contributors

Again, hope you found this useful. Please do send on to your friends- we are looking to build a strong community of data science practitioners- and sign up for future updates here:

Processing…
Success! You're on the list.

– Piers

Lessons on Algorithm ethics from UK Exam algorithm story

This document chronicles the recent controversy surrounding the UK exams fiasco where an algorithmic approach was rejected and demonstrates areas where those building and deploying models must be vigilant and have training, process and governance in place
Giles Pavey – September 2021

Background

In March 2020, the UK Government made the significant policy in response to the COVID-19 pandemic to close all schools and cancel all GCSE and A level examinations in summer 2020. This policy decision reflects a trade-off between the public health benefits of potentially slowing the spread of the virus, and the costs to the economy and to society, including disruption to the education system.

The Government instructed OFQUAL (the organisations responsible for UK exams) to create a system that:

  1. Ensured grades were comparable to previous years – specifically to minimise grade inflation. This was deemed important for colleges, universities and employers in order that they could have confidence in exam results.
  2. Be unbiased (not discriminate) with respect to the student’s protected characteristics. (race, religion, gender…)
  3. Could therefore be used by students and colleges in order for pupils for progress fairly to the next stage of their education.

Method

The solution OFQUAL created was based upon an algorithm with the following inputs:

  • Each school’s previous results, for each subject, over the previous 3 years.
  • A rank order of the projected success of each pupil within each school, by subject on their predicted result if they were to take that exam.
  • The schools also produced a predicted grade for each pupil, for each subject; although these were not used by the algorithm unless required.

The way the algorithm worked was as follows – it looked at each school’s results in every subject – for example Blackfriars High School’s results in Mathematics over the past 3 years. This then gave a prediction of how many students would be expected to receive an A, B, C, D, E, grade this year. Assuming the algorithm predicted 10 A’s and 16 B’s then the top 10 students taking maths at Blackfriars would be allocated an A grade the next 16 would receive a B. Importantly, this was regardless of what they had been predicted.

The exception to this approach would be if the number of pupils taking a given subject in a given school was deemed too small to use this approach. If the number of students at Blackfriars Highschool taking Latin was less than 5 then all pupils would receive their predicted grade (irrespective of the schools previous results) . If it was 6-15 then the grade was a weighted combination of the algorithm and predicted grade.

Algorithmic results and ensuing calamity

Separate algorithms were used by the consistent countries of the UK and for the exams taken at 16 (GCSEs) and 18 (A levels). In all cases the algorithm produced were shown to broadly hit the target of having a spread of grades similar to previous years. OFQUAL also said that analysis showed that the algorithms were proved not to have any bias due to protected characteristics. (although more of this later!)

However, the grades received by many students were dramatically different – usually lower – than those predicted by their teachers. This is in predominantly driven by the fact that teachers tend to be optimistic about their student’s attainment – they are prone to predict their pupil’s results “on a good day”. (This happens in non-COVID years where more than half of student’s actuals grades are less than predicted. (e.g. predicted A-B-B and receiving B-B-C))

OFQUAL claimed that if the algorithm’s dampening effect had not been used the proportion of top grades would have risen from 7.7% to 13.9%. But it was when viewed at an individual pupil level that problems arose. The algorithm downgraded teacher’s predictions in 39% of cases.

This resulted in tens of thousands of students not achieving the grades that they needed to move on to their first choice universities.

There was also a dramatic difference in the number of downgrades received by government funded schools compared to “Private” (fee paying) schools.  This was driven by the fact that Private schools are more likely to have both smaller numbers of students and offer more unusual exam subjects. Both of these factors mean that a greater proportion of Private schools had students doing subjects in groups of 5 or less leading to a higher proportion of student receiving predicted rather than algorithm grades.

A more detailed write up of the fall out of the exam algorithm can be found here. Along with a complete explanation of the formula here.

Lessons learned for those building and deploying models

  1. There is a dramatic difference between the outcome of an algorithm when viewed at a total population (or macro level) compared to the individual experiences of those student (the micro level) where literally thousands received what they viewed as unfair treatment.
    • Organisations deploying models must make sure to consider the impact on individuals of actions taken from their AI models in addition to the overall effectiveness.
  2. Even though precautions were taken and validated to check that the algorithms was not biased towards or against certain protected groups the profiles of those attending different across schools inevitably led to outcomes being different because the profile of those attending smaller private schools and doing less mainstream subjects is more privileged than the UK as a whole.
    • Specialist statistical analysis is required to design and assess algorithms. Organisations must make sure that even though an algorithm can be shown to not directly use protected characteristics it must be checked that it does not result in discriminatory recommendations. specialist statistical analysis is required to design and assess algorithms
  3. The exam results were significantly influenced by previous outcomes achieved by their school: which the pupils themselves had no part in. This resulted in the so called “sins of my fore-fathers” problem – where students were penalised for events which did not reflect their personal behaviour.
    • Organisations must take immense care when using models that consider historical data that is out of the model subjects’ control.
  4. It remains to be seen whether the predicted results used were in fact subject to the teachers (human) biased. It has however been widely accepted that because these are “human decisions” they are more valid.
    • Organisations that use algorithms in decision making should expect the results to be scrutinised and prepare accordingly. They should be prepared that when switching to algorithmic solutions to previously human decision making it is likely that we will uncover previously unknown bias and that that there may well be no perfect solution.
    • Organisations must prepare clear communications and install a fair and proportionate appeals process where applicable.
  5. Many of the problems with the algorithm were because it was asked to produce fair results from an existing unfair process, i.e. that of basing a student’s ability on exams rather than their performance over the whole 2 years of their study of the curriculum.
    • Organisations must be aware that AI is a powerful tool to drive effectiveness and efficiency it is not magical in its ability to correct all wrongs or mitigate and unfair system. They must educate themselves accordingly.

Sign up for more updates, news, events from us here:

Processing…
Success! You're on the list.

Ethics Happy Hours

Announcing our new Data Science Section Initiative:
Share and discuss ethical challenges encountered in your professional life during Ethics Happy Hours

Ethical questions are ubiquitous in pursuing real-world data science projects. During the Covid-19 crisis, controversies around the design of contact tracing apps or the moderation of GCSE and A-level results have recently served as a reminder of the diverse forms that such questions can take.

To raise awareness and promote informed debate about the role of ethics in data science, we are planning to launch a new initiative that draws on the wealth of professional experience that exists within the Data Science Section and the RSS community more broadly. The core of this initiative will be a series of virtual ‘ethics happy hours’ dedicated to discussions of ethical challenges and dilemmas encountered by community members in their professional lives, thus relying on the power of concrete case studies to spark ethical deliberation.

Our hope is for these sessions to provide an opportunity for valuable intellectual exchange, but also for DSS members to get together and get to know each other during these times of social distancing. Each session will feature two to three pre-identified discussants, including Florian Ostmann (Alan Turing Institute), Anjali Mazumder (Alan Turing Institute), Danielle Belgrave (Microsoft Research) and other DSS committee members. Drawing on their ethics expertise and covering conceptual and philosophical as well as technical angles, discussants will share initial reflections on the selected case study, followed by an open conversation among all participants.

The single most important ingredient will be your stories. If you have encountered or
witnessed ethical challenges in your professional life as a data scientist that you think would make for an interesting discussion, the DSS committee would like to hear from you.

Initially, a short summary of 200 to 500 words will be all we need. Following a review of submissions received, we will be in touch with proponents of selected stories to explore the best way of presenting them and, where needed, develop them in more detail. In doing so, we will be guided by authors’ confidentiality preferences. Stories may be anonymised as needed and contributors will be free to decide whether or not to present their story themselves or otherwise identify themselves during the discussion.

We look forward to receiving your stories by October 15th via email to
dss.ethics@gmail.com. If you have any questions prior to submitting a story, please do not hesitate to reach out via the same address.

— Florian, Anjali, Danielle

Sign up for future updates, news, events from us here:

Processing…
Success! You're on the list.