Featured

The UK AI Strategy: are we listening to the experts?

The emerging UK National AI Strategy is out of step with the needs of the nation’s technical community and, as it stands, is unlikely to result in a well-functioning AI industry. The Data Science & Artificial Intelligence Section (Royal Statistical Society) asks whether the government has actively sought the views of expert practitioners.

The UK government has released plans for a new AI Strategy, with the stated goal of making ‘the UK a global centre for the development, commercialisation and adoption of responsible AI’. We asked our members—UK-based technical practitioners of artificial intelligence—their opinion of the plans. Our results point to a fundamental disconnect between the roadmap for the Strategy and the views of those actually building AI-based products and services in the UK.

The basis of the AI Strategy is the AI Council’s ‘AI Roadmap‘, which was developed with input mainly from public sector leaders and university researchers. The AI Council does not appear to have engaged with engineers and scientists from the commercial technology sector.

Tech companies commercialise AI, not universities. Yet between the 52 individuals who contributed to the Roadmap, only four software companies are represented. There are 19 CBEs and OBEs but not one startup CTO.

Hoping to fill this gap, we surveyed our community of practicing data scientists and AI specialists, asking for their thoughts on the Roadmap. We received 284 detailed responses; clearly the technical community cares deeply about this subject.

Only by direct engagement with technical specialists can we hope to uncover the key ingredients of a successful AI industry. For example, while the AI Roadmap focusses on moonshots and flagship institutes, the community seems to care more about practical issues such as open-source software, startup funding and knowledge-sharing.

The economic opportunity of AI represents at least 5% of GDP (compare to fisheries, at about 0.05% of GDP). If the National AI Strategy does not correctly identify the challenges that lie ahead, this opportunity will be squandered.

We will publish our findings in four parts, covering the different sections of the AI Roadmap. This first covers AI research and development.

Comparison with the AI Roadmap for R&D

Three areas are central to the Roadmap’s plans for R&D: the Alan Turing Institute, Moon shots (such as ‘digital twin’ technology) and ‘AI to transform research, development and innovation’. These topics were scarcely mentioned by our respondents, despite them being listing as potential subjects for discussion.

For example the Alan Turing institute was mentioned only 4 times by respondents. Two were negative.

There were 7 responses on the topic of moon shots, 3 of them negative. ‘Digital twins’ were not mentioned at all:

“moonshotting” […] without a solid foundation and shared values would destroy the field in perpetuity.

The central concerns of the Roadmap may sound plausible on paper but they don’t resonate strongly with the technical community.

Better collaboration between academia and industry

By far the most frequently mentioned topic was better collaboration between academia and industry, which was addressed by 52 respondents. To summarise: knowledge transfer between academia and companies is not currently working. The UK’s strength in academic research will be wasted if industry and academia cannot easily learn from each other.

The Roadmap barely addresses this topic, other than one mention of the pre-existing Knowledge Transfer Partnerships (KTP) scheme. Yet our practitioner community think that clearing this obstacle should be at the core of the strategy. A typical request was:

Better sharing of knowledge and experience between universities and industry, specifically industry use case examples.

There were many voices suggesting the knowledge transfer should also operate in the opposite direction:

The knowledge transfer deficit is in the opposite direction: industry making investment and research headway while universities cannot compete.

Encourage adoption of good software engineering practices amongst researchers.

Another key concern is the brain drain from academia to industry:

UK universities were leading in the AI space until the industry (Google, Msft, Amz, FB) started poaching all the top professors […]

There needs to be strong support for this area in academia to stop ‘brain drain’ to big tech companies and allow UK to make research advances that will allow competitive advantages for startups.

Open source

40 respondents recommend that the Strategy focus on open-source. This makes it the second most mentioned issue in the entire survey. Strikingly, the AI Roadmap doesn’t contain a single mention of the term ‘open-source’.

Many respondents agreed that funding positions for contributors to key open-source projects would bring many benefits. This is well-founded: when Columbia university hired core developers on the Scikit-learn open source project they facilitated knowledge transfer and training on cutting edge techniques.

Open source should be embraced by the Government, it sends a positive message about intent and helps to draw in the right talent to the field (most people learning practical machine learning will start their experience in open source).

Support for startups

40 responses agreed on a need to support startups through direct funding, incubators, tax breaks and other approaches such as access to compute infrastructure.

More funding and assistance for AI startups, and assisting their collaboration with UK-based research and universities.

Funding for AI and Deep Tech startups.

Funding/grants for startups for the use of cloud computing infrastructure.

Ethics

26 responses want to see consideration of ethics at the heart of future AI innovation. For example:

Finally, I think governance of how AI and DS are used by the private sector is very important, and something that, in my opinion, should be a priority for any government AI roadmap.

If you fail to identify and analyze the obstacles, you don’t have a strategy

We draw attention to the work of UCLA strategy researcher Richard Rumelt. He makes a specific warning: ‘If you fail to identify and analyze the obstacles, you don’t have a strategy’. Has the AI Roadmap made this mistake? Its 37 pages do not apparently contain a clear analysis of the obstacles in the way of a strong AI industry.

Identification and analysis of these obstacles requires close and sustained collaboration with AI practitioners; our survey is just a starting point. We urge the Office for AI to engage directly with the technical community before moving forward to finalising their AI Strategy.

Sign up to the Data Science & AI Section if you are interested in this topic

Processing…
Success! You're on the list.

Data Science and AI Section (Royal Statistical Society) Committee

Chair: Dr Martin Goodson (CEO & Chief Scientist, Evolution AI)

Vice Chair: Dr Jim Weatherall (VP, Data Science & AI, AstraZeneca)

Trevor Duguid Farrant (Senior Principal Statistician, Mondeléz International)

Rich Pugh (Chief Data Scientist, Mango Solutions (an Ascent Company))

Dr Janet Bastiman (Head of Analytics, Napier AI. AI Venture Partner)

Dr Adam Davison (Head of Insight & Data Science, The Economist)

Dr Anjali Mazumder (AI and Justice & Human Rights Theme Lead, Alan Turing Institute)

Giles Pavey (Global Director – Data Science, Unilever)

Piers Stobbs (Chief Data Officer, Cazoo)

Magda Woods (Data Director, New Statesman Media Group)

Dr Danielle Belgrave (Senior Staff Research Scientist, DeepMind)

Appendix: Analysis

Our survey was designed to bring out the voice of technical community. We asked leading questions – prompting the respondents with topics from the AI roadmap as well as other topics we thought might be of interest to the community. We collected free-text responses.

Our analysis is subjective and we will make our full dataset available for independent analysis. We do not make any quantitative claims, because our sample is biased (for example, geographically).

We included a single quantitative question: ‘To what extent do you agree that these are the top priorities for the UK in AI Research, Development & Innovation? (5 means ‘Strongly agree’)’. Responses could range from 0-5. The average response was 3.4 (neither agree nor disagree).

We received 284 responses in total. We selected qualified respondents by requiring:

  • They declared they were either “a practising data scientist” or “used to be a practising data scientist”
  • They declared they were “an individual data science contributor”, “a line manager of data scientists” or “a senior leader involved in data science”

After applying these requirements 245 qualified responses remained. 118 (47%) of respondents identified as either ‘Managers’ or ‘Senior leaders’.

In order to interpret our results we made a crude manual classification of every comment and focused on those topics which at least 20 respondents mentioned.

The declared demographic of our qualified responses was primarily male (77%) and white (75%). We note that only 60% answered questions on demographics.

The Data Science and AI section is grateful for the support of our partner communities PyLadies London, PyData London, PyDataUK, London Machine Learning and the Apache Spark+AI Meetup, representing a combined (overlapping) membership of 27K data scientists and technologists.

December Newsletter

Hi everyone-

Properly dark and cold now in the UK, and even some initial sightings of Christmas trees so it must be getting to the end of year… perhaps time for some satisfying data science reading materials while pondering what present to buy for your long lost auntie!

Following is the Dcember edition of our Royal Statistical Society Data Science and AI Section newsletter. Hopefully some interesting topics and titbits to feed your data science curiosity.

As always- any and all feedback most welcome! If you like these, do please send on to your friends- we are looking to build a strong community of data science practitioners. And if you are not signed up to receive these automatically you can do so here.

Industrial Strength Data Science December 2021 Newsletter

RSS Data Science Section

Committee Activities

We are all conscious that times are incredibly hard for many people and are keen to help however we can- if there is anything we can do to help those who have been laid-off (networking and introductions help, advice on development etc.) don’t hesitate to drop us a line.

On Tuesday 23rd November we hosted our latest event “The National AI Strategy – boom or bust to your career in data science?” and it was another great success with a strong turnout.

  • First of all Seb Krier, Senior Technology Policy Researcher at the Stanford University Cyber Policy Centre, gave an excellent overview of the published National AI strategy using his extensive experience to provide insight into the strengths and weaknesses of the different focus areas, and how it compares to different approaches around the world.
  • Next, Adam Davison and Martin Goodson talked through the results of our recent data science practitioner survey on the government strategy proposals, highlighting areas of discrepancy and omission.
  • We then finished with a lively round-table discussion, additionally including Stian Westlake, Chief Executive of the RSS and Janet Bastiman, Chief Data Scientist at Napier AI.

We will publish a more detailed review and video in the coming weeks for those who missed out.

If anyone is interested in getting more involved in this discussion, we are collaborating with the UK Government’s Office for AI to host a roundtable event on AI Governance and Regulation which is one of the 3 main pillars of the UK AI Strategy. We are seeking Data Science and AI experts and practitioners to participate – please express any interest by emailing weatheralljames@hotmail.com.

Many congratulations to DSS section committee’s Rich Pugh who has been elected to the RSS Council – joining the DSS’s Anjali Mazumder and Jim Weatherall… all part of our cunning plan for global domination!

Martin Goodson continues to run the excellent London Machine Learning meetup and is very active in with events. The last talk was on October 27th where Anees Kazi, senior research scientist at the chair of Computer Aided Medical Procedure and Augmented Reality (CAMPAR) at Technical University of Munich, discussed “Graph Convolutional Networks for Disease Prediction“. Videos are posted on the meetup youtube channel – and future events will be posted here.

This Month in Data Science

Lots of exciting data science going on, as always!

Ethics and more ethics…
Bias, ethics and diversity continue to be hot topics in data science…

"This change will represent one of the largest shifts in facial recognition usage in the technology’s history. More than a third of Facebook’s daily active users have opted in to our Face Recognition setting and are able to be recognized, and its removal will result in the deletion of more than a billion people’s individual facial recognition templates."
For example, asking AI to cure cancer as quickly as possible could be dangerous. “It would probably find ways of inducing tumours in the whole human population, so that it could run millions of experiments in parallel, using all of us as guinea pigs,” said Russell. “And that’s because that’s the solution to the objective we gave it; we just forgot to specify that you can’t use humans as guinea pigs and you can’t use up the whole GDP of the world to run your experiments and you can’t do this and you can’t do that.”

Developments in Data Science…
As always, lots of new developments…

“The brain is able to use information coming from the skin as if it were coming from the eyes. We don’t see with the eyes or hear with the ears, these are just the receptors, seeing and hearing in fact goes on in the brain.”
"This trend of massive investments of dozens of millions of dollars going into training ever more massive AI models appears to be here to stay, at least for now. Given these models are incredibly powerful this is very exciting, but the fact that primarily corporations with large monetary resources can create these models is worrying"

Real world applications of Data Science
Lots of practical examples making a difference in the real world this month!

“Biology is likely far too complex and messy to ever be encapsulated as a simple set of neat mathematical equations. But just as mathematics turned out to be the right description language for physics, biology may turn out to be the perfect type of regime for the application of AI.”
“There was no problem with the algorithm as long as they stay within the boundaries of the business model and buy cookie-cutter homes that are easier to sell. There are a lot of things that affect the valuation of homes that even very sophisticated algorithms cannot catch"

How does that work?
A new section on understanding different approaches and techniques

"Before we start, just a heads-up. We're going to be talking a lot about matrix multiplications and touching on backpropagation (the algorithm for training the model), but you don't need to know any of it beforehand. We'll add the concepts we need one at a time, with explanation.."
For example, speech recognition systems need to disambiguate between phonetically similar phrases like “recognize speech” and “wreck a nice beach”, and a language model can help pick the one that sounds the most natural in a given context. For instance, a speech recognition system transcribing a lecture on audio systems should likely prefer "recognize speech", whereas a news flash about an extraterrestrial invasion of Miami should likely prefer "wreck a nice beach".
"But I am going to define this stuff three times. Once for mum, once for dad, and once for the country."

Practical tips
How to drive analytics and ML into production

  • We’ve previously highlighted the importance of MLOps and the standardisation of processes for updating and monitoring ML models in production. Another good podcast on the ‘The Data Exchange’ this time about ML Ops Anti-Patterns (the underlying research paper is here)
  • Speaking of MLOps – excellent summary of the platforms used across the big players, highlighting how much is still ‘home grown’ (labeled ‘IH’ below)
"Machine learning systems are extremely complex, and have a frustrating ability to erode abstractions between software components. This presents a wide array of challenges to the kind of iterative development that is essential for ML success.”

Bigger picture ideas
Longer thought provoking reads – a few more than normal, lean back and pour a drink!

"Abundant evidence and decades of sustained research suggest that the brain cannot simply be assembling sensory information, as though it were putting together a jigsaw puzzle, to perceive its surroundings. This is borne out by the fact that the brain can construct a scene based on the light entering our eyes, even when the incoming information is noisy and ambiguous."
"I would love to incorporate deep learning into the design, manufacturing, and operations of our aircraft. But I need some guarantees."

Practical Projects and Learning Opportunities
As always here are a few potential practical projects to keep you busy:

Covid Corner

As we head into winter, we continue to experience the conflicting emotions of relaxing regulations and behaviour with increasing Covid prevalence and hospitals at breaking point. And now there is a news of a new variant…

"Whatever the reason, by half-term, only around 16 per cent of vaccinations in the cohort had been achieved. Meanwhile, school-age kids had caught Covid by the truckload. Over 7 per cent of the entire Year 7 to Year 11 cohort was infected on any day in the last week of October alone. Maybe that was the unspoken plan. Certainly the JCVI’s minutes – released at the end of October after lengthy delays – make grim reading in this respect. The idea, already noted, that “natural infection” might be better than vaccination for young people was under discussion even here. Somehow, catching Covid was proffered as a better way of not getting ill with Covid than preventing its worst effects with a proven vaccine."

Updates from Members and Contributors

  • Professor Harin Sellahewa reports that nearly 50 of the University of Buckingham’s first ever master’s level data science apprentices have graduated. The Integrated Master’s level Degree Apprenticeship course was set up two years ago to help address an urgent shortage of people with advanced digital skills and to produce expert data scientists by giving them the technological and business skills to transform their workplace. The graduates receive the MSc in Applied Data Science from Buckingham as well as the Level 7 Digital and Technology Solutions Specialist degree apprenticeship certificate from ESFA. The apprenticeship is provided in partnership with AVADO who work with businesses to train staff to develop the skills needed to compete in a digital world. Industry partners such as IBM, Tableau, TigerGraph and Zizo conducted practical workshops for the learners.

Again, hope you found this useful. Please do send on to your friends- we are looking to build a strong community of data science practitioners- and sign up for future updates here.

– Piers

The views expressed are our own and do not necessarily represent those of the RSS

November Newsletter

Hi everyone-

The clocks have changed – officially the end of ‘daylight savings’ in the UK – does that mean we no longer try and save daylight? Certainly feels that way … definitely time for some satisfying data science reading materials while drying out from the rain!

Following is the November edition of our Royal Statistical Society Data Science and AI Section newsletter. Hopefully some interesting topics and titbits to feed your data science curiosity.

As always- any and all feedback most welcome! If you like these, do please send on to your friends- we are looking to build a strong community of data science practitioners. And if you are not signed up to receive these automatically you can do so here.

Industrial Strength Data Science November 2021 Newsletter

RSS Data Science Section

Committee Activities

We are all conscious that times are incredibly hard for many people and are keen to help however we can- if there is anything we can do to help those who have been laid-off (networking and introductions help, advice on development etc.) don’t hesitate to drop us a line.

We are pleased to announce our next virtual DSS meetup event, on Tuesday 23rd November at 5pm: “The National AI Strategy – boom or bust to your career in data science?”. Following on from our commentary on the UK Government’s AI Strategy (based on the excellent feedback from our community), and the pick-up we have received, we are going to run a focused event discussing this topic. You will hear key information about the strategy and have the opportunity to ask questions, provide input, and hear a panel of experts discuss the implications of the strategy for practitioners of AI in the UK. Save the date- all welcome!

Of course, the RSS never sleeps… so preparation for next year’s conference, which will take place in Aberdeen, Scotland from 12-15 September 2022, is already underway. The RSS is inviting proposals for invited topic sessions. These are put together by an individual, group of individuals or an organisation with a set of speakers who they invite to speak on a particular topic. The conference provides one of the best opportunities in the UK for anyone interested in statistics and data science to come together to share knowledge and network. Deadline for proposals is November 18th.

Martin Goodson continues to run the excellent London Machine Learning meetup and is very active in with events. The last talk was on October 27th where Anees Kazi, senior research scientist at the chair of Computer Aided Medical Procedure and Augmented Reality (CAMPAR) at Technical University of Munich, discussed “Graph Convolutional Networks for Disease Prediction“. Videos are posted on the meetup youtube channel – and future events will be posted here.

This Month in Data Science

Lots of exciting data science going on, as always!

Ethics and more ethics…
Bias, ethics and diversity continue to be hot topics in data science…

“As far as we can tell, the algorithm is using problematic and biased criteria, like nationality, to choose which “stream” you get in. People from rich white countries get “Speedy Boarding”; poorer people of colour get pushed to the back of the queue.”
"Facebook has been unwilling to accept even little slivers of profit being sacrificed for safety"
"In a competitive marketplace, it may seem easier to cut corners. But it’s unacceptable to create AI systems that will harm many people, just as it’s unacceptable to create pharmaceuticals and other products—whether cars, children’s toys, or medical devices—that will harm many people."

Developments in Data Science…
As always, lots of new developments…

  • Before delving into the research, it’s sometimes useful to step back and observe the lie of the land. Interesting perspective here on how the major players have ended up focusing in slightly different areas of deep learning research
"It is important to not only look at average task accuracy -- which may be biased by easy or redundant tasks -- but also worst-case accuracy (i.e. the performance on the task with the lowest accuracy)."
"Classification, extractive question answering, and multiple choice tasks benefit so much from additional examples that collecting a few hundred examples is often "worth" billions of parameters"
  • The extent to which you can use synthetic data in machine learning always generates discussion. Microsoft Research highlights you can go far with facial analysis, with the potential benefits of improving diversity in data sets.
  • The annual ‘State of AI’ report is always a weighty tome – this years’ comes in at 188 slides… Worth a skim to see what people are working on, but perhaps be wary of the predictions…
  • This is very relevant – ‘editing’ models. We have talked about how some of the large data sets used to train the leading image and language models have questionable data quality. Is there a way of removing the influence of particular erroneous data points from the final model when they are identified? Researchers at Stanford University think so
"MEND can be trained on a single GPU in less than a day even for 10 billion+ parameter models; once trained MEND enables rapid application of new edits to the pre-trained model. Our experiments with T5, GPT, BERT, and BART models show that MEND is the only approach to model editing that produces effective edits for models with tens of millions to over 10 billion parameters"
"By leveraging advances in graph neural networks, we propose a hypernetwork that can predict performant parameters in a single forward pass taking a fraction of a second, even on a CPU. The proposed model achieves surprisingly good performance on unseen and diverse networks"

Real world applications of Data Science
Lots of practical examples making a difference in the real world this month!

  • Google has announced it plans to include multi-modal models in its search algorithms- learning from the linkages between text and images- good commentary here
“It holds out the promise that we can ask very complex queries and break them down into a set of simpler components, where you can get results for the different, simpler queries and then stitch them together to understand what you really want.”
"To compute the embedding of the tabular context, it first uses a BERT-based architecture to encode several rows above and below the target cell (together with the header row). The content in each cell includes its data type (such as numeric, string, etc.) and its value, and the cell contents present in the same row are concatenated together into a token sequence to be embedded using the BERT encoder”

How does that work?
A new section on understanding different approaches and techniques

  • Why do neural networks generalise so well? Good question… let the BAIR help you out (well worth a read – note you may need to reload the page as it doesnt seem to take in-bound links)
"Perhaps the greatest of these mysteries has been the question of generalization: why do the functions learned by neural networks generalize so well to unseen data? From the perspective of classical ML, neural nets’ high performance is a surprise given that they are so overparameterized that they could easily represent countless poorly-generalizing functions."

Practical tips
How to drive analytics and ML into production

"Nobody cared that I speak 5 languages, that I know a bunch about how microcontrollers work in the tiniest of details, how an analog high-frequency circuit is built from bare metal, and how computers actually work. All of that is abstracted away. You only need…algorithms & data structures pretty much.”

Bigger picture ideas
Longer thought provoking reads – a few more than normal, lean back and pour a drink!

"A number of researchers are showing that idealized versions of these powerful networks are mathematically equivalent to older, simpler machine learning models called kernel machines. If this equivalence can be extended beyond idealized neural networks, it may explain how practical ANNs achieve their astonishing results."
"I wrote earlier this year about Morioka Shoten, a bookshop in Tokyo that only sells one book, and you could see this as an extreme reaction to a problem of infinite choice. Of course, like all these solutions it really only relocates the problem, because now you have to know about the shop instead of having to know about the book"

Practical Projects and Learning Opportunities
As always here are a few potential practical projects to keep you busy:

"Transcribing Japanese cursive writing found in historical literary works like this one is usually an arduous task even for experienced researchers. So we tested a machine learning model called KuroNet to transcribe these historical scripts."
"A competition focused on helping advance development of next-generation virtual assistants that will assist humans in completing real-world tasks by harnessing generalizable AI methodologies such as continuous learning, teachable AI, multimodal understanding, and reasoning"

Covid Corner

Although life seems to be returning to normal for many people in the UK, there is still lots of uncertainty on the Covid front… booster vaccinations are now rolling out in the UK, which is good news, but we still have exceedingly high community covid case levels due to the Delta variant and rising hospitalisations…

"From the viewpoint of some JCVI members, children aren’t independent agents with a right to be protected from a potentially dangerous virus. Rather, because they can serve as human shields for more vulnerable adults, it’s downright good when children get sick. They explicitly stated that “natural infection in children could have substantial long-term benefits for COVID-19 in the UK.”  Not only is this scientific nonsense, as the high number of infections in the UK clearly shows, it’s a moral abomination"

Updates from Members and Contributors

  • Sorry we didnt do more publicity around PyData Global 2021 … it just happened last week. Many congrats to Kevin O’Brien one of the main organisers and to Marco Gorelli for his talk on Bayesian Ordered Logistic Regression!
  • Ronald Richman has just published a new paper on explainable deep learning which looks very interesting.
  • Sarah Phelps invites everyone to what looks to be an excellent webinar hosted by the UK ONS Data Science Campus:
    • “The UK Office for National Statistics Data Science Campus and UNECE HLG-MOS invite you to join them for the ONS-UNECE Machine Learning Group 2021 Webinar on 19 November. “
    • “The webinar will provide an opportunity to learn about the progress that the Group has made this year in its different work areas, from coding and classification and satellite imagery to operationalisation and data ethics. Bringing together colleagues from across the global official statistics community, it will include contributions from senior figures in the data science divisions of various NSOs as well as discussion on the priorities for advancing the use of machine learning in official statistics in 2022.”

Again, hope you found this useful. Please do send on to your friends- we are looking to build a strong community of data science practitioners- and sign up for future updates here.

– Piers

The views expressed are our own and do not necessarily represent those of the RSS

October Newsletter

Hi everyone-

I guess summer is over, what there was of it- I was hoping we might get a bit of autumn sunshine but it feels like it’s big coat weather already… definitely time for some tasty data science reading materials in front of a warm fire!

Following is the October edition of our Royal Statistical Society Data Science and AI Section newsletter. Hopefully some interesting topics and titbits to feed your data science curiosity.

As always- any and all feedback most welcome! If you like these, do please send on to your friends- we are looking to build a strong community of data science practitioners. And if you are not signed up to receive these automatically you can do so here.

Industrial Strength Data Science October 2021 Newsletter

RSS Data Science Section

Committee Activities

We are all conscious that times are incredibly hard for many people and are keen to help however we can- if there is anything we can do to help those who have been laid-off (networking and introductions help, advice on development etc.) don’t hesitate to drop us a line.

First of all, we have a new name… Data Science and AI Section! To be honest, we’ve always talked about machine learning and artificial intelligence, and have some very experienced practitioners both on the committee and in our network, so it doesn’t really change our focus. It is nice to have it officially recognised by the RSS though.

Thank you all for taking the time to fill in our survey responding to the UK Government’s proposed AI Strategy. As you may have seen, Martin Goodson, our chair, summarised some of the findings in a recent post, highlighting the significant gaps in the government’s proposed approach based on comments from you. Some of these gaps, particularly on open-source, have now been publicly acknowledged, multiple times. In addition Martin, and Jim Weatherall met with Sana Khareghani (director of the Office for AI) and Tabitha Goldstaub (chair of the AI council) in order to further advocate for our community’s needs, with Sana agreeing that the Office for AI will run workshops together with the RSS focused on the technical practitioner community, in order to gain their perspective and identify their needs.

“Confessions of a Data Scientist” seemed to go down very well at the recent RSS conference- massive thanks to Louisa Nolan for making it so successful, and to you all for your contributions.

Of course, the RSS never sleeps… so preparation for next year’s conference, which will take place in Aberdeen, Scotland from 12-15 September 2022, is already underway. The RSS is inviting proposals for invited topic sessions. These are put together by an individual, group of individuals or an organisation with a set of speakers who they invite to speak on a particular topic. The conference provides one of the best opportunities in the UK for anyone interested in statistics and data science to come together to share knowledge and network. Deadline for proposals is November 18th.

Martin Goodson continues to run the excellent London Machine Learning meetup and is very active in with events. The last talk was on September 7th where Thomas Kipf, Research Scientist at Google Research in the Brain Team in Amsterdam, discussed “Relational Structure Discovery“. Videos are posted on the meetup youtube channel – and future events will be posted here.

This Month in Data Science

Lots of exciting data science going on, as always!

Ethics and more ethics…
Bias, ethics and diversity continue to be hot topics in data science…

“Artificial intelligence can be a force for good, helping societies overcome some of the great challenges of our times. But AI technologies can have negative, even catastrophic, effects if they are used without sufficient regard to how they affect people’s human rights”
"Depoliticizing people’s feeds makes sense for a company that is perpetually in hot water for its alleged impact on politics"
"We don’t want viewers regretting the videos they spend time watching and realized we needed to do even more to measure how much value you get from your time on YouTube."

Developments in Data Science…
As always, lots of new developments… thought we’d have a more extended look at some of the new research this month

  • Plenty of great arXiv papers out there this month- I know these can be a bit dry, so will try and give a bit of context…
    • One theme of research we have been following is “fewer-shot” training of models. Fundamentally, humans don’t need millions of examples of an orange before being able to identify one, so learning from limited examples should be possible. Large language models like GPT-3 have shown great promise in this area, where, given a few “prompts” (question and answer examples), they seem to be able to provide remarkable results to this type or problem. Sadly, this paper, “true few-shot learning” suggests we need a more standardised approach to example selection as previous results may have been artificially inflated by biased approaches.
    • More positively, “Can you learn an algorithm” talks through recent research showing that simple recurrent neural networks can learn approaches that can be successfully applied to larger scale problems, just as humans can learn from toy examples. Similarly, a new sequence to sequence learning approach from MIT CSAIL includes a component that learns “grammar” across examples.
    • Another popular research theme is simplifying architecture and reducing processing. A team at Google Brain have shown (“Pay Attention to MLPs“) that you can almost replicate the performance of transformers (a more complex deep learning architecture) with a simpler approach based on basic building blocks (multi-layer perceptrons)
    • GANs (generational adversarial networks) are pretty cool – they generate new similar looking examples from input data (see here for an intro). A recent paper (GAN’s N’ Roses) takes this to a new level, generating stable video from an input and a theme. (“GAN’s N’ Roses” is clearly a popular meme – this tutorial predates the paper by 4 years!)
  • Of course the big industrial research powerhouses (Google/DeepMind, Facebook etc.) keep churning out fantastic work:
“We would like our agents to leverage knowledge acquired in previous tasks to learn a new task more quickly, in the same way that a cook will have an easier time learning a new recipe than someone who has never prepared a dish before"
  • Finally, one paper I encourage everyone to read- “A Farewell to the Bias-Variance Tradeoff?“, one of the conundrums I still struggle to fully understand … why is that over-parameterised models (those which seem to have far too many parameters given the data set they are trained on) are able to generalise so well.

Real world applications of Data Science
Lots of practical examples making a difference in the real world this month!

  • Great article in Wired on the development of large language models outside of the US, and the English language
"What's surprising about these large language models is how much they know about how the world works simply from reading all the stuff that they can find"
"It is a pioneering program that’s mixing responsible AI and science with indigenous led knowledge and solving complex environmental management problems at spots in Northern Australia"
  • We don’t hear much from Amazon about their use of AI, although clearly they have very advanced applications across their business. This was an interesting post digging into the practical problem of how you help delivery workers find the actual entrance to a given residence, from noisy data.
  • “In this project, we’ve trained physically simulated humanoids to play a simplified version of 2v2 football” …. and there’s video!
  • And the Boston Dynamics robots continue to fascinate/scare in equal measure… they can now do Parkour!
"On the Atlas project, we use parkour as an experimental theme to study problems related to rapid behavior creation, dynamic locomotion, and connections between perception and control that allow the robot to adapt – quite literally – on the fly."
"Everyone was floored, there was a lot of press, and then it was radio silence, basically. You’re in this weird situation where there’s been this major advance in your field, but you can’t build on it.”

How does that work?
A new section on understanding different approaches and techniques

  • Hyper-parameter optimisation can often require more art than science if you don’t have a systematic approach- some useful tips here using Argo
  • There are lots of different activation functions (defining the output from given inputs) you can use in neural networks, but which one should you use for a given task? Useful paper here.
  • Interesting comparison: using meme search to explore the performance of different image encoders, in particular CLIP from OpenAI vs Google’s Big Transfer
  • I’m not a massive fan of media-mix modelling (building models that optmise marketing expenditure based on historic performance) because it always feels there is so much fundamentally missing in the underying data sets. However, they can certainly be useful, and using a Bayesian approach would seem to be a good way to go (more detail here)
"The Bayesian approach allows prior knowledge to be elegantly incorporated into the model and quantified with the appropriate mathematical distributions."

Practical tips
How to drive analytics and ML into production

"Companies that are starting with the problem first, improving on a defined metric and reach ML as a solution naturally are the ones that will treat their models as a continuously developing product”

Bigger picture ideas
Longer thought provoking reads

"the modern data stack isn't enough. We have to create a modern data experience."
"We call for the replacement of the deep network technology to make it closer to how the brain works by replacing each simple unit in the deep network today with a unit that represents a neuron, which is already—on its own—deep"

Practical Projects and Learning Opportunities
As always here are a few potential practical projects to keep you busy:

What’s interesting with that system, contrary to classical game development, is that you don’t need to hard-code every interaction. Instead, you use a language model that selects what’s robot possible action is the most appropriate given user input.
Our goal is to create a formal call for blog posts at ICLR to incentivize and reward researchers to review past work and summarize the outcomes, develop new intuitions, or highlight some shortcomings.

Covid Corner

Although life seems to be returning to normal for many people in the UK, there is still lots of uncertainty on the Covid front… vaccinations keep progressing in the UK, which is good news, but we still have high community covid case levels due to the Delta variant…

"By comparing Eva’s performance against modelled counterfactual scenarios, we show that Eva identified 1.85 times as many asymptomatic, infected travellers as random surveillance testing, with up to 2-4 times as many during peak travel, and 1.25-1.45 times as many asymptomatic, infected travellers as testing policies that only utilize epidemiological metrics."

Updates from Members and Contributors

Again, hope you found this useful. Please do send on to your friends- we are looking to build a strong community of data science practitioners- and sign up for future updates here.

– Piers

The views expressed are our own and do not necessarily represent those of the RSS

September Newsletter

Hi everyone-

I don’t know about you, but that didn’t feel particularly August-like…. I miss the sun! Perhaps September will save the summer, together with some inspiration from the Paralympics … How about a few curated data science materials for perusing during the lull in the wheelchair rugby final?

Following is the September edition of our Royal Statistical Society Data Science Section newsletter. Hopefully some interesting topics and titbits to feed your data science curiosity … We are continuing with our move of Covid Corner to the end to change the focus a little.

As always- any and all feedback most welcome! If you like these, do please send on to your friends- we are looking to build a strong community of data science practitioners. And if you are not signed up to receive these automatically you can do so here.

Industrial Strength Data Science September 2021 Newsletter

RSS Data Science Section

Committee Activities

We are all conscious that times are incredibly hard for many people and are keen to help however we can- if there is anything we can do to help those who have been laid-off (networking and introductions help, advice on development etc.) don’t hesitate to drop us a line.

Thank you all for taking the time to fill in our survey responding to the UK Government’s proposed AI Strategy We are working on a series of posts digging into the results which we hope will be thought provoking.

This year’s RSS Conference is almost here (Manchester from 6-9 September, register here), with some great keynote talks from the likes of Hadley Wickham, Bin Yu and Tom Chivers. There is online access to over 40 hours of content at the conference covering a wide variety of topics. The full list of the online content can be found here. We really hope to see you all there, particularly at “Confessions of a Data Scientist” (11:40-13:00 Tuesday, 7 September), chaired by Data Science Section committee member Louisa Nolan.  

Martin Goodson, our chair, continues to run the excellent London Machine Learning meetup and is very active in with events. The next talk is on September 7th when Thomas Kipf, Research Scientist at Google Research in the Brain Team in Amsterdam, will discuss “Relational Structure Discovery“. Videos are posted on the meetup youtube channel – and future events will be posted here.

Many congratulations to Martin and the team at evolution.ai for winning the Leading Innovators in Data Extraction Award during the FinTech Awards 2021!

This Month in Data Science

Lots of exciting data science going on, as always!

Ethics and more ethics…
Bias, ethics and diversity continue to be hot topics in data science…

"The fact that diagnostic models recognize race in medical scans is startling. The mystery of how they do it only adds fuel to worries that AI could magnify existing racial disparities in health care"
  • The Stanford Institute for Human-Centered Artificial Intelligence released a comprehensive review of the opportunities and risks of what it calls “Foundation Models” – these are models (such as BERT, DALL-E, and GPT-3) that are trained on “broad data at scale and are adaptable to a wide range of downstream tasks”
    • The research paper is a weighty tome (available here) but definitely worth a look
    • A good review can be found here
"They create a single point of failure, so any defects, any biases which these models have, any security vulnerabilities . . . are just blindly inherited by all the downstream tasks"
  • Of course the models and algorithms could be perfect, but still cause harm if they are not solving the right problem, or the outputs are not used in the right way
    • Motherboard reports that police are apparently attempting to have evidence generated from gunshot-detecting AI system altered
    • And a short but well reasoned piece in defence of algorithms:
"These algorithms aren’t “mutant” in any meaningful sense – their outcomes are the inevitable consequence of decisions made during their design"

Developments in Data Science…
As always, lots of new developments…

  • All sorts of activity in the reinforcement learning/robotics space this month:
“As far as I know, this is an entirely unprecedented level of generality for a reinforcement-learning agent"
  • As always, lots of research is going on in the deep learning architecture space:
  • Similarly investigation into methods that learn from smaller data sets continues
    • Researchers at Facebook, PSL Research and NYU have developed an elegant unsupervised pre-training method called VICReg that attempts to minimise issues of variance (identical representations for different inputs), invariance (dissimilar representations for inputs that humans find similar) and covariance (redundant parts of a representation)- this shows great promise for aiding more efficient use of pre-training and data augmentation
    • This paper also gives a good survey of data augmentation methods for Deep Learning

Real world applications of Data Science
Lots of practical examples making a difference in the real world this month!

"If we intervene early, the treatments can kick in early and slow down the progression of the disease and at the same time avoid more damage"
"Another method that we found to be effective was the use of unsupervised self-training. We prepared a set of 100 million satellite images from across Africa, and filtered these to a subset of 8.7 million images that mostly contained buildings. This dataset was used for self-training using the Noisy Student method, in which the output of the best building detection model from the previous stage is used as a ‘teacher’ to then train a ‘student’ model that makes similar predictions from augmented images."

How does that work?
A new section on understanding different approaches and techniques

"ML is notoriously bad at this inverse causality type of problems. They require us to answer “what if” questions, what Economists call counterfactuals. What would happen if instead of this price I’m currently asking for my merchandise, I use another price?"

Practical tips
How to drive analytics and ML into production

"Analytics isn’t primarily technical. While technical skills are useful, they’re not what separate average analysts from great ones."

Bigger picture ideas
Longer thought provoking reads

If you tell me a story and I say, ‘Oh, the same thing happened to me,’ literally the same thing did not happen to me that happened to you, but I can make a mapping that makes it seem very analogous. It’s something that we humans do all the time without even realizing we’re doing it. We’re swimming in this sea of analogies constantly.
"There’s a slightly humorous stereotype about computational complexity that says what we often end up doing is taking a problem that is solved a lot of the time in practice and proving that it’s actually very difficult"

Practical Projects and Learning Opportunities
As always here are a few potential practical projects to keep you busy:

All of the images in this post were synthesized by a combination of several machine learning models, directed by text that I provided, VQGAN for generation, and CLIP for directing the image to match the text.

Covid Corner

Still lots of uncertainty on the Covid front… vaccinations keep progressing in the UK, which is good news, but we still have very high community covid case levels due to the Delta variant…

“In the end, many hundreds of predictive tools were developed. None of them made a real difference, and some were potentially harmful.”

Updates from Members and Contributors

Again, hope you found this useful. Please do send on to your friends- we are looking to build a strong community of data science practitioners- and sign up for future updates here.

– Piers

The views expressed are our own and do not necessarily represent those of the RSS

August Newsletter

Hi everyone-

That was quick, August already, but at least we have had the occasional day when it properly feels like summer- and now we have some Olympics to watch which is always entertaining! … How about a few curated data science materials for reading in while watching the marathon?

Following is the August edition of our Royal Statistical Society Data Science Section newsletter. Hopefully some interesting topics and titbits to feed your data science curiosity … We are continuing with our move of Covid Corner to the end to change the focus a little.

As always- any and all feedback most welcome! If you like these, do please send on to your friends- we are looking to build a strong community of data science practitioners. And if you are not signed up to receive these automatically you can do so here.

Industrial Strength Data Science August 2021 Newsletter

RSS Data Science Section

Committee Activities

We are all conscious that times are incredibly hard for many people and are keen to help however we can- if there is anything we can do to help those who have been laid-off (networking and introductions help, advice on development etc.) don’t hesitate to drop us a line.

We are still working on releasing the video and a summary of the latest in our ‘Fireside chat’ series- an engaging and enlightening conversation with with Anthony Goldbloom, founder and CEO of Kaggle. Sorry for the delay- we will post a link when it is available.

Thank you all for taking the time to fill in our survey responding to the UK Government’s proposed AI Strategy (If you haven’t already, you can still contribute here). We are passionate about making sure the government focuses on the right things in this area, and are now analysing the results which we will publish shortly.

The full programme for this year’s RSS Conference, which takes place in Manchester from 6-9 September, has been confirmed.  The programme includes keynote talks from the likes of Hadley Wickham, Bin Yu and Tom Chivers.  Registration is open

Speaking of the RSS Conference, we are running a session there, and we need your help! We would like to hear stories about your worst mistakes in data science. From these, we will select common themes and topics, and create a crowd-sourced compilation of the deadliest sins of data science. These will be presented – anonymously – to our panel, for a live, interactive discussion in front of an audience, at our session on Tuesday 7 September, 11:40 – 13:00. We hope this will both entertain and inform. Maybe your pain can help save someone else’s (data science) soul… CONFESS YOUR SINS HERE – the survey is anonymous, we won’t embarrass anyone!

Martin Goodson, our chair, continues to run the excellent London Machine Learning meetup and is very active in with virtual events. The most recent event was on July 14th when Xavier Bresson, Associate Professor in the Department of Computer Science at the National University of Singapore, discussed “The Transformer Network for the Traveling Salesman Problem“. Videos are posted on the meetup youtube channel – and future events will be posted here.

This Month in Data Science

Lots of exciting data science going on, as always!

Ethics and more ethics…
Bias, ethics and diversity continue to be hot topics in data science…

"One gave our candidate a high score for English proficiency when she spoke only in German."
  • We talk about bias a fair amount, and it’s always good to define terms – this summary from the ACM (Association for Computer Machinery) gives a good overview. They split biases in AI systems into four sensible high level areas (as well as splitting out more specific types in each area):
    • Data-creation bias
    • Biases related to problem formulation
    • Biases related to the algorithm/data analysis
    • Biases related to evaluation/validation
  • It’s easy to overlook the first area highlighted above – data-creation bias. Often we train supervised learning models based on hand-labeled examples which we assume to be ‘correct’ but may not be. This article from O’Reilly talks through this issue and discusses different approaches (such as semi-supervised learning and weak supervision), while this article (from Sandeep Uttamchandani) gives some practical tips on data set selection for ML model building.
There is no such thing as gold labels: even the most well-known hand labeled datasets have label error rates of at least 5% (ImageNet has a label error rate of 5.8%!).
  • More positively, Apple has released information about their approach for face detection in photos, highlighting positive aspects such as on-device scoring, and fairness.
  • And this analysis charting the ‘data-for-good’ landscape shows it’s not all doom and gloom…

Developments in Data Science…
As always, lots of new developments…

  • When the ‘founding fathers’ of Deep Learning (Bengio, Hinton and LeCun) get together it’s always worth reading… here they discuss the future of Deep Learning and key research directions. They highlight key issues with existing approaches (large volumes of data for supervised learning or large numbers of iterations for reinforcement learning) but are not convinced by hybrid approaches including symbolic learning, believing research into more efficient learning from fewer examples will bear fruit.
“Humans and animals seem to be able to learn massive amounts of background knowledge about the world, largely by observation, in a task-independent manner. This knowledge underpins common sense and allows humans to learn complex tasks, such as driving, with just a few hours of practice.”
Interestingly, the ways that languages categorize color vary widely. Nonindustrialized cultures typically have far fewer words for colors than industrialized cultures. So while English has 11 words that everyone knows, the Papua-New Guinean language Berinmo has only five, and the Bolivian Amazonian language Tsimane’ has only three words that everyone knows, corresponding to black, white and red

Real world applications of Data Science
Lots of practical examples making a difference in the real world this month!

"we’ve found that other approaches, such as reinforcement learning with human feedback, lead to faster progress in our reinforcement learning research"
"GitHub Copilot has been described as ‘magical’, ‘god send’, ‘seriously incredible work’, et cetera. I agree, it’s a pretty impressive tool, something I see myself using daily ... In my experience, Copilot excels at writing repetitive, tedious, boilerplate-y code. With minimal context, it can whip up a function that slices and dices a dataset, trains and evaluates several ml models, and, if you ask it nicely, also makes a nice batch of french fries"
  • Ok, so maybe not quite so practical, but still great fun – AI driven art out of Berkley (‘Alien Dreams’)
"this CLIP method is more like a beautifully hacked together trick for using language to steer existing unconditional image generating models"
  • A useful rundown from DoorDash on how they use ML models to balance supply and demand, including some interesting discussion on optimisation approaches which are often the way of turning a ML model into something that is used in decision making.

How does that work?
A new section on understanding different approaches and techniques

Diffusion models are a new type of generative models that are flexible enough to learn any arbitrarily complex data distribution while tractable to analytically evaluate the distribution

Getting it live
How to drive ML into production

  • Andrew Ng brings to life the challenges of building an AI product…
"Unsurprisingly, things did not go exactly as planned. Thus, this post is about what worked and what didn’t. I have focused on the most challenging aspects of trying to get data scientists to get review from their peers. I hope this helps others who wish to formalize peer review processes in data science"

Correlation or Causation?
A deep dive into causal analysis in machine learning

  • You have a machine learning model and it seems to perform great, not only on the training set, but even on hold out test sets- sorted right? It’s worth considering how you are going to use the model- if you are making predictions and using the output as is, then maybe you are ok; but if you are going to use the model for scenario planning, and counter-factual assessment (‘what-ifs?’) it would be worth thinking about causal analysis. Here’s a good starting point, from Jane Huang.
  • Here’s a useful example – estimating price elasticity
  • The technique often relies on something called ‘Double Machine Learning’
    • Overview here, with different implementations here and here and a worked example here
As any great technology, Double Machine Learning for causal inference has the potential to become pretty ubiquitous. But let’s calm the enthusiasm of this writer down and go back to our task
  • Finally, an intriguing approach for time series and econometrics… causal forests

Practical Projects and Learning Opportunities
As always here are a few potential practical projects to keep you busy:

How to get involved in the IRCAI AI Award 2021?

The International Research Centre in Artificial Intelligence under the auspices of UNESCO is launching an AI Award for individuals who have dedicated their work to solving problems related to the United Nations Sustainable Development Goals (SDGs) by means of the application of Artificial Intelligence.

Covid Corner

Not sure what to say here… vaccinations keep progressing in the UK, which is good news, but we now have what appear to be the highest covid case levels we have seen over the whole of the pandemic due to the Delta variant…

  • The latest ONS Coronavirus infection survey estimates the current prevalence of Covid in the community in England to be roughly 1 in 65 people, up from 1 in 75 the week before and an almost unbelievable increase from only June, when the estimate was 1 in 1100.
  • More or Less gives an excellent review of the Delta variant and how it has come to dominate other strains of coronavirus the world over
  • One of the core findings about Delta, as discussed by More or Less, is its apparent ability to transmit through vaccinated individuals (or those with antibodies from prior infections) – in other words vaccinations, while still protecting against the worst outcomes, are not as effective at reducing transmission.
  • This definitely raises the stakes of the recent UK governmental re-opening and relaxation of restrictions on July 13th (symbolically welcomed by the prime minister in self-isolation…) which has been roundly condemned by the scientific community
  • In addition, in a recent article in the guardian, SAGE committee member Professor Robert West states the government’s express intention is to allow infections to rip through the younger population, a very worrying statement.
“What we are seeing is a decision by the government to get as many people infected as possible, as quickly as possible, while using rhetoric about caution as a way of putting the blame on the public for the consequences”

Updates from Members and Contributors

  • Marco Gorelli announces the first official release (1.0.0) of his highly acclaimed nbQA repo, full of very useful code formatting features and pre-commit hooks for jupyter notebooks
  • Alex Spanos will be presenting TrueLayer’s data science work at the RSS conference in Manchester (“An end-to-end Data Science workflow for building scalable and performant data enrichment APIs in Open Banking“) – another great reason to attend in September!
  • Mark Baillie highlights an upcoming special issue of the Biometrical Journal
    “Data scientists are frequently faced with an array of methods to choose from; often this makes selection difficult especially beyond one’s own particular interests and expertise. Neutral comparison studies are an essential cornerstone towards the improvement of this situation, providing evidence to help guide practitioners. For the special issue of Biometrical Journal we are interested in submissions that define, develop, discuss or illustrate concepts related to practical issues and improvement of neutral method comparison studies, as well as articles reporting well-designed neutral comparison studies of methods”

Again, hope you found this useful. Please do send on to your friends- we are looking to build a strong community of data science practitioners- and sign up for future updates here.

In memoriam

With great sadness I announce the untimely death of Rebecca Nettleship, a valued colleague and talented data scientist, on 22nd July 2021. She will be sorely missed. Our deepest condolences go out to her family and friends.

– Piers

The views expressed are our own and do not necessarily represent those of the RSS

July Newsletter

Hi everyone-

Not sure what happened to June – seemed to fly by – I know there were some lovely sunny days but then it got cold again… fingers crossed summer it’s not over already! … How about a few curated data science reading materials for reading in the garden, rain or shine?

Following is the July edition of our Royal Statistical Society Data Science Section newsletter. Hopefully some interesting topics and titbits to feed your data science curiosity … We are continuing with our move of Covid Corner to the end to change the focus a little.

As always- any and all feedback most welcome! If you like these, do please send on to your friends- we are looking to build a strong community of data science practitioners. And if you are not signed up to receive these automatically you can do so here.

Industrial Strength Data Science July 2021 Newsletter

RSS Data Science Section

Committee Activities

We are all conscious that times are incredibly hard for many people and are keen to help however we can- if there is anything we can do to help those who have been laid-off (networking and introductions help, advice on development etc.) don’t hesitate to drop us a line.

We are working on releasing the video and a summary of the latest in our ‘Fireside chat’ series- an engaging and enlightening conversation with with Anthony Goldbloom, founder and CEO of Kaggle. We will post a link when it is available.

We have released a survey to our readers and members focused on the UK Government’s proposed AI Strategy. We are passionate about making sure the government focuses on the right things in this area, and feel like, as the organisation representing technical Data Science and AI practitioners, we need to make sure our voice is heard. If you havn’t already, please give us your thoughts by participating here.

The full programme for this year’s RSS Conference, which takes place in Manchester from 6-9 September, has been confirmed.  The programme includes keynote talks from the likes of Hadley Wickham, Bin Yu and Tom Chivers.  Registration is open with early-bird discounts available until Friday 4 June. 

Martin Goodson, our chair, continues to run the excellent London Machine Learning meetup and is very active in with virtual events. On June 30th, the meetup hosted Frank Willet (Research Scientist at Stanford University) for a talk titled “High-performance brain-to-text communication via handwriting“. Videos are posted on the meetup youtube channel – and future events will be posted here.

This Month in Data Science

Lots of exciting data science going on, as always!

Ethics and more ethics…
Bias, ethics and diversity continue to be hot topics in data science…

Imagine a world where a state government, or other actor, can realistically manipulate images to show either nothing there or a different layout
"68% chose the option declaring that ethical principles focused primarily on the public good will not be employed in most AI systems by 2030"
Our method will facilitate deepfake detection and tracing in real-world settings, where the deepfake image itself is often the only information detectors have to work with.

Developments in Data Science…
As always, lots of new developments…

In remote sensing images, we can use temporal information to obtain pairs of images from the same location at different points in time, which we call seasonal positive pairs. Seasonal changes provide more semantically meaningful content than artificial transformations, and remote sensing images provide this natural augmentation for free.
  • Facebook have released ‘TextStyleBrush’ allowing you to emulate a text style in an image using just a single word
  • Generating realistic synthetic video is computationally intensive – new work out of UC Berkeley, called VideoGPT, uses novel approaches to make the whole process more efficient, allowing anyone to generate video on a standalone computer.
  • A Chinese Lab is challenging the supremacy of Google and OpenAI in the language model space with a model containing 1.7 trillion parameters. Interestingly, the original article seems to have been removed – although copies are still available online, with more technical details:
The Chinese lab claims that Wudao's sub-models achieved better performance than previous models, beating OpenAI’s CLIP and Google’s ALIGN on English image and text indexing in the Microsoft COCO dataset
"Will better engineering produce CNNs [Convolutional Neural Networks] that understand sameness and difference in the generalizable way that children do? Or are CNNs’ abstract-reasoning powers fundamentally limited, no matter how cleverly they’re built and trained?"

Real world applications of Data Science
Lots of practical examples making a difference in the real world this month!

  • I’m not familiar with the underlying challenge, but I understand that this is a big breakthrough (nature paper here) : a team at Google has automated the design of the physical layout of computer chips using deep reinforcement learning.
  • This is pretty compelling- well worth a read: Facebook AI have released details of their advanced object recognition system which allows consumers to shop items from images. It uses an elegant compound approach, modelling the objects and attributes separately as well as multi-modal signals. Also good to see they are attempting to avoid bias by building an monitoring the models appropriately:
"As part of our ongoing efforts to improve the algorithmic fairness of models we build, we trained and evaluated our AI models across subgroups, including 15 countries and four age buckets."
“Welcome to Hardcore High School” bellowed the script kiddo. We had just gotten to the kindergarten level when the music and lights began to blink. I frowned. “What is that?”
“Beats me” said the A.I. As he walked down the halls, mimicking the sounds of the various musical instruments, he fiddled with the script kiddo a bit. “Welcome to Hardcore High School” He said again, a bit more softly this time.

How does that work?
A new section on understanding different approaches and techniques

Getting it live
How to drive ML into production

"On a daily average, there are over 4,000 models at Facebook running on PyTorch"
  • The importance of Data preparation and curation in the ML lifecycle is highlighted in this piece on Data Cascades from Google Research.
"One of the most common causes of data cascades is when models that are trained on noise-free datasets are deployed in the often-noisy real world. For example, a common type of data cascade originates from model drifts, which occur when target and independent variables deviate, resulting in less accurate models"

From Prediction to Decision
The art and science of decision making

  • Lovely extended essay from Hannah Fry on the history of graphs and how they help us understand data and make decisions
  • An excellent article published in HBR from Michael Ross on why company investments in AI often don’t generate the gains they expect (the asymmetric cost function is particularly interesting)
(1) They don’t ask the right question, and end up directing AI to solve the wrong problem. 
(2) They don’t recognize the differences between the value of being right and the costs of being wrong, and assume all prediction mistakes are equivalent. 
(3) They don’t leverage AI’s ability to make far more frequent and granular decisions, and keep following their old practices

Practical Projects and Learning Opportunities
As always here are a few potential practical projects to keep you busy:

Covid Corner

Again, more positive progress in the UK on the Covid front with 45m people now having received their first vaccine dose and over 30m fully vaccinated. However, the new Delta variant originating in India is cause for concern and case rates and hospitalisations are now rising again.

Updates from Members and Contributors

Everyone must be out enjoying themselves as no specific updates from members and contributors this month- let me know if you’d like to include anything here next month.

Again, hope you found this useful. Please do send on to your friends- we are looking to build a strong community of data science practitioners- and sign up for future updates here.

– Piers

The views expressed are our own and do not necessarily represent those of the RSS

The UK AI Roadmap: your expert views needed

This year will see the first version of an AI Strategy from the UK government. Led by the Office for AI, this strategy will build on the AI Roadmap (which was published in January 2021).

If you work in data science or AI, the AI strategy will affect your career.

The Data Science Section of the  Royal Statistical Society will ensure the voice of technical practitioners is heard – and that decisions are made with your interests in mind. 

However, we cannot do this without your help. Please fill in the UK Artificial Intelligence Strategy Survey and give us your expert views. It will take less than 5 minutes to complete.

This is a great opportunity. The government is attempting to embrace data science and AI. Help us make sure the strategy focuses on the areas that will really make a difference.

Many thanks!

RSS plans for engaging with the development of the government’s AI strategy

The government’s AI Roadmap, published at the start of 2021, sets the direction for the development of a national AI strategy.
As the roadmap is developed into a strategy, there is a vital role for the RSS to play in setting out the role that statistics and data science have to play in the wider national strategy. There are two senses in which this perspective is important:

  • The RSS as a membership organisation has access to the experience and expertise of over a thousand professional data scientists who, as practitioners, focus on the types of issues which are central to the roadmap – there is an opportunity to strengthen the strategy by ensuring that these experiences are represented in the development of the strategy.
  • As they stand, the government’s plans do not show an appreciation of the role that statistics will have to play in the strategy. The disciplines of statistics and data science are closely related, and the role of statistics – as well as data science – should be reflected in the AI strategy.

The RSS – led by our Data Science Section – is kicking off a programme of work to shape the AI strategy and ensure that both the discipline of statistics and the experience of working data scientists are reflected in the strategy.

To start this work, we are planning to highlight a number of questions concerning the practice of data science in order to further inform the roadmap. We will be launching a survey soon to help gather intelligence from the community to support this work.

We are also planning a series of events and roundtables to discuss these issues. These events will help us share knowledge and refine our thinking, as well as engage directly with government stakeholders.

This is an important point in the development of AI as countries seek to position themselves as leaders in the field. The UK is well-positioned to lead on many areas of AI – but the strategy must be right and we hope to be able to help shape the strategy in the coming months.

RSS Chief Executive Stian Westlake said:

“The RSS welcomes the development of a national AI strategy, but it is important that the views of practitioners are represented in the process. With our strong data science section, the RSS is uniquely placed to access the perspective of practitioners and there is a vital role for us to play in ensuring that this is represented as the strategy develops.”

June Newsletter

Hi everyone-

It’s a bank holiday weekend – again – so that means it’s June and hopefully some warmer weather as May has definitely not delivered on that front … perhaps a few curated data science reading materials might prove useful for sunshine in the garden?

Following is the June edition of our Royal Statistical Society Data Science Section newsletter. Hopefully some interesting topics and titbits to feed your data science curiosity … We are continuing with our move of Covid Corner to the end to change the focus a little.

As always- any and all feedback most welcome! If you like these, do please send on to your friends- we are looking to build a strong community of data science practitioners. And if you are not signed up to receive these automatically you can do so here.

Industrial Strength Data Science June 2021 Newsletter

RSS Data Science Section

Committee Activities

We are all conscious that times are incredibly hard for many people and are keen to help however we can- if there is anything we can do to help those who have been laid-off (networking and introductions help, advice on development etc.) don’t hesitate to drop us a line.

We are now ‘two for two’ on our ‘Fireside chat’ series! Following on from our fantastic discussion with Andrew Ng, Giles Pavey hosted an engaging and enlightening conversation with with Anthony Goldbloom on May 20th. Anthony is founder and CEO of Kaggle (now a Google company), the world’s largest data science and machine learning community. There was a great deal of insight into the evolution of data science over the 10 years Kaggle has been running as well as lots of audience questions. We will distill the session down and publish a summary shortly.

We will soon be releasing a survey to our readers and members focused on the UK Government’s proposed AI Strategy. We are passionate about making sure the government focuses on the right things in this area, and feel like true Data Science and AI practitioners need to feed into this process. So when you see the survey, do please take the time to fill it out if you can!

The full programme for this year’s RSS Conference, which takes place in Manchester from 6-9 September, has been confirmed.  The programme includes keynote talks from the likes of Hadley Wickham, Bin Yu and Tom Chivers.  Registration is open with early-bird discounts available until Friday 4 June. 
In addition, the RSS now has a new accreditation – Data Analyst.

Data Analyst is a registered form of professional membership status that provides formal recognition of a member’s statistical training and work-based experience at entry level

Martin Goodson, our chair, continues to run the excellent London Machine Learning meetup and is very active in with virtual events. The last event was on 24th May where Christian Szegedy, machine learning and AI researcher at Google Research, gave a talk titled ‘The Inverse Mindset of Machine Learning‘. Videos are posted on the meetup youtube channel – and future events will be posted here.

This Month in Data Science

Lots of exciting data science going on, as always!

Ethics and more ethics…
Bias, ethics and diversity continue to be hot topics in data science…

The real danger wasn’t “Deep Fakes.” The real danger is cheap fakes, fakes that can be produced quickly, easily, in bulk, and at virtually no cost
  • Regulators are rightly becoming increasingly active in an attempt to combat these issues. This HBR article helps map out what organisations need to know to be prepared.
  • We all know how complex ML models are becoming and the scale at which some of them now operate, and so we have to be open to the fact that mistakes will happen. The critical question becomes: what do you do about it when the issue surfaces? Twitter has taken a positive and transparent approach to dealing with some of their previous bias related issues in automated cropping, releasing a detailed and technical analysis about why it was happening and the steps they are taking to remove the bias:
We want to thank you for sharing your open feedback and criticism of this algorithm with us. As we discussed in our recent blog post about our Responsible ML initiatives, Twitter is committed to providing more transparency around the ways we’re investigating and investing in understanding the potential harms that result from the use of algorithmic decision systems like ML.
  • Really interesting discussion on the Kara Swisher’s Sway podcast with Daniel Kahneman (renowned behavioural economist – “Thinking Fast and Slow”) delving into why we require much higher accuracy from computers and technology than from humans before we are willing to trust them.
  • And in a similar vein, this is thought provoking– does more data necessarily mean better decision making?
  • Less specifically focused on bias and ethics, but really interesting commentary from Benedict Evans on Amazon and how much it really knows about what it sells, touching on how much of a responsibility a platform has for moderation of its own recommendation content.
Of Amazon’s top 50 best-sellers in “Children's Vaccination & Immunisation”, close to 20 are by anti-vaccine polemicists, and 5 are novels about fictional pandemics

Developments in Data Science…
As always, lots of new developments…

Real world applications of Data Science
Lots of practical examples making a difference in the real world this month!

How does that work?
A new section on understanding different approaches and techniques

Getting it live
How to drive ML into production

"For me, teaching this course was an unusual experience. MLOps standards and tools are still evolving, so it was exciting to survey the field and try to convey to you the cutting edge. I hope you will find it equally exciting to learn about this frontier of ML development, and that the skills you gain from this will help you build and deploy valuable ML systems." Andrew Ng

The Art of Visualisation
Making data science look right..

Practical Projects and Learning Opportunities
As always here are a few potential practical projects to keep you busy:

Covid Corner

Again, more positive progress in the UK on the Covid front with over 40m people now having received their first vaccine dose and over 25m fully vaccinated. However, the new variant originating in India is cause for concern.

 Experts gave a median estimate of 30,000 Covid deaths by the end of the year, whereas the non-experts said 20,000. The truth was around 75,000

Updates from Members and Contributors

  • Harald Carlens has put together a very useful comparison of cloud GPU services and pricing – definitely check it out if you are using deep learning in the cloud.
  • Lucie Burgess would like to announce an interesting set of discussions around the provenance and legality of automated decisions taking place on June 15th and June 22nd. Helix Data Innovation are running the sessions on behalf of the PLEAD project (King’s College London, University of Southampton, with partners Experian, Roke and Southampton Connect) – sign up here for what should be a good discussion on a very relevant topic
  • Kevin O’Brien highlights the upcoming UseR! 2021 conference on 5-9th of July – a must see for those R users out there

Again, hope you found this useful. Please do send on to your friends- we are looking to build a strong community of data science practitioners- and sign up for future updates here.

– Piers

The views expressed are our own and do not necessarily represent those of the RSS

May Newsletter

Hi everyone-

It’s a bank holiday weekend, so it’s probably May and another month has flown by… I hope the excitement of venturing out from our cave-like lockdown has not proved too overwhelming … perhaps a few curated data science reading materials might prove relaxing?

Following is the May edition of our Royal Statistical Society Data Science Section newsletter. Hopefully some interesting topics and titbits to feed your data science curiosity … A particularly strong section from Members and Contributors this month- good reason to read to the end! Also we are moving Covid Corner to the end to change the focus a little.

As always- any and all feedback most welcome! If you like these, do please send on to your friends- we are looking to build a strong community of data science practitioners. And if you are not signed up to receive these automatically you can do so here.

Industrial Strength Data Science May 2021 Newsletter

RSS Data Science Section

Committee Activities

We are all conscious that times are incredibly hard for many people and are keen to help however we can- if there is anything we can do to help those who have been laid-off (networking and introductions help, advice on development etc.) don’t hesitate to drop us a line.

Fresh on the heels of our incredibly successful event with Andrew Ng, we are excited to announce the next instalment in the series at 6.30pm on Thursday May 20th. The RSS Data Science section invites you to a fireside conversation with Anthony Goldbloom – founder and CEO of Kaggle (now a Google company), the world’s largest data science and machine learning community with over 6MM members. Hear Anthony share his thoughts and experiences from the past 10 years at the forefront of competitive Machine Learning – sign up here to attend.

Martin Goodson, our chair, continues to run the excellent London Machine Learning meetup and is very active in with virtual events. The next event is on 10th May where Noam Brown, research scientist at Facebook AI in New York, will give a talk titled ‘AI for Imperfect-Information Games: Poker and Beyond‘. Videos are posted on the meetup youtube channel – and future events will be posted here.

This Month in Data Science

Lots of exciting data science going on, as always!

Ethics and more ethics…
Bias, ethics and diversity continue to be hot topics in data science…

"This is like trying to write a single law covering 'cars', that covers drunk driving, emissions standards, parking, and the tax treatment of highways.."
  • In addition, it is very hard to regulate ‘AI’ when it is far from clear we have a good definition of what ‘AI’ actually is, as our very own Martin Goodson points out in his recent blog post.
"The Act has already caused dismay amongst statisticians, who had no idea they were actually doing AI all along."
"accountability (n) - The act of holding someone else responsible for the consequences when your AI system fails."

Developments in Data Science…
As always, lots of new developments…

Real world applications of Data Science
Making a difference in the real world

Practical pointers on recommenders and search
Lots of good tips on search and recommendations this month

How does that work?
A new section on understanding different approaches and techniques

It’s all about the data
Which is more important… the data or the algorithm?

"Having clean data is in this category of “ghost knowledge” that, if you’ve been working in data for a long time, you know painfully from your own experience."
"Systematic improvement of data quality on a basic model is better than chasing the state-of-the-art models with low-quality data."

The Art of Visualisation
Making data science look right..

Practical Projects and Learning Opportunities
As always here are a few potential practical projects to while away the socially distanced hours:

Covid Corner

Again, more positive progress in the UK on the Covid front with over 35m people now having received their first vaccine dose and other metrics, such as deaths and hospitalisations all progressing in the right direction.

  • The latest ONS Coronavirus infection survey estimates the current prevalence of Covid in the community in England to be roughly 1 in 1000 so we have come a considerable way from January, when prevalence peaked at around 1 in 50. It is interesting to note though that we are not yet back to where we were last summer, when it dropped to 1 in 2000.
  • Some very positive results regarding the efficacy of the various vaccines ‘in the wild’ against the current variants have been recently published in the BMJ.
"Vaccination with a single dose of Oxford-AstraZeneca or Pfizer-BioNTech vaccines, [] significantly reduced new SARS-CoV-2 infections in this large community surveillance study"
 “The new vaccine can be mass-produced in chicken eggs — the same eggs that produce billions of influenza vaccines every year in factories around the world”

Updates from Members and Contributors

Again, hope you found this useful. Please do send on to your friends- we are looking to build a strong community of data science practitioners- and sign up for future updates here.

– Piers

The views expressed are our own and do not necessarily represent those of the RSS

Anthony Goldbloom Fireside Chat – Sign Up Now!

We are delighted to announce the second instalment of our ‘Fireside Chat’ series.

Following fresh on the heels of our excellent conversation with Andrew Ng, Giles Pavey will be chatting by the virtual fireside with Anthony Goldbloom, the founder of Kaggle (now a Google company) – the world’s most influential competitive data science platform.

The event will take place at 6.30pm on Thursday May 20th – sign up here to attend.

Forbes has twice named Anthony one of the 30 under 30 in technology, the MIT Technology Review has named him as one of the 35 Innovators Under 35 and the University of Melbourne has given Anthony an Alumni of Distinction Award.

Join us to hear Anthony’s reflections on 10 years at the heart of applied AI, the wisdom of Kaggle’s global crowd of 7 million members and what he believes this new decade has in store for Data Science.

Don’t miss out on what we are sure will be a compelling discussion- sign up for the event here, and send any topics or questions you would like to see covered to Giles Pavey.

April Newsletter

Hi everyone-

Another month flies by… still cold, but I’ve definitely seen the sun once or twice… I hope the on-again off-again dreams of a proper summer holiday aren’t proving too painful … perhaps a few curated data science reading materials might ease the burden over the Easter weekend?

Following is the April edition of our Royal Statistical Society Data Science Section newsletter. Hopefully some interesting topics and titbits to feed your data science curiosity …

As always- any and all feedback most welcome! If you like these, do please send on to your friends- we are looking to build a strong community of data science practitioners.

Industrial Strength Data Science April 2021 Newsletter

RSS Data Science Section

Covid Corner

It definitely feels like progress, at least in the UK, on the Covid front, with over 30m people now having received their first vaccine dose. Supply issues notwithstanding, it is clear that the vaccine roll-out is progressing very well.

  • It is now over a year since the UK first went into lockdown to attempt to restrict the spread of the virus. It’s interesting to reflect on how much data and statistics have become part of general public discussion: we still have daily updates of a number of different metrics on the news and published in papers. ‘More or Less’ has a nice summary of the UK’s efforts to collate and disseminate the figures and how the centralised healthcare setup contrasts favourably with the US, which required volunteers to generate national figures in the Covid Tracking Project.
  • Despite (or perhaps because of) the proliferation of data, the statistics have been made to argue many sides of the same case as highlighted in this research from MIT, stressing the importance of good visualisations.
  • It has been quite a month for Astra Zeneca …
 “Overall it’s a win for the world”

Committee Activities

We are all conscious that times are incredibly hard for many people and are keen to help however we can- if there is anything we can do to help those who have been laid-off (networking and introductions help, advice on development etc.) don’t hesitate to drop us a line.

Our first ‘Ethics Happy Hour’ on March 17th was very well received – see the write up here. The video recording will shortly be posted on youtube and we will publish links to it when it is available. Please let us know if you have any comments or would like to suggest topics for future events via email to dss.ethics@gmail.com

Fresh on the heels of our incredibly successful event with Andrew Ng, we are excited to announce the next instalment in the series. The RSS Data Science section invites you to a fireside conversation with Anthony Goldbloom – founder and CEO of Kaggle (now a Google company), the world’s largest data science and machine learning community with over 6MM members. Forbes has twice named Anthony one of the 30 under 30 in technology, the MIT Technology Review has named him as one of the 35 Innovators Under 35 and the University of Melbourne has given Anthony an Alumni of Distinction Award. Hear Anthony share his thoughts and experiences from the past 10 years at the forefront of competitive Machine Learning. Watch this space for more details!

Martin Goodson, our chair, continues to run the excellent London Machine Learning meetup and continues to be very active in with virtual events. The next event is on 7th April where Mike Lewis, research scientist at Facebook AI Research in Seattle, will give a talk titled ‘Beyond BERT: Representation Learning for Natural Language at Scale’ . Videos are posted on the meetup youtube channel – and future events will be posted here.

Elsewhere in Data Science

Lots of non-Covid data science going on, as always!

Ethics and more ethics…
Bias, ethics and diversity continue to be hot topics in data science…

"I will have a lot more to say about this later. But announcing a new org by a Black woman as if we’re all interchangeable while harassing, terrorizing and gaslighting my team and doing absolutely ZERO to acknowledge & redress the harm that’s been done is beyond gaslighting."
"Everything the company does and chooses not to do flows from a single motivation: Zuckerberg’s relentless desire for growth."

Developments in Data Science…
As always, lots of new developments…

The Practical side … getting stuff to work in production

"When a system isn’t performing well, many teams instinctually try to improve the Code. But for many practical applications, it’s more effective instead to focus on improving the Data."
"It’s a common joke that 80 percent of machine learning is actually data cleaning, as though that were a lesser task. My view is that if 80 percent of our work is data preparation, then ensuring data quality is the important work of a machine learning team."

How does that work?
A new section on understanding different approaches and techniques

Thinking about intelligence and bigger picture stuff
Stepping back from the code for a bit…

  • Thought provoking article proposing that “Computers will never write good novels” – definitely worth thinking through how much of this you agree with
"The best that computers can do is spit out word soups. They leave our neurons unmoved."
"Employees are far happier when they are led by people with deep expertise in the core activity of the business."

Practical Projects and Learning Opportunities
As always here are a few potential practical projects to while away the socially distanced hours:

Updates from Members and Contributors

  • Marco Gorelli is running an excellent workshop on 10th April about contributing to Pandas. The workshop is being run in collaboration with PyLadies and is specifically targeting people from underrepresented genders in tech. Sign up for the morning session or the afternoon session.
  • Emre Kasim is running the brilliant Algo Conference which this year is taking place online on April 29th with a number of very relevant streams, including ‘Foundational AI’, ‘AI and Innovation’ and ‘Implications of AI and other Disruptive Technologies- well worth signing up for here.
  • Alex Spanos highlights the upcoming Data Science Festival which in April is focused on Fintech- check out his talk on Data Science/Machine Learning and Open Banking APIs on April 15th.
  • Vijay Kumar Mishra, Research Scientist at Public Health for India, is running a 5-day online international workshop on ‘’Designing and Conducting Clinical Trials” from the 3rd to the 7th of May. The workshop will be jointly conducted by Public Health Foundation of India, Sitaram Bhartia Institute of Science and Research, Paropakar Maternity and Women Hospital and University College London and will be aimed at providing a theoretical understanding of designing and conducting clinical trials. Contact Vijay (vijay.mishra@phfi.org) for more details.
  • Harin Sellahewa draws our attention to the 35 of 70 masters students entering their final assessment for the University of Buckingham MSc in Applied Data Science- best of luck to everyone!

Again, hope you found this useful. Please do send on to your friends- we are looking to build a strong community of data science practitioners- and sign up for future updates here:

Processing…
Success! You're on the list.

– Piers

The views expressed are our own and do not necessarily represent those of the RSS

The inaugural RSS Data Science Ethics Happy Hour

Event report by Giles Pavey, RSS Data Science Section Committee member

Wednesday March 17th saw the RSS Data Science Section host its first ‘Ethics Happy Hour’. Events in this new series provide an opportunity to discuss and meet other people interested in questions of AI ethics and data science ethics more broadly. Taking place in a relaxed and informal setting, our aim for these sessions is to stimulate intellectual exchange and contribute to community building around ethics in the context of data science.

The inaugural event took place virtually and focused on COVID-19. The discussion took the form of a panel chaired by RSS Data Science Section Committee member Dr Florian Ostmann with three experts sharing their thoughts on the ethics of data science in addressing the public health crisis:

  • Dr Zachary Lipton (Carnegie Mellon University) is a machine learning researcher and jazz saxophonist. He is currently an Assistant Professor of Operations Research and Machine Learning at Carnegie Mellon University, where he runs the Approximately Correct Machine Intelligence lab.
  • Dr Anjali Mazumder (RSS Data Science Section Committee / The Alan Turing Institute) is the Theme Lead on AI and Justice & Human Rights at the Alan Turing Institute and also a member of the Data Science Section Committee, among other RSS roles. Her research interests include Bayesian decision support systems, causal reasoning, detecting bias and algorithmic fairness, and responsible data sharing practices.
  • Dr Nicola Stingelin (RSS Data Ethics and Governance Section Committee) is a member of the RSS Data Ethics and Governance Section Committee and an associate researcher at the University of Basel. Building on business experience in the pharmaceutical sector, she acts in various advisory roles with a focus on the ethics of data innovation including big data, algorithms and public health data ethics in health care research and practice.

The event attracted around 35 attendees from across academia, business, and the public and charitable sectors. After the introduction there was a lively debate covering multiple aspects of the use of data and AI around the world in response to the pandemic and its ramifications.

Access to data was a major discussion point, with Nicola arguing that the importance of data in creating competitive advantage, especially in commerce, was causing obstacles to data sharing which could stall progress.

Zach conjectured that we should not rely on technical solutions to what might be considered primarily societal and philosophical problems such as access to data or AI resource. In answer to this point, Anjali suggested that there are at least some relevant problems where technical solutions can help. For instance, Privacy Enhancing Technologies, such as Differential Privacy, can enable insights whilst protecting individuals’ privacy.

Another area of debate concerned differences between what we should expect from data and AI when thinking about micro (individual) level predications and classifications versus more high/macro-level decision making. For example, is it acceptable to use thermal imaging AI models in public spaces for social control?

The discussion also touched on areas such as whether society is becoming too reliant on data and whether this is creating a digital divide; how we feel about a society where one has to have a smart phone to access services; and how things will develop as the populous are asked to share more and more data. Lastly, on the subject of speed of change: COVID19 has highlighted tensions between the desire and need to move at a quick pace and the academic norm of considered peer review.

The event was drawn to a close (having overrun by 15’) with a general agreement that the ethical issues are many and complex and that data science and statistical methods will offer key tools for us to navigate the future.

Please contact the RSS Data Science Section if you have any comments or would like to suggest topics for future events via email to dss.ethics@gmail.com. To stay informed about future ethics happy hours and other events organised by the Data Science Section, we recommend signing up to the Data Science Section mailing list.

Our First Ethics Happy Hour: March 17th, 5-6pm

Ethics Happy Hour on March 17th:
“Ethical data science in the context of COVID-19 — What are the most important issues?”

We are excited to host our first ‘Ethics Happy Hour’. Events in this new series provide an opportunity to discuss and meet other people interested in questions of AI ethics and data science ethics more broadly. Taking place in a relaxed and informal setting, our aim for these sessions is to stimulate intellectual exchange and contribute to community building around ethics in the context of data science. 

Our first event will take place virtually and focus on COVID-19. We are delighted that the following three experts have agreed to share their thoughts on the ethics of data science in addressing the public health crisis:

The hour-long session will take place on March 17th from 5 to 6pm. It will begin with each expert offering an initial take on the topic, drawing on their different areas of experience. This will be followed by an open discussion with the opportunity for all participants to share questions, comments, and contributions. 

To sign up for the event, please register here. 

As previously announced, the organising team is always looking for submissions of data science-related ethical challenges and dilemmas that community members have encountered in their professional lives and that are suitable to serve as case studies to be discussed in future sessions. If you have a suitable story to share, we look forward to hearing from you with an initial brief summary sent to dss.ethics@gmail.com 

March Newsletter

Hi everyone-

Another month flies by… At least it’s getting a bit lighter in the mornings and I’ve even seen the sun once or twice. I hope you are staying as sane as possible despite home-schooling, home-working, home-everything else (delete as appropriate…) … perhaps a few curated data science reading materials might lighten the mood?

Following is the March edition of our Royal Statistical Society Data Science Section newsletter. Hopefully some interesting topics and titbits to feed your data science curiosity …

As always- any and all feedback most welcome! If you like these, do please send on to your friends- we are looking to build a strong community of data science practitioners.

Industrial Strength Data Science March 2021 Newsletter

RSS Data Science Section

Covid Corner

The vaccination roll-out in the UK continues to progress well with now over 20m first doses delivered, and we even have a road-map out of lockdown… perhaps some light at the end of the tunnel.

"Both of these are working spectacularly well"
  • In additional positive vaccine news, a recent FDA review showed that the new Johnson and Johnson ‘one-shot’ vaccine appeared safe and effective in trials; and we also saw the first shipment of the AstraZeneca vaccine as part of the COVAX program, delivered to Ghana.
  • As we all know, the pandemic has thrown up a wide variety of new terms, metrics and statistics that can be easily misinterpreted or misunderstood – the RSS has published an excellent FAQ on Covid-19 measures and statistics which is well worth circulating.
  • The UK government has charted a cautious route out of lockdown. In sobering reading, this cautiousness was apparently linked to research commissioned from the teams at Imperial and Warwick University by the modelling group (SPI-M) in SAGE.
    • These models have proved surprisingly accurate, at least in terms of predicting the surge in cases over the winter.
    • This time both teams were asked to independently model the effect of different lockdown exit strategies and both reached similar conclusions- that lifting all restrictions by April 26th would likely drive another wave comparable in size to January 2021, resulting in a further 62,000 to 107,000 deaths in England.
  • The NHS Test and Trace App did not have the most auspicious beginnings, but recent research from the Alan Turing Institute indicates that it has indeed had a positive effect in reducing the impact of Covid.
  • The virus does seem to be in retreat in a number of countries around the world. The recent decrease in positive cases in the US is puzzling researchers somewhat (also covered by more or less)- decreased testing? improved behaviour? vaccination roll-out? seasonality? herd immunity? … the upshot seems to be, a little bit of everything and we don’t really know.
  • Although the retreat is great news, the results in the US and elsewhere have been devastating and disproportionately felt. This recent study published in PNAS shows how life expectancy in the US has fallen by 1.13 years due to Covid, with “estimated reductions for the Black and Latino populations 3 to 4 times that for Whites”.
  • Finally a thoughtful piece from the Ada Lovelace Institute about vaccination passports and what role they could or should play in society.

Committee Activities

We are all conscious that times are incredibly hard for many people and are keen to help however we can- if there is anything we can do to help those who have been laid-off (networking and introductions help, advice on development etc.) don’t hesitate to drop us a line.

Our Fireside Chat with Andrew Ng on February 10th was a roaring success. We had over 500 people attend what proved to be an entertaining and thought provoking discussion on technical leadership in AI, artfully hosted by our chairman Martin Goodson, and introduced by the RSS President Sylvia Richardson. For those who missed it, here’s the 5 minute edited highlights (and below if you are viewing on the blog) – check out the full video here

We are excited to host our first ‘Ethics Happy Hour’, which will take place on March 17th from 5 to 6pm. As previously announced, events in this new series provide an opportunity to discuss and meet other people interested in questions of AI ethics and data science ethics more broadly. The first event will take place virtually and focus on COVID-19. We are delighted that the following three experts have agreed to share their thoughts on the ethics of data science in addressing the public health crisis:

  • Dr Zachary Lipton (Carnegie Mellon University)
  • Dr Anjali Mazumder (RSS Data Science Section Committee / The Alan Turing Institute)
  • Dr Nicola Stingelin (RSS Data Ethics and Governance Section Committee)

The event will begin with each expert offering an initial take on the topic, drawing on their different areas of experience. This will be followed by an open discussion with the opportunity for all participants to share questions, comments, and contributions. To sign up for the event, please register here.

The joint RSS/British Computing Society/Operations Research Society discussions on data science accreditation are picking up again and we are actively involved in these. We also hope to be posting our own version of a basic data science curriculum soon- will keep you posted.

Martin Goodson, our chair, continues to run the excellent London Machine Learning meetup and continues to be very active in with virtual events. The next event is on 8th march where Mingxing Tan a research scientist at Google Brain, will talk about AutoML for Efficient Vision Learning. Videos are posted on the meetup youtube channel – and future events will be posted here.

Elsewhere in Data Science

Lots of non-Covid data science going on, as always!

Ethics and more ethics…
Bias, ethics and diversity continue to be hot topics in data science…

"We found that falsehood diffuses significantly farther, faster, deeper, and more broadly than the truth, in all categories of information, and in many cases by an order of magnitude... False news is more novel, and people are more likely to share novel information"
  • In addition they talk about what could be done to limit the power and reach of incumbent social networks
    • Portability and interoperability – as happened with mobile phone numbers, and instant messenger apps – is much more likely to succeed than splitting up the leading players, since the network effects naturally lead to another dominant player taking over.
  • Clearly, flagging or removing false information and inflammatory posts would be beneficial all around, but automating and scaling this process is very difficult as this article about how ads for clothing for people with disabilities have been repeatedly banned, highlights.

Developments in Data Science…
As always, lots of new developments…

“By having the human iteratively teach the model, it's possible to make a better model, in less time, with much less labelled data.”

The Art of Visualisation…

How does that work?
A new section on understanding different approaches and techniques

Thinking about intelligence…
How does the brain really work, how should we think about AI morality…

"Imagine it’s 2026. An autonomous public robocar is driving your two children to school, unsupervised by a human. Suddenly, three unfamiliar kids appear on the street ahead – and the pavement is too slick to stop in time. The only way to avoid killing the three kids on the street is to swerve into a flooded ditch, where your two children will almost certainly drown."

Practical Projects and Learning Opportunities
As always here are a few potential practical projects to while away the socially distanced hours:

Updates from Members and Contributors

Again, hope you found this useful. Please do send on to your friends- we are looking to build a strong community of data science practitioners- and sign up for future updates here:

Processing…
Success! You're on the list.

– Piers

The views expressed are our own and do not necessarily represent those of the RSS

February Newsletter

Hi everyone-

Well.. January seemed to fly by. 2021 has certainly started with a bang (Brexit!, Impeachment!, New President!, Vaccinations!) and the holidays seem an age ago. I hope you are surviving lockdown 3.0 as best as you can… maybe there is room in the long dark evenings for a few curated data science reading materials?

Following is the February edition of our Royal Statistical Society Data Science Section newsletter. Hopefully some interesting topics and titbits to feed your data science curiosity …

As always- any and all feedback most welcome! If you like these, do please send on to your friends- we are looking to build a strong community of data science practitioners.

Industrial Strength Data Science February 2021 Newsletter

RSS Data Science Section

Covid Corner

I keep thinking we might be able to drop the ‘Covid Corner’ section from the newsletter, but sadly the pandemic is still very much alive. The vaccination roll-out in the UK does seem to be going well, however, with over 9m first dose vaccinations made (as of Feb 1st) which is great news.

"one side claims that the tests are more than 90% effective at what they do; the other side says they could be as low as 3%, depending on what you mean by “effective”."
  • Finally, this feels like a very exciting development. The recent breakthroughs in natural language processing (NLP) and language models (like BERT-2/3) are at heart based on understanding the likelihood of different sequences of letters and words, codified into word embeddings (vector representations). Applying this approach to other fields (remember chess?) feels very elegant, and the MIT researchers in this case have used the underlying gene sequences (‘letters’) of viruses to train their model. From this they are able to predict likely virus mutations using sequence data alone:
"The model achieved 0.85 AUC in predicting SARS-CoV-2 variants that were highly infectious and capable of evading antibodies."

Committee Activities

We are all conscious that times are incredibly hard for many people and are keen to help however we can- if there is anything we can do to help those who have been laid-off (networking and introductions help, advice on development etc.) don’t hesitate to drop us a line.

There is still time to register for our upcoming fireside chat with none-other than Andrew Ng on February 10th. We are very excited for what is going to be a fantastic event: don’t miss out, sign up here.

As we previously announced we are looking forward to our first AI Ethics Happy Hour event – details to follow.

The joint RSS/British Computing Society/Operations Research Society discussions on data science accreditation are picking up again and we are actively involved in these. We also hope to be posting our own version of a basic data science curriculum soon- will keep you posted.

Giles Pavey has been discussing what it takes to build world class data science teams.

Martin Goodson, our chair, continues to run the excellent London Machine Learning meetup and continues to be very active in with virtual events. The next event is on 11th February where Manzil Zaheer, a research scientist at Google, will talk about Big Bird: Transformers for Longer Sequences. Videos are posted on the meetup youtube channel – and future events will be posted here.

Finally, we are really pleased to include a call for contributions to RSS 2021 Conference, 6-9 September in Manchester. The organisers are seeking submissions for contributed talks which can be on any topic related to statistics and data science (deadline April 6th).

Elsewhere in Data Science

Lots of non-Covid data science going on, as always!

Big Government and AI
Governments around world mapping out grand AI plans…

Ethics and more ethics…
Bias, ethics and diversity continue to be hot topics in data science…

AI in Healthcare
Increasing utilisation of AI and machine learning in healthcare…

  • Exciting announcement from the Korea Institute of Science and Technology who have developed a prostate cancer urine screening test using machine learning.
  • Interesting comment published in Nature discussing how recent applications of AI to ageing research are leading to the emergence of the field of longevity medicine.
  • We have seen a number of studies in recent times highlighting the power of deep learning techniques in medical imaging and the automatic assessment of resulting scans- this review article in nature assesses the overall gains over the last decade.
  • As the previous article alludes to, going from prototype to real world production in a healthcare setting is far from simple, and this article from Rachel Thomas of fast.ai highlights some of the underlying issues.
  • Interestingly, the FDA in the US has released an action plan focused on methods for approving AI and Machine Learning based applications in health care in the US.

Developments in Data Science…
As always, lots of new developments…

  • Fresh on the heels of GPT-3, OpenAI have released an amazing application, called DALL-E (Salvador Dali crossed with Pixar’s WALL-E…), a 12 billion parameter version of GPT-3 trained to generate images from text descriptions. You have to try this… Good summary here from MIT Technology Review.
“In the long run, you’re going to have models which understand both text and images. AI will be able to understand language better because it can see what words and sentences mean.”
  • Not to be outdone on the ‘my model has more parameters than your model’ stakes, Google recently announced their Switch Transformer Language Model with 1.6 trillion parameters.
  • Great summary, from Jeff Dean, head of Google AI, of Google’s research output in 2020 (over 800 publications) and what lies ahead for 2021. This is long, but well worth a read as it highlights the amazing breadth and depth of the output from the Google researchers.
"I’m particularly enthusiastic about the possibilities of building more general-purpose machine learning models that can handle a variety of modalities and that can automatically learn to accomplish new tasks with very few training examples"

How does that work?
A new section on understanding different approaches and techniques

Teams, people and production…
Still one of the biggest obstacles…

  • Interesting commentary from Gergely Orosz on the approach to motivating and empowering software engineers in Silicon Valley, very relevant also for Data Scientists and ML engineers.
  • What skills do you really need in your data team? Is it all about the models, or do you need more breadth, both on the business side, and engineering.
  • How do you scale a team at different stages of development? Useful advice here from Peter Gao.
  • If you want to put in place proper monitoring of your ML systems but aren’t quite ready for a full blown MLOps solution, how about giving this a try, from Jeremy Jordan?
  • A pretty bland ‘top x trends in data’ title, but some useful pointers on best practices in building out a a modern data stack

Practical Projects and Learning Opportunities
As always here are a few potential practical projects to while away the socially distanced hours:

Updates from Members and Contributors

  • Adriano Soares Koshiyama highlights what looks like an excellent upcoming UCL webinar on AI in the Judicial System on Feb 25 at 1pm: “In this webinar we welcome Dr Pamela Ugwudike (University of Southampton, Alan Turing Institute) and Charles Kerrigan (CMS partner and global head of Fintech) to present their perspectives from academia and industry”. Register here.
  • Rafael Garcia-Navarro has been doing some impressive work in conversational ai, implementing on top of Metaflow (Netflix’s MLOps framework) – definitely worth a read.
  • Kevin O’Brien draws our attention to a great write-up on the Climate Modeling Alliance (CliMA) project and how they use Julia (“Meet the team shaking up climate models”). Also, don’t forget JuliaCon 2021 Wednesday 28th July to Friday 30th July 2021.

Again, hope you found this useful. Please do send on to your friends- we are looking to build a strong community of data science practitioners- and sign up for future updates here:

Processing…
Success! You're on the list.

– Piers

The views expressed are our own and do not necessarily represent those of the RSS

Andrew Ng Fireside Chat – Sign Up Now!

We are incredibly honoured and excited to be hosting Andrew Ng for a fireside chat on 10th February at 6.30pm (Sign up here)

Many data scientists’ first encounter with Andrew Ng was through his Stanford University machine learning course – which has enrolled almost 4m people! However, some may be unaware that his contribution to AI, machine learning and data science goes much further.

In addition to his Professorship at Stanford University and co-founding of Coursera, he has been one of the most important drivers of the success of Deep Learning over the last decade. He founded and led the “Google Brain” project, which developed massive-scale deep learning algorithms, before moving on to lead Baidu’s 1300 person AI Group, developing technologies in deep learning, speech, computer vision, NLP, and other areas.

More recently he has set up a number of initiatives including DeepLearning.AI, Landing.AI and the AI Fund, focused on promoting the practical use of AI to solve real world problems.

Our fireside chat will focus on technical leadership in artificial intelligence. We’ll be asking Andrew’s advice on:

  • How technical people can become effective AI leaders or entrepreneurs.
  • How to run a successful R&D team for AI product development.
  • How the UK can support a new generation of AI leaders.

The discussion will be hosted by Martin Goodson, Chair of the Data Science Section and CEO of artificial intelligence startup, Evolution AI. The event will be opened by the President of the Royal Statistical Society, Sylvia Richardson.

Don’t miss out on what we are sure will be a compelling discussion- sign up for the event here, and send any topics or questions you would like to see covered to Martin Goodson.

January Newsletter

Hi everyone-

Happy New Year! I hope you have all had a festive holiday period and found some time to catch up on those deep learning research papers you had been meaning to dig into… Fingers crossed 2021 proves better than 2020…. as a start, how about welcoming in the new year with a few curated data science reading materials!

Following is the January edition of our Royal Statistical Society Data Science Section newsletter. Hopefully some interesting topics and titbits to feed your data science curiosity …

As always- any and all feedback most welcome! If you like these, do please send on to your friends- we are looking to build a strong community of data science practitioners.

Industrial Strength Data Science January 2021 Newsletter

RSS Data Science Section

Covid Corner

A new year but sadly not much change in the story – however with vaccinations now actively happening, an end does seem in sight, even if it seems tantalisingly far away.

  • A new strain of COVID-19 materialised in south east England. Although virus mutations happen all the time, this one was important as the strain appears significantly more transmissible. Its prevalence in positive tests appears strongly linked to dramatic rises in new cases.
  • An imperial study modelling the case rates concludes that the new strain “has a transmission advantage of 0.4 to 0.7 in reproduction number compared to the previously observed strain.”
  • This report also highlights how statisticians and data scientists still need to work on the art of communication….
"Using whole genome prevalence of different genetic variants through time and phylodynamic modelling (dynamics of epidemiological and evolutionary processes), researchers show that this variant is growing rapidly."

Yes, quite…

  • And in case you missed it, the UK government managed to lose case data again… Clearly not learned from the last time.
  • There has however been fantastic news on the vaccine front with vaccinations now rolling out around the world. As discussed in our previous newsletter, the mRNA approach used in the Moderna and BioNTech vaccines is huge breakthrough- there is an excellent interview on the Andreesen-Horowitz A16z podcast with Stephane Bancel the Moderna CEO where he goes through the development process in detail, including how they generated the vaccine blueprint within 48 hours of receiving the virus DNA sequence.
We used to grow our vaccines, now we can “print” them.

Committee Activities

We are all conscious that times are incredibly hard for many people and are keen to help however we can- if there is anything we can do to help those who have been laid-off (networking and introductions help, advice on development etc.) don’t hesitate to drop us a line.

As we announced just before Christmas, we are all incredibly excited about our upcoming fireside chat with none-other than Andrew Ng on February 10th – save the date! We want to make the discussion as relevant to our community as possible, so do please send any topics or questions on becoming an AI technical leader to Martin (@martingoodson).

As we previously announced we are looking forward to our first AI Ethics Happy Hour event – details to follow.

The joint RSS/British Computing Society/Operations Research Society discussions on data science accreditation are picking up again and we are actively involved in these. We also hope to be posting our own version of a basic data science curriculum soon- will keep you posted.

Martin Goodson, our chair, continues to run the excellent London Machine Learning meetup and continues to be very active in with virtual events. The next event is on 13th January where Jakob Foerster from FacebookAI will discuss Zero-Shot (Human-AI) Co-ordination. Videos are posted on the meetup youtube channel – and future events will be posted here.

Elsewhere in Data Science

Lots of non-Covid data science going on, as always!

Ethics and more ethics…
Bias, ethics and diversity continue to be hot topics in data science…

Real world data science applications …
All sorts of great applications of data science and machine learning, regularly coming to light.

  • AirBnB have released an elegant new approach to dealing with positional bias in search rankings. If you are learning preferences from historical data, how do you deal the fact that actions (clicks, likes etc) will be influenced by the position rank of the given item?
This creates a feedback loop, where listings ranked highly by previous models continue to maintain higher positions in the future, even when they could be misaligned with guest preferences.

Developments in Data Science…
As always, lots of new developments…

"In short: this module is a neural network that iteratively refines the structure predictions while respecting and leveraging an important symmetry of the problem, namely that of roto translations."
"The results of DeepMind's work are quite astounding and I marvel at what they are going to be able to achieve in the future given the resources they have available to them"

Getting AI into production…
Still one of the biggest obstacles…

"While building good models is important, many organizations now realize that much more needs to be done to put them into practical use, from data management to deployment and monitoring. In 2021, I hope we will get much better at understanding the full cycle of machine learning projects, at building MLOps tools to support this work, and at systematically building, productionizing, and maintaining AI models."

Practical Projects and Learning Opportunities
As always here are a few potential practical projects to while away the socially distanced hours:

Updates from Members and Contributors

  • Mani Sarkar has been busy updating his NLP Profiler python library– he has a useful notebook working through the different features here.
  • Kevin O’Brien draws our attention to JuliaCon 2021 which will be free and virtual with the main conference taking place Wednesday 28th July to Friday 30th July 2021 (workshops will be held the week before). Julia is a high performance dynamic language designed to address the requirements of high-level numerical and scientific computing, and is becoming increasingly popular in Machine Learning and Data Science. Stay up to date on further announcement by joining the JuliaCon 2021 event page on LinkedIn.

Again, hope you found this useful. Please do send on to your friends- we are looking to build a strong community of data science practitioners- and sign up for future updates here:

Processing…
Success! You're on the list.

– Piers

The views expressed are our own and do not necessarily represent those of the RSS

Andrew Ng at the RSS Data Science Section!

We are going to have the great honour of hosting a fireside chat with Andrew Ng in February. The Data Science Section of the Royal Statistical Society have invited Andrew to come and talk to us about how technical people can become leaders in artificial intelligence and data science.

Andrew needs little introduction to the world of Machine Learning and AI. A successful scientist, inventor, writer and huge contributor to a field that we all share a common enthusiasm for.

Our topic of discussion will be the art and science of creating successful RnD teams which are able to deliver business value consistently. We would like to invite some guest questions. Let us know what you’d like to ask Andrew Ng. For example:
– How do you structure an effective R&D team?
– How do you decide what’s important to research?
– How can the UK government support its technical data scientists?

I’m sure you have more ideas!

Please sign up to our mailing list to find out more: 

Processing…
Success! You're on the list.