June Newsletter

Hi everyone-

It’s June already – time flies – and in the UK an extra bank holiday! Perhaps the data science reading materials below might help fill the void now the Jubilee celebrations have finished …

Following is the June edition of our Royal Statistical Society Data Science and AI Section newsletter. Hopefully some interesting topics and titbits to feed your data science curiosity.

As always- any and all feedback most welcome! If you like these, do please send on to your friends- we are looking to build a strong community of data science practitioners. And if you are not signed up to receive these automatically you can do so here.

Industrial Strength Data Science June 2022 Newsletter

RSS Data Science Section

Committee Activities

Committee members continue to be actively involved in a joint initiative between the RSS and various other bodies (The Chartered Institute for IT (BCS), the Operational Research Society (ORS), the Royal Academy of Engineering (RAEng), the National Physical Laboratory (NPL), the Royal Society and the IMA (The Institute of Mathematics and its Applications)) in defining standards for data scientist accreditation, with a plan to launch the Advanced Certificate in the summer.

We will also shortly be announcing details of our next meetup – watch this space!

Janet Bastiman (Chief Data Scientist at Napier AI) recently spoke at the FinTech FinCrime Exchange Conference (FFECON) in a panel session entitled “With great AI power comes great FinCrime responsibility”, discussing how AI implementations can go wrong and what we need to do about it.

The RSS is running an in-person Discussion Meeting on Thursday June 16th at the Errol Street headquarters: “Statistical Aspects of the Covid-19 Pandemic”. Register here for free attendance.

The full programme is now available for the September RSS 2022 Conference. The Data Science and AI Section is running what will undoubtedly be the best session(!) … ‘The secret sauce of open source’, which will discuss using open source to bridge the gap between academia and industry. An early booking registration discount is available until 6 June for in-person attendance at the conference and 20 June for viewing content via the online conference platform.

Martin Goodson (CEO and Chief Scientist at Evolution AI) continues to run the excellent London Machine Learning meetup and is very active with events. The next event is on June 15th when Ting Chen from Google Brain, will discuss Pix2Seq, “A new language interface for object detection“. Videos are posted on the meetup youtube channel – and future events will be posted here.

This Month in Data Science

Lots of exciting data science going on, as always!

Ethics and more ethics…
Bias, ethics and diversity continue to be hot topics in data science…

"After three separate experiments, the researchers found the AI-created synthetic faces were on average rated 7.7% more trustworthy than the average rating for real faces... The three faces rated most trustworthy were fake, while the four faces rated most untrustworthy were real, according to the magazine New Scientist."
"The settlement, filed Monday in a federal court in Illinois, bars the company from selling its biometric data to most businesses and private firms across the U.S. The company also agreed to stop offering free trial accounts to individual police officers without their employers' knowing or approving, which had allowed them to run searches outside of police departments' purview"
"Even when you filter medical images past where the images are recognizable as medical images at all, deep models maintain a very high performance. That is concerning because superhuman capacities are generally much more difficult to control, regulate, and prevent from harming people."
"This brief focuses on three sub-areas within “AI safety,” a term that has come to refer primarily to technical research (i.e., not legal, political, social, etc. research) that aims to identify and avoid unintended AI behavior. AI safety research primarily seeks to make progress on technical aspects of the many socio-technical challenges that have come along with progress in machine learning over the past decade."
"The AI industry does not seek to capture land as the conquistadors of the Caribbean and Latin America did, but the same desire for profit drives it to expand its reach. The more users a company can acquire for its products, the more subjects it can have for its algorithms, and the more resources—data—it can harvest from their activities, their movements, and even their bodies."
"The answers are complex and depend to some extent on your exact threat models, but if you want a summary of the advice I usually give it boils down to:
 - Treat your training data like you do your traditional source code.
 - Treat your model files like compiled executables."

Developments in Data Science…
As always, lots of new developments on the research front and plenty of arXiv papers to read…

"Another class of specification gaming examples comes from the agent exploiting simulator bugs. For example, a simulated robot that was supposed to learn to walk figured out how to hook its legs together and slide along the ground."
"A lot of the existing video models have poor quality (especially on long videos), require enormous amounts of GPUs/TPUs, and can only solve one specific task at a time (only prediction, only generation, or only interpolation). We aimed to improve on all these problems. We do so through a Masked Conditional Video Diffusion (MCVD) approach."
A much broader segment of the AI community needs access to these models in order to conduct reproducible research and collectively drive the field forward. With the release of OPT-175B and smaller-scale baselines, we hope to increase the diversity of voices defining the ethical considerations of such technologies.
  • DeepMind has been at its ground breaking best again …
    • Firstly with Flamingo which elegantly combines visual and text user feedback to refine responses
    • And perhaps most impressively with Gato, a single generalist agent
The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens.
  • Real world applications of reinforcement learning can still be hard to come by despite the progress at DeepMind. One promising approach is Offline RL (which utilises historic data) – looks like BAIR (Berkley Artificial Intelligence Research) has made good progress
"Let’s begin with an overview of the algorithm we study. While lots of prior work (Kumar et al., 2019; Ghosh et al., 2021; and Chen et al., 2021) share the same core algorithm, it lacks a common name. To fill this gap, we propose the term RL via Supervised Learning (RvS). We are not proposing any new algorithm but rather showing how prior work can be viewed from a unifying framework"

Real world applications of Data Science
Lots of practical examples making a difference in the real world this month!

Advocates like Mr. Ward look to beneficial, low-cost, intermediate technologies that are available now. A prime example is intelligent speed assistance, or I.S.A., which uses A.I. to manage a car’s speed via in-vehicle cameras and maps. The technology will be mandatory in all new vehicles in the European Union beginning in July, but has yet to take hold in the United States.
At Google, we’re always dreaming up new ways to help you uncover the information you’re looking for — no matter how tricky it might be to express what you need. That’s why today, we’re introducing an entirely new way to search: using text and images at the same time. With multisearch in Lens, you can go beyond the search box and ask questions about what you see.

More DALL-E fun..
A one off section on everyone’s favourite image generation tool, DALL-E

  • Last month we highlighted the amazing examples of images generated from text prompts using OpenAI’s DALL-E 2. There’s been lots more commentary so we’ve pulled it together in one place…
  • First of all, an update from OpenAI – apparently early users have generated over 3m images to date.
  • How does it actually work- good breakdown of the underlying methods here.
  • A different take on DALL-E and what it means for design and a potential ‘vibe-shift’ – well worth a read.
  • Another great take- this time exploring how DALL-E seems to combine objects in ways that make sense but that can’t be known from the words themselves.
  • Finally, watch out DALL-E, here comes IMAGEN from the Google Brain team
"A marble statue of a Koala in front of a marble statue of a turntable. The Koala has large marble headphones"

How does that work?
Tutorials and deep dives on different approaches and techniques

"Graphs are a convenient way to abstract complex systems of relations and interactions. The increasing prominence of graph-structured data from social networks to high-energy physics to chemistry, and a series of high-impact successes have made deep learning on graphs one of the hottest topics in machine learning research"
"Recommender systems work well when we have a lot of data on user-item preferences. With a lot of data, we have high certainty about what users like. Conversely, with very little data, we have low certainty. Despite the low certainty, recommenders tend to greedily promote items that received higher engagement in the past. And because they influence how much exposure an item gets, potentially relevant items that aren’t recommended continue getting no to low engagement, perpetuating the feedback loop."
"The goal of structural optimization is to place material in a design space so that it rests on some fixed points or “normals” and resists a set of applied forces or loads as efficiently as possible."
"This article outlines different methods for creating confidence intervals for machine learning models. Note that these methods also apply to deep learning. This article is purposefully short to focus on the technical execution without getting bogged down in details; there are many links to all the relevant conceptual explanations throughout this article."
"My team spent many hours debating the most important concepts to teach. We developed extensive syllabi for various topics and prototyped course units in them. Sometimes this process helped us realize that a different topic was more important, so we cut material we had developed to focus on something else. The result, I hope, is an accessible set of courses that will help anyone master the most important algorithms and concepts in machine learning today — including deep learning but also a lot of other things — and to build effective learning systems." 

Practical tips
How to drive analytics and ML into production

"For example, when you’re in a BI tool like Looker, you inevitably think, “Do I trust this dashboard?” or “What does this metric mean?” And the last thing anyone wants to do is open up another tool (aka the traditional data catalog), search for the dashboard, and browse through metadata to answer that question.." 
"I actually don’t care that much about the bundling argument that I will make in this post. Truthfully, I just want to argue that feature stores, metrics layers, and machine learning monitoring tools are all abstraction layers on the same underlying concepts, and 90% of companies should just implement these “applications” in SQL on top of streaming databases."
"At its core, data storytelling is about taking the step beyond the simple relaying of data points. It’s about trying to make sense of the world and leveraging storytelling to present insights to stakeholders in a way they can understand and act on. As data scientists, we can inform and influence through data storytelling by creating personal touch points between our audience and our analysis."

Bigger picture ideas
Longer thought provoking reads – lean back and pour a drink!

"But this morning I woke to a new reification, a Twitter thread that expresses, out loud, the Alt Intelligence creed, from Nando de Freitas, a brilliant high-level executive at DeepMind, Alphabet’s rightly-venerated AI wing, in a declaration that AI is “all about scale now.” Indeed, in his mind (perhaps deliberately expressed with vigor to be provocative), the harder challenges in AI are already solved. “The Game is Over!”, he declares"
"It is a tale told by an idiot, full of sound and fury, signifying nothing". —Macbeth

"AI-generated artwork is the same as a gallery of rock faces. It is pareidolia, an illusion of art, and if culture falls for that illusion we will lose something irreplaceable. We will lose art as an act of communication, and with it, the special place of consciousness in the production of the beautiful."
"AIs will make increasingly complex and important decisions, but they may make these decisions based on different criteria that could potentially go against our values. Therefore, we need a language to talk to AI for better alignment. "
"But the algorithmic summaries could make errors, include outdated information or remove nuance and uncertainty, without users appreciating this. If anyone can use LLMs to make complex research comprehensible, but they risk getting a simplified, idealized view of science that’s at odds with the messy reality, that could threaten professionalism and authority. It might also exacerbate problems of public trust in science."

Fun Practical Projects and Learning Opportunities
A few fun practical projects and topics to keep you occupied/distracted:

Covid Corner

Apparently Covid is over – certainly there are very limited restrictions in the UK now

  • The latest results from the ONS tracking study estimate 1 in 60 people in England have Covid. This is at least moving in the right direction compared to couple of weeks ago, when it reached 1 in 14… Still a far cry from the 1 in 1000 we had last summer.

Updates from Members and Contributors

Jobs!

A new section highlighting relevant job openings across the Data Science and AI community (let us know if you have anything you’d like to post here…)

  • EvolutionAI, are looking to hire someone for applied deep learning research. Must like a challenge. Any background but needs to know how to do research properly. Remote. Apply here
  • AstraZeneca are looking for a Data Science and AI Engagement lead – more details here
  • Cazoo is looking for a number of senior data engineers – great modern stack and really interesting projects!

Again, hope you found this useful. Please do send on to your friends- we are looking to build a strong community of data science practitioners- and sign up for future updates here.

– Piers

The views expressed are our own and do not necessarily represent those of the RSS

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: