July Newsletter

Hi everyone-

Not sure what happened to June – seemed to fly by – I know there were some lovely sunny days but then it got cold again… fingers crossed summer it’s not over already! … How about a few curated data science reading materials for reading in the garden, rain or shine?

Following is the July edition of our Royal Statistical Society Data Science Section newsletter. Hopefully some interesting topics and titbits to feed your data science curiosity … We are continuing with our move of Covid Corner to the end to change the focus a little.

As always- any and all feedback most welcome! If you like these, do please send on to your friends- we are looking to build a strong community of data science practitioners. And if you are not signed up to receive these automatically you can do so here.

Industrial Strength Data Science July 2021 Newsletter

RSS Data Science Section

Committee Activities

We are all conscious that times are incredibly hard for many people and are keen to help however we can- if there is anything we can do to help those who have been laid-off (networking and introductions help, advice on development etc.) don’t hesitate to drop us a line.

We are working on releasing the video and a summary of the latest in our ‘Fireside chat’ series- an engaging and enlightening conversation with with Anthony Goldbloom, founder and CEO of Kaggle. We will post a link when it is available.

We have released a survey to our readers and members focused on the UK Government’s proposed AI Strategy. We are passionate about making sure the government focuses on the right things in this area, and feel like, as the organisation representing technical Data Science and AI practitioners, we need to make sure our voice is heard. If you havn’t already, please give us your thoughts by participating here.

The full programme for this year’s RSS Conference, which takes place in Manchester from 6-9 September, has been confirmed.  The programme includes keynote talks from the likes of Hadley Wickham, Bin Yu and Tom Chivers.  Registration is open with early-bird discounts available until Friday 4 June. 

Martin Goodson, our chair, continues to run the excellent London Machine Learning meetup and is very active in with virtual events. On June 30th, the meetup hosted Frank Willet (Research Scientist at Stanford University) for a talk titled “High-performance brain-to-text communication via handwriting“. Videos are posted on the meetup youtube channel – and future events will be posted here.

This Month in Data Science

Lots of exciting data science going on, as always!

Ethics and more ethics…
Bias, ethics and diversity continue to be hot topics in data science…

Imagine a world where a state government, or other actor, can realistically manipulate images to show either nothing there or a different layout
"68% chose the option declaring that ethical principles focused primarily on the public good will not be employed in most AI systems by 2030"
Our method will facilitate deepfake detection and tracing in real-world settings, where the deepfake image itself is often the only information detectors have to work with.

Developments in Data Science…
As always, lots of new developments…

In remote sensing images, we can use temporal information to obtain pairs of images from the same location at different points in time, which we call seasonal positive pairs. Seasonal changes provide more semantically meaningful content than artificial transformations, and remote sensing images provide this natural augmentation for free.
  • Facebook have released ‘TextStyleBrush’ allowing you to emulate a text style in an image using just a single word
  • Generating realistic synthetic video is computationally intensive – new work out of UC Berkeley, called VideoGPT, uses novel approaches to make the whole process more efficient, allowing anyone to generate video on a standalone computer.
  • A Chinese Lab is challenging the supremacy of Google and OpenAI in the language model space with a model containing 1.7 trillion parameters. Interestingly, the original article seems to have been removed – although copies are still available online, with more technical details:
The Chinese lab claims that Wudao's sub-models achieved better performance than previous models, beating OpenAI’s CLIP and Google’s ALIGN on English image and text indexing in the Microsoft COCO dataset
"Will better engineering produce CNNs [Convolutional Neural Networks] that understand sameness and difference in the generalizable way that children do? Or are CNNs’ abstract-reasoning powers fundamentally limited, no matter how cleverly they’re built and trained?"

Real world applications of Data Science
Lots of practical examples making a difference in the real world this month!

  • I’m not familiar with the underlying challenge, but I understand that this is a big breakthrough (nature paper here) : a team at Google has automated the design of the physical layout of computer chips using deep reinforcement learning.
  • This is pretty compelling- well worth a read: Facebook AI have released details of their advanced object recognition system which allows consumers to shop items from images. It uses an elegant compound approach, modelling the objects and attributes separately as well as multi-modal signals. Also good to see they are attempting to avoid bias by building an monitoring the models appropriately:
"As part of our ongoing efforts to improve the algorithmic fairness of models we build, we trained and evaluated our AI models across subgroups, including 15 countries and four age buckets."
“Welcome to Hardcore High School” bellowed the script kiddo. We had just gotten to the kindergarten level when the music and lights began to blink. I frowned. “What is that?”
“Beats me” said the A.I. As he walked down the halls, mimicking the sounds of the various musical instruments, he fiddled with the script kiddo a bit. “Welcome to Hardcore High School” He said again, a bit more softly this time.

How does that work?
A new section on understanding different approaches and techniques

Getting it live
How to drive ML into production

"On a daily average, there are over 4,000 models at Facebook running on PyTorch"
  • The importance of Data preparation and curation in the ML lifecycle is highlighted in this piece on Data Cascades from Google Research.
"One of the most common causes of data cascades is when models that are trained on noise-free datasets are deployed in the often-noisy real world. For example, a common type of data cascade originates from model drifts, which occur when target and independent variables deviate, resulting in less accurate models"

From Prediction to Decision
The art and science of decision making

  • Lovely extended essay from Hannah Fry on the history of graphs and how they help us understand data and make decisions
  • An excellent article published in HBR from Michael Ross on why company investments in AI often don’t generate the gains they expect (the asymmetric cost function is particularly interesting)
(1) They don’t ask the right question, and end up directing AI to solve the wrong problem. 
(2) They don’t recognize the differences between the value of being right and the costs of being wrong, and assume all prediction mistakes are equivalent. 
(3) They don’t leverage AI’s ability to make far more frequent and granular decisions, and keep following their old practices

Practical Projects and Learning Opportunities
As always here are a few potential practical projects to keep you busy:

Covid Corner

Again, more positive progress in the UK on the Covid front with 45m people now having received their first vaccine dose and over 30m fully vaccinated. However, the new Delta variant originating in India is cause for concern and case rates and hospitalisations are now rising again.

Updates from Members and Contributors

Everyone must be out enjoying themselves as no specific updates from members and contributors this month- let me know if you’d like to include anything here next month.

Again, hope you found this useful. Please do send on to your friends- we are looking to build a strong community of data science practitioners- and sign up for future updates here.

– Piers

The views expressed are our own and do not necessarily represent those of the RSS

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: