More and more companies demand data scientists to apply statistical methods and technical computing tools in their design processes. Perhaps the solution for the shortage lies in the engineering community, writes Stéphane Marouani, Country Manager at MathWorks Australia.
According to Deloitte, the number of data science workers in Australia will balloon to 338,800 in the financial year 2021-22, up from 300,900 in 2016-2017, at an average annual growth rate of 2.4 per cent.
This outpaces the 1.5 per cent annual growth for the Australian labour force as a whole over the same period. Therefore, amid the digital skills gap, it is no surprise that data scientists with business acumen are in such high demand they are earning almost three times Australia’s average salary. There aren’t enough people with the knowledge to fill these roles.
Companies are looking for data scientists who have computer science skills, knowledge of statistics and domain expertise relevant to their specific business problems. These types of candidates are proving elusive, but companies may find success by focusing on the latter.
This third skill — domain expertise about the business — is often overlooked. Domain expertise is required to make judgement calls during the development of an analytic model. It enables one to distinguish between correlation and causation, between signal and noise, between an anomaly worth further investigation and oh yeah, that happens sometimes.
Domain knowledge is hard to teach: It requires on-the-job experience, mentorship, and time to develop. This type of expertise is often found in engineering and research departments that have built cultures around understanding the products they design and build. These teams are intimately familiar with the systems they work on.
They often use statistical methods and technical computing tools as part of their design processes, which makes a jump to the machine-learning algorithms and big data tools of the data analytics world manageable.
Leaping into new territory
With data science emerging across industries as an important differentiator, these engineers with domain knowledge need flexible and scalable environments that put the tools of the data scientist at their fingertips.
Depending on the problem, they might need traditional analysis techniques such as statistics and optimisation, data-specific techniques such as signal processing and image processing, or newer capabilities such as machine learning algorithms.
The cost of learning a new tool for each technique would be high, so having these tools together in one environment becomes very important.
So, a natural question to ask is: How can newer techniques like machine learning be made accessible to engineers with domain expertise?
The goal of machine learning is to identify the underlying trends and structure in data by fitting a statistical model to that data.
When working with a new dataset, it’s hard to know which model is going to work best; there are dozens of popular models to choose from — and thousands of less-popular choices. Trying and comparing several different model types can be very time-consuming when using ‘bleeding edge’ machine-learning algorithms.
Each of these algorithms will have an interface that is specific to the algorithm and preferences of the researcher who developed it. Significant amounts of time will be required to try many different models and compare approaches.
One solution is an environment that makes it easy for engineers to try the most-trusted machine-learning algorithms and encourages best practices such as preventing over-fitting.
Making the most of machine-learning
For example, process engineers at a large semiconductor manufacturing company were considering new ways to ensure alignment between the layers on a wafer. They came across machine-learning as a possible way to predict overlay between layers but, as process engineers, they didn’t have experience with this newer technique.
Working through different machine learning examples, they were able to identify a suitable machine-learning algorithm, train it on historical data, and integrate it into a prototype overlay controller.
Using the latest tools meant these process engineers had the ability to apply their domain expertise to build a model that could identify systematic and random errors that might otherwise go undetected.
According to research and advisory company Gartner, engineers with domain expertise can “bridge the gap between mainstream self-service analytics by business users and the advanced analytics techniques of data scientists.
“They are now able to perform sophisticated analysis that would previously have required more expertise, enabling them to deliver advanced analytics without having the skills that characterise data scientists.”
A solution already at hand
As technology continues to evolve, organisations must quickly ingest, analyse, verify, and visualise a tsunami of data to deliver timely insights to capitalise on business opportunities.
Instead of spending time and money searching for those elusive data scientists, companies can stay competitive by enabling their engineers to do data science with a flexible tool environment that enables engineers and scientists to become data scientists — opening up access to the data for more people.