AutoML is ready to turn developers into data scientists – and vice versa. Here’s how AutoML will completely change data science for the better.
Over the next decade, the role of data scientists as we know will look very different than they are now. But don’t worry, no one anticipates losing your job, it’s just a change.
Data scientists will be fine – according to the Bureau of Labor Statistics, the role is still predicted to grow at a higher than average level until 2029. But advances in technology will be the engine for major shifts in data scientist responsibilities and in how businesses approach overall analysis. And AutoML tools, which help auto-machine learning pipelines from raw data to usable models, will lead this revolution.
In 10 years, data scientists will have completely different skill sets and tools, but their functionality remains the same: acting as confident and capable technology instructors who can understand complex data to solve business problems.
AutoML democratizes data science
Until recently, machine learning algorithms and processes were virtually just the realm of more traditional data science roles – those with formal education and advanced degree or working for large technology corporations. Data scientists have played an invaluable role in every part of the machine learning development spectrum. But over time, their roles will become more cooperative and strategic. With tools like AutoML to automate some of their more academic skills, data scientists can focus on guiding organizations toward solutions to business problems through data.
In many ways, this is because AutoML democratizes its efforts to put machine learning into practice. Providers from startups to cloud super-level service providers have come up with solutions that are easy enough for developers to use and test without major educational or experience barriers to joining. Similarly, some AutoML applications are intuitive enough and simple enough that non-tech people can try their best to create solutions to problems in their own departments —creating a “citizen data scientist” within organizations.
To explore the possibility that these types of tools open up to both developers and data scientists, we must first understand the current state of data science as it relates to machine learning development. This is most understandable when placed on a maturity scale.
Smaller organizations and businesses with more traditional roles in charge of digital transformation (i.e. uns trained data scientists) are often at the bottom of this scale. Currently, they are the largest customers for machine learning applications available, aimed at subjects unfamiliar with the complexity of machine learning.
Pros: These turnkey applications tend to be easy to implement, relatively inexpensive, and easy to deploy. For smaller companies with a very specific process for automation or innovation, there may be some viable options in the market. The low entry barrier makes these applications perfect for data scientists who first began machine learning research. Because some applications are very intuitive, they even allow non-technical employees the opportunity to test automation and advanced data —potentially introducing a valuable sandbox into an organization.
Cons: This type of machine learning application is notoriously inactive. While they can be easy to do, they’re not easily customized. Therefore, certain levels of accuracy may be impossible for certain applications. In addition, these applications may be severely limited due to dependence on pre-trained models and data.
Examples of these apps include Amazon Comprehend, Amazon Lex, and Amazon Forecast from Amazon Web Services and Azure Speech Services and Azure Language Understanding (LUIS) from Microsoft Azure. These tools are often enough for developing data scientists to take the first steps in machine learning and take their organization deeper into maturity.
Customizable solutions with AutoML
Organizations with large but relatively common data collections —think customer transaction data or marketing email indicators — need more flexibility when using machine learning to solve problems. Type AutoML. AutoML performs the steps of machine learning manually (data discovery, discovery data analysis, hypersolycous adjustment, etc.) and condenses them into a configurable stack.
Pros: AutoML apps allow for more testing on data in a larger space. But the real power of AutoML is accessibility – it is possible to build custom configurations and inputs that can be tweaked relatively easily. Moreover, AutoML is not created specifically for data scientists as an audience. Developers can also easily tinker in sandboxes to include machine learning elements in their own products or projects.
Cons: Although it has come close, the limitations of AutoML mean that the accuracy of the results will be difficult to perfect. Therefore, data scientists who carry cards, hold the degree of commonality of applications built with the help of AutoML – even if the results are accurate enough to solve the problem at hand.
Examples of these apps include Amazon SageMaker AutoPilot or Google Cloud AutoML. Data scientists after a decade will surely need to get acquainted with tools like this. Like a developer fluent in multiple programming languages, data scientists will need to be fluent in multiple AutoML environments to be considered top talent.
Machine learning solutions “hand-made” and “home garden plants”
The largest enterprise size businesses and Fortune 500 companies are where the most advanced and proprietary machine learning applications are currently being developed. The data scientists at these organizations are part of large teams that perfect machine learning algorithms using a lot of the company’s historical data and build these applications in the first place. Custom applications like this are only possible with considerable resources and talent, which is why rewards and risks are enormous.
Pros: Like any application built from scratch, custom machine learning is “the most advanced” and builds on a deep understanding of the current problem. It’s also more accurate – if only by small margins – than AutoML and proprietary machine learning solutions.
Cons: For a custom machine learning application to reach a certain threshold of accuracy can be extremely difficult and often requires advanced groups of data scientists. In addition, customized machine learning options are the most time-consuming and costly to develop.
An example of a hand scrolling machine learning solution is to start with a blank Jupyter notebook, enter data manually, and then proceed step by step from discovery data analysis through manually adjusting the model. This is usually achieved by writing custom code using open-source machine learning frameworks such as Scikit-learning, TensorFlow, PyTorch, and more. This approach requires a high level of experience and intuition but can produce results that are often better than both turnkey machine learning and AutoML
Tools like AutoML will change the role and responsibilities of data science over the next 10 years. AutoML shouldered the responsibility of developing machine learning from the heads of data scientists and instead put the capabilities of machine learning directly into the hands of other problem solve solvents. With time freed to focus on what they know – data and the inputs themselves – data scientists over the next decade will serve as even more valuable guidelines for their organization.
Eric Miller is senior director of engineering strategy at Rackspace, where he provides strategic advisory leadership with a proven record of building practices in the Amazon Partner Network (APN) ecosystem. As an outstanding technology leader with 20 years of proven success in enterprise IT, Eric has led a number of solution and AWS architecture initiatives, including the AWS Well-Architected Framework (WAF), Amazon EC2 AWS Service Delivery Program for Windows Server, and a series of AWS rewritten organizations for billions of dollars.