A Brief History: 70 Years of Machine Learning

Whether it's leaping ahead in quality assurance for telecommunications companies, increasing Right First Time rates for critical infrastructure, or bolstering operational efficiency, machine learning (ML) has become an essential component of the modern business world.

Since the boom of ChatGPT at the beginning of 2023, Artificial Intelligence (AI) has climbed to the top of mind in the private and public spheres. AI and ML implementation is increasing across a multitude of use cases and industries, while governments and regulatory authorities invest in reports on the potential of AI and host summits about its practical ramifications.

But AI and ML did not come into being with Sam Altman and OpenAI. To understand the current state of machine learning and where it is going, it’s important to know how it began.

What is Machine Learning (ML)?

Machine learning, also referred to as the cybernetic mind or electrical brain, is a subset of AI in which statistical models and algorithms are used to mimic the way a human mind learns information. After being trained on data sets and learning to notice patterns, ML programs are designed to make decisions based on analysis.

Ancient AI

Following the Second World War, mathematicians and scientists began exploring the creation of AI and ML. Alan Turing, the famous computer scientist who broke the Nazi’s Enigma Code during WWII and is credited as the Father of AI, wrote about the possibility for machines to mimic human thinking through computations based on available data and reason in his 1950 paper Computing Machinery and Intelligence.

Throughout the 1950s, the first major experiments in artificial intelligence took place. While the groundwork for what would become modern machine learning was laid more than 70 years ago, it struggled with the technological limitations of the day. Besides lacking the sheer computing power needed for complex calculations, old computers could only execute commands and not follow them, an important aspect of modern machine learning.

Computers were also very expensive, with one report stating that renting a computer for a month could run a €150,000 bill. This meant the development of AI was out of reach for most. By 1956, a group of scientists presented a proof of concept called the Logic Theorist, the first artificial intelligence program ever created. Though research progressed into the 1970s, scientists realised that computers were still too weak to function at the level needed for the models and algorithms being created. Due to this, AI research gradually became less prominent.

The Artificial Intelligence & Machine Learning Schism

Through the 1960s and 1970s, machine learning and neural networks were being used to train AI and were a fundamental part of the subject. But towards the end of the 1970s, AI researchers moved away from ML research, forcing the ML community to establish itself as a separate branch of study parallel to the AI field.

While large-scale development of ML wasn't prevalent in the 1980s, important theoretical contributions were made. One prime example is the introduction of Example Based Learning (EBL) by Gerald Dejong. One of the critical moments in the progression of ML was the rise of the internet in the 1990s. Thanks to the vast increase in data accessibility enabled by the World Wide Web and the steady increase of compute power, a resurgence occurred in ML research that enabled one of its first major breakthroughs.

In 1997, scientific history was made when IBM’s Deep Blue chess program beat grandmaster world chess champion Gary Kasparov. Thanks to the guiding light of Moore’s Law, computational power had increased exponentially since the early days of AI research, making this ML victory possible. For the first time, people saw machine intelligence outperform peak human ability.

What is "Moore's Law?"

Moore's Law is a hypothesis put forth by former Intel CEO-Gordon Moore in 1965 which predicted the number of transistors that can be installed on a microchip will double every two years, resulting in a doubling of compute speed and power. Moore also speculated the cost to construct each component would halve in the same time period, resulting in better value for more power. This prediction has been remarkably accurate over the past 50 years, as demonstrated in the following graph.

The Classical Machine Learning Era

The beginning of the Classical ML Era is around 2005. Classical machine learning — also known as statistical learning — uses models or algorithms to analyse massive data sets, identify patterns, and make predictions. Common models include linear regression, logistic regression, decision trees, random forests, k-nearest neighbours algorithms, and support vector machines, just to name a few.

Unlike the deep learning neural network models that would be subsequently introduced, classical machine learning models are less computationally cumbersome. They rely more on the quality of the training data set than deep learning models. Due to their more straightforward training and operational processes, these models are considered "Explainable AI."

This era is also referred to as the "Recommendation Era" due to the growth of recommendation algorithms embedded in search engines like Google or YouTube. ML was also being deployed for facial recognition and fraud detection during this time.

What is "Explainable Artificial Intelligence?"

Also referred to as XAI, explainable artificial intelligence are models whose algorithms and decision-making processes can be easily traced by people. This allows for transparency in how a model arrives at a specific determination and reveals potential bias. In many regulated industries, XAI is the only type of ML that is legally allowed.

The Deep Learning Era

Beginning in the 2010s, The Deep Learning Era is marked by the adoption of neural network models, convolutional neural networks, and the need for Graphic Processing Units (GPUs) instead of Central Processing Units (CPUs). Deep learning is a subset of machine learning in which a neural network has at least three or more layers. The added layers allow models to have greater accuracy, train more efficiently on a larger set of data and process complex computing.

What is an "Artificial Neural Network?"

Artificial neural networks/neural networks, commonly abbreviated to ANNs, are a subset of machine learning and an essential part of deep learning. ANNs are designed to mimic the architecture of a biological neural network, such as that found in the human brain. Layers of interconnected nodes/neurons signal to each other, creating a dynamic system from which computers can gradually learn from data and improve their accuracy.

By 2007, the deep learning technique called long short-term memory (LSTM) — a neural network model that can learn from events buried thousands of steps back in its memory — had begun outperforming traditional speech recognition programs. Google adopted this model in 2015, resulting in a 49 per cent decrease in transcription errors on Google Voice.

One of the key developments of this era was the large-scale adoption of the Graphics Processing Unit (GPU) instead of the Central Processing Unit (CPU). The simple difference between the two hardware is that CPUs are good at performing fast and linear actions (known as sequential computing) while GPUs can process a multitude of different actions at once (known as parallel computing). CPUs were therefore viable in the early days of machine learning, but GPUs are necessary to perform the layered commands associated with deep learning and convolutional neural networks.

As predicted in Moore's Law and demonstrated in the following graph, the value-to-performance ratio of computing component's such as GPUs has improved over time. Starting in 2006, NVIDIA's GeForce 7900 GS AGP had a performance value of 54 million floating-point operations per second (FLOP/s) per US dollar. By 2021, that value had risen to 42.59 billion FLOP/s/$ with NVIDIA's GeForce RTX 3080 GPU.

This period also saw the open sourcing of AI frameworks. As more foundational models became accessible through programs like TensorFlow by Google, PyTorch by Meta, MXnet by AWS, researchers working on ML theory and algorithms were able to transfer their theoretical work more easily into practical systems.

This helped democratize machine learning and allowed it to grow exponentially in use cases, with some predictions that open-source models will develop faster than closed-source models. Some of the applications that arose from the Deep Learning Era include sentiment analysis, spam and bot detection through text, mail and post, as well as computer vision.

Use Cases Expand: Computer Vision

Thanks to increases in efficiency and capability, the Deep Learning Era marked an expansion of use cases and practical implementations of ML models. One of the main commercial uses became computer vision, a field of AI and computer science that interprets visual data.

It involves algorithms and systems that allow machines to process and analyze visual information from mediums such as images or video. Artificial neural networks are trained on data sets to do anything from learning to identify a pencil to categorizing and analyzing all the visible parts of a car’s engine and their level of degradation.

Inveniam’s computer vision product is used by field workers across several critical infrastructure industries, such as telecommunications and utilities. Our AI analyses visual data to provide real-time insight. We give field technicians near-instant feedback on their work, drive predictive maintenance through asset monitoring, and speed up build completion audits in areas such as telecommunications.

Computer vision has a wide range of implementations and is being used in self-driving car technology, security, on assembly lines, for wildfire management, and in medicine, among many others.

The Large-Scale Era

Starting around 2015, the Large-Scale Era is characterised by large corporations running AI systems and a jump in training compute to 2-3 orders of magnitude bigger than the Deep Learning Era systems of the previous years. As interest in deep learning methods grew, cloud providers invested in their architectures to meet the demand.

Big data — data sets that are too large to be managed by any form of analogue system — and the maturity of tools for managing these data sets continue to play a crucial role in machine learning advancements. Another sea change occurred in 2017 when Google released its hallmark paper “Attention is all you need.”

The research paper introduced the “Transformer,” a neural network architecture that has become the standard for natural language processing tasks. The Transformer is the basis for ChatGPT and other generative AI products.

Current and Future Trends

For all the impact AI and ML had on the world between 1950 and today, it remained under the radar of the general public until November 2022, when OpenAI released ChatGPT. At first, ChatGPT’s release was little noticed, but within a few months, it had become a worldwide phenomenon, boasting five million visits per day.

The era of generative AI and the race to innovate and integrate had begun. Now, machine learning and AI are permeating throughout modern life. From providing autonomous customer service operators, improving physical builds through computer vision, or making your resume shine with ChatGPT, these technologies are here to stay.

But there is still a long way to go before we uncover all the latent benefits of ML adoption. Multi-scale learning is only starting and is still limited by computing resources, unsupervised learning approaches have yet to see large-scale deployment, and quantum machine learning and quantum computers are still in early development.