An Introduction to Computer Vision

What is computer vision?

Computer vision is a field of computer science that allows machines to interpret and understand the visual world. The process involves the development of algorithms and techniques that enable computers to pinpoint crucial information from digital data like images or videos.

The ultimate goal of computer vision is to replicate human vision capabilities in machines to allow them to perceive, analyse, and make decisions based on visual data. The process involves utilising computer vision to train machines to interpret visual information through different methods.

‍

Image recognition – identifying and categorising content in images.
Object detection – locating objects within an image.
Motion analysis – tracking changes and movements across frames.
Image processing – enhancing or extracting information from images.

‍

These methods help computers to recognise and categorise objects, understand their spatial relationships, and follow their movements over time. With the advancements in artificial intelligence (AI) and machine learning (ML), computer vision leads to the development of sophisticated systems capable of performing complex visual tasks and detecting anomalies with high accuracy, speed, and no human errors.

The beginning of computer vision

The concept of computer vision began in the late 1950s with the development of the first digital image scanner. This technology allowed computers to convert images into grids of numbers, enabling them to recognise and process visual data. Around the same time, Lawrence "Larry" Roberts, known as the father of computer vision, explored the possibilities of extracting 3D geometrical information from 2D images.

Researchers delved into low-level vision tasks in the following decades, like segmentation and object detection. Multiple frameworks and methods emerged, including recognition-by-components, which suggested that objects could be recognised by breaking them down into their main components. As a result of the groundwork laid during this period, computer vision has advanced rapidly over the years.

How does computer vision work?

In the 1980s, neuroscientist David Marr established the hierarchical nature of vision and introduced algorithms for detecting basic shapes like edges, corners, and curves. Computer scientist Kunihiko Fukushima developed Neocognitron, a network of cells capable of pattern recognition. T

hese advancements laid the foundation for convolutional neural networks (CNNs), which have become a popular choice for computer vision applications. In 2000, a shift to object recognition led to real-time face recognition applications. The availability of large-scale labelled datasets, such as ImageNet, propelled the progress of computer vision. ImageNet, with its millions of tagged images, provided a solid foundation for training CNNs and deep learning (a subtype of machine learning) models.

CNNs process visual inputs by analysing pixels and identifying patterns to understand images. They perform a series of mathematical operations known as convolutions, which help the model to predict and interpret what it's seeing. In essence, CNNs are a class of deep learning algorithms specifically designed for processing structured array data – visuals like images. Their sophisticated pattern recognition capabilities enable high-quality results in computer vision tasks.

Computer vision application across different sectors

The application of computer vision is vast and diverse, impacting different industries. Let's explore some of the key use cases.

Telecommunications sector

Telcos are leveraging computer vision technology to enhance their operations, improve infrastructure maintenance and enhance customer experiences. One important application is the use of video and photo analytics for network monitoring and maintenance. By deploying computer vision algorithms, telcos can automatically detect and analyse network infrastructure, identify potential issues, and predict maintenance needs, thereby improving network reliability and performance.

The integration of computer vision in the telco sector optimises operational processes and drives the development of new services and revenue streams. According to a report by Allied Research Group, starting from 2021, the value of AI in telecommunications is estimated to be $38.8 billion by 2031. With the continuous advancements in computer vision capabilities, telcos can develop their network infrastructure to provide better services to their customers and differentiate themselves in a highly competitive market.

Automotive industry

The automotive industry has embraced computer vision to enhance safety and introduce autonomous driving. Companies like Tesla have implemented computer vision algorithms, including (but not limited to) CNNs, to develop advanced driver-assistance systems. Tesla's Autopilot features, including lane centring and self-parking, are a direct result of computer vision applications. Startups like Mighty AI have also played a crucial role in providing accurate annotations for training autonomous vehicle algorithms.

Manufacturing sector

Computer vision coupled with sensors is revolutionising the manufacturing sector. By leveraging this technology, critical equipment can be monitored and maintained more effectively. FANUC, a group of Japanese automation products companies, use computer vision software to detect potential problems in industrial robots, enabling predictive maintenance and preventing unexpected failures.

Retail industry

In the retail industry, computer vision is being used to improve customer experiences and streamline operations. For example, Walmart employs computer vision to track checkout theft and reduce shrinkage. Their Missed Scan Detection program uses cameras to identify scan errors and failures, alerting checkout managers for prompt action. Mashgin, another startup, is also leveraging computer vision for self-checkout systems, eliminating the need for barcodes and reducing checkout times.

Healthcare sector

The healthcare sector has seen an increase in AI applications that have already seen positive results. Computer vision has the potential to bring significant value to the healthcare sector. While it cannot replace healthcare professionals, it can complement routine diagnostics and assist in image analysis.

Gauss Surgical, for example, has developed a real-time blood monitor that accurately predicts blood loss during surgeries, potentially saving billions of dollars in unnecessary blood transfusions. Microsoft's project InnerEye, uses AI to analyse radiological images, significantly speeding up the process and suggesting effective treatments.

Agriculture

Computer vision is transforming the agricultural industry by offering solutions for mapping, soil analysis, crop evaluation, and more. Startups like RSIP Vision and One Soil Platform are using deep learning and satellite imagery to estimate seasonal crop yields and monitor plant health. This technology helps farmers streamline their work and make data-driven decisions.

The future of computer vision

Computer vision has certainly had an impact, but what can we anticipate for the future in terms of new developments? There are a few emerging fields where computer vision is being utilised to help optimise, improve, and help advance new technologies.

‍

Multi-modal data – refers to the combination of different types of data, like images, videos, audio, and text. By integrating this data, computer vision can gain a more comprehensive understanding of the world.
Edge computing – is a process of performing computations and analysis at the edge or closer to where the data is being generated rather than relying on a centralised cloud infrastructure. This is particularly important for real-time applications that require immediate processing and response. For computer vision, edge computing allows for faster and more efficient analysis of visual data.
3D computer vision – 3D computer vision is the process of understanding and interpreting the world in three dimensions. Typically, 3D models are created by extracting, processing, and analysing data from the real world. 3D computer vision can overcome many of the limitations of 2D computer vision by providing depth information.
Advanced satellite vision – remote sensing data can be improved by using computer vision applications to gather and identify certain features with precision, accuracy, and speed, helping solve challenges associated with large satellite image data sets.
Augmented reality – as an integral component, computer vision enables computers to translate visual information into digital information. In 2024, new augmented reality devices will hit the market, including the long-awaited sets from Meta or Apple. CV-augmented tools will be accessible to a much broader population than ever before.

Wrapping up

In the past two decades, computer vision has evolved from a concept to a reality, transforming industries along the way. With advancements in neural networks, deep learning, and the availability of large-scale datasets, computer vision has become a powerful tool for visual automation. As AI technology continues to progress, introducing new types of AI applications, we can expect even more exciting use cases and breakthroughs in the future.

What is Computer Vision? An Introduction