Computer Vision

Registering our surroundings, gauging distances – these are familiar things we do, just like object recognition. We perceive our environment through a range of different sources, including particularly our visual system. Having a visual understanding of the world makes a lot of things easier for human beings. But is a visual understanding something computers can have too? And what would the implications be for automated driving?

The numbers make it clear. While the full-year figure is yet unknown, the number of fatal accidents occurring in the first half of 2019 declined versus the first half of last year. Data thus far indicate that a positive trend is in place that driving is becoming steadily safer, thanks in part to advances in technology. As use of these becomes more widespread, what opportunities might be captured? What could be done to better support drivers? Automated driving technology is promising, holding potential for greater road safety by preventing and further reducing the number of rare but serious human-caused car accidents.

If we want machines to think, we need to teach them to see.
Fei-Fei Li, Computer Science Department, Stanford University.

Understanding perception

Our senses are how we interact with the world around us. Our understanding of what perception is helps us in road traffic situations, for example, when we gauge risks. Computers, on the other hand, have no concept of what roads are, nor do they understand driver behavior. They too need to have some kind of visual perception in order for automated driving to work, thus cameras are installed in cars. The video captured by these cameras is evaluated via techniques known as “machine vision” or “computer vision”. Computer vision is an interdisciplinary field concerned with developing visual comprehension of environments by means of image analysis algorithms executed on a data procsessing unit, i.e, a computer. It involves methods for recording, processing and analyzing digital image and video data to ultimately achieve understanding – for visual recording of an environment is not enough in itself.

Artificial intelligence is applied in a manner somewhat like our own brain mechanisms in order to process and evaluate recorded image data. Advances in artificial intelligence have substantially improved the performance of computer vision systems. Such systems are now capable of identifying meaningful elements of image content derived from object recognition, such as streets, cars, and pedestrians.

But to match the capability of the human brain, computer vision also requires an ability to make predictions regarding potential dangers arising from the perceived situation. One of the key questions to be addressed is thus how computer vision performance can be improved up to human-like levels of environment perception and action prediction.

Steadily increasing use in series production

The reliability and safety of integrated systems are essential prerequisites for realizing usable environment perception technology. This is why they are being worked on by a strong international network of some 120 Bosch research and development specialists at a central computer vision facility in Hildesheim. The work is concentrated on the three research fields of model-based multi-view scene analysis (“classic” computer vision), data-driven “deep learning computer vision”, and computer vision system design. This involves the integration of algorithms from model-based and data-driven approaches to realize robust composite computer vision systems. Large volumes of data are required to train deep learning methods. Research is thus focusing on new possibilities for automated generation of training and validation data via simulation and image synthesis. This allows virtual testing of highly realistic situations, yielding data which then serves as the basis for training the system.

The research results are transferable to other areas too, like security technology and robotics, which are major application domains in addition to the automotive sector.

Bosch Research is working on technological solutions for analyzing video content in safety-critical areas.

Multi-Camera-System for safety-critical areas

These include particularly train stations and airports, where multi-camera networks are deployed for monitoring and surveillance. Such technologies could supply great benefit here, as algorithms can be used for enhanced early detection of incidents of violence and recognition of individuals within the camera network. In addition to rapid, targeted intervention by security personnel in dangerous situations, the technology could make it easier to find individuals being sought. Who left a suitcase unattended at the airport? The recorded data material could be referenced to determine the time the suitcase was left behind, facilitating identification of the individual. Preventing violence and finding people are but two examples illustrating how the recognition of individuals within large camera networks can increase safety.

But in addition to safety, quality of life is enhanced as well, be it at work, in one’s car, or at home. Bosch is researching technologies that hold substantial potential for a broad range of robotics applications. Computer vision systems are key for robust perception, a semantic understanding of environments, and reliable navigation. Advances in these areas will enable future robotics products to operate intelligently and safely within a human environment.

semantic segmentation — Semantic segmentation – each pixel is assigned a semantic meaning, which is shown here in color-coded form, e.g. Road surface – gray-blue, road markings – yellow, pedestrian – light green, traffic lights – green, etc.

Automotive is one of the major application areas for Bosch Research in Hildesheim, where efforts are underway to develop video-based automated driving, surmounting the associated challenges. Bosch Research is already developing algorithms for next-generation multi-purpose cameras that will provide even more sophisticated computer perception of environments. Reliable perception of the entire vehicle environment is absolutely critical in order to realize higher-value automated vehicle functions, such as automated passing maneuvers. Like looking over one’s shoulder or in the rear-view mirror, multi-camera systems are being developed that are capable of capturing the entire vehicle environment so the computer can be sure at all times whether a given maneuver is safe and comfortably executable. There is also a video-based solution for reliable, accurate vehicle localization known as “Video Road Signature”. And interior video sensor technology is another important focus, which can yield comfort and safety enhancements for all vehicle occupants.

Imagine all the things human sight allows and you can start to realize the nearly endless applications for computer vision.
Bernard Marr, economist, best-selling author, technology advisor

These select application areas alone already make clear that computer vision is no academic research project but rather an interdisciplinary field in which a host of applications are being worked on for both current and future Bosch products.

With all of the senses

There are always challenges when introducing series products in safety-critical applications. The algorithms and systems developed by Bosch Research have to meet the highest standards for performance and reliability, for the safety of all road users is and will always be paramount, even as we approach the age of autonomous driving. Improved algorithms and new driver support technologies like 360-degree vehicle perception or a multimodal sensor data fusion can already be incorporated into today’s assistance systems, and the solutions developed for these applications provide a good basis for new approaches to solving challenges faced in other research areas. The aim is not only for the number of traffic fatalities to continue falling, but also for the number of people to continuously increase whose lives are made easier by new technologies for everyday life.