Automatic Image Processing with an Eye Open for Trouble

Video surveillance is getting smart: It doesn’t alert the guards or other security personnel until something is actually happening – inside museums and banks, in railroad stations and airports, on factory grounds or outside an embassy. Bosch researchers are developing automatic image-processing algorithms suitable for all of these applications.

Suddenly the priceless sculpture was no longer in its place. Had the security guard on duty been inattentive or distracted? At any rate, it only took the thief a few minutes to heist the masterpiece from the museum. After the crime, during a review of the recorded images taken by the surveillance cameras, a suspect was identified. But for the time being at least, the sculpture was gone.

The example is fictitious, but it points to the essence of the problem: Human beings’ capacity to pay attention is limited, and so is conventional video surveillance technology, which is limited to the recording and display of images. This limitation is the reason that researchers in Hildesheim are developing image-processing systems with algorithms that can automatically detect movements and other changes in a sequence of images. The system then “decides” whether or not an alarm should be triggered. If so, it also transmits the optimally prepared images to the monitors that are used by the security personnel.

The images must be transmitted to the security staff because, in the final analysis, only humans can and should decide how to respond. Studies have indicated, however, that an undistracted observer can only oversee four monitors at the same time. That is the reason that image processing must be able to discriminate between what’s essential and what isn’t. Instead of being confronted with an unchanging video image of the sculpture, for example, the museum’s security personnel are only alerted if something is actually happening.

Image processing can identify objects from their contours. Changes in an image are detected by sensing differences in brightness or color between individual pixels. If a sculpture or a painting is moved or it is suddenly not there any more, the system immediately detects this change and triggers an alarm.

The substitution is even detected if the sculpture is replaced by a replica, because the algorithms do more than simply analyze individual images – they always analyze image sequences as well. A typical rate of recording and analysis is 25 images per second.

The challenge that researchers have to overcome in this application, though, is not so much detecting objects or changes, but suppressing irrelevant information. Of course the relatively stable environment that is characteristic of a museum poses fewer difficulties than, say, the hustle and bustle in a crowded subway station or a busy pedestrian underpass. In their algorithms, researchers must account for changes in lighting, the play of shadows in the course of a day, rain, snow and even the falling of autumn leaves. An algorithm is equally capable of detecting image changes caused by a running person and those resulting from the falling leaves.

But irrelevant information, such as the movement of the leaves, is screened out by means of statistical analysis. Then the detected objects are tracked in sequential images, classified according to their size and other characteristics, and in a predefined situation – when a security limit is violated or when someone leaves a piece of luggage unattended, for example – an alarm is activated. Scientists are programming object classifications of image content – such as pedestrian, cyclist, car, dog, bird – not only for application in security systems but also for future driver-assistance systems in motor vehicles. Both types of system require such classifications.

If a pedestrian is detected just before an unavoidable accident, security systems can be activated – for instance, to reduce the severity of potential injuries by partially raising the engine hood prior to impact. By the same token, it’s important for security personnel to know if the creature approaching their security fence is a human or just an animal.

What the video data can’t determine yet is the suspected intention of the “object” that is being monitored. That’s why future research will be focusing on the interpretation of scenes. Scenarios such as “two people greet one another by shaking hands,” “passing an object to the other person,” “two individuals fighting” or “someone is being pursued” are derived from the data, or at least a probability index can be deduced for them, which allows the relevant video images to be presented to a human analyst for evaluation. This solution would be of particular interest for security at events attended by large crowds, such as rock concerts or football games, but also for high-risk locations in inner cities.

In applications designed to protect sensitive areas against burglary, hold-ups or terrorist attacks, for instance, it’s possible to devise classifications such as “a red car stops in front of the bank” or “a panel van is parked next to the embassy.” If a window is broken during a break-in, the camera can turn directly toward the origin of the sound if the system is equipped with microphones.

Standardized processes are used to compress the generated video data, link them with metadata and store them. The metadata can then be used later to search the database, using a query such as “display all situations in the past two weeks when a red car drove past the bank.”

Such powerful possibilities also make it imperative to recognize that protecting privacy must also be a key requirement for future image processing technology. Depending on the application, programmers can ensure that faces or license plates are automatically rendered unrecognizable.