23 Jun Optimum Camera Specifications based on Computer Vision Algorithm Goals
It is often difficult to determine what camera specifications are required in order to meet your high level system goals – whether it be detection, navigation, depth estimation, or image restoration. In this post we describe the fundamental camera properties that affect your algorithm performance, we describe how to determine performance limits, and we describe how to use these limits to select off the shelf components that minimize system size, weight, power and cost while maintaining acceptable performance. We use a real world example problem involving face detection to demonstrate the workflow that gets us from an abstract goal to hardware components.
The Three Fundamental Camera Properties:
There are only a small number of camera properties that ultimately determine if your algorithm is going to succeed or fail – namely Field of View, Resolution, and Signal to Noise Ratio.
Field of View: The field of view (FOV) determines if you can capture your object of interest or see enough of the region around you to complete your task. The field of view is a function of the lens and sensor properties, and the distance from the camera to the object.
Resolution: Resolution can mean several different things. Here we define the two primary resolution factors that determine if you will capture enough information about your object.
Geometric Resolution: The geometric resolution describes how large of an object each pixel will detect in object space independent of any lens blur. We describe this concept as the minimum resolvable feature size. The minimum resolvable feature size is a function of the lens and sensor properties and the distance from the camera to the object.
Lens Resolution: The amount of blur due to the imaging lens also affects resolution. We model the impact of the lens blur on resolution through the Modulation Transfer Function (MTF).
Signal-to-Noise Ratio: The signal-to-noise ratio (SNR) determines how noisy an image appears. The SNR is affected by the source, scene, lens, and sensor properties.
It should be noted that all of these properties are camera properties, not component properties (e.g. they are impacted by some combination of the source, scene, lens, sensor, and processing). Depending on your application you may also need to consider dynamic range and how accurately you need to capture your spectrum or color information. For the example we are using in this post, it is sufficient to only consider the FOV, resolution, and SNR for our component selection.
Determining the Minimum Camera Specifications for a Face Detection System
We use a system whose goal is to detect faces to highlight the process that can be used to find the minimum camera specifications while still maintaining robust performance. In order to describe the process we will solve the following problem:
- Detect faces walking through a walkway that is 4m wide, with a camera mounted 10m away
- Minimize cost by using common off-the-shelf components (e.g. color mobile sensors, mid to high volume production lenses)
- 100 lux
As is typical with most systems, this system starts off with a high level goal. To get started, let’s start with an HD camera (1920 x 1080 pixels) that images our 4m walkway from 10m away. The required field of view is the arctan(2m/10m)*2, resulting in a required horizontal field of view of around 22 degrees. We will use an F/2 lens on our HD camera as a starting point. For the face detection algorithm, we use the Haar cascades algorithm that is available as part of OpenCV. We now generate a series of simulated images using Imager and we find that we are able to robustly detect faces with our HD camera system as is shown in the video below.
Since our goal here is to determine the minimum specifications where our system still works well, we start by determining the minimum system resolution. In the video below, we reduce the number of pixels and also reduce the camera’s focal length in order to maintain the same field of view with the smaller sensor. The results show that the system performs well until the number of horizontal pixels drops below 400. This equates to a minimum resolvable feature size of 10mm, meaning that each sensor pixel is gathering light from a 10mm x 10mm region on the object. It should be noted that in these simulations we are using a full virtual prototype of an imaging system and in this case we are only changing two of the many input parameters that influence the final image.
We just determined the field of view and the minimum resolvable feature size. The second aspect of resolution, as was discussed above, is the amount of lens blur or equivalently the lens’s modulation transfer function (MTF). In the video below we introduce blur until the algorithm begins to fail. Since there is correlation between the minimum resolvable feature size and the lens blur, for the analysis below we use a system with 400 horizontal pixels. Technically we are introducing a lens aberration called spherical aberration. The result is a blur that is characteristic of the blur you would get from real lenses. Here we find that we can tolerate 0.25 RMS waves of spherical aberration before we start to get false positives in the image. We describe how this relates to the MTF specification below.
The last fundamental requirement in addition to the field of view and resolution is the signal to noise ratio (SNR). For this example, we are going to reduce the SNR by reducing the pixel size. Smaller pixels have the advantage of reducing the system size and reducing cost due to using less silicon area. In the video below we find that as we reduce the SNR below 27 the algorithm starts to fail. We have now found the fundamental system requirements which can be used to find appropriate off the shelf components.
- FOV > 22 degrees
- SNR > 27
- Minimum Resolvable Feature Size (@10m) < 10mm
- MTF greater than:
We start by selecting a sensor. We know we need at least 400 samples across the 4m region in order to achieve the 10mm minimum resolvable feature size, so we select the closest common sensor format which is a VGA sensor (640×480). From our previous simulations we know we need the pixel size to be at least 2.2 microns with our F/2 lens to meet our SNR goals given the 100 lux lighting condition. The pixel size and F/# can be traded off while still retaining the same SNR. There are a couple of options, we could go with a VGA sensor with 5.6 micron pixels (ASX340AT) and use a slower lens or we could choose a VGA sensor with 2.5 micron pixels (OV7675) and use a lens with a similar F/# to our starting point. For this example let’s use the 2.5 micron pixel solution.
Based on the 2.5 micron sensor, we know that our focal length needs to be less than or equal to 4.2mm in order to achieve our 22 degree FOV. Based on this, let’s select a 3.4mm F/2 lens from Sunex.
After entering the detailed component specifications in Imager, we find that the system has:
✓ FOV (Spec > 21.7 degrees):
- 27.7 degrees
✓ Resolvable Feature Size (Spec < 10mm):
- Min: 7.4mm
- Max: 8.0mm
✓ SNR (Spec > 27)
- Max: 41.5
- Min: 31.2
✓ MTF (Spec > 0.25 at Nyquist)
- Max: 0.5
- Min: 0.4
For the resolvable feature size there is a min and a max value due to lens distortion. For the SNR there is a min and a max value due to lens relative illumination and due to the mismatch between the lens chief ray angle and the sensor micro lens shift. For the MTF there are min and max values due to the MTF dropping through the field of view due to off-axis lens aberrations. Overall, the selected components will allow us to achieve our desired performance.
Our original starting point and the final designed camera both meet the requirements, but the designed camera is significantly smaller, lighter, and less expensive. By consider all aspects of the system and understanding what you really need to accomplish your goals at the design phase, you can greatly improve your system-level metrics such as the performance, size, weight, power, and/or cost.
In practice you would repeat this with a more diverse input image dataset, but this example was intended to demonstrate how simulated images can be used to help define the limits of algorithm performance and to readily get to component selection for your application.
You now know the process to determine where an algorithm will fail, and from this, how to determine the minimum camera specifications. If you have questions post them below, or email us at firstname.lastname@example.org. You can learn more about Imager, the simulation software we used for this example, on our website: www.fivefocal.com.