As explained in the previous sections, the detector uses a Convolutional Neural Network with a [300, 300, 3] input layer. This means any image presented to the detection pipeline will be resized to 300x300 and converted to RGB_888 format regardless its resolution.
Using a low resolution speedup the inference function and using a fixed shape instead of ratio-based scaling improves generalization and speedup the training process. This is obviously an issue when the image is very large and the license plates very small or far away. Small or far away license plates on large images tend to disappear when the image is downscaled with a 2+ factor.
To fix the above issue we could choose an input layer with higher resolution (e.g. [512, 512, 3] instead of [300, 300, 3]) as done by many ANPR solutions. The problem is that higher resolution comes with higher latency and memory usage. The ANPR solutions we tried barely reach 1fps (detection only) on Raspberry Pi 4 while our implementation can run at 12fps. To keep this frame rate while being able to accurately detect small and far away plates we spent huge amount of time in R&D to come with a very fast and accurate solution. To make it short: Scale the features not the image.
You don’t need to understand how the pyramidal search works in order to use it (see pyramidal_search_enabled config to enable/disable) but some basic technical information may help you debug issues:
The features are extracted from the input image and defined as base layer.
If quantization is enabled, then the features are converted from float32 to int8 and normalized ([-127, 128]).
The detection pipeline is partially executed on the base layer without the Non-max Suppression (NMS) step. The variable n is initialized with integer value 1. This is the first pass (1st-pass).
A refinement function with binary output is done on the result (bounding boxes and scores) from nth-pass. If the output is 0 or n>6 then, the process is stopped and we move to step 7. Otherwise (output is 1 and n <=6), we move to step 5.
The features from nth-pass are scaled by X and the detection pipeline is partially executed again. This execution is almost 7 times faster than nth-pass as there are less layers and neurons (lower depth multiplier).
Variable n is incremented and we loop back to step 4.
Resume the detection pipeline with NMS and other post-processing operations.
As you’ve noticed, there are 6 pyramidal levels and pyramidal_search_sensitivity configuration entry controls how many are needed. The sensitivity also controls the depth multiplier which defines the number of neurons. The higher the sensitivity is, the higher the number of pyramidal levels and neurons will be. More levels means better accuracy but higher CPU usage and inference time. Default value: 0.28.
For example, you can use pyramidal search to monitor at realtime a 5-lane highway using a single long-range camera.
Let’s be very concrete and try with a sample image. The next image is from The Verge article titled “Privacy advocate held at gunpoint after license plate reader database mistake, lawsuit alleges”. You can find it here.
- Original image:
- ANPR result without Pyramidal search:
- ANPR result using Pyramidal search:
You can clearly see that we miss the two furthest plates when pyramidal search is disabled. The same test could be done using our online cloud-based demo web application hosted at https://www.doubango.org/webapps/alpr/ . You can also use the original image with ALPR / ANPR products provided by other companies for comparison.