Benchmark¶
It’s easy to assert that our implementation is the fastest you can find without backing our claim with numbers and source code freely available to everyone to check.
More information about the benchmark application could be found here and you can checkout the source code from Github.
UltimateALPR versus OpenALPR on Android¶
We’ve found #3 OpenALPR (for Android) repositories on Github:
https://github.com/SandroMachado/openalpr-android [708 stars]
https://github.com/RobertSasak/react-native-openalpr [338 stars]
https://github.com/sujaybhowmick/OpenAlprDroidApp [102 stars]
We’ve decided to go with the one with most stars on Github which is [1]. We’re using recognizeWithCountryRegionNConfig(country=”us”, region=””, topN = 10).
- Rules:
We’re using Samsung Galaxy S10+ (Snapdragon 855)
For every implementation we’re running the recognition function within a loop for #1000 times.
The positive rate defines the percentage of images with a plate. For example, 20% positives means we will have #800 negative images (no plate) and #200 positives (with a plate) out of the #1000 total images. This percentage is important as it allows timing both the detector and recognizer.
All positive images contain a single plate.
Both implementations are initialized outside the loop.
0% positives |
20% positives |
50% positives |
70% positives |
100% positives |
|
---|---|---|---|---|---|
ultimateALPR |
21344 ms |
25815 ms |
29712 ms |
33352 ms |
37825 ms |
OpenALPR |
715800 ms |
758300 ms |
819500 ms |
849100 ms |
899900 ms |
One important note from the above table is that the detector in OpenALPR is very slow and 80% of the time is spent trying to detect the license plates. This could be problematic as most of the time there is no plate on the video stream (negative images) from a camera filming a street/road and in such situations an application must run as fast as possible (above the camera maximum frame rate) to avoid dropping frames and loosing positive frames. Also, the detection part should burn as less as possible CPU cycles which means more energy efficient.
The above table shows that ultimateALPR is up to 33 times faster than OpenALPR.
To be fair to OpenALPR:
The API only allows providing a file path which means for every loop they are reading and decoding the input while ultimateALPR accepts raw bytes.
There is no ARM64 binaries provided and the app is loading the ARMv7 versions.
Again, our benchmark application is open source and doesn’t require registration or license key to try. You can try to make the same test on your own device and please don’t hesitate to share your numbers or any feedback if you think we missed something.
AMD Ryzen 7 3700X 8-Core CPU with RTX 3060 GPU (Untuntu 20)¶
We recommend using a computer with a GPU to unleash ultimateALPR speed. Next numbers are obtained using NVIDIA RTX 3060 GPU and AMD Ryzen 7 3700X 8-Core CPU on Ubuntu 20.
0% positives |
20% positives |
50% positives |
70% positives |
100% positives |
|
---|---|---|---|---|---|
GPU using TensorRT |
201 ms |
238 ms |
291 ms |
333 ms |
379 ms |
GPU using TensorFlow OpenVINO Enabled |
615 ms |
679 ms |
740 ms |
773 ms |
809 ms |
GPU using TensorFlow OpenVINO Disabled |
961 ms |
1047 ms |
1206 ms |
1325 ms |
1434.16 ms |
The above numbers show that the best case is “AMD Ryzen 7 3700X 8-Core + RTX 3060 + TensorRT enabled”. In such case the GPU (TensorRT, CUDA) is used for all modules (detection, classification and OCR).
Intel Xeon E3 1230v5 CPU with GTX 1070 GPU (Untuntu 18)¶
We recommend using a computer with a GPU to unleash ultimateALPR speed. Next numbers are obtained using GeForce GTX 1070 GPU and Intel Xeon E3 1230v5 CPU on Ubuntu 18.
0% positives |
20% positives |
50% positives |
70% positives |
100% positives |
|
---|---|---|---|---|---|
GPU using TensorFlow OpenVINO Enabled |
737 ms |
809 ms |
903 ms |
968 ms |
1063 ms |
GPU using TensorFlow OpenVINO Disabled |
711 ms |
828 ms |
1004 ms |
1127 ms |
1292 ms |
The above numbers show that the best case is “Intel Xeon E3 1230v5 + GTX 1070 + OpenVINO enabled”. In such case the GPU (Tensorflow) and the CPU (OpenVINO) are used in parallel. The CPU is used for detection and the GPU for recognition/OCR.
Core i7 (Windows)¶
These performance numbers are obtained using version 3.0.0. You can use any later version.
Both i7 CPUs are 6yr+ old (2014) to make sure everyone can easily find them at the cheapest price possible.
Please notice the boost when OpenVINO is enabled.
0% positives |
20% positives |
50% positives |
70% positives |
100% positives |
|
---|---|---|---|---|---|
i7-4790K (Win7) |
758 ms |
1110 ms |
1597 ms |
1907 ms |
2399 ms |
i7-4790K (Win7) |
4251 ms |
4598 ms |
4851 ms |
5117 ms |
5553 ms |
i7-4770HQ (Win10) |
1094 ms |
1674 ms |
2456 ms |
2923 ms |
4255 ms |
i7-4770HQ (Win10) |
6040 ms |
6342 ms |
7065 ms |
7279 ms |
7965 ms |
NVIDIA Jetson devices¶
We added full GPGPU acceleration for NVIDIA Jetson devices in version 3.1.0. More information at https://github.com/DoubangoTelecom/ultimateALPR-SDK/blob/master/Jetson.md.
Next benchmark numbers are obtained using JetPack 4.4.1 on 720p images.
0% positives |
20% positives |
50% positives |
70% positives |
100% positives |
|
---|---|---|---|---|---|
Xavier NX |
657 ms |
744 ms |
837 ms |
961 ms |
1068 ms |
Nano B01 |
2920 ms |
3102 ms |
3274 ms |
3415 ms |
3727 ms |
Note
On NVIDIA Jetson the code is up to 3 times faster when parallel processing is enabled.
Jetson Xavier NX and Jetson TX2 are proposed at the same price ($399) but NX has 4.6 times more compute power than TX2 for FP16: 6 TFLOPS versus 1.3 TFLOPS.
We highly recommend using Xavier NX instead of TX2.
Xavier NX (EURO 342): https://www.amazon.com/NVIDIA-Jetson-Xavier-Developer-812674024318/dp/B086874Q5R
TX2: (EURO 343): https://www.amazon.com/NVIDIA-945-82771-0000-000-Jetson-TX2-Development/dp/B06XPFH939
The next video shows LPR, LPCI, VCR and VMMR running on NVIDIA Jetson nano:
Raspberry Pi 4 and RockPi 4B¶
The Github repository contains Raspberry Pi (ARM32) and RockPi 4B (ARM64) benchmark applications to evaluate the performance.
More information on how to build and use the application could be found at https://github.com/DoubangoTelecom/ultimateALPR-SDK/blob/master/samples/c++/benchmark/README.md.
Please note that even if Raspberry Pi 4 has a 64-bit CPU Raspbian OS uses a 32-bit kernel which means we’re loosing many SIMD optimizations.
0% positives |
20% positives |
50% positives |
70% positives |
100% positives |
|
---|---|---|---|---|---|
Raspberry Pi |
8189 ms |
8977 ms |
11519 ms |
12295 ms |
14146 ms |
RockPi 4B |
7588 ms |
8008 ms |
8606 ms |
9213 ms |
9798 ms |
Note
On RockPi 4B the code is 5 times faster when parallel processing is enabled.
On Android devices we have noticed that parallel processing can speedup the pipeline by up to 120% on some devices while on Raspberry Pi 4 the gain is marginal.
Amlogic NPU (Khadas VIM3, Banana Pi,…)¶
We have added support for Amlogic NPUs (Neural Processing Unit) acceleration in version v3.9.0. You’ll be amazed to see UltimateALPR running at up to 64fps (High Definition[HD/720p] resolution) on a $99 ARM device (Khadas VIM3). The engine can run at up to 90fps on low resolution images.
Please read AmlogicNPU.md for more information.
0% positives |
20% positives |
50% positives |
70% positives |
100% positives |
|
---|---|---|---|---|---|
Khadas VIM3 Basic |
1560 ms |
1797 ms |
1876 ms |
2162 ms |
2902 ms |
Khadas VIM3 Basic |
1776 ms |
3443 ms |
6009 ms |
7705 ms |
10275 ms |
Khadas VIM3 Basic |
4187 ms |
4414 ms |
4824 ms |
5189 ms |
5740 ms |
Khadas VIM3 Basic |
4184 ms |
5972 ms |
8513 ms |
10258 ms |
12867 ms |
Note
When parallel processing is enabled we’ll perform detection using the NPU and OCR using the CPU in parallel.
Notice how the parallel mode is 4 times faster than the sequential mode when rate=1.0 (all 100 images have plates).