Benchmark¶

It’s easy to assert that our implementation is the fastest you can find without backing our claim with numbers and source code freely available to everyone to check.

More information about the benchmark application could be found here and you can checkout the source code from Github.

UltimateALPR versus OpenALPR on Android¶

We’ve found #3 OpenALPR (for Android) repositories on Github:

https://github.com/SandroMachado/openalpr-android [708 stars]
https://github.com/RobertSasak/react-native-openalpr [338 stars]
https://github.com/sujaybhowmick/OpenAlprDroidApp [102 stars]

We’ve decided to go with the one with most stars on Github which is [1]. We’re using recognizeWithCountryRegionNConfig(country=”us”, region=””, topN = 10).

Rules:

We’re using Samsung Galaxy S10+ (Snapdragon 855)
For every implementation we’re running the recognition function within a loop for #1000 times.
The positive rate defines the percentage of images with a plate. For example, 20% positives means we will have #800 negative images (no plate) and #200 positives (with a plate) out of the #1000 total images. This percentage is important as it allows timing both the detector and recognizer.
All positive images contain a single plate.
Both implementations are initialized outside the loop.

	0% positives	20% positives	50% positives	70% positives	100% positives
ultimateALPR	21344 ms 46.85 fps	25815 ms 38.73 fps	29712 ms 33.65 fps	33352 ms 29.98 fps	37825 ms 26.43 fps
OpenALPR	715800 ms 1.39 fps	758300 ms 1.31 fps	819500 ms 1.22 fps	849100 ms 1.17 fps	899900 ms 1.11 fps

One important note from the above table is that the detector in OpenALPR is very slow and 80% of the time is spent trying to detect the license plates. This could be problematic as most of the time there is no plate on the video stream (negative images) from a camera filming a street/road and in such situations an application must run as fast as possible (above the camera maximum frame rate) to avoid dropping frames and loosing positive frames. Also, the detection part should burn as less as possible CPU cycles which means more energy efficient.

The above table shows that ultimateALPR is up to 33 times faster than OpenALPR.

To be fair to OpenALPR:

The API only allows providing a file path which means for every loop they are reading and decoding the input while ultimateALPR accepts raw bytes.
There is no ARM64 binaries provided and the app is loading the ARMv7 versions.

Again, our benchmark application is open source and doesn’t require registration or license key to try. You can try to make the same test on your own device and please don’t hesitate to share your numbers or any feedback if you think we missed something.

AMD Ryzen 7 3700X 8-Core CPU with RTX 3060 GPU (Untuntu 20)¶

We recommend using a computer with a GPU to unleash ultimateALPR speed. Next numbers are obtained using NVIDIA RTX 3060 GPU and AMD Ryzen 7 3700X 8-Core CPU on Ubuntu 20.

	0% positives	20% positives	50% positives	70% positives	100% positives
GPU using TensorRT OpenVINO Disabled	201 ms 497.40 fps	238 ms 419.71 fps	291 ms 343.12 fps	333 ms 299.41 fps	379 ms 263.36 fps
GPU using TensorFlow OpenVINO Enabled	615 ms 162.54 fps	679 ms 147.13 fps	740 ms 135.01 fps	773 ms 129.21 fps	809 ms 123.58 fps
GPU using TensorFlow OpenVINO Disabled	961 ms 103.97 fps	1047 ms 95.46 fps	1206 ms 82.90 fps	1325 ms 75.45 fps	1434.16 ms 69.72 fps

The above numbers show that the best case is “AMD Ryzen 7 3700X 8-Core + RTX 3060 + TensorRT enabled”. In such case the GPU (TensorRT, CUDA) is used for all modules (detection, classification and OCR).

Intel Xeon E3 1230v5 CPU with GTX 1070 GPU (Untuntu 18)¶

We recommend using a computer with a GPU to unleash ultimateALPR speed. Next numbers are obtained using GeForce GTX 1070 GPU and Intel Xeon E3 1230v5 CPU on Ubuntu 18.

	0% positives	20% positives	50% positives	70% positives	100% positives
GPU using TensorFlow OpenVINO Enabled	737 ms 135.62 fps	809 ms 123.55 fps	903 ms 110.72 fps	968 ms 103.22 fps	1063 ms 94.07 fps
GPU using TensorFlow OpenVINO Disabled	711 ms 140.51 fps	828 ms 120.766 fps	1004 ms 99.53 fps	1127 ms 88.70 fps	1292 ms 77.38 fps

The above numbers show that the best case is “Intel Xeon E3 1230v5 + GTX 1070 + OpenVINO enabled”. In such case the GPU (Tensorflow) and the CPU (OpenVINO) are used in parallel. The CPU is used for detection and the GPU for recognition/OCR.

Core i7 (Windows)¶

These performance numbers are obtained using version 3.0.0. You can use any later version.

Both i7 CPUs are 6yr+ old (2014) to make sure everyone can easily find them at the cheapest price possible.

Please notice the boost when OpenVINO is enabled.

	0% positives	20% positives	50% positives	70% positives	100% positives
i7-4790K (Win7) (OpenVINO Enabled)	758 ms 131.78 fps	1110 ms 90.07 fps	1597 ms 62.58 fps	1907 ms 52.42 fps	2399 ms 41.66 fps
i7-4790K (Win7) (OpenVINO Disabled)	4251 ms 23.52 fps	4598 ms 21.74 fps	4851 ms 20.61 fps	5117 ms 19.54 fps	5553 ms 18.00 fps
i7-4770HQ (Win10) (OpenVINO Enabled)	1094 ms 91.35 fps	1674 ms 59.71 fps	2456 ms 40.71 fps	2923 ms 34.21 fps	4255 ms 23.49 fps
i7-4770HQ (Win10) (OpenVINO Disabled)	6040 ms 16.55 fps	6342 ms 15.76 fps	7065 ms 14.15 fps	7279 ms 13.73 fps	7965 ms 12.55 fps

NVIDIA Jetson devices¶

We added full GPGPU acceleration for NVIDIA Jetson devices in version 3.1.0. More information at https://github.com/DoubangoTelecom/ultimateALPR-SDK/blob/master/Jetson.md.

Next benchmark numbers are obtained using JetPack 4.4.1 on 720p images.

	0% positives	20% positives	50% positives	70% positives	100% positives
Xavier NX (TensorRT)	657 ms 152 fps	744 ms 134 fps	837 ms 119 fps	961 ms 104 fps	1068 ms 93 fps
Nano B01 (TensorRT)	2920 ms 34 fps	3102 ms 32 fps	3274 ms 30 fps	3415 ms 20 fps	3727 ms 27 fps

Note

On NVIDIA Jetson the code is up to 3 times faster when parallel processing is enabled.
Jetson Xavier NX and Jetson TX2 are proposed at the same price ($399) but NX has 4.6 times more compute power than TX2 for FP16: 6 TFLOPS versus 1.3 TFLOPS.
We highly recommend using Xavier NX instead of TX2.

Xavier NX (EURO 342): https://www.amazon.com/NVIDIA-Jetson-Xavier-Developer-812674024318/dp/B086874Q5R
TX2: (EURO 343): https://www.amazon.com/NVIDIA-945-82771-0000-000-Jetson-TX2-Development/dp/B06XPFH939

The next video shows LPR, LPCI, VCR and VMMR running on NVIDIA Jetson nano:

Raspberry Pi 4 and RockPi 4B¶

The Github repository contains Raspberry Pi (ARM32) and RockPi 4B (ARM64) benchmark applications to evaluate the performance.

More information on how to build and use the application could be found at https://github.com/DoubangoTelecom/ultimateALPR-SDK/blob/master/samples/c++/benchmark/README.md.

Please note that even if Raspberry Pi 4 has a 64-bit CPU Raspbian OS uses a 32-bit kernel which means we’re loosing many SIMD optimizations.

	0% positives	20% positives	50% positives	70% positives	100% positives
Raspberry Pi (Debian Buster, ARM32)	8189 ms 12.21 fps	8977 ms 11.13 fps	11519 ms 8.68 fps	12295 ms 8.13 fps	14146 ms 7.06 fps
RockPi 4B (Ubuntu Server 18, ARM64)	7588 ms 13.17 fps	8008 ms 12.48 fps	8606 ms 11.61 fps	9213 ms 10.85 fps	9798 ms 10.20 fps

Note

On RockPi 4B the code is 5 times faster when parallel processing is enabled.
On Android devices we have noticed that parallel processing can speedup the pipeline by up to 120% on some devices while on Raspberry Pi 4 the gain is marginal.

Amlogic NPU (Khadas VIM3, Banana Pi,…)¶

We have added support for Amlogic NPUs (Neural Processing Unit) acceleration in version v3.9.0. You’ll be amazed to see UltimateALPR running at up to 64fps (High Definition[HD/720p] resolution) on a $99 ARM device (Khadas VIM3). The engine can run at up to 90fps on low resolution images.

Please read AmlogicNPU.md for more information.

	0% positives	20% positives	50% positives	70% positives	100% positives
Khadas VIM3 Basic (Linux 4.9, NPU, Parallel)	1560 ms 64.08 fps	1797 ms 55.63 fps	1876 ms 53.29 fps	2162 ms 46.25 fps	2902 ms 34.45 fps
Khadas VIM3 Basic (Linux 4.9, NPU, Sequential)	1776 ms 56.30 fps	3443 ms 29.04 fps	6009 ms 16.63 fps	7705 ms 12.97 fps	10275 ms 9.73 fps
Khadas VIM3 Basic (Linux 4.9, CPU, Parallel)	4187 ms 23.88 fps	4414 ms 22.65 fps	4824 ms 20.72 fps	5189 ms 19.26 fps	5740 ms 17.42 fps
Khadas VIM3 Basic (Linux 4.9, CPU, Sequential)	4184 ms 23.89 fps	5972 ms 16.74 fps	8513 ms 11.74 fps	10258 ms 9.74 fps	12867 ms 7.77 fps

Note

When parallel processing is enabled we’ll perform detection using the NPU and OCR using the CPU in parallel.
Notice how the parallel mode is 4 times faster than the sequential mode when rate=1.0 (all 100 images have plates).