Benchmark

It’s easy to assert that our implementation is the fastest you can find without backing our claim with numbers and source code freely available to everyone to check.

More information about the benchmark application could be found here and you can checkout the source code from Github.

UltimateALPR versus OpenALPR on Android

We’ve found #3 OpenALPR (for Android) repositories on Github:

  1. https://github.com/SandroMachado/openalpr-android [708 stars]

  2. https://github.com/RobertSasak/react-native-openalpr [338 stars]

  3. https://github.com/sujaybhowmick/OpenAlprDroidApp [102 stars]

We’ve decided to go with the one with most stars on Github which is [1]. We’re using recognizeWithCountryRegionNConfig(country=”us”, region=””, topN = 10).

Rules:
  • We’re using Samsung Galaxy S10+ (Snapdragon 855)

  • For every implementation we’re running the recognition function within a loop for #1000 times.

  • The positive rate defines the percentage of images with a plate. For example, 20% positives means we will have #800 negative images (no plate) and #200 positives (with a plate) out of the #1000 total images. This percentage is important as it allows timing both the detector and recognizer.

  • All positive images contain a single plate.

  • Both implementations are initialized outside the loop.

0% positives

20% positives

50% positives

70% positives

100% positives

ultimateALPR

21344 ms
46.85 fps

25815 ms
38.73 fps

29712 ms
33.65 fps

33352 ms
29.98 fps

37825 ms
26.43 fps

OpenALPR

715800 ms
1.39 fps

758300 ms
1.31 fps

819500 ms
1.22 fps

849100 ms
1.17 fps

899900 ms
1.11 fps

One important note from the above table is that the detector in OpenALPR is very slow and 80% of the time is spent trying to detect the license plates. This could be problematic as most of the time there is no plate on the video stream (negative images) from a camera filming a street/road and in such situations an application must run as fast as possible (above the camera maximum frame rate) to avoid dropping frames and loosing positive frames. Also, the detection part should burn as less as possible CPU cycles which means more energy efficient.

The above table shows that ultimateALPR is up to 33 times faster than OpenALPR.

To be fair to OpenALPR:

  1. The API only allows providing a file path which means for every loop they are reading and decoding the input while ultimateALPR accepts raw bytes.

  2. There is no ARM64 binaries provided and the app is loading the ARMv7 versions.

Again, our benchmark application is open source and doesn’t require registration or license key to try. You can try to make the same test on your own device and please don’t hesitate to share your numbers or any feedback if you think we missed something.

AMD Ryzen 7 3700X 8-Core CPU with RTX 3060 GPU (Untuntu 20)

We recommend using a computer with a GPU to unleash ultimateALPR speed. Next numbers are obtained using NVIDIA RTX 3060 GPU and AMD Ryzen 7 3700X 8-Core CPU on Ubuntu 20.

0% positives

20% positives

50% positives

70% positives

100% positives

GPU using TensorRT
OpenVINO Disabled

201 ms
497.40 fps

238 ms
419.71 fps

291 ms
343.12 fps

333 ms
299.41 fps

379 ms
263.36 fps

GPU using TensorFlow OpenVINO Enabled

615 ms
162.54 fps

679 ms
147.13 fps

740 ms
135.01 fps

773 ms
129.21 fps

809 ms
123.58 fps

GPU using TensorFlow OpenVINO Disabled

961 ms
103.97 fps

1047 ms
95.46 fps

1206 ms
82.90 fps

1325 ms
75.45 fps

1434.16 ms
69.72 fps

The above numbers show that the best case is “AMD Ryzen 7 3700X 8-Core + RTX 3060 + TensorRT enabled”. In such case the GPU (TensorRT, CUDA) is used for all modules (detection, classification and OCR).

Intel Xeon E3 1230v5 CPU with GTX 1070 GPU (Untuntu 18)

We recommend using a computer with a GPU to unleash ultimateALPR speed. Next numbers are obtained using GeForce GTX 1070 GPU and Intel Xeon E3 1230v5 CPU on Ubuntu 18.

0% positives

20% positives

50% positives

70% positives

100% positives

GPU using TensorFlow OpenVINO Enabled

737 ms
135.62 fps

809 ms
123.55 fps

903 ms
110.72 fps

968 ms
103.22 fps

1063 ms
94.07 fps

GPU using TensorFlow OpenVINO Disabled

711 ms
140.51 fps

828 ms
120.766 fps

1004 ms
99.53 fps

1127 ms
88.70 fps

1292 ms
77.38 fps

The above numbers show that the best case is “Intel Xeon E3 1230v5 + GTX 1070 + OpenVINO enabled”. In such case the GPU (Tensorflow) and the CPU (OpenVINO) are used in parallel. The CPU is used for detection and the GPU for recognition/OCR.

Core i7 (Windows)

These performance numbers are obtained using version 3.0.0. You can use any later version.

Both i7 CPUs are 6yr+ old (2014) to make sure everyone can easily find them at the cheapest price possible.

Please notice the boost when OpenVINO is enabled.

0% positives

20% positives

50% positives

70% positives

100% positives

i7-4790K (Win7)
(OpenVINO Enabled)

758 ms
131.78 fps

1110 ms
90.07 fps

1597 ms
62.58 fps

1907 ms
52.42 fps

2399 ms
41.66 fps

i7-4790K (Win7)
(OpenVINO Disabled)

4251 ms
23.52 fps

4598 ms
21.74 fps

4851 ms
20.61 fps

5117 ms
19.54 fps

5553 ms
18.00 fps

i7-4770HQ (Win10)
(OpenVINO Enabled)

1094 ms
91.35 fps

1674 ms
59.71 fps

2456 ms
40.71 fps

2923 ms
34.21 fps

4255 ms
23.49 fps

i7-4770HQ (Win10)
(OpenVINO Disabled)

6040 ms
16.55 fps

6342 ms
15.76 fps

7065 ms
14.15 fps

7279 ms
13.73 fps

7965 ms
12.55 fps

NVIDIA Jetson devices

We added full GPGPU acceleration for NVIDIA Jetson devices in version 3.1.0. More information at https://github.com/DoubangoTelecom/ultimateALPR-SDK/blob/master/Jetson.md.

Next benchmark numbers are obtained using JetPack 4.4.1 on 720p images.

0% positives

20% positives

50% positives

70% positives

100% positives

Xavier NX
(TensorRT)

657 ms
152 fps

744 ms
134 fps

837 ms
119 fps

961 ms
104 fps

1068 ms
93 fps

Nano B01
(TensorRT)

2920 ms
34 fps

3102 ms
32 fps

3274 ms
30 fps

3415 ms
20 fps

3727 ms
27 fps

Note

  • On NVIDIA Jetson the code is up to 3 times faster when parallel processing is enabled.

  • Jetson Xavier NX and Jetson TX2 are proposed at the same price ($399) but NX has 4.6 times more compute power than TX2 for FP16: 6 TFLOPS versus 1.3 TFLOPS.

  • We highly recommend using Xavier NX instead of TX2.

The next video shows LPR, LPCI, VCR and VMMR running on NVIDIA Jetson nano:



Raspberry Pi 4 and RockPi 4B

The Github repository contains Raspberry Pi (ARM32) and RockPi 4B (ARM64) benchmark applications to evaluate the performance.

More information on how to build and use the application could be found at https://github.com/DoubangoTelecom/ultimateALPR-SDK/blob/master/samples/c++/benchmark/README.md.

Please note that even if Raspberry Pi 4 has a 64-bit CPU Raspbian OS uses a 32-bit kernel which means we’re loosing many SIMD optimizations.

0% positives

20% positives

50% positives

70% positives

100% positives

Raspberry Pi
(Debian Buster, ARM32)

8189 ms
12.21 fps

8977 ms
11.13 fps

11519 ms
8.68 fps

12295 ms
8.13 fps

14146 ms
7.06 fps

RockPi 4B
(Ubuntu Server 18, ARM64)

7588 ms
13.17 fps

8008 ms
12.48 fps

8606 ms
11.61 fps

9213 ms
10.85 fps

9798 ms
10.20 fps

Note

  • On RockPi 4B the code is 5 times faster when parallel processing is enabled.

  • On Android devices we have noticed that parallel processing can speedup the pipeline by up to 120% on some devices while on Raspberry Pi 4 the gain is marginal.

Amlogic NPU (Khadas VIM3, Banana Pi,…)

We have added support for Amlogic NPUs (Neural Processing Unit) acceleration in version v3.9.0. You’ll be amazed to see UltimateALPR running at up to 64fps (High Definition[HD/720p] resolution) on a $99 ARM device (Khadas VIM3). The engine can run at up to 90fps on low resolution images.

Please read AmlogicNPU.md for more information.

0% positives

20% positives

50% positives

70% positives

100% positives

Khadas VIM3 Basic
(Linux 4.9, NPU, Parallel)

1560 ms
64.08 fps

1797 ms
55.63 fps

1876 ms
53.29 fps

2162 ms
46.25 fps

2902 ms
34.45 fps

Khadas VIM3 Basic
(Linux 4.9, NPU, Sequential)

1776 ms
56.30 fps

3443 ms
29.04 fps

6009 ms
16.63 fps

7705 ms
12.97 fps

10275 ms
9.73 fps

Khadas VIM3 Basic
(Linux 4.9, CPU, Parallel)

4187 ms
23.88 fps

4414 ms
22.65 fps

4824 ms
20.72 fps

5189 ms
19.26 fps

5740 ms
17.42 fps

Khadas VIM3 Basic
(Linux 4.9, CPU, Sequential)

4184 ms
23.89 fps

5972 ms
16.74 fps

8513 ms
11.74 fps

10258 ms
9.74 fps

12867 ms
7.77 fps

Note

  • When parallel processing is enabled we’ll perform detection using the NPU and OCR using the CPU in parallel.

  • Notice how the parallel mode is 4 times faster than the sequential mode when rate=1.0 (all 100 images have plates).