Architecture overview¶

Supported operating systems¶

We support any OS with a C++11 compiler. The code has been tested on Android, iOS, Windows, Linux, Raspberry Pi and many custom embedded devices (e.g. Cameras).

The Github repository contains binaries for Android, Raspberry Pi, Linux and Windows as reference code to allow developers to test the implementation. This reference implementation comes with both Java and C++ APIs. The API is common to all operating systems which means you can develop and test your application on Android, Raspberry Pi, Linux or Windows and when you’re ready to move forward we’ll provide the binaries for your OS.

Supported CPUs¶

We officially support any ARM32 (AArch32), ARM64 (AArch64), X86 and X86_64 architecture. The SDK have been tested on all these CPUs.

MIPS32/64 may work but haven’t been tested and would be horribly slow as there is no SIMD acceleration written for these architectures.

Almost all computer vision functions are written using assembler and accelerated with SIMD code (NEON, SSE and AVX). Some computer vision functions have been open sourced and shared in CompV project available at https://github.com/DoubangoTelecom/CompV.

Supported GPUs¶

We support any OpenCL 1.2+ compatible GPU for the computer vision parts. For the deep learning modules:

We support any NVIDIA GPU, thanks to TensorRT.
We support any Intel GPU, thanks to OpenVINO.
The mobile (ARM) implementation works anywhere thanks to the multiple backends: OpenCL, OpenGL shaders, Metal and NNAPI.

Please note that for the mobile (ARM) implementation a GPU isn’t required at all. Most of the time the code will run faster on CPU than GPU thanks to fixed-point math implementation and quantized inference. GPU implementations will provide more accuracy as it rely on 32bit floating-point math. We’re working to provide 16bit floating-point models for the coming months.

Note

Full GPGPU acceleration was added in version 3.1.0 for NVIDIA Jetson devices using TensorRT and TF-TRT.
Check https://github.com/DoubangoTelecom/ultimateALPR-SDK/blob/master/Jetson.md.

Supported VPUs¶

Thanks to OpenVINO, we support Intel Movidius Vision Processing Unit (VPU).

Supported FPGAs¶

Thanks to OpenVINO, we support Intel FPGAs.

Supported programming languages¶

The code was developed using C++11 and assembler but the API (Application Programming Interface) has many bindings thanks to SWIG.

Bindings: ANSI-C, C++, C#, Java, ObjC, Swift, Perl, Ruby and Python.

Supported raw formats¶

We supports the following image/video formats: RGBA32, BGRA32, RGB24, BGR24, NV12, NV21, Y(Grayscale), YUV420P, YVU420P, YUV422P and YUV444P. NV12 and NV21 are semi-planar formats also known as YUV420SP.

Optimizations¶

Hand-written assembler
SIMD (SSE, AVX, NEON) using intrinsics or assembler
GPGPU (CUDA, TensorRT, TF-TRT, OpenVINO, OpenCL, OpenGL, NNAPI and Metal)
Smart multithreading (minimal context switch, no false-sharing, no boundaries crossing…)
Smart memory access (data alignment, cache pre-load, cache blocking, non-temporal load/store for minimal cache pollution, smart reference counting…)
Fixed-point math
Quantized inference
… and many more

Many functions have been open sourced and included in CompV project: https://github.com/DoubangoTelecom/CompV. More functions from deep learning parts will be open sourced in the coming months. You can contact us to get some closed-source code we’re planning to open.

Thread safety¶

All the functions in the SDK are thread safe which means you can invoke them in concurrent from multiple threads. But, you should not do it for many reasons:

The SDK is already massively multithreaded d in an efficient way (see the threading model section).
You’ll end up saturating the CPU and making everything run slower. The threading model makes sure the SDK will never use more threads than the number of virtual CPU cores. Calling the engine from different threads will break this rule as we cannot control the threads created outside the SDK.
Unless you have access to the private API the engine uses a single context which means concurrent calls are locked when they try to write to a shared resource.