Architecture overview¶
Supported operating systems¶
We support any OS with a C++11 compiler. The public version only supports x86-64 CPUs (Linux and Windows) but could work on Android, iOS, NVIDIA Jetson, RPI… We’ll release the mobile versions in the coming months after the initial release. The Github repository contains binaries for Windows and Linux (x86-64) as reference code to allow developers to test the implementation. These reference implementations come with Java, C#, Python and C++ APIs.
Supported CPUs¶
We officially support any ARM32 (AArch32), ARM64 (AArch64), X86 and X86_64 architecture. The SDK have been tested on all these CPUs.
MIPS32/64 may work but haven’t been tested and would be horribly slow as there is no SIMD acceleration written for these architectures.
Almost all computer vision functions are written using assembler and accelerated with SIMD code (NEON, SSE and AVX). Some computer vision functions have been open sourced and shared in CompV project available at https://github.com/DoubangoTelecom/CompV.
Supported GPUs¶
We support any OpenCL 1.2+ compatible GPU for the computer vision parts. For the deep learning modules:
We support any NVIDIA GPU, thanks to TensorRT.
We support any Intel GPU, thanks to OpenVINO.
The mobile (ARM) implementation works anywhere thanks to the multiple backends: OpenCL, OpenGL shaders, Metal and NNAPI.
Please note that for the mobile (ARM) implementation a GPU isn’t required at all. Most of the time the code will run faster on CPU than GPU thanks to fixed-point math implementation and quantized inference. GPU implementations will provide more accuracy as it rely on 32bit floating-point math. We’re working to provide 16bit floating-point models for the coming months.
Supported VPUs¶
Thanks to OpenVINO, we support Intel Movidius Vision Processing Unit (VPU).
Supported FPGAs¶
Thanks to OpenVINO, we support Intel FPGAs.
Supported programming languages¶
The code was developed using C++11 and assembler but the API (Application Programming Interface) has many bindings thanks to SWIG.
Bindings: ANSI-C, C++, C#, Java, ObjC, Swift, Perl, Ruby and Python.
Supported raw formats¶
We supports the following image/video formats: RGBA32, BGRA32, RGB24, BGR24, NV12, NV21, Y(Grayscale), YUV420P, YVU420P, YUV422P and YUV444P. NV12 and NV21 are semi-planar formats also known as YUV420SP.
Optimizations¶
Hand-written assembler
SIMD (SSE, AVX, NEON) using intrinsics or assembler
GPGPU (CUDA, TensorRT, TF-TRT, OpenVINO, OpenCL, OpenGL, NNAPI and Metal)
Smart multithreading (minimal context switch, no false-sharing, no boundaries crossing…)
Smart memory access (data alignment, cache pre-load, cache blocking, non-temporal load/store for minimal cache pollution, smart reference counting…)
Fixed-point math
Quantized inference
… and many more
Many functions have been open sourced and included in CompV project: https://github.com/DoubangoTelecom/CompV. More functions from deep learning parts will be open sourced in the coming months. You can contact us to get some closed-source code we’re planning to open.
Thread safety¶
All the functions in the SDK are thread safe which means you can invoke them in concurrent from multiple threads. But, you should not do it for many reasons:
The SDK is already massively multithreaded in an efficient way (see the threading model section).
You’ll end up saturating the CPU and making everything run slower. The threading model makes sure the SDK will never use more threads than the number of virtual CPU cores. Calling the engine from different threads will break this rule as we cannot control the threads created outside the SDK.
Unless you have access to the private API the engine uses a single context which means concurrent calls are locked when they try to write to a shared resource.