Memory management design¶

This section is about the memory management design.

Memory pooling¶

The SDK will allocate at maximum 1/20th of the available RAM during the application lifetime and manage it using a pool. For example, if the device have 8G memory, then it will start allocating 3M memory and depending on the malloc/free requests this amount will be increased with 400M (1/20th of 8G) being the maximum. Most of the time the allocated memory will never be more than 5M.

Every memory allocation or deallocation operation (malloc, calloc, free, realloc…) is hooked which make it immediate (no delay). The application allocates and deallocates aligned memory hundreds of time every second and thanks to the pooling mechanism these operations don’t add any latency.

We found it was interesting to add this section on the documentation so that the developers understand why the amount of allocated memory doesn’t automatically decrease when freed. You may think there are leaks but it’s probably not the case. Please also note that we track every allocated memory or object and can automatically detect leaks.

Minimal cache eviction¶

Thanks to the memory pooling when a block is freed it’s not really deallocated but put on the top of the pool and reattributed at the next allocation request. This not only make the allocation faster but also minimize the cache eviction as the “fakely” freed memory is still hot in the cache.

Aligned on SIMD and cache line size¶

Any memory allocation done using the SDK will be aligned on 16bytes on ARM and 32bytes on x86. The data is also strided to make it cache-friendly. The 16bytes and 32bytes alignment values aren’t arbitrary but chosen to make ARM NEON and AVX functions happy.

When the user provides non-aligned data as input to the SDK, then the data is unpacked and wrapped to make it SIMD-aligned. This introduce some latency. Try to provide aligned data and when choosing region of interest (ROI) for the detector try to use SIMD-aligned left bounds.

(left & 15) == 0; // means 16bytes aligned
(left & 31) == 0; // means 32bytes aligned

Memory usage¶

OpenVINO consume 2 times less memory than Tensorflow. We highly recommend using OpenVINO instead of Tensorflow to decrease memory usage.

Cache blocking¶

Note

TODO(dmi): to be continued