Card classification¶

The card classification is performed on each detected card on the image using batching. The classification will tell us what’s the graph identifier (e.g. California Driver License or Belgian Identity Card).

MRZ, the ultimate arbiter of truth¶

The classification will be restricted to what the MRZ returned. Let’s say the MRZ result is “IDFRA…”, that means we already know that the document is a french identity card. France has 2 type of identity cards and only these two will be considered. This is a good strategy to improve the accuracy but the MRZ isn’t required to correctly classify the document.

Pipeline¶

The classification pipeline is complex but runs very fast. All modules are deep learning models rather than heuristic functions. The modules are GPGPU-accelerated using CUDA and CPU optimized using OpenVINO.

Spatial Transformer Network (STN)¶

The role of the STN module is to rectify/de-skew/de-rotate the image.

More information about STN at https://arxiv.org/abs/1506.02025

Attention Guided Network (AGN)¶

The AGN module is part of the STN and its role is to restrict the processing on the hot zone and ignore everything else.

Embeddings generator¶

Every graph in the dataset is represented as a vector of 64 floating point vector named “embeddings”. The embeddings are computed on the output of the STN module.

Cosine similarity¶

The embeddings are compared using dot product to produce cosine similarity values. These values are within [-1,1]. The graph with the higher similarity score will be taken as the classification result.