Graph computation¶
This is the most complex and most important part of the project. For confidential reasons only few technical details will be provided here.
The graph computation is ran after the document is fully classified and rectified (thanks to the STN).
More information about Graph Neural Network (GNN) at https://en.wikipedia.org/wiki/Graph_neural_network
Sparse versus Dense¶
We support 2 types of graphs: “dense” and “sparse”. As the names suggest, the first one produces more nodes than the second one. We recommend using “dense” graphs on servers (x86) and “sparse” graphs on mobiles (ARM).
Pipeline¶
Nodes generation¶
The output of the graph is a myriad of nodes represented as a 2x2x512 tensor.
Optical flow¶
The dataset folder has 2 files: “IdentityOCR_aggregated.json.dense” and “IdentityOCR_aggregated.json.sparse”, the first one contains the dense nodes while the second one has the sparse nodes. These nodes are pre-computed on the supported documents. For example, they contain a graph for french identity card.
Let’s say the document to process is a french id card, the graph nodes will be extracted from that document and stacked (channel wise) with the pre-computed ones from the dataset to form a 2x2x1024 tensor. The stacked tensor is passed through a deep learning model to produce optical flow.
Registration¶
The optical flow cannot be used for graph registration for various reasons and that’s why we need to post-process it with MAGSAC++ or TPS.
The registration is the process used to retrieve the position of each field (e.g. DateOfBirth, Surname, GivenNames…).
More information about image registration at https://en.wikipedia.org/wiki/Image_registration.
Marginalizing Sample Consensus (MAGSAC++)¶
MAGSAC++ will produce a 3x3 Homography matrix mapping graph A to graph B. Graph A represents the image provided by the user while Graph B comes from the dataset.
More information about Homography at https://en.wikipedia.org/wiki/Homography_(computer_vision).
More info about MAGSAC++ at https://arxiv.org/abs/1912.05909.
Thin-Plate Spline (TPS)¶
A homography cannot represent some non-rigid transformations. One of these non-rigid transformations is document folding. For various reasons it’s not always possible to register Graph A with respect to Graph B using a homography. This is where TPS is needed.
TPS is part of 2nd-pass graph computation which means you need to enable 2-pass option.
More information about TPS at https://en.wikipedia.org/wiki/Thin_plate_spline.
Umeyama¶
Another way to deal with some non-rigid transformations is to use umeyama algorithm.
Umeyama is part of 2nd-pass graph computation which means you need to enable 2-pass option.
More information about umeyama algorithm at https://web.stanford.edu/class/cs273/refs/umeyama.pdf.