The detection pipeline is complex and has a crucial role. You don’t need to understand how the pipeline works in order to take the best from the SDK. But, understanding the pipeline will help you teak the configuration parameters.
- When an image is presented to the detection pipeline the steps are:
This section explains how each node in the detection pipeline work.
Multi-layer image segmentation¶
This is the first node in the pipeline. We’re using 4 layers and each one has its own parameters. The role of the segmenter is to detach the foreground from the background. In this case the foreground is the text. Each layer is designed to deal with different level of contrast.
Each layer from the segmenter will produce fragments (groups of pixels). This node will fuse the fragments to reduce the amount of data to process.
Hysteresis voting using double thresholding¶
The fused fragments are groups of pixels. The pixels are connected based on their intensity (8-bit grayscale values) using hysteresis. More information about hysteresis could be found at https://en.wikipedia.org/wiki/Hysteresis . For each pixel, we consider the 8 neighbors (8-connected algorithm). The decision to connect the current pixel to its neighbor is done using double thresholding on the residual. Double thresholding means we’re using a minimum and a maximum.
The residual is computed using absolute difference on the intensities:
A residual lower than the minimum threshold is named lost pixel.
A residual higher than the minimum threshold but lower than the maximum threshold is named weak pixel.
A residual higher than the maximum threshold is named strong pixel.
All weak pixels connected to at least one strong pixel are nominated as strong. When they are connected to a lost pixel, then they are nominated as lost pixels. At the end of the hysteresis only strong pixels are considered. All other pixels are considered orphans and are removed.
The strong pixels from the hysteresis are clustered to form groups (or lines). More information about clustering could be found at https://en.wikipedia.org/wiki/Cluster_analysis. It’s up to this node to compute the skew and slant angles. These angles are very important because they are used to deskew and deslant the MRZ lines before feeding the recognizer.
Binary font classification¶
At the end of the clustering you’ll have hundreds if not thousands of groups. The groups (or lines) are passed to a binary font classifier. The font classifier uses OpenCL for GPGPU acceleration. For each group, the classifier will output 1 or 0. 1 means it’s likely a MRZ group and 0 for anything else (text, images…). Three fonts are supported: OCR-B, CMC-7 and E-13B. In this project only OCR-B font is activated.
Backpropagation function used to detect MRZ lines is the same as what is used for MICR. Please check https://www.doubango.org/SDKs/micr/docs/Detection_techniques.html#backpropagation for more information.