Text detector

Text detection will be performed on the rectified image after graph registration. The detector is a Fully Convolutional Network (FCN) and the output is a heatmap.

Heatmap

The heatmap is the output of the main detector and is a float array with values within [0, 1] with the same size as the input image.

Heatmap refining

Our text detector returns a heatmap instead of bounding boxes. The text detector was trained to detect words with at least 3 characters, we didn’t include short words (less than 3 chars) on purpose to avoid having too many false positives. Short words will not be detected by the main text detector and that’s why we have a second specialized detector. The specialized detector for short words is a SVM model (not deep learning) with input features from a THOG generator (Paper: T-HOG: an Effective Gradient-Based Descriptor for Single Line Text Regions). Running the specialized detector after the main one is what we name “heat refining”.