MRZ parser¶
The SDK contains C++ code to parse the MRZ lines returned using the API. The sample application is at https://github.com/DoubangoTelecom/ultimateMRZ-SDK/tree/master/samples/c++/parser This sample application uses regular expressions which means you can easily migrate the code to C#, Python or Java (no change to the regex).
If you’re dealing with non-standard formats and struggling to write the right regular expressions then, don’t hesitate to contact us via our dev-group and we’ll help you.
Here are some samples we’ll use in the next sections:
TD1 |
I<UTOD231458907<<<<<<<<<<<<<<< |
TD2 |
I<UTOERIKSSON<<ANNA<MARIA<<<<<<<<<<< |
TD3 |
P<UTOERIKSSON<<ANNA<MARIA<<<<<<<<<<<<<<<<<<< |
MRVA |
V<UTOERIKSSON<<ANNA<MARIA<<<<<<<<<<<<<<<<<<< |
MRVB |
V<UTOERIKSSON<<ANNA<MARIA<<<<<<<<<<< |
Determine the document type¶
The MRZ lines (array of strings) returned by the SDK are sorted from top to bottom.
The first operation is to deteermine the document type. You can determine the type using a single line of code:
const MRZ_DOCUMENT_TYPE type = (lines.size() == 3 && lines.front().size() == 30) ? MRZ_DOCUMENT_TYPE_TD1 : ((lines.front().size() == 44 && lines.size() == 2) ? (lines.front()[0] == 'P' ? MRZ_DOCUMENT_TYPE_TD3 : MRZ_DOCUMENT_TYPE_MRVA) : ((lines.front().size() == 36 && lines.size() == 2) ? (lines.front()[0] == 'V' ? MRZ_DOCUMENT_TYPE_MRVB : MRZ_DOCUMENT_TYPE_TD2) : MRZ_DOCUMENT_TYPE_UNKNOWN));
Parsing TD1 format¶
TD1 format has 3 lines, each has 30 characters.
Line 1 (TD1#1)¶
Regular expression (TD1#1)¶
Regular expression |
|
([A|C|I][A-Z0-9<]{1})([A-Z]{3})([A-Z0-9<]{9})([0-9]{1})([A-Z0-9<]{15}) |
|
Group #1 |
Document type. A, C or I as the first character. |
Group #2 |
3 letters country code. |
Group #3 |
Document number, up to 9 alphanumeric characters. |
Group #4 |
Check digit on the document number. |
Group #5 |
Optional data at the discretion of the issuing state. |
Sample result (TD1#1)¶
Data |
|
I<UTOD231458907<<<<<<<<<<<<<<< |
|
Group #1 |
I< |
Group #2 |
UTO |
Group #3 |
D23145890 |
Group #4 |
7 |
Group #5 |
<<<<<<<<<<<<<<< |
Line 2 (TD1#2)¶
Regular expression (TD1#2)¶
Regular expression |
|
([0-9]{6})([0-9]{1})([M|F|X|<]{1})([0-9]{6})([0-9]{1})([A-Z]{3})([A-Z0-9<]{11})([0-9]{1}) |
|
Group #1 |
Holder’s date of birth in format YYMMDD. |
Group #2 |
Check digit on the date of birth. |
Group #3 |
Sex of holder. |
Group #4 |
Date of expiry of the document in format YYMMDD. |
Group #5 |
Check digit on date of expiry. |
Group #6 |
Nationality of the holder represented by a three-letter code. |
Group #7 |
Optional data at the discretion of the issuing state. |
Group #8 |
Overall check digit for upper and middle MRZ lines. |
Sample result (TD1#2)¶
Data |
|
7408122F1204159UTO<<<<<<<<<<<6 |
|
Group #1 |
740812 |
Group #2 |
2 |
Group #3 |
F |
Group #4 |
120415 |
Group #5 |
9 |
Group #6 |
UTO |
Group #7 |
<<<<<<<<<<< |
Group #8 |
6 |
Parsing TD2 format¶
TD2 format has 2 lines, each has 36 characters.
Line 1 (TD2#1)¶
Regular expression (TD2#1)¶
Regular expression |
|
([A|C|I][A-Z0-9<]{1})([A-Z]{3})([A-Z0-9<]{31}) |
|
Group #1 |
Document type. A, C or I as the first character. |
Group #2 |
3 letters country code. |
Group #3 |
Primary Identifier. |
Sample result (TD2#1)¶
Data |
|
I<UTOERIKSSON<<ANNA<MARIA<<<<<<<<<<< |
|
Group #1 |
I< |
Group #2 |
UTO |
Group #3 |
ERIKSSON, ANNA MARIA |
Line 2 (TD2#2)¶
Regular expression (TD2#2)¶
Regular expression |
|
([A-Z0-9<]{9})([0-9]{1})([A-Z]{3})([0-9]{6})([0-9]{1})([M|F|X|<]{1})([0-9]{6})([0-9]{1})([A-Z0-9<]{7})([0-9]{1}) |
|
Group #1 |
Document number, up to 9 alphanumeric characters. |
Group #2 |
Check digit on document number. |
Group #3 |
Nationality. 3 letters country code. |
Group #4 |
Holder’s date of birth. |
Group #5 |
Check digit on the date of birth. |
Group #6 |
Sex of holder. |
Group #7 |
Date of expiry of the document. |
Group #8 |
Check digit on the date of expiry. |
Group #9 |
Optional data at the discretion of the issuing state. |
Group #10 |
Overall check digit |
Sample result (TD2#2)¶
Data |
|
D231458907UTO7408122F1204159<<<<<<<6 |
|
Group #1 |
D23145890 |
Group #2 |
7 |
Group #3 |
UTO |
Group #4 |
740812 |
Group #5 |
2 |
Group #6 |
F |
Group #7 |
120415 |
Group #8 |
9 |
Group #9 |
<<<<<<< |
Group #10 |
6 |
Parsing TD3 format¶
TD3 format has 2 lines, each has 44 characters.
Line 1 (TD3#1)¶
Regular expression (TD3#1)¶
Regular expression |
|
(P[A-Z0-9<]{1})([A-Z]{3})([A-Z0-9<]{39}) |
|
Group #1 |
Document type. P as the first character. |
Group #2 |
3 letters country code. |
Group #3 |
Primary Identifier. |
Sample result (TD3#1)¶
Data |
|
P<UTOERIKSSON<<ANNA<MARIA<<<<<<<<<<<<<<<<<<< |
|
Group #1 |
P< |
Group #2 |
UTO |
Group #3 |
ERIKSSON, ANNA MARIA |
Line 2 (TD3#2)¶
Regular expression (TD3#2)¶
Regular expression |
|
([A-Z0-9<]{9})([0-9]{1})([A-Z]{3})([0-9]{6})([0-9]{1})([M|F|X|<]{1})([0-9]{6})([0-9]{1})([A-Z0-9<]{14})([0-9]{1})([0-9]{1}) |
|
Group #1 |
Document number, up to 9 alphanumeric characters. |
Group #2 |
Check digit on document number. |
Group #3 |
Nationality. 3 letters country code. |
Group #4 |
Holder’s date of birth. |
Group #5 |
Check digit on the date of birth. |
Group #6 |
Sex of holder. |
Group #7 |
Date of expiry of the document. |
Group #8 |
Check digit on the date of expiry. |
Group #9 |
Optional data at the discretion of the issuing state. |
Group #10 |
Check digit on the optional data. |
Group #11 |
Overall check digit. |
Sample result (TD3#2)¶
Data |
|
L898902C36UTO7408122F1204159ZE184226B<<<<<10 |
|
Group #1 |
L898902C3 |
Group #2 |
6 |
Group #3 |
UTO |
Group #4 |
740812 |
Group #5 |
2 |
Group #6 |
F |
Group #7 |
120415 |
Group #8 |
9 |
Group #9 |
ZE184226B<<<<< |
Group #10 |
1 |
Group #11 |
0 |
Parsing MRVA format¶
MRVA format has 2 lines, each has 44 characters.
Line 1 (MRVA#1)¶
Regular expression (MRVA#1)¶
Regular expression |
|
(V[A-Z0-9<]{1})([A-Z]{3})([A-Z0-9<]{39}) |
|
Group #1 |
Document type. V as the first character. |
Group #2 |
3 letters country code. |
Group #3 |
Primary Identifier. |
Sample result (MRVA#1)¶
Data |
|
V<UTOERIKSSON<<ANNA<MARIA<<<<<<<<<<<<<<<<<<< |
|
Group #1 |
V< |
Group #2 |
UTO |
Group #3 |
ERIKSSON, ANNA MARIA |
Line 2 (MRVA#2)¶
Regular expression (MRVA#2)¶
Regular expression |
|
([A-Z0-9<]{9})([0-9]{1})([A-Z]{3})([0-9]{6})([0-9]{1})([M|F|X|<]{1})([0-9]{6})([0-9]{1})([A-Z0-9<]{16}) |
|
Group #1 |
Document number, up to 9 alphanumeric characters. |
Group #2 |
Check digit on document number. |
Group #3 |
Nationality. 3 letters country code. |
Group #4 |
Holder’s date of birth. |
Group #5 |
Check digit on the date of birth. |
Group #6 |
Sex of holder. |
Group #7 |
Date of expiry of the document. |
Group #8 |
Check digit on the date of expiry. |
Group #9 |
Optional data at the discretion of the issuing state. |
Sample result (MRVA#2)¶
Data |
|
L8988901C4XXX4009078F96121096ZE184226B<<<<<< |
|
Group #1 |
L8988901C |
Group #2 |
4 |
Group #3 |
XXX |
Group #4 |
400907 |
Group #5 |
8 |
Group #6 |
F |
Group #7 |
961210 |
Group #8 |
9 |
Group #9 |
6ZE184226B<<<<<< |
Parsing MRVB format¶
MRVB format has 2 lines, each has 36 characters.
Line 1 (MRVB#1)¶
Regular expression (MRVB#1)¶
Regular expression |
|
(V[A-Z0-9<]{1})([A-Z]{3})([A-Z0-9<]{31}) |
|
Group #1 |
Document type. V as the first character. |
Group #2 |
3 letters country code. |
Group #3 |
Primary Identifier. |
Sample result (MRVB#1)¶
Data |
|
V<UTOERIKSSON<<ANNA<MARIA<<<<<<<<<<< |
|
Group #1 |
V< |
Group #2 |
UTO |
Group #3 |
ERIKSSON, ANNA MARIA |
Line 2 (MRVB#2)¶
Regular expression (MRVB#2)¶
Regular expression |
|
([A-Z0-9<]{9})([0-9]{1})([A-Z]{3})([0-9]{6})([0-9]{1})([M|F|X|<]{1})([0-9]{6})([0-9]{1})([A-Z0-9<]{8}) |
|
Group #1 |
Document number, up to 9 alphanumeric characters. |
Group #2 |
Check digit on document number. |
Group #3 |
Nationality. 3 letters country code. |
Group #4 |
Holder’s date of birth. |
Group #5 |
Check digit on the date of birth. |
Group #6 |
Sex of holder. |
Group #7 |
Date of expiry of the document. |
Group #8 |
Check digit on the date of expiry. |
Group #9 |
Optional data at the discretion of the issuing state. |
Sample result (MRVB#2)¶
Data |
|
L8988901C4XXX4009078F9612109<<<<<<<< |
|
Group #1 |
L8988901C |
Group #2 |
4 |
Group #3 |
XXX |
Group #4 |
400907 |
Group #5 |
8 |
Group #6 |
F |
Group #7 |
961210 |
Group #8 |
9 |
Group #9 |
<<<<<<<< |