MRZ parser¶

The SDK contains C++ code to parse the MRZ lines returned using the API. The sample application is at https://github.com/DoubangoTelecom/ultimateMRZ-SDK/tree/master/samples/c++/parser This sample application uses regular expressions which means you can easily migrate the code to C#, Python or Java (no change to the regex).

If you’re dealing with non-standard formats and struggling to write the right regular expressions then, don’t hesitate to contact us via our dev-group and we’ll help you.

Here are some samples we’ll use in the next sections:

TD1	I<UTOD231458907<<<<<<<<<<<<<<< 7408122F1204159UTO<<<<<<<<<<<6 ERIKSSON<<ANNA<MARIA<<<<<<<<<<
TD2	I<UTOERIKSSON<<ANNA<MARIA<<<<<<<<<<< D231458907UTO7408122F1204159<<<<<<<6
TD3	P<UTOERIKSSON<<ANNA<MARIA<<<<<<<<<<<<<<<<<<< L898902C36UTO7408122F1204159ZE184226B<<<<<10
MRVA	V<UTOERIKSSON<<ANNA<MARIA<<<<<<<<<<<<<<<<<<< L8988901C4XXX4009078F96121096ZE184226B<<<<<<
MRVB	V<UTOERIKSSON<<ANNA<MARIA<<<<<<<<<<< L8988901C4XXX4009078F9612109<<<<<<<<

Determine the document type¶

The MRZ lines (array of strings) returned by the SDK are sorted from top to bottom.

The first operation is to deteermine the document type. You can determine the type using a single line of code:

const MRZ_DOCUMENT_TYPE type = (lines.size() == 3 && lines.front().size() == 30) ?  MRZ_DOCUMENT_TYPE_TD1 : ((lines.front().size() == 44 && lines.size() == 2) ? (lines.front()[0] == 'P' ? MRZ_DOCUMENT_TYPE_TD3 : MRZ_DOCUMENT_TYPE_MRVA) : ((lines.front().size() == 36 && lines.size() == 2) ?  (lines.front()[0] == 'V' ? MRZ_DOCUMENT_TYPE_MRVB : MRZ_DOCUMENT_TYPE_TD2) : MRZ_DOCUMENT_TYPE_UNKNOWN));

Parsing TD1 format¶

TD1 format has 3 lines, each has 30 characters.

Line 1 (TD1#1)¶

Regular expression (TD1#1)¶

Regular expression
([A\|C\|I][A-Z0-9<]{1})([A-Z]{3})([A-Z0-9<]{9})([0-9]{1})([A-Z0-9<]{15})
Group #1	Document type. A, C or I as the first character.
Group #2	3 letters country code.
Group #3	Document number, up to 9 alphanumeric characters.
Group #4	Check digit on the document number.
Group #5	Optional data at the discretion of the issuing state.

Sample result (TD1#1)¶

Data
I<UTOD231458907<<<<<<<<<<<<<<<
Group #1	I<
Group #2	UTO
Group #3	D23145890
Group #4	7
Group #5	<<<<<<<<<<<<<<<

Line 2 (TD1#2)¶

Regular expression (TD1#2)¶

Regular expression
([0-9]{6})([0-9]{1})([M\|F\|X\|<]{1})([0-9]{6})([0-9]{1})([A-Z]{3})([A-Z0-9<]{11})([0-9]{1})
Group #1	Holder’s date of birth in format YYMMDD.
Group #2	Check digit on the date of birth.
Group #3	Sex of holder.
Group #4	Date of expiry of the document in format YYMMDD.
Group #5	Check digit on date of expiry.
Group #6	Nationality of the holder represented by a three-letter code.
Group #7	Optional data at the discretion of the issuing state.
Group #8	Overall check digit for upper and middle MRZ lines.

Sample result (TD1#2)¶

Data
7408122F1204159UTO<<<<<<<<<<<6
Group #1	740812
Group #2	2
Group #3	F
Group #4	120415
Group #5	9
Group #6	UTO
Group #7	<<<<<<<<<<<
Group #8	6

Line 3 (TD1#3)¶

Regular expression (TD1#3)¶

Regular expression
([A-Z0-9<]{30})
Group #1	Names

Sample result (TD1#3)¶

Data
ERIKSSON<<ANNA<MARIA<<<<<<<<<<
Group #1	ERIKSSON, ANNA MARIA

Parsing TD2 format¶

TD2 format has 2 lines, each has 36 characters.

Line 1 (TD2#1)¶

Regular expression (TD2#1)¶

Regular expression
([A\|C\|I][A-Z0-9<]{1})([A-Z]{3})([A-Z0-9<]{31})
Group #1	Document type. A, C or I as the first character.
Group #2	3 letters country code.
Group #3	Primary Identifier.

Sample result (TD2#1)¶

Data
I<UTOERIKSSON<<ANNA<MARIA<<<<<<<<<<<
Group #1	I<
Group #2	UTO
Group #3	ERIKSSON, ANNA MARIA

Line 2 (TD2#2)¶

Regular expression (TD2#2)¶

Regular expression
([A-Z0-9<]{9})([0-9]{1})([A-Z]{3})([0-9]{6})([0-9]{1})([M\|F\|X\|<]{1})([0-9]{6})([0-9]{1})([A-Z0-9<]{7})([0-9]{1})
Group #1	Document number, up to 9 alphanumeric characters.
Group #2	Check digit on document number.
Group #3	Nationality. 3 letters country code.
Group #4	Holder’s date of birth.
Group #5	Check digit on the date of birth.
Group #6	Sex of holder.
Group #7	Date of expiry of the document.
Group #8	Check digit on the date of expiry.
Group #9	Optional data at the discretion of the issuing state.
Group #10	Overall check digit

Sample result (TD2#2)¶

Data
D231458907UTO7408122F1204159<<<<<<<6
Group #1	D23145890
Group #2	7
Group #3	UTO
Group #4	740812
Group #5	2
Group #6	F
Group #7	120415
Group #8	9
Group #9	<<<<<<<
Group #10	6

Parsing TD3 format¶

TD3 format has 2 lines, each has 44 characters.

Line 1 (TD3#1)¶

Regular expression (TD3#1)¶

Regular expression
(P[A-Z0-9<]{1})([A-Z]{3})([A-Z0-9<]{39})
Group #1	Document type. P as the first character.
Group #2	3 letters country code.
Group #3	Primary Identifier.

Sample result (TD3#1)¶

Data
P<UTOERIKSSON<<ANNA<MARIA<<<<<<<<<<<<<<<<<<<
Group #1	P<
Group #2	UTO
Group #3	ERIKSSON, ANNA MARIA

Line 2 (TD3#2)¶

Regular expression (TD3#2)¶

Regular expression
([A-Z0-9<]{9})([0-9]{1})([A-Z]{3})([0-9]{6})([0-9]{1})([M\|F\|X\|<]{1})([0-9]{6})([0-9]{1})([A-Z0-9<]{14})([0-9]{1})([0-9]{1})
Group #1	Document number, up to 9 alphanumeric characters.
Group #2	Check digit on document number.
Group #3	Nationality. 3 letters country code.
Group #4	Holder’s date of birth.
Group #5	Check digit on the date of birth.
Group #6	Sex of holder.
Group #7	Date of expiry of the document.
Group #8	Check digit on the date of expiry.
Group #9	Optional data at the discretion of the issuing state.
Group #10	Check digit on the optional data.
Group #11	Overall check digit.

Sample result (TD3#2)¶

Data
L898902C36UTO7408122F1204159ZE184226B<<<<<10
Group #1	L898902C3
Group #2	6
Group #3	UTO
Group #4	740812
Group #5	2
Group #6	F
Group #7	120415
Group #8	9
Group #9	ZE184226B<<<<<
Group #10	1
Group #11	0

Parsing MRVA format¶

MRVA format has 2 lines, each has 44 characters.

Line 1 (MRVA#1)¶

Regular expression (MRVA#1)¶

Regular expression
(V[A-Z0-9<]{1})([A-Z]{3})([A-Z0-9<]{39})
Group #1	Document type. V as the first character.
Group #2	3 letters country code.
Group #3	Primary Identifier.

Sample result (MRVA#1)¶

Data
V<UTOERIKSSON<<ANNA<MARIA<<<<<<<<<<<<<<<<<<<
Group #1	V<
Group #2	UTO
Group #3	ERIKSSON, ANNA MARIA

Line 2 (MRVA#2)¶

Regular expression (MRVA#2)¶

Regular expression
([A-Z0-9<]{9})([0-9]{1})([A-Z]{3})([0-9]{6})([0-9]{1})([M\|F\|X\|<]{1})([0-9]{6})([0-9]{1})([A-Z0-9<]{16})
Group #1	Document number, up to 9 alphanumeric characters.
Group #2	Check digit on document number.
Group #3	Nationality. 3 letters country code.
Group #4	Holder’s date of birth.
Group #5	Check digit on the date of birth.
Group #6	Sex of holder.
Group #7	Date of expiry of the document.
Group #8	Check digit on the date of expiry.
Group #9	Optional data at the discretion of the issuing state.

Sample result (MRVA#2)¶

Data
L8988901C4XXX4009078F96121096ZE184226B<<<<<<
Group #1	L8988901C
Group #2	4
Group #3	XXX
Group #4	400907
Group #5	8
Group #6	F
Group #7	961210
Group #8	9
Group #9	6ZE184226B<<<<<<

Parsing MRVB format¶

MRVB format has 2 lines, each has 36 characters.

Line 1 (MRVB#1)¶

Regular expression (MRVB#1)¶

Regular expression
(V[A-Z0-9<]{1})([A-Z]{3})([A-Z0-9<]{31})
Group #1	Document type. V as the first character.
Group #2	3 letters country code.
Group #3	Primary Identifier.

Sample result (MRVB#1)¶

Data
V<UTOERIKSSON<<ANNA<MARIA<<<<<<<<<<<
Group #1	V<
Group #2	UTO
Group #3	ERIKSSON, ANNA MARIA

Line 2 (MRVB#2)¶

Regular expression (MRVB#2)¶

Regular expression
([A-Z0-9<]{9})([0-9]{1})([A-Z]{3})([0-9]{6})([0-9]{1})([M\|F\|X\|<]{1})([0-9]{6})([0-9]{1})([A-Z0-9<]{8})
Group #1	Document number, up to 9 alphanumeric characters.
Group #2	Check digit on document number.
Group #3	Nationality. 3 letters country code.
Group #4	Holder’s date of birth.
Group #5	Check digit on the date of birth.
Group #6	Sex of holder.
Group #7	Date of expiry of the document.
Group #8	Check digit on the date of expiry.
Group #9	Optional data at the discretion of the issuing state.

Sample result (MRVB#2)¶

Data
L8988901C4XXX4009078F9612109<<<<<<<<
Group #1	L8988901C
Group #2	4
Group #3	XXX
Group #4	400907
Group #5	8
Group #6	F
Group #7	961210
Group #8	9
Group #9	<<<<<<<<