MRZ parser

The SDK contains C++ code to parse the MRZ lines returned using the API. The sample application is at https://github.com/DoubangoTelecom/ultimateMRZ-SDK/tree/master/samples/c++/parser This sample application uses regular expressions which means you can easily migrate the code to C#, Python or Java (no change to the regex).

If you’re dealing with non-standard formats and struggling to write the right regular expressions then, don’t hesitate to contact us via our dev-group and we’ll help you.

Here are some samples we’ll use in the next sections:

TD1

I<UTOD231458907<<<<<<<<<<<<<<<
7408122F1204159UTO<<<<<<<<<<<6
ERIKSSON<<ANNA<MARIA<<<<<<<<<<

TD2

I<UTOERIKSSON<<ANNA<MARIA<<<<<<<<<<<
D231458907UTO7408122F1204159<<<<<<<6

TD3

P<UTOERIKSSON<<ANNA<MARIA<<<<<<<<<<<<<<<<<<<
L898902C36UTO7408122F1204159ZE184226B<<<<<10

MRVA

V<UTOERIKSSON<<ANNA<MARIA<<<<<<<<<<<<<<<<<<<
L8988901C4XXX4009078F96121096ZE184226B<<<<<<

MRVB

V<UTOERIKSSON<<ANNA<MARIA<<<<<<<<<<<
L8988901C4XXX4009078F9612109<<<<<<<<

Determine the document type

The MRZ lines (array of strings) returned by the SDK are sorted from top to bottom.

The first operation is to deteermine the document type. You can determine the type using a single line of code:

const MRZ_DOCUMENT_TYPE type = (lines.size() == 3 && lines.front().size() == 30) ?  MRZ_DOCUMENT_TYPE_TD1 : ((lines.front().size() == 44 && lines.size() == 2) ? (lines.front()[0] == 'P' ? MRZ_DOCUMENT_TYPE_TD3 : MRZ_DOCUMENT_TYPE_MRVA) : ((lines.front().size() == 36 && lines.size() == 2) ?  (lines.front()[0] == 'V' ? MRZ_DOCUMENT_TYPE_MRVB : MRZ_DOCUMENT_TYPE_TD2) : MRZ_DOCUMENT_TYPE_UNKNOWN));

Parsing TD1 format

TD1 format has 3 lines, each has 30 characters.

Line 1 (TD1#1)

Regular expression (TD1#1)

Regular expression

([A|C|I][A-Z0-9<]{1})([A-Z]{3})([A-Z0-9<]{9})([0-9]{1})([A-Z0-9<]{15})

Group #1

Document type. A, C or I as the first character.

Group #2

3 letters country code.

Group #3

Document number, up to 9 alphanumeric characters.

Group #4

Check digit on the document number.

Group #5

Optional data at the discretion of the issuing state.

Sample result (TD1#1)

Data

I<UTOD231458907<<<<<<<<<<<<<<<

Group #1

I<

Group #2

UTO

Group #3

D23145890

Group #4

7

Group #5

<<<<<<<<<<<<<<<

Line 2 (TD1#2)

Regular expression (TD1#2)

Regular expression

([0-9]{6})([0-9]{1})([M|F|X|<]{1})([0-9]{6})([0-9]{1})([A-Z]{3})([A-Z0-9<]{11})([0-9]{1})

Group #1

Holder’s date of birth in format YYMMDD.

Group #2

Check digit on the date of birth.

Group #3

Sex of holder.

Group #4

Date of expiry of the document in format YYMMDD.

Group #5

Check digit on date of expiry.

Group #6

Nationality of the holder represented by a three-letter code.

Group #7

Optional data at the discretion of the issuing state.

Group #8

Overall check digit for upper and middle MRZ lines.

Sample result (TD1#2)

Data

7408122F1204159UTO<<<<<<<<<<<6

Group #1

740812

Group #2

2

Group #3

F

Group #4

120415

Group #5

9

Group #6

UTO

Group #7

<<<<<<<<<<<

Group #8

6

Line 3 (TD1#3)

Regular expression (TD1#3)

Regular expression

([A-Z0-9<]{30})

Group #1

Names

Sample result (TD1#3)

Data

ERIKSSON<<ANNA<MARIA<<<<<<<<<<

Group #1

ERIKSSON, ANNA MARIA

Parsing TD2 format

TD2 format has 2 lines, each has 36 characters.

Line 1 (TD2#1)

Regular expression (TD2#1)

Regular expression

([A|C|I][A-Z0-9<]{1})([A-Z]{3})([A-Z0-9<]{31})

Group #1

Document type. A, C or I as the first character.

Group #2

3 letters country code.

Group #3

Primary Identifier.

Sample result (TD2#1)

Data

I<UTOERIKSSON<<ANNA<MARIA<<<<<<<<<<<

Group #1

I<

Group #2

UTO

Group #3

ERIKSSON, ANNA MARIA

Line 2 (TD2#2)

Regular expression (TD2#2)

Regular expression

([A-Z0-9<]{9})([0-9]{1})([A-Z]{3})([0-9]{6})([0-9]{1})([M|F|X|<]{1})([0-9]{6})([0-9]{1})([A-Z0-9<]{7})([0-9]{1})

Group #1

Document number, up to 9 alphanumeric characters.

Group #2

Check digit on document number.

Group #3

Nationality. 3 letters country code.

Group #4

Holder’s date of birth.

Group #5

Check digit on the date of birth.

Group #6

Sex of holder.

Group #7

Date of expiry of the document.

Group #8

Check digit on the date of expiry.

Group #9

Optional data at the discretion of the issuing state.

Group #10

Overall check digit

Sample result (TD2#2)

Data

D231458907UTO7408122F1204159<<<<<<<6

Group #1

D23145890

Group #2

7

Group #3

UTO

Group #4

740812

Group #5

2

Group #6

F

Group #7

120415

Group #8

9

Group #9

<<<<<<<

Group #10

6

Parsing TD3 format

TD3 format has 2 lines, each has 44 characters.

Line 1 (TD3#1)

Regular expression (TD3#1)

Regular expression

(P[A-Z0-9<]{1})([A-Z]{3})([A-Z0-9<]{39})

Group #1

Document type. P as the first character.

Group #2

3 letters country code.

Group #3

Primary Identifier.

Sample result (TD3#1)

Data

P<UTOERIKSSON<<ANNA<MARIA<<<<<<<<<<<<<<<<<<<

Group #1

P<

Group #2

UTO

Group #3

ERIKSSON, ANNA MARIA

Line 2 (TD3#2)

Regular expression (TD3#2)

Regular expression

([A-Z0-9<]{9})([0-9]{1})([A-Z]{3})([0-9]{6})([0-9]{1})([M|F|X|<]{1})([0-9]{6})([0-9]{1})([A-Z0-9<]{14})([0-9]{1})([0-9]{1})

Group #1

Document number, up to 9 alphanumeric characters.

Group #2

Check digit on document number.

Group #3

Nationality. 3 letters country code.

Group #4

Holder’s date of birth.

Group #5

Check digit on the date of birth.

Group #6

Sex of holder.

Group #7

Date of expiry of the document.

Group #8

Check digit on the date of expiry.

Group #9

Optional data at the discretion of the issuing state.

Group #10

Check digit on the optional data.

Group #11

Overall check digit.

Sample result (TD3#2)

Data

L898902C36UTO7408122F1204159ZE184226B<<<<<10

Group #1

L898902C3

Group #2

6

Group #3

UTO

Group #4

740812

Group #5

2

Group #6

F

Group #7

120415

Group #8

9

Group #9

ZE184226B<<<<<

Group #10

1

Group #11

0

Parsing MRVA format

MRVA format has 2 lines, each has 44 characters.

Line 1 (MRVA#1)

Regular expression (MRVA#1)

Regular expression

(V[A-Z0-9<]{1})([A-Z]{3})([A-Z0-9<]{39})

Group #1

Document type. V as the first character.

Group #2

3 letters country code.

Group #3

Primary Identifier.

Sample result (MRVA#1)

Data

V<UTOERIKSSON<<ANNA<MARIA<<<<<<<<<<<<<<<<<<<

Group #1

V<

Group #2

UTO

Group #3

ERIKSSON, ANNA MARIA

Line 2 (MRVA#2)

Regular expression (MRVA#2)

Regular expression

([A-Z0-9<]{9})([0-9]{1})([A-Z]{3})([0-9]{6})([0-9]{1})([M|F|X|<]{1})([0-9]{6})([0-9]{1})([A-Z0-9<]{16})

Group #1

Document number, up to 9 alphanumeric characters.

Group #2

Check digit on document number.

Group #3

Nationality. 3 letters country code.

Group #4

Holder’s date of birth.

Group #5

Check digit on the date of birth.

Group #6

Sex of holder.

Group #7

Date of expiry of the document.

Group #8

Check digit on the date of expiry.

Group #9

Optional data at the discretion of the issuing state.

Sample result (MRVA#2)

Data

L8988901C4XXX4009078F96121096ZE184226B<<<<<<

Group #1

L8988901C

Group #2

4

Group #3

XXX

Group #4

400907

Group #5

8

Group #6

F

Group #7

961210

Group #8

9

Group #9

6ZE184226B<<<<<<

Parsing MRVB format

MRVB format has 2 lines, each has 36 characters.

Line 1 (MRVB#1)

Regular expression (MRVB#1)

Regular expression

(V[A-Z0-9<]{1})([A-Z]{3})([A-Z0-9<]{31})

Group #1

Document type. V as the first character.

Group #2

3 letters country code.

Group #3

Primary Identifier.

Sample result (MRVB#1)

Data

V<UTOERIKSSON<<ANNA<MARIA<<<<<<<<<<<

Group #1

V<

Group #2

UTO

Group #3

ERIKSSON, ANNA MARIA

Line 2 (MRVB#2)

Regular expression (MRVB#2)

Regular expression

([A-Z0-9<]{9})([0-9]{1})([A-Z]{3})([0-9]{6})([0-9]{1})([M|F|X|<]{1})([0-9]{6})([0-9]{1})([A-Z0-9<]{8})

Group #1

Document number, up to 9 alphanumeric characters.

Group #2

Check digit on document number.

Group #3

Nationality. 3 letters country code.

Group #4

Holder’s date of birth.

Group #5

Check digit on the date of birth.

Group #6

Sex of holder.

Group #7

Date of expiry of the document.

Group #8

Check digit on the date of expiry.

Group #9

Optional data at the discretion of the issuing state.

Sample result (MRVB#2)

Data

L8988901C4XXX4009078F9612109<<<<<<<<

Group #1

L8988901C

Group #2

4

Group #3

XXX

Group #4

400907

Group #5

8

Group #6

F

Group #7

961210

Group #8

9

Group #9

<<<<<<<<