MRZ (Machine Readable Zone)

Formats

MRZ / MRTD (Machine Readable Zone / Travel Documents) format is standardized by the ICAO (International Organization for Standardization) in Document 9303. The document is divided in 7 parts:

Fortunately you don’t need to read all these documents to understand the MRZ / MRTD format our use our SDK. We have done the work for you.

There are 3 main formats: TD1, TD2, TD3 and Visas (MRVA and MRVB).

TD1

This format is defined in Machine Readable Travel Documents Part 5: Specifications for TD1 Size Machine Readable Official Travel Documents (MROTDs) document.

Here is a sample image:

TD1 sample

Here is the format:

TD1 format

TD2

This format is defined in Machine Readable Travel Documents Part 6: Specifications for TD2 Size Machine Readable Official Travel Documents (MROTDs) document.

Here is a sample image:

TD2 sample

Here is the format:

TD2 format

TD3

This format is defined in Machine Readable Travel Documents Part 4: Specifications for Machine Readable Passports (MRPs) and other TD3 Size MRTDs document.

Here is a sample image:

TD3 sample

Here is the format:

TD3 format

Visas

This format is defined in Machine Readable Travel Documents Part 7: Machine Readable Visas document and have two subtypes: MRVA and MRVB.

MRVA

Here is a sample image:

MRVA sample

Here is the format:

MRVA format

MRVB

Here is a sample image:

MRVB sample

Here is the format:

MRVB format

Parser

The MRZ parser is closed source and based on an open source project at https://github.com/DoubangoTelecom/ultimateMRZ-SDK/tree/master/samples/c++/parser. This sample application uses regular expressions which means you can easily migrate the code to C#, Python or Java (no change to the regex).

If you’re dealing with non-standard formats and struggling to write the right regular expressions then, don’t hesitate to contact us via our dev-group and we’ll help you and update the SDK to support it.

Here are some samples we’ll use in the next sections:

TD1

I<UTOD231458907<<<<<<<<<<<<<<<
7408122F1204159UTO<<<<<<<<<<<6
ERIKSSON<<ANNA<MARIA<<<<<<<<<<

TD2

I<UTOERIKSSON<<ANNA<MARIA<<<<<<<<<<<
D231458907UTO7408122F1204159<<<<<<<6

TD3

P<UTOERIKSSON<<ANNA<MARIA<<<<<<<<<<<<<<<<<<<
L898902C36UTO7408122F1204159ZE184226B<<<<<10

MRVA

V<UTOERIKSSON<<ANNA<MARIA<<<<<<<<<<<<<<<<<<<
L8988901C4XXX4009078F96121096ZE184226B<<<<<<

MRVB

V<UTOERIKSSON<<ANNA<MARIA<<<<<<<<<<<
L8988901C4XXX4009078F9612109<<<<<<<<

Determine the document type

The MRZ lines (array of strings) returned by the SDK are sorted from top to bottom.

The first operation is to deteermine the document type. You can determine the type using a single line of code:

const MRZ_DOCUMENT_TYPE type = (lines.size() == 3 && lines.front().size() == 30) ?  MRZ_DOCUMENT_TYPE_TD1 : ((lines.front().size() == 44 && lines.size() == 2) ? (lines.front()[0] == 'P' ? MRZ_DOCUMENT_TYPE_TD3 : MRZ_DOCUMENT_TYPE_MRVA) : ((lines.front().size() == 36 && lines.size() == 2) ?  (lines.front()[0] == 'V' ? MRZ_DOCUMENT_TYPE_MRVB : MRZ_DOCUMENT_TYPE_TD2) : MRZ_DOCUMENT_TYPE_UNKNOWN));

Parsing TD1 format

TD1 format has 3 lines, each has 30 characters.

Line 1 (TD1#1)

Regular expression (TD1#1)

Regular expression

([A|C|I][A-Z0-9<]{1})([A-Z]{3})([A-Z0-9<]{9})([0-9]{1})([A-Z0-9<]{15})

Group #1

Document type. A, C or I as the first character.

Group #2

3 letters country code.

Group #3

Document number, up to 9 alphanumeric characters.

Group #4

Check digit on the document number.

Group #5

Optional data at the discretion of the issuing state.

Sample result (TD1#1)

Data

I<UTOD231458907<<<<<<<<<<<<<<<

Group #1

I<

Group #2

UTO

Group #3

D23145890

Group #4

7

Group #5

<<<<<<<<<<<<<<<

Line 2 (TD1#2)

Regular expression (TD1#2)

Regular expression

([0-9]{6})([0-9]{1})([M|F|X|<]{1})([0-9]{6})([0-9]{1})([A-Z]{3})([A-Z0-9<]{11})([0-9]{1})

Group #1

Holder’s date of birth in format YYMMDD.

Group #2

Check digit on the date of birth.

Group #3

Sex of holder.

Group #4

Date of expiry of the document in format YYMMDD.

Group #5

Check digit on date of expiry.

Group #6

Nationality of the holder represented by a three-letter code.

Group #7

Optional data at the discretion of the issuing state.

Group #8

Overall check digit for upper and middle MRZ lines.

Sample result (TD1#2)

Data

7408122F1204159UTO<<<<<<<<<<<6

Group #1

740812

Group #2

2

Group #3

F

Group #4

120415

Group #5

9

Group #6

UTO

Group #7

<<<<<<<<<<<

Group #8

6

Line 3 (TD1#3)

Regular expression (TD1#3)

Regular expression

([A-Z0-9<]{30})

Group #1

Names

Sample result (TD1#3)

Data

ERIKSSON<<ANNA<MARIA<<<<<<<<<<

Group #1

ERIKSSON, ANNA MARIA

Parsing TD2 format

TD2 format has 2 lines, each has 36 characters.

Line 1 (TD2#1)

Regular expression (TD2#1)

Regular expression

([A|C|I][A-Z0-9<]{1})([A-Z]{3})([A-Z0-9<]{31})

Group #1

Document type. A, C or I as the first character.

Group #2

3 letters country code.

Group #3

Primary Identifier.

Sample result (TD2#1)

Data

I<UTOERIKSSON<<ANNA<MARIA<<<<<<<<<<<

Group #1

I<

Group #2

UTO

Group #3

ERIKSSON, ANNA MARIA

Line 2 (TD2#2)

Regular expression (TD2#2)

Regular expression

([A-Z0-9<]{9})([0-9]{1})([A-Z]{3})([0-9]{6})([0-9]{1})([M|F|X|<]{1})([0-9]{6})([0-9]{1})([A-Z0-9<]{7})([0-9]{1})

Group #1

Document number, up to 9 alphanumeric characters.

Group #2

Check digit on document number.

Group #3

Nationality. 3 letters country code.

Group #4

Holder’s date of birth.

Group #5

Check digit on the date of birth.

Group #6

Sex of holder.

Group #7

Date of expiry of the document.

Group #8

Check digit on the date of expiry.

Group #9

Optional data at the discretion of the issuing state.

Group #10

Overall check digit

Sample result (TD2#2)

Data

D231458907UTO7408122F1204159<<<<<<<6

Group #1

D23145890

Group #2

7

Group #3

UTO

Group #4

740812

Group #5

2

Group #6

F

Group #7

120415

Group #8

9

Group #9

<<<<<<<

Group #10

6

Parsing TD3 format

TD3 format has 2 lines, each has 44 characters.

Line 1 (TD3#1)

Regular expression (TD3#1)

Regular expression

(P[A-Z0-9<]{1})([A-Z]{3})([A-Z0-9<]{39})

Group #1

Document type. P as the first character.

Group #2

3 letters country code.

Group #3

Primary Identifier.

Sample result (TD3#1)

Data

P<UTOERIKSSON<<ANNA<MARIA<<<<<<<<<<<<<<<<<<<

Group #1

P<

Group #2

UTO

Group #3

ERIKSSON, ANNA MARIA

Line 2 (TD3#2)

Regular expression (TD3#2)

Regular expression

([A-Z0-9<]{9})([0-9]{1})([A-Z]{3})([0-9]{6})([0-9]{1})([M|F|X|<]{1})([0-9]{6})([0-9]{1})([A-Z0-9<]{14})([0-9]{1})([0-9]{1})

Group #1

Document number, up to 9 alphanumeric characters.

Group #2

Check digit on document number.

Group #3

Nationality. 3 letters country code.

Group #4

Holder’s date of birth.

Group #5

Check digit on the date of birth.

Group #6

Sex of holder.

Group #7

Date of expiry of the document.

Group #8

Check digit on the date of expiry.

Group #9

Optional data at the discretion of the issuing state.

Group #10

Check digit on the optional data.

Group #11

Overall check digit.

Sample result (TD3#2)

Data

L898902C36UTO7408122F1204159ZE184226B<<<<<10

Group #1

L898902C3

Group #2

6

Group #3

UTO

Group #4

740812

Group #5

2

Group #6

F

Group #7

120415

Group #8

9

Group #9

ZE184226B<<<<<

Group #10

1

Group #11

0

Parsing MRVA format

MRVA format has 2 lines, each has 44 characters.

Line 1 (MRVA#1)

Regular expression (MRVA#1)

Regular expression

(V[A-Z0-9<]{1})([A-Z]{3})([A-Z0-9<]{39})

Group #1

Document type. V as the first character.

Group #2

3 letters country code.

Group #3

Primary Identifier.

Sample result (MRVA#1)

Data

V<UTOERIKSSON<<ANNA<MARIA<<<<<<<<<<<<<<<<<<<

Group #1

V<

Group #2

UTO

Group #3

ERIKSSON, ANNA MARIA

Line 2 (MRVA#2)

Regular expression (MRVA#2)

Regular expression

([A-Z0-9<]{9})([0-9]{1})([A-Z]{3})([0-9]{6})([0-9]{1})([M|F|X|<]{1})([0-9]{6})([0-9]{1})([A-Z0-9<]{16})

Group #1

Document number, up to 9 alphanumeric characters.

Group #2

Check digit on document number.

Group #3

Nationality. 3 letters country code.

Group #4

Holder’s date of birth.

Group #5

Check digit on the date of birth.

Group #6

Sex of holder.

Group #7

Date of expiry of the document.

Group #8

Check digit on the date of expiry.

Group #9

Optional data at the discretion of the issuing state.

Sample result (MRVA#2)

Data

L8988901C4XXX4009078F96121096ZE184226B<<<<<<

Group #1

L8988901C

Group #2

4

Group #3

XXX

Group #4

400907

Group #5

8

Group #6

F

Group #7

961210

Group #8

9

Group #9

6ZE184226B<<<<<<

Parsing MRVB format

MRVB format has 2 lines, each has 36 characters.

Line 1 (MRVB#1)

Regular expression (MRVB#1)

Regular expression

(V[A-Z0-9<]{1})([A-Z]{3})([A-Z0-9<]{31})

Group #1

Document type. V as the first character.

Group #2

3 letters country code.

Group #3

Primary Identifier.

Sample result (MRVB#1)

Data

V<UTOERIKSSON<<ANNA<MARIA<<<<<<<<<<<

Group #1

V<

Group #2

UTO

Group #3

ERIKSSON, ANNA MARIA

Line 2 (MRVB#2)

Regular expression (MRVB#2)

Regular expression

([A-Z0-9<]{9})([0-9]{1})([A-Z]{3})([0-9]{6})([0-9]{1})([M|F|X|<]{1})([0-9]{6})([0-9]{1})([A-Z0-9<]{8})

Group #1

Document number, up to 9 alphanumeric characters.

Group #2

Check digit on document number.

Group #3

Nationality. 3 letters country code.

Group #4

Holder’s date of birth.

Group #5

Check digit on the date of birth.

Group #6

Sex of holder.

Group #7

Date of expiry of the document.

Group #8

Check digit on the date of expiry.

Group #9

Optional data at the discretion of the issuing state.

Sample result (MRVB#2)

Data

L8988901C4XXX4009078F9612109<<<<<<<<

Group #1

L8988901C

Group #2

4

Group #3

XXX

Group #4

400907

Group #5

8

Group #6

F

Group #7

961210

Group #8

9

Group #9

<<<<<<<<

Data validation

What is nice with the MRZ / MRTD specifications is that they contain different check digits to make sure that the most important fields (document number, expiry date, date of birth…) are valid. We’ll explain how the validation is done in the next sections.

The code to validate the MRZ data is closed source and based on open source code at https://github.com/DoubangoTelecom/ultimateMRZ-SDK/tree/master/samples/c++/validation.

A check digit consists of a single digit computed from the other digits in a series. Check digits in the MRZ are calculated on specified numerical data elements in the MRZ. The check digits permit readers to verify that data in the MRZ is correctly interpreted.

A special check digit calculation has been adopted for use in MRTDs. The check digits shall be calculated on modulus 10 with a continuously repetitive weighting of 731 731 …, as follows:
  • Step 1. Going from left to right, multiply each digit of the pertinent numerical data element by the weighting figure appearing in the corresponding sequential position.

  • Step 2. Add the products of each multiplication.

  • Step 3. Divide the sum by 10 (the modulus).

  • Step 4. The remainder shall be the check digit.

For data elements in which the number does not occupy all available character positions, the symbol < shall be used to complete vacant positions and shall be given the value of zero for the purpose of calculating the check digit.

When the check digit calculation is applied to data elements containing alphabetic characters, the characters A to Z shall have the values 10 to 35 consecutively, as follows:

A

10

B

11

C

12

D

13

E

14

F

15

G

16

H

17

I

18

J

19

K

20

L

21

M

22

N

23

O

24

P

25

Q

26

R

27

S

28

T

29

U

30

V

31

W

32

X

33

Y

34

Z

35

Next sections explain how to implement the above algorithm using C++. The entire code could be found at https://github.com/DoubangoTelecom/ultimateMRZ-SDK/blob/master/samples/c++/validation/main.cxx and it’s used in the validation sample application.

The SDK itself doesn’t contain the validation code for the simple reason that we want it to be generic to work with any MRZ format even if the data is malformed or non-standard.

Local variables and macros

Weighting:
static const int __Weights[] = { 7, 3, 1 };
Mapped values:
static std::map<char, int> __MappedValues;
const std::string charset = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
for (int i = 0; i < static_cast<int>(charset.size()); ++i) {
        __MappedValues[charset[i]] = i;
}
__MappedValues['<'] = 0;
Weighted sum:
#define MRZ_COMPUTE_WEIGHTED_SUM(_line_, _start_, _end_, w) { \
        for (size_t i = _start_, j = w; i <= _end_; ++i, ++j) { \
                sum += __MappedValues[(_line_)[i]] * __Weights[j % 3]; \
        } \
}
Validity check:
#define MRZ_CHECK_VALIDITY(_line_, _start_, _end_, _check_, _ret_) { \
        const std::string __line__ = (_line_); \
        int sum = 0; \
        MRZ_COMPUTE_WEIGHTED_SUM(__line__, _start_, _end_, 0); \
        _ret_ = ((sum % 10) == __MappedValues[__line__[_check_]]); \
}

Validating TD1 format

bool documentNumber, dateOfBirth, dateOfExpiry, upperAndMiddleLines;

MRZ_CHECK_VALIDITY(lines[0], 5, 13, 14, documentNumber);
MRZ_CHECK_VALIDITY(lines[1], 0, 5, 6, dateOfBirth);
MRZ_CHECK_VALIDITY(lines[1], 8, 13, 14, dateOfExpiry);

int sum = 0;
MRZ_COMPUTE_WEIGHTED_SUM(lines[0], 5, 29, 0);
MRZ_COMPUTE_WEIGHTED_SUM(lines[1], 0, 6, 25);
MRZ_COMPUTE_WEIGHTED_SUM(lines[1], 8, 14, 32);
MRZ_COMPUTE_WEIGHTED_SUM(lines[1], 18, 28, 43);
upperAndMiddleLines = (sum % 10) == __MappedValues[lines[1][29]];

Validating TD2 format

bool documentNumber, dateOfBirth, dateOfExpiry, composite;

MRZ_CHECK_VALIDITY(lines[1], 0, 8, 9, documentNumber);
MRZ_CHECK_VALIDITY(lines[1], 13, 18, 19, dateOfBirth);
MRZ_CHECK_VALIDITY(lines[1], 21, 26, 27, dateOfExpiry);

int sum = 0;
MRZ_COMPUTE_WEIGHTED_SUM(lines[1], 0, 9, 0);
MRZ_COMPUTE_WEIGHTED_SUM(lines[1], 13, 19, 10);
MRZ_COMPUTE_WEIGHTED_SUM(lines[1], 21, 34, 17);
composite = (sum % 10) == __MappedValues[lines[1][35]];

Validating TD3 format

bool passportNumber, dateOfBirth, dateOfExpiry, personalNumber, composite;

MRZ_CHECK_VALIDITY(lines[1], 0, 8, 9, passportNumber);
MRZ_CHECK_VALIDITY(lines[1], 13, 18, 19, dateOfBirth);
MRZ_CHECK_VALIDITY(lines[1], 21, 26, 27, dateOfExpiry);
MRZ_CHECK_VALIDITY(lines[1], 28, 41, 42, personalNumber);

int sum = 0;
MRZ_COMPUTE_WEIGHTED_SUM(lines[1], 0, 9, 0);
MRZ_COMPUTE_WEIGHTED_SUM(lines[1], 13, 19, 10);
MRZ_COMPUTE_WEIGHTED_SUM(lines[1], 21, 42, 17);
composite = (sum % 10) == __MappedValues[lines[1][43]];

Validating MRVA and MRVB formats

bool documentNumber, dateOfBirth, dateOfExpiry;

MRZ_CHECK_VALIDITY(lines[1], 0, 8, 9, documentNumber);
MRZ_CHECK_VALIDITY(lines[1], 13, 18, 19, dateOfBirth);
MRZ_CHECK_VALIDITY(lines[1], 21, 26, 27, dateOfExpiry);