Data normalization¶
We support several languages (Arabic, Hebrew, English, Japanese…) and countries, the same field (e.g. Date Of Birth) could be written differently. It would be a nightmare if we let you figure out how to handle each field. The engine will normalize some fields to make your life easier.
Date formats¶
Dates like 25/01/89, 25/JAN/89, 1989/01/25, 25 1月/JAN 1989… will be normalized as 25/01/1989, the format is {DD/MM/YYYY}. ‘D’ stands for Day, ‘M’ stands for Month and ‘Y’ stands for Year.
Age¶
A field named “Age” (number of years) will be generated in the normalized section based on the DateOfBirth. Use this field to check if the user has legal age (e.g. compare it to 18 or 21).
Years Since Issue¶
A field named “YearsSinceIssue” will be generated in the normalized section based on the DateOfIssue. You could use this field to know since how many years a driver has a license.
Days To Expire¶
A field named “DaysToExpire” will be generated in the normalized section based on the DateOfExpiry. It tells you how many days are left before the document expires. Zero (#0) means the document has expired.
Days From Expire¶
A field named “DaysFromExpire” will be generated in the normalized section based on the DateOfExpiry. It tells you how many days passed since the document expired. Zero (#0) means the document has not expired.
Height¶
Heights like 1M90, 1,90m, 6’23’, 6ft23in… will be normalized as 190 cm, the format is {ddd cm}. ‘d’ is a digit and cm stands for CENTIMETERS.
Weight¶
Weights like 165LB, 165LBS, 75KG… will be normalized as 75 kg, the format is {ddd kg}. ‘d’ is a digit and kg stands for KILOGRAMS.
Gender (Sex)¶
Genders like M, Male,पुरुष,მმ,МУЖ, ذكر, 男… will be normalized as ‘M’, the format is {C}. ‘C’ is a single character with 3 possible values: ‘M’, ‘F’ or ‘X’. ‘M’ stands for Male, ‘F’ stands for ‘Female’ and ‘X’ stands for Unknown.
Country code¶
The words Γαλλία, צרפת, فرانسه, Francia, フランス, ฝรั่งเศส … means “France” in different languages. In KYC you may need the Country Of Issue, the Country Of Birth… which may be written in different languages. The name of the countries will be coded as 3 letter according to the ISO_3166-1_alpha-3.
Date and place¶
Sometimes the document may contain a date and place mixed in a single line. For example “DateAndPlaceOfBirth”, “DateAndPlaceOfIssue”… could be “New York 23 OCT 1989” or “OCT, 23rd 1989 at New York”. We’ll process this value to generate 2 normalized fields, “DateOfBirth” equal to {23/10/1983} and “PlaceOfBirth” equal to {New York}.
Address decomposition¶
In the next versions we’ll decompose the addresses to extract the street, city, country…
Transliteration¶
All fields will be transliterated to produce an English version.
More info about transliteration at https://en.wikipedia.org/wiki/Transliteration.
Surname, GivenNames, FatherName, FullName¶
All names are transliterated and uppercased. The FullName is the concatenation of the Surname with the GivenNames.