Face recognition

The recognition module returns the embeddings for each face in the image. The embeddings are L2-normalized float vectors of size 512. You can think of embeddings as a fingerprint for a face. Let’s say you have John Doe as a customer. You would extract the embeddings for John’s face and store it in your database once at registration time. Each time someone tries to access John’s account using his id card or passport, extracts his/her face embeddings and compare it to the one in your database. The comparison is done using cosine similarity (dot product).

Another example, let’s say you have an image with 50 faces and want to know if John is one of them:
  1. Call the recognition module on the image. The embeddings will be extracted for all faces in parallel using batch size 50. The result is a 50x512 matrix.

  2. Call dot product between John’s embeddings from your database and the 50 embeddings. This is a dot product between 1x512 vector and 50x512 matrix (transposed). On python using numpy you’d write similarities = x.dot(y.T).

  3. The result of the dot product is a 1x50 vector. Each entry represent how similar John’s embeddings is to one face out of the 50. Each value is within [-1, 1]. Two identical embeddings have a cosine similarity of +1, two orthogonal embeddings have a similarity of 0, and two opposite embeddings have a similarity of −1.

  4. John is on the image if one of the 50 similarities is higher than a pre-defined threshold. That threshold should be equal to 0.35. The index/position (argmax) of the largest similarity is the face representing John.

You can change the above example to use a database of 1 million embeddings (each representing a user) instead of an image of 50 faces. This occurs when you have a user trying face authentication and you perform a database lookup. A naive way to do it would be a dot product between a 1x512 vector and a 1000000x512 (transposed) matrix. The Github repository contains sample code in C++, C#, Java and Python showing how to use the recognition module.

Recommendations

Accuracy

The model shipped with the SDK uses ResNet18 backbone to make it as fast as possible while having state-of-the-art accuracy. We have more accurate models using ResNet50 and ResNet101 if speed is not important for you.