Microsoft Decides to Remove Its Open Facial Recognition Dataset Following an Investigation

Microsoft Decides to Remove Its Open Facial Recognition Dataset Following an Investigation

Finding an image of you being used by a photographer without your consent can be a surprising or even an unnerving experience, but what if it's Microsoft doing it on a grand scale?

It's one thing to randomly find yourself on a street photographer's website, where you may have been caught enjoying a day out in the city, but how would you feel if you came across your face being used by Microsoft on a publicly available facial recognition dataset, which claimed to contain as many as 10 million images portraying approximately 10,000 different people?

The database named MS Celeb, which, according to Microsoft, was aimed to be used for "academic purposes," contained primarily photographs of celebrities. However, faces of regular people were also gathered and stored in the dataset, including those of journalists. Following a Financial Times report, which sought to point out privacy and ethical issues of this database, Microsoft decided to quietly remove it. They responded to Financial Times by downplaying the situation and explaining that "[the database] was run by an employee that is no longer with Microsoft and has since been removed".

Although removed, the database has been available for open use and as such, still carries any security and privacy concerns that come with handling such a large amount of personal data. According to Financial Times, the personal data obtained from the database spans across a variety of sectors and parts of the world:

Microsoft’s MS Celeb data set has been used by several commercial organizations, according to citations in AI papers, including IBM, Panasonic, Alibaba, Nvidia, Hitachi, Sensetime, and Megvii. Both Sensetime and Megvii are Chinese suppliers of equipment to officials in Xinjiang, where minorities of mostly Uighurs and other Muslims are being tracked and held in internment camps.

Undoubtedly, Microsoft isn't the only company to have created a large database containing our data and images to be used for advancing machine learning and for other purposes; for example, the multinational tech company IBM has also joined the facial recognition race by launching their Diversity in Faces (DiF) project "to advance the study of fairness and accuracy in facial recognition technology." While our data and images are being collected on a daily basis from various sources, MegaPixels, a two-man team led by Adam Harvey and Jules LaPlace, is instead counterreacting by researching and reporting on "the ethics, origins, and individual privacy implications of face recognition image datasets and their role in the expansion of biometric surveillance technologies". 

With the ever-expanding development of artificial intelligence and surveillance, it comes as no surprise that handling our personal data and consent is and will be a very prevalent issue on this matter, and it should be very carefully handled by such large tech giants, such as Microsoft. What are your thoughts on this?

Lead image by Tadas Sar via Unsplash.

Log in or register to post comments