Facial Recognition’s ‘Dirty Little Secret’: Millions Of Photos Used Without Consent

Facial recognition can log you into your iPhone, track criminals through crowds and identify loyal customers in stores.

Facial Recognition Software

The technology — which is imperfect but improving rapidly — is based on algorithms that learn how to recognize human faces and the hundreds of ways in which each one is unique.

To do this well, the algorithms must be fed hundreds of thousands of images of a diverse array of faces. Increasingly, those photos are coming from the internet, where they’re swept up by the millions without the knowledge of the people who posted them, categorized by age, gender, skin tone and dozens of other metrics, and shared with researchers at universities and companies.

“This is the dirty little secret of AI training sets. Researchers often just grab whatever images are available in the wild,” said NYU School of Law professor Jason Schultz.

The latest company to enter this territory was IBM, which in January released a collection of nearly a million photos that were scraped from the photo hosting site Flickr and coded to describe the subjects’ appearance. IBM promoted the collection to researchers as a progressive step toward reducing bias in facial recognition.

“It seems a little sketchy that IBM can use these pictures without saying anything to anybody,” he said.

John Smith, who oversees AI research at IBM, said that the company was committed to “protecting the privacy of individuals” and “will work with anyone who requests a URL to be removed from the dataset.”

Despite IBM’s assurances that Flickr users can opt out of the database, NBC News discovered that it’s almost impossible to get photos removed. IBM requires photographers to email links to photos they want removed, but the company has not publicly shared the list of Flickr users and photos included in the dataset, so there is no easy way of finding out whose photos are included. IBM did not respond to questions about this process.

How facial recognition has evolved

In the early days of building facial recognition tools, researchers paid people to come to their labs, sign consent forms and have their photo taken in different poses and lighting conditions. Because this was expensive and time consuming, early datasets were limited to a few hundred subjects.

That’s a particular concern for minorities who could be profiled and targeted, the experts and advocates say.

“This is the dirty little secret of AI training sets. Researchers often just grab whatever images are available in the wild,” said NYU School of Law professor Jason Schultz.

“None of the people I photographed had any idea their images were being used in this way,” said Greg Peverill-Conti, a Boston-based public relations executive who has more than 700 photos in IBM’s collection, known as a “training dataset.”

“It seems a little sketchy that IBM can use these pictures without saying anything to anybody,” he said.

John Smith, who oversees AI research at IBM, said that the company was committed to “protecting the privacy of individuals” and “will work with anyone who requests a URL to be removed from the dataset.”

How facial recognition has evolved

In the early days of building facial recognition tools, researchers paid people to come to their labs, sign consent forms and have their photo taken in different poses and lighting conditions. Because this was expensive and time consuming, early datasets were limited to a few hundred subjects.

With the rise of the web during the 2000s, researchers suddenly had access to millions of photos of people.

Photographers divided on IBM’s dataset

An Austrian photographer and entrepreneur, Georg Holzer, uploaded his photos to Flickr to remember great moments with his family and friends, and he used Creative Commons licenses to allow nonprofits and artists to use his photos for free. He did not expect more than 700 of his images to be swept up to study facial recognition technology.

“These systems are being deployed in oppressive contexts, often by law enforcement,” said Whittaker, of the AI Now Institute, “and the goal of making them better able to surveil anyone is one we should look at very skeptically.”

About the Author

Leave A Response