In September 2019, four researchers wrote to publisher Wiley to “respectfully ask” that it immediately retract a scientific paper. The study, published in 2018, had trained algorithms to distinguish faces of Uyghur people, a predominantly Muslim minority ethnic group in China, from those of Korean and Tibetan ethnicity.
China had already been internationally condemned for its heavy surveillance and mass detentions of Uyghurs in camps in the north-western province of Xinjiang, which the government says are re-education centres aimed at quelling a terrorist movement.
According to media reports, authorities in Xinjiang have used surveillance cameras equipped with software attuned to Uyghur faces.
As a result, many researchers found it disturbing that academics had tried to build such algorithms — and that a US journal had published a research paper on the topic. And the 2018 study wasn’t the only one: journals from publishers including Springer Nature, Elsevier and the Institute of Electrical and Electronics Engineers (IEEE) had also published peer-reviewed papers that describe using facial recognition to identify Uyghurs and members of other Chinese minority groups.
The complaint, which launched an ongoing investigation, was one foray in a growing push by some scientists and human-rights activists to get the scientific community to take a firmer stance against unethical facial-recognition research.
It’s important to denounce controversial uses of the technology, but that is not enough, ethicists say. Scientists should also acknowledge the morally dubious foundations of much of the academic work in the field, including studies that have collected enormous data sets of images of people’s faces without consent, many of which helped hone commercial or military surveillance algorithms.
An increasing number of scientists are urging researchers to avoid working with firms or universities linked to unethical projects, to re-evaluate how they collect and distribute facial-recognition data sets and to rethink the ethics of their own studies.
Some institutions are already taking steps in this direction. In the past year, several journals and an academic conference have announced extra ethics checks on studies.
“A lot of people are now questioning why the computer-vision community dedicates so much energy to facial-recognition work when it’s so difficult to do it ethically,” says Deborah Raji, a researcher in Ottawa who works at the non-profit Internet foundation Mozilla. “I’m seeing a growing coalition that is just against this entire enterprise.”
This year, Nature asked 480 researchers around the world who work in facial recognition, computer vision and artificial intelligence (AI) for their views on thorny ethical questions about facial-recognition research.
The results of this first-of-a-kind survey suggest that some scientists are concerned about the ethics of work in this field — but others still don’t see academic studies as problematic.
For facial-recognition algorithms to work well, they must be trained and tested on large data sets of images, ideally captured many times under different lighting conditions and at different angles. In the 1990s and 2000s, scientists generally got volunteers to pose for these photos, but most now collect facial images without asking permission.
For instance, in 2015, scientists at Stanford University in California published a set of 12,000 images from a webcam in a San Francisco café that had been live-streamed online. The following year, researchers at Duke University in Durham, North Carolina, released more than two million video frames (85 minutes) of footage of students walking on the university campus
Still frames from the Duke MTMC (Multi-Target-Multi-Camera) CCTV dataset captured on Duke University campus in 2014.
The biggest collections have been gathered online. In 2016, researchers at the University of Washington in Seattle posted a database, called MegaFace, of 3.3 million photos from the image-sharing site Flickr44. And scientists at Microsoft Research in Redmond, Washington, issued the world’s largest data set, MSCeleb5, consisting of 10 million images of nearly 100,000 individuals, including journalists, musicians and academics, scraped from the Internet.
In 2019, Berlin-based artist Adam Harvey created a website called MegaPixels that flagged these and other data sets. Harvey and another Berlin-based technologist and programmer, Jules LaPlace, showed that many had been shared openly and used to evaluate and improve commercial surveillance products. Some were cited, for instance, by companies that worked on military projects in China.
“I wanted to uncover the uncomfortable truth that many of the photos people posted online have an afterlife as training data,” Harvey says.
In total, he says he has charted 29 data sets, used in around 900 research projects. Researchers often use public Flickr images that were uploaded under copyright licences that allow liberal reuse.
After The Financial Times published an article on Harvey’s work in 2019, Microsoft and several universities took their data sets down. Most said at the time — and reiterated to Nature this month — that their projects had been completed or that researchers had requested that the data set be removed. Computer scientist Carlo Tomasi at Duke University was the sole researcher to apologise for a mistake.
In a statement two months after the data set had been taken down, he said he had got institutional review board (IRB) approval for his recordings — which his team made to analyse the motion of objects in video, not for facial recognition.
But the IRB guidance said he shouldn’t have recorded outdoors and shouldn’t have made the data available without password protection. Tomasi told Nature that he did make efforts to alert students by putting up posters to describe the project.
The removal of the data sets seems to have dampened their usage a little, Harvey says. But big online image collections such as MSCeleb are still distributed among researchers, who continue to cite them, and in some cases have re-uploaded them or data sets derived from them.
Scientists sometimes stipulate that data sets should be used only for non-commercial research, but once they have been widely shared, it is impossible to stop companies from obtaining and using them.
In October, computer scientists at Princeton University in New Jersey reported identifying 135 papers that had been published after the Duke data set had come down and which had used it or data derived from it. The authors urged researchers to set more restrictions on the use of data sets and asked journals to stop accepting papers that use data sets that had been taken down. Legally, it is unclear whether scientists in Europe can collect photos of individuals’ faces for biometric research without their consent.
The European Union’s vaunted General Data Protection Regulation (GDPR) does not provide an obvious legal basis for researchers to do this, reported Catherine Jasserand, a biometrics and privacy-law researcher at the Catholic University of Leuven in Belgium, in 2018.
But there has been no official guidance on how to interpret the GDPR on this point, and it hasn’t been tested in the courts.
In the United States, some states say it is illegal for commercial firms to use a person’s biometric data without their consent; Illinois is unique in allowing individuals to sue over this. As a result, several firms have been hit with class-action lawsuits.
The US social-media firm Facebook, for instance, agreed this year to pay $650 million to resolve an Illinois class-action lawsuit over a collection of photos that was not publicly available, which it used for facial recognition (it now allows users to opt out of facial-recognition tagging).
The controversial New York City-based technology company Clearview AI, which says it scraped three billion online photos for a facial-recognition system, has also been sued for violating this law in pending cases.
And the US tech firms IBM, Google, Microsoft, Amazon and FaceFirst were also sued in Illinois for using a data set of nearly one million online photos that IBM released in January 2019. IBM removed it at around the time of the lawsuit, which followed a report by NBC News detailing photographers’ disquiet that their pictures were in the data set.
Microsoft told Nature that it has filed to dismiss the case, and Clearview says it “searches only publicly available information, like Google or any other search engine”. Other firms did not respond to requests for comment.
In the study on Uyghur faces published by Wiley, the researchers didn’t gather photos from online, but said they took pictures of more than 300 Uyghur, Korean and Tibetan 18–22-year-old students at Dalian Minzu University in northeast China, where some of the scientists worked.
Months after the study was published, the authors added a note to say that the students had consented to this. But the researchers’ assertions don’t assuage ethical concerns, says Yves Moreau, a computational biologist at the Catholic University of Leuven.
He sent Wiley a request to retract the work last year, together with the Toronto-based advocacy group Tech Inquiry. It’s unlikely that the students were told enough about the purpose of the research to have given truly informed consent, says Moreau. But even if they did freely consent, he argues, human-rights abuses in Xinjiang mean that Wiley ought to retract the study to avoid giving the work academic credence.
Publishers say the key issue is checking whether participants in studies gave informed consent. Springer Nature, for instance, said in December 2019 that it would investigate papers of concern on vulnerable groups along these lines, and that it had updated its guidance to editors and authors about the need to gain explicit and informed consent in studies that involve clinical, biomedical or biometric data from people.
– A Nature magazine report