Garbage In, Garbage Out: Machines Learn to Discriminate from Discriminatory Data

Machine learning and Artificial Intelligence methods are extremely powerful technologies that have led to predictive text, email spam filtering, and Netflix recommendations. But sometimes, even when intentions are good, machine learning methods learn to reproduce past discriminatory behavior.

Amazon Shutters Sexist A.I. Recruiting Tool

In October 2018, Reuters reported that Amazon had scrapped a secret A.I. recruiting tool that used machine learning to rate job applicants’ resumes. The catch? The A.I. showed bias against women applying for technical and developer positions. While Amazon hasn’t commented on what exactly went wrong with their tool, we can start to understand how it might have happened by understanding how machine learning works.

As Reuters reported, Amazon’s models were trained using resumes job applicants submitted. However, just having the documents isn’t enough to label the documents with a score – in order to use the resume to evaluate the applicant via machine learning, the A.I. specialists would have needed scored (or “labeled”) data. They might have used past hiring practices, an individual’s success in the company, or past ratings by recruiters.

Once the specialists had their labeled data (i.e. the resumes), there are several different machine learning methods to discover what words and phrases – and therefore educational history, past employers, hobbies, and technical skills – were common among resumes with high or low ratings. By 2015, Amazon realized that their A.I. was penalizing resumes that included the word “women’s,” such as “President of local Women’s Engineering Society Chapter,” as well as penalizing applicants from two different all-women’s colleges.

It is possible that the bias the A.I. learned was due in part to lack of representation of women in the training data. Men make up an overwhelming majority of people in technical positions at U.S. technology companies, so their model didn’t see as many successful women as it saw successful men. But this explanation doesn’t clarify everything; if the relatively rare woman applicant was rated higher, hired more often, and/or paid more than similarly qualified men (depending on how the initial resume scoring was done), terms like “women’s” would benefit applicants rather than hurting them. Considering the bias against women in technical fields, it seems more likely to this analyst that Amazon’s scoring data used to train the resume rater was biased against women than for the AI’s bias to come from too few women in the sample set alone.

Amazon initially modified their algorithm to make obviously gendered words like “women’s” and the names of women-only universities value neutral. However, they recognized that the A.I. could find other less obviously gendered terms to serve as a proxy for the applicant’s gender and use them to re-introduce gender bias in less obvious ways. As Reuters reported, the tool learned to value verbs such as “executed” and “captured,” which more regularly appeared on male applicants’ resumes. Since the training data itself included a bias against women, training a machine learning method on that data produces sexist results.

Predictive Policing Has No Impact on Racist Policing Practices

In the U.S., black people are pulled over, arrested, charged, convicted, and incarcerated at rates 5 to 10 times higher than white people. In Wisconsin, the problem is particularly stark, as black people are 11.5 times more likely to be incarcerated than white people, with fully 1 in 20 black men incarcerated. According to the NAACP, even though African Americans and whites use illicit drugs at similar rates, African Americans are imprisoned for drug charges at nearly six times the rate white people are. Within the criminal justice system, these disparities start at the level of police interaction. Some people believe that relying on “race neutral” A.I. to determine where police should spend their time could remove race from the calculation and create a less discriminatory criminal justice system.

In a 2017 study researchers at UCLA, LSU, and Purdue detailed how they worked with police departments to study if, and how, the use of A.I. changed the number of arrests and the race of the people arrested. Each day during the experimental time frame, they had human analysts and an A.I. independently advise law enforcement where crime was likely to occur that day. After all the plans were made, they randomly decided which plan to use and recorded the arrest data by race for each day of the experiment.

The study found that there were no statistically significant differences in the ratio of races arrested between the human and A.I. plan days. Even though the “crime predicting” technology didn’t explicitly include race, it still perpetuated the same patterns of discriminatory policing.

We can again understand this by understanding how A.I. develops its predictions. Because the criminal justice system has a strong racial bias, using the outcomes of that system in predictive models means that the outcomes of those models are highly likely to be biased as well. And like the Amazon recruiting tool, location-based predictive policing tools can create implicit associations and must do so in order to fit the training data accurately.

Looking forward: Clairifai and Autonomous Weapons

On Friday, the New York Times published “Is Ethical A.I. Even Possible?” in which Cade Metz discusses even more troubling uses of A.I., included using image recognition services such as Clarifai to make autonomous weapons for the U.S. Military. Given the track record for discriminatory results of machine learning solutions, creating autonomous weaponry raises significant concerns, especially when the weaknesses of the technology lead to death.

Best Practices in AI: Focusing on Fairness

Fortunately, there are efforts by many academics, non-profits, and workers to encourage responsible and fair machine learning practices. At Extract Systems, we know our software is used by governments and in health care, where high accuracy models that don’t perpetuate bias is of critical importance. We take several steps to make sure our software is fair and bias free:

  1. We work with our clients individually to ensure the data we use to train our software is truly representative of the documents that client is processing.

  2. We use statistical reporting methods to ensure the verified data is consistent across users for our clients and can expose that data via our Analytics Dashboards.

  3. We have a goal to evaluate our software accuracy for patterns to ensure that there aren’t patterns of higher or lower accuracy that might be reflecting social inequalities.

  4. We ensure robustness of our machine learning solutions by using separate training and testing data so we can spot possible problems before software deployment and continuously guard against them through on-site machine learning.

If you want to learn more about how we use A.I. to meet our clients’ data extraction needs, feel free to reach out today.


ABOUT THE AUTHOR: NATHAN NEFF-MALLON

Nathan is a Data Capture Analyst at Extract with experience in data analysis, software development, machine learning, teaching, and lasers. He earned his Bachelors at Whitman College in Walla Walla, Washington and his Masters in Chemistry from University of Wisconsin – Madison. Nathan enjoys statistics, building models, and glassblowing.