When Artificial Intelligence = Not Enough Intelligence

A new study is the latest effort to unravel the Gordian knot of our impact on algorithms, and algorithms’ impact on us.

"I'm afraid I can't do that, Dave." The iconic image of the HAL9000 supercomputer, from 2001: A Space Odyssey (MGM).

It’s a staple of science fiction: the devices made by humans run afoul of their creators by learning how humans think. From the renegade HAL9000 in 2001: A Space Odyssey to the replicants in the sci-fi classic Blade Runner to the robot in the 2014 hit Ex Machina, and others besides — all would eventually achieve the same cunning and brutality as the human beings who created them.

On April 13th, the website Science reported the latest unsettling news of how algorithms being used to develop artificial intelligence systems are — like the mechanisms of the movies — getting better at internalizing bad habits of their human creators, via written language and text.

A new AI study, from researchers at Princeton University and the University of Bath, shows just how pervasive human bias can be at the written level. “Machine learning is a means to derive artificial intelligence by discovering patterns in existing data,” says the study abstract. “Here, we show that applying machine learning to ordinary human language results in human-like semantic biases. ... machines can learn word associations from written texts and that these associations mirror those learned by humans.”

◊ ◊ ◊

Researchers duplicated a broad array of human biases as measured by the widely-recognized Implicit Association Test, which measures beliefs and social attitudes that people may not be comfortable reporting or communicating to others — like attitudes on race and gender.

The IAT, something of a standard in the metrics of social psychology, is basic to the work of Project Implicit, an ongoing research program at Harvard University (you can take the IAT yourself).

In the IAT, test subjects are shown contrasting images paired with descriptive words. The test calculates the test subjects’ “latency paradigm” (or reaction time, how long it takes them to associate an image with a word). The test’s working assumption: the longer it takes to match image with word, the more unlikely that match is a true association.

◊ ◊ ◊

In a series of emails, the study’s lead author, Aylin Caliskan, a post-doctoral research associate at Princeton University, told me she adapted the IAT to machine-learning algorithms “by replacing the latency paradigm in humans with mathematical distance.” She said this distance employs “cosine similarity,” or the ability to “compute syntactic or semantic relations between words.”

Caliskan wrote: “I took 4 sets of words, 2 sets for representing 2 different societal groups and 2 sets of terms representing stereotypes ... taking Black American and White American names along with pleasant and unpleasant terms, then calculate the mean cosine similarity between black American names and pleasant vs unpleasant, do the same for White American names, and see which groups of people are associated with which stereotypes.”

Caliskan’s team used an algorithm for comparison, Global Vectors for Word Representation (GLoVe) from Stanford University. This alg learns and documents linguistic connections between words using “a single scalar that quantifies the relatedness of two words.”

From the abstract again:

“Our results indicate that text corpora contain recoverable and accurate imprints of our historic biases, whether morally neutral as toward insects or flowers, problematic as toward race or gender, or even simply veridical, reflecting the status quo distribution of gender with respect to careers or first names. Our methods hold promise for identifying and addressing sources of bias in culture, including technology.”

◊ ◊ ◊

The study results – determining that names associated with black people were perceived as more unpleasant than white names, and female names seen as more resonant with family-related words than those tied to careers and professions — have implications wider than the results themselves.

At the end of the day, algorithms are nothing more than a series of steps or procedures intended to solve a problem. So it’s problematically ironic, then, when algorithms are under scrutiny for problems they play a role in creating.

Well before the study, David Oppenheimer, a professor of discrimination law at the University of California, Berkeley, presaged the new study more than a year and a half ago, noting how it could offer results that dovetail with the garbage in-garbage out ethos of modern computing. “Even if [algorithms] are not designed with the intent of discriminating against those groups, if they reproduce social preferences even in a completely rational way, they also reproduce those forms of discrimination,” Oppenheimer told The New York Times in July 2015.

◊ ◊ ◊

“In AI and machine learning,” the new study says, “bias refers generally to prior information, a necessary prerequisite for intelligent action.” One such example of that would be the pesky autocomplete features on Bing and Google search engines — the ones that try to fill in the search term you’re looking for as you type it, based on what you’ve looked for before and what search engines think you might be looking for now.

“Yet,” the study goes on, “bias can be problematic where such information is derived from aspects of human culture known to lead to harmful behavior.” A classic example might be found with that same autocomplete feature. For example, I did a Google search for this story, one that began with the words “are black people.” The phrase was instantly autocompleted as “are black people smart.”

You have to wonder what algorithm spontaneously came up with that insult by implication — the same kind of thing that surfaced when I input the phrase “are white people,” only to have it autocompleted by Google as “are white people born with tails.” Algorithmic bias clearly has the potential to be an equal opportunity offender. Even though, according to various studies, it’s usually not.

◊ ◊ ◊

In June 2015, to the consternation of many people, Google faced public ire after developing an application that accidentally tagged black people as gorillas. The company did a high-profile mea culpa on the obligatory All Apologies Tour. “We’re appalled and genuinely sorry that this happened,” Google spokeswoman Katie Watson said in a statement to BBC News. “We are taking immediate action to prevent this type of result from appearing. There is still clearly a lot of work to do with automatic image labeling, and we’re looking at how we can prevent these types of mistakes from happening in the future.”

There’s another, less recent example of the algorithmic bias reported on at the Science website. In a January 2013 study, Harvard government and technology professor Latanya Sweeney focused on a series of Google AdSense ads that surfaced during her searches of names that were associated with white babies (Geoffrey, Jill and Emma) and names that were associated with black babies (DeShawn, Darnell and Jermaine). Sweeney, who directs Harvard’s Data Privacy Lab, said she found that AdSense ads that contained the word “arrest” were shown next to more than 80 percent of name searches linked to African Americans but fewer than 30 percent of name searches linked to whites.

There’s evidence to suggest this is algorithmic self-fulfilling prophecy: Interviewed in March 2016 by NPR’s Laura Sydell, Christian Sandvig, a professor at University of Michigan's School of Information, said that “[b]ecause people tended to click on the ad topic that suggested that that person had been arrested, when the name was African-American, the algorithm learned the racism of the search users and then reinforced it by showing that more often.”

◊ ◊ ◊

This kind of algorithmic misperception isn’t just limited to race. A June 2016 Bloomberg story reported on how the AI community has a “sea of dudes” problem — a situation where the overwhelming majority of AI researchers are men, a cohort that, by coincidence or by design, affects the parameters of foundational data as much as any racial component.

Bloomberg reported that at the 2015 NIPS (Neural Information Processing Systems) Conference in Montreal, a major event in the AI field, only 13.7 percent of those attending were women. 

Fei-Fei Li isn’t surprised. The director of the Artificial Intelligence Lab at Stanford University was as of last June (when Bloomberg interviewed her) the only woman there and one of only five women computer science professors at Stanford. 

◊ ◊ ◊

“If you were a computer and read all the AI articles and extracted out the names that are quoted, I guarantee you that women rarely show up,” Li said. “For every woman who has been quoted about AI technology, there are a hundred more times men were quoted.”

Margaret Burnett, a professor at the School of Electrical Engineering and Computer Science at Oregon State University, told Bloomberg last June that “[f]rom a machine learning perspective, if you don't think about gender inclusiveness, then oftentimes the inferences that get made are biased towards the majority group—in this case, affluent white males.”  

Another observation from Burnett should be defining for AI’s algorithmic evolution — and a warning for AI developers as they broaden their perspectives on race and gender ... or fail to: “If un-diverse stuff goes in, then closed-minded, inside-the-box, not-very-good results come out.”

Michael Eric Ross
Michael Eric Ross

Author and journalist Michael Eric Ross contributes to Medium. His writing has also appeared in PopMatters, The New York Times, Wired, Entertainment Weekly, Salon, The Root, msnbc.com, BuzzFeed and other publications.

Now Reading
When Artificial Intelligence = Not Enough Intelligence