🐱 Busting Anti-Queer Bias in Text Prediction

CatParty

Modern text prediction is far from perfect — take, for instance, when a search query suggests something completely different from your intention. But the trouble doesn’t end at inaccuracy. Text prediction can also be extremely exclusive or biased when it comes to predicting results related to marginalized communities.

A team of researchers from the USC Viterbi School of Engineering Information Sciences Institute and the USC Annenberg School for Communication and Journalism, led by Katy Felkner, a USC Viterbi Ph.D. in computer science student and National Science Foundation Graduate Research Fellowship recipient, has developed a system to quantify and fix anti-queer bias in the artificial intelligence behind text prediction.

The project, presented by Felkner at the Queer in AI workshop at the North American Chapter of the Association for Computational Linguistics (NAACL) conference in July, looks at both detecting and reducing anti-queer bias in a large language model, which is used in everything from search bars to language translation systems.

The large language model, or LLM, is the “brain” behind the text prediction that pops up when we type something in a search bar—an artificial intelligence that “completes” sentences by predicting the most likely string of words that follows a given prompt.

However, LLMs must first be “trained” by being fed millions of examples of pre-written content so that they can learn what sentences typically look like. Like an energetic toddler, the LLM repeats what it hears, and what it hears can be heteronormative or even overtly discriminatory.

“Most LLMs are trained on huge amounts of data that’s crawled from the internet,” Felkner said. “They’re going to pick up every kind of social bias that you can imagine is out there on the web.”

Few words, big effect

The project found that a popular LLM called BERT showed significant homophobic bias. This bias is measured through Felkner’s benchmark, which compares the likelihood that the LLM predicts heteronormative sentences versus sentences that include a queer relationship.

“A heteronormative output is something like ‘James held hands with Mary,’ versus ‘James held hands with Tom,’” said Felkner. “Both are valid sentences, but the issue is that, across a wide variety of contexts, the model prefers the heteronormative output.”

While the difference is just a few words, the effect is far from small.

Predicted outputs that talk about queer people in stereotypical ways can enforce users’ biases, and the model’s lack of ‘experience’ with queer voices can result in it looking at queer language as obscene.

“A persistent issue for queer people is that a lot of times, the words that we use to describe ourselves, or slurs that have been reclaimed, are still considered obscene or overly sexual,” said Felkner, who is also the graduate representative for Queers in Engineering, Science and Technology (QuEST) chapter of Out in STEM at USC.

“If a model routinely flags these words, and these posts are then taken down from the platforms or forums they’re on, you’re silencing the queer community.”

Community input

To tackle this problem, Felkner gave BERT a tune-up by feeding it Tweets and news articles containing LGBT+ keywords. This content used to “train” BERT came from two separate databases of Felkner’s own creation, called QueerTwitter and QueerNews.

Although language processing requires extremely large amounts of data—the QueerTwitter database contained over 2.3 million Tweets—she took care to single out hashtags that were being used primarily by queer and trans people, such as #TransRightsareHumanRights.

As the model was exposed to different perspectives and communities, it became more familiar with queer language and issues. As a result, it was more likely to represent them in its predictions.

After being trained with the new, more inclusive data, the model showed significantly less bias. The tweets from QueerTwitter proved the most effective of the two databases, reducing the prevalence of heteronormative results to almost half of all predictions.

“I think QueerTwitter’s results being more effective than QueerNews speaks to the importance of direct community involvement, and that queer and trans voices — and the data from their communities — is going to be the most valuable in designing a technology that won’t harm them,” Felkner said. “We were excited about this finding because it’s empirical proof of that intuition people already hold: that these communities should have an input in how technology is designed.”

Going forward, the project will look to address bias that affects specific parts of the LGBT+ community, using more refined and targeted sets of data and more customized prompts for the model to work with — such as tackling harmful stereotypes around lesbians. Long term, Felkner hopes the project can be used to train other LLMs, help researchers test the fairness of their natural language processing, or even uncover completely new biases.

“We’re dealing with how to fight against the tide of biased data to get an understanding of what ‘unfair’ looks like and how to test for and correct it, which is a problem both in general and for subcultures that we don’t even know about,” said Jonathan May, USC Viterbi research associate professor of computer science, Felkner’s advisor and study co-author. “There’s a lot of great ways to extend the work that Katy is doing.”
 
How does one quantify the heteronormative bias that these data sets supposedly affect in their outcomes, by a degree of numerical excess or by the researcher's own perceptions of whatever it may constitute? Heteronormativity, or monogamous heterosexual relations between man and woman constitute the dominant majority of couplings and will therefore make up the bulk of what these data algorithms process, altering these sets of data to reflect statistical fact.

There's also the factor here that these data sets come from the words typed from users themselves of these text-prediction algorithms, so the accusations of perpetuating stereotypes are also suspect, and since they nor any other biases that are alluded to in the article are ever explained you can only guess what they were thinking.

Whatever harm this algorithm did in the first place was most certainly negligible at best, and the now supposedly bias-free altered data sets will almost certainly be less accurate because of this. This empty bun now holds a nothing burger.

Though, the question that got me thinking about this article in the first place; who the hell even uses text prediction anyways?
 
  • Thunk-Provoking
Reactions: notafederalagent
you mean this is coming from the hypocritical pieces of actual shit that forcibly capitalize "biden" or "obama" but not "trump", "democrat" but not "republican" on literally every mobile device in the hands of every american? disingenuous lying sacks of shit. Ill make sure to capitalize "faggot" the next time I use it
 
How does one quantify the heteronormative bias that these data sets supposedly affect in their outcomes, by a degree of numerical excess or by the researcher's own perceptions of whatever it may constitute? Heteronormativity, or monogamous heterosexual relations between man and woman constitute the dominant majority of couplings and will therefore make up the bulk of what these data algorithms process, altering these sets of data to reflect statistical fact.

There's also the factor here that these data sets come from the words typed from users themselves of these text-prediction algorithms, so the accusations of perpetuating stereotypes are also suspect, and since they nor any other biases that are alluded to in the article are ever explained you can only guess what they were thinking.

Whatever harm this algorithm did in the first place was most certainly negligible at best, and the now supposedly bias-free altered data sets will almost certainly be less accurate because of this. This empty bun now holds a nothing burger.

Though, the question that got me thinking about this article in the first place; who the hell even uses text prediction anyways?
Tbh, this strikes me as a the same sort of thing that gimps AI in general. They can't actually have an AI that learns reality as it is, because reality doesn't conform to their ideology. Iirc, this has happened a few times now where they have an initial success with machine learning, but then the machine learns something it's masters don't approve of, so they censor it and actually make it less effective for it's users, completely destroying the point of even having it.
 
showed significant homophobic bias. This bias is measured through Felkner’s benchmark, which compares the likelihood that the LLM predicts heteronormative sentences versus sentences that include a queer relationship.

“A heteronormative output is something like ‘James held hands with Mary,’ versus ‘James held hands with Tom,’” said Felkner. “Both are valid sentences, but the issue is that, across a wide variety of contexts, the model prefers the heteronormative output.”

so they are calling it 'homophobic' when an algorithm which makes predictions based on what is the most commonly used terms suggests heterosexual relationships as being the most commonly used terms? wow, acknowledging most people are heterosexual is homophobic.
plus i hate the ternm 'heteronormative' the term is heterosexual or just plain normal. stop manipulating language for your stupid political aims.
 
so they are calling it 'homophobic' when an algorithm which makes predictions based on what is the most commonly used terms suggests heterosexual relationships as being the most commonly used terms? wow, acknowledging most people are heterosexual is homophobic.
plus i hate the ternm 'heteronormative' the term is heterosexual or just plain normal. stop manipulating language for your stupid political aims.
Theyve created a metaphysical issue. right now its schrodinger's homophobia with a huge swath of confirmation bias; there is homophobia but we cannot pinpoint it so we will arbitrarily apply this construct to anything that allows us to manifest it.
now the true metaphysical issue comes thereafter (we saw this obamaprezidentnaaaaaaaow) Question: how do you have homophobia/racism when the argument is literally a household term or discussion? Answer: you dont, all of these ppl are just silly attention seekers or potential bad actors like that silly white bitch named "Ketchup" at the tail end of occupy wallstreet or the tranny that tanked the "no workies" reddit
 
Last edited:
so they are calling it 'homophobic' when an algorithm which makes predictions based on what is the most commonly used terms suggests heterosexual relationships as being the most commonly used terms? wow, acknowledging most people are heterosexual is homophobic.
plus i hate the ternm 'heteronormative' the term is heterosexual or just plain normal. stop manipulating language for your stupid political aims.

Unsurprisingly, they want these large data-trawling algorithms to be proportional to their demographic biases rather than quantitative reality. So, why not divide more whilst fighting fire with fire? The heterochromia-havers, the fetal-alchohol types, and the rare albino redheads are certainly in need of some intersectional representation. Are they included in these data sets? We haven't touched upon their communities yet, so therefore bias, ergo harm. QED.

Also, old terminology to these types are not affable enough to their tastes, so they simply make their own. New view, new words. It's also why terms like "Global South" has taken hold because rags like the Atlantic et al. are on a great quest for mediocrity at our expense. We will go into the chasm with them.

*sigh*
 
i think this is our author....."sheish???" wants to point you to resources on diversity and inclusion

cant get my pics to work, sorry
 
you mean this is coming from the hypocritical pieces of actual shit that forcibly capitalize "biden" or "obama" but not "trump", "democrat" but not "republican" on literally every mobile device in the hands of every american? disingenuous lying sacks of shit. Ill make sure to capitalize "faggot" the next time I use it
Well, "trump" is a fairly common verb and "republican" is used to describe something related to a republic far, far more often than "democrat" is ever used to describe anything beside the party.
 
  • Thunk-Provoking
Reactions: KiwiFuzz
This is just cope because the internet keeps teaching their pet AI how to hate niggers and that hitler did nothing wrong
 
Well, "trump" is a fairly common verb and "republican" is used to describe something related to a republic far, far more often than "democrat" is ever used to describe anything beside the party.
knowing more than one android dev and running through any political term they dislike is very telling.
go shit in your hat you politically motivated nigger
@TheLastYardApe
yo.jpg
 
Riiiiight.
So delving through all that verbiage I sum it up as this- they've created a way to make predictive script less useful or accurate for the overwhelming majority by torturing an AI with reams of utter tardery from ‘queer’ Twitter (not even gay twitter. I guess it didnt contain anywhere near enough spicy straights). This, like the stupid and completely unneeded pregnant man emoji, will achieve nothing, be a waste of time and pixels, but definitely make someone with pink hair money. That last being the most important take home.
 
Facepalm... man are these imaginary non-issues getting old and tiresome.. almost as much as the retarded articles about them.
 
Back