MIT apologizes, permanently pulls offline huge dataset that taught AI systems to use racist, misogynistic slurs - Tay 3.0

  • 🐕 I am attempting to get the site runnning as fast as possible. If you are experiencing slow page load times, please report it.

Arm Pit Cream

5%er, Jupiterian Philosopher, Anglophobe, CSIS
True & Honest Fan
kiwifarms.net
Joined
Mar 21, 2019

MIT has taken offline its highly cited dataset that trained AI systems to potentially describe people.
The training set, built by the university, has been used to teach machine-learning models to automatically identify and list the people and objects depicted in still images. For example, if you show one of these systems a photo of a park, it might tell you about the children, adults, pets, picnic spreads, grass, and trees present in the snap. Thanks to MIT's cavalier approach when assembling its training set, though, these systems may also label women as whores or bitches, and Black and Asian people with derogatory language. The database also contained close-up pictures of female genitalia labeled with the C-word.

graph.jpg


screenshot_mit_dataset_small.jpg


The key problem is that the dataset includes, for example, pictures of Black people and monkeys labeled with the N-word; women in bikinis, or holding their children, labeled whores; parts of the anatomy labeled with crude terms; and so on – needlessly linking everyday imagery to slurs and offensive language, and baking prejudice and bias into future AI models.
---------------
Tay lives on
screen-shot-2016-03-24-at-10-04-06-am.png
 
Emotionless AIs that take a cold, hard look at the raw data of our world start calling black people niggers.

Really makes you think!

They just gotta get rid of that dataset and replace it with the "correct" dataset that doesn't offend anyone.
 
is this related to that thing that makes google translate interpret "ooga booga" as real words?
 
I might just be retarded but for some reason I cannot for the life of me figure out what c****e is. Anyone got any ideas?
the only thing I can think of is "cuntie" (like a portmanteau of "cunt" and "sweetie") but that's probably not it.
 
Speaking of this seriously, the ratio of "offensive" images is less than the amount of people with legitimate gender dysphoria in the populace. So there is literally zero percent this will affect the learned system.
I wouldn't be surprised if the dataset itself was targeted either by some tranny who doesn't like he got a bad score or a general purpose of making "true" descriptions that are 100% politically motivated.
 
Back