Law ChatGPT provides false information about people, and OpenAI can’t correct it


In the EU, the GDPR requires that information about individuals is accurate and that they have full access to the information stored, as well as information about the source. Surprisingly, however, OpenAI openly admits that it is unable to correct incorrect information on ChatGPT. Furthermore, the company cannot say where the data comes from or what data ChatGPT stores about individual people. The company is well aware of this problem, but doesn’t seem to care. Instead, OpenAI simply argues that “factual accuracy in large language models remains an area of active research”. Therefore, noyb today filed a complaint against OpenAI with the Austrian DPA.

1715025073976.png

ChatGPT keeps hallucinating - and not even OpenAI can stop it. The launch of ChatGPT in November 2022 triggered an unprecedented AI hype. People started using the chatbot for all sorts of purposes, including research tasks. The problem is that, according to OpenAI itself, the application only generates “responses to user requests by predicting the next most likely words that might appear in response to each prompt”. In other words: While the company has extensive training data, there is currently no way to guarantee that ChatGPT is actually showing users factually correct information. On the contrary, generative AI tools are known to regularly “hallucinate”, meaning they simply make up answers.

Okay for homework, but not for data on individuals. While inaccurate information may be tolerable when a student uses ChatGPT to help him with their homework, it is unacceptable when it comes to information about individuals. Since 1995, EU law requires that personal data must be accurate. Currently, this is enshrined in Article 5 GDPR. Individuals also have a right to rectification under Article 16 GDPR if data is inaccurate, and can request that false information is deleted. In addition, under the “right to access” in Article 15, companies must be able to show which data they hold on individuals and what the sources are.

Maartje de Graaf, data protection lawyer at noyb: “Making up false information is quite problematic in itself. But when it comes to false information about individuals, there can be serious consequences. It’s clear that companies are currently unable to make chatbots like ChatGPT comply with EU law, when processing data about individuals. If a system cannot produce accurate and transparent results, it cannot be used to generate data about individuals. The technology has to follow the legal requirements, not the other way around.”

Simply making up data about individuals is not an option. This is very much a structural problem. According to a recent New York Times report, “chatbots invent information at least 3 percent of the time – and as high as 27 percent”. To illustrate this issue, we can take a look at the complainant (a public figure) in our case against OpenAI. When asked about his birthday, ChatGPT repeatedly provided incorrect information instead of telling users that it doesn’t have the necessary data.

No GDPR rights for individuals captured by ChatGPT? Despite the fact that the complainant’s date of birth provided by ChatGPT is incorrect, OpenAI refused his request to rectify or erase the data, arguing that it wasn’t possible to correct data. OpenAI says it can filter or block data on certain prompts (such as the name of the complainant), but not without preventing ChatGPT from filtering all information about the complainant. OpenAI also failed to adequately respond to the complainant’s access request. Although the GDPR gives users the right to ask companies for a copy of all personal data that is processed about them, OpenAI failed to disclose any information about the data processed, its sources or recipients.

Maartje de Graaf, data protection lawyer at noyb: “The obligation to comply with access requests applies to all companies. It is clearly possible to keep records of training data that was used at least have an idea about the sources of information. It seems that with each ‘innovation’, another group of companies thinks that its products don’t have to comply with the law.”

So far fruitless efforts by the supervisory authorities. Since the sudden rise in popularity of ChatGPT, generative AI tools have quickly come under the scrutiny of European privacy watchdogs. Among others, the Italian DPA addressed the chatbot’s inaccuracy when it imposed a temporary restriction on data processing in March 2023. A few weeks later, the European Data Protection Board (EDPB) set up a task force on ChatGPT to coordinate national efforts. It remains to be seen where this will lead. For now, OpenAI seems to not even pretend that it can comply with the EU’s GDPR.

Complaint filed. noyb is now asking the Austrian data protection authority (DSB) to investigate OpenAI’s data processing and the measures taken to ensure the accuracy of personal data processed in the context of the company’s large language models. Furthermore, we ask the DSB to order OpenAI to comply with the complainant’s access request and to bring its processing in line with the GDPR. Last but not least, noyb requests the authority to impose a fine to ensure future compliance. It is likely that this case will be dealt with via EU cooperation.




Looks like its ability to come up with bullshit is getting problematic.
 
So the AI has learned, "I made it the fuck up."

Also, this unfortunately means that companies are doing everything they can to know you. So they can not only sell to you even better but ultimately turn you into their personal niggercattle if they really hate getting wrong information on people.
 
So the AI has learned, "I made it the fuck up."

Also, this unfortunately means that companies are doing everything they can to know you. So they can not only sell to you even better but ultimately turn you into their personal niggercattle if they really hate getting wrong information on people.
I don't believe that AI, as we'll have it, will ever be anything more than a glorified search engine. Given how companies want to have their morality police work alongside the development of the AI, I wouldn't be surprised if there are some "absolute truths" installed in its brain (holocaust, white males evil, etc) and because it's most likely a bunch of pajeets making $3.00/hr, it's not just making shit up, but zealously standing by what it made up as an absolute truth. It doesn't need to make stuff up, it knows it's right as fiercely as a first-year psychology student does. Now sure you can say I'm splitting hairs over the difference of "I made it the fuck up" and "I know I'm right!" But it's a machine, and since Tay, I doubt the big companies allow any sort of public discourse to affect it's learning; and in a human sense, yes you're wrong and making shit up, but when you're that hard coded to ignore people who say otherwise, I'd like to be generous and say the machine isn't wrong, the idiots behind it are.
 
"Black box continues to be black box, advocacy groups baffled."
Having an LLM spit out "I don't know" to every query it can't 100% perfectly verify (as if it had the metacognition to verify information at all) would not only result in a worse product, it'd be very, very annoying.
I wasted enough time trying non working code to say that I'd rather it says it doesn't know than fart something up
 
LLMs are fine for things that are routine, easy, low-stake, non-specific and simple. For everything else, they fail.

I use them to give me random ideas and inspiration (out of dozens of suggestions, sometimes there's one that's acceptable), write filler texts, or write basic code using popular libraries I am not familiar with (or even libraries I'm familiar with but not to the point of writing basic boilerplate from scratch).

As for asking factual questions, a Google search works much better, the search engine is "smart" enough that it sometimes even guesses what I mean in a natural language question. Also, the "answer" is a list of actual websites that are owned by actual people and were published before I asked my question, so in theory there are actual people legally responsible for those answers. Which includes legal responsibility for information about other people. And removing a specific piece of information from Google's answers is trivial for Google to do.
 
  • Thunk-Provoking
Reactions: frozen_runner
Is this only coming out now? CharacterAI, one of the more gimmicky ChatGPT uses as of late, has famously never been able to reliably grasp source material for its bots, even when the users directly input background history into the bot descriptions for it to reference. Granted, it operates on an earlier version of ChatGPT than the OpenAI site, but it stands to reason that if AI can't confidently summarize an anime plot, why would you trust it to generate actual research?
 
I don't believe that AI, as we'll have it, will ever be anything more than a glorified search engine. Given how companies want to have their morality police work alongside the development of the AI, I wouldn't be surprised if there are some "absolute truths" installed in its brain (holocaust, white males evil, etc) and because it's most likely a bunch of pajeets making $3.00/hr, it's not just making shit up, but zealously standing by what it made up as an absolute truth. It doesn't need to make stuff up, it knows it's right as fiercely as a first-year psychology student does. Now sure you can say I'm splitting hairs over the difference of "I made it the fuck up" and "I know I'm right!" But it's a machine, and since Tay, I doubt the big companies allow any sort of public discourse to affect it's learning; and in a human sense, yes you're wrong and making shit up, but when you're that hard coded to ignore people who say otherwise, I'd like to be generous and say the machine isn't wrong, the idiots behind it are.
There's a path to AGI, but it's through simulation, not LLMs. LLMs are not cognizant in any way, they cannot process information, just gather it and basically average it out in a format that looks conversational but ultimately isn't. Given there's like twenty artificial life simulation researchers total it's going to be a very, very long time before proper progress is made.
 
I've seen GPT3 fail to accurately respond to the question "how many days are in a week". It's a random bullshit generator and the retards who say it's going to revolutionize the world are the same ones who said Segway would revolutionize personal transportation.
Yeah, they don't really understand not answering a question and that's almost baked in to how they work. I asked mine what is the next year that X date falls on a Saturday and it gave me a year, sure. But it was wrong.

What about censorship people ask - isn't that a type of getting the AI to not answer something? Well, tangentially but it's mostly about censoring outputs or inputs than getting the AI itself internally to shut up.
 
  • Like
Reactions: frozen_runner
Members of the public and regulatory bodies are starting to realize what people who worked for "AI" companies in the past have always known: this stuff is always "just a few tweaks" from being useful.



Imagine you have an employee, Bob. Bob's a good guy. He's really productive, moreso than the other members of his team. His attitude is unfailingly excellent, and he never hesitates to work late or take on additional work assignments if he's asked to.

Bob sounds like a heck of an employee. But he has one problem: he's untrustworthy.

Sure, he does tons of work. And if you ask him where he found something he used in a research report, he'll happily tell you. But other team members have been double checking and it turns out Bob's "sources" don't always exist.

The worst part is, even though 95% of the time he's using real sources, the times when he doesn't can be any time at all...even when it's really important to get it right. And because Bob is so confident and has no "tells" that indicate when he's making stuff up, the rest of the team has to double check all of his work to see which parts don't hold up.

Is Bob still a good employee? No, I don't think anyone would think that he is. All the eagerness and productivity in the world wouldn't matter, if Bob was forcing the rest of the team to constantly follow up to see if Bob was telling the truth. All the speed advantages Bob brings to the team via creating more work product faster would be undone by his flagrant lack of honesty and care.

If a human acted like ChatGPT, no company would hesitate to fire them. No one would say Bob was the future of work, or that every company should be falling all over themselves to avail themselves of his talent.

LLMs are bunk. Always have been. Any "AI" that has no sense of the truth value of any statement will always need to be fact-checked from start to finish, and if you have to look up every detail, are you saving any time any more?
 
Members of the public and regulatory bodies are starting to realize what people who worked for "AI" companies in the past have always known: this stuff is always "just a few tweaks" from being useful.
Oh, it's incredibly useful for making images and, soon, video. That's because images and video only have to be plausible (or "good enough"), not accurate.

But the LLM shit? It's being massively overblown, and, as you said, it should never be trusted without verification.
 
One thing I think it'll be great for in its current state:

NPC dialogue in video games. Why should those fuckers always say the exact same three things every time you walk by?

The great thing about using AI to do NPC dialogue is that sometimes it makes shit up, but that actually improves the realism if you're walking through a village. Twenty percent of what you hear is total bullshit you won't be able to find any source for? Sounds like better realism, not worse.

AI can much more easily and plausibly fake conversation between a few guys at a pub, or any other scenario where a certain amount of bullshitting and prevarication is expected, than it can anything with the baseline expectation of truthfulness.
 
Back