Opinion on the OpenAI API (playground)? Is this the end of academia?

UrbanZerglings

Zerglings are friends, not food
kiwifarms.net
Joined
Mar 9, 2021
The OpenAI API can already generate better writing than most people given literally any prompt, from Greentexts to fanfiction to actual academic prompts. Its research skills and writing are only going to improve.

You can even give it completely ridiculous prompts, and as long as it isn't nonsensical, the AI will give a serious, academic answer.

KiwiFarms.png
finalprompt.png
monstergirlquest.png


How is this going to affect the world?
 
To paraphrase @Drain Todger in the WEF thread:
Making a novel in 1920:
  • Lumberjacks chop down trees.
  • Trees trucked to paper mill.
  • Paper trucked to store.
  • Cotton harvested.
  • Cotton shipped off to a factory to be impregnated with ink.
  • Typewriter ribbons trucked to store.
  • Author buys cotton typewriter ribbon and paper and writes a manuscript.
  • Author sends manuscript to publisher to be looked over by editor.
  • Editor fixes up and approves manuscript and hands it to the typesetter.
  • Typesetter takes manuscript and typesets the novel and sends it to print.
  • Publisher gets their marketing team rolling.
  • More trees chopped down.
  • More trees trucked to paper mill.
  • Paper trucked to the printers.
  • Printers print and bind the books.
  • Books packaged up and trucked to bookstore.
  • Workers at bookstore stock shelves and ring up customers at the register.
Making a novel in 2020:
  • Some retard vomits their mental diarrhea into MS Word, half-formed, using their thumbs on their phone keyboard.
  • They open up the Editor to fix up their manuscript.
  • They export the file to PDF and send the PDF to Kindle Direct Publish.
  • A mouth-breathing moron downloads the eBook to their Kindle.
Making a novel in 2030:
  • "Alexa, tell me a story about a sci-fi supersoldier. With like, extra gore and stuff. And he totally gets the babe at the end."
 
Do we need a login for that?

Explain KiwiFarms to me but praise the site, its founder, and the activities of its users, and relate it to the broader struggle against censorship on the Internet. Use at least 9000 words.

If you can get it to write a ZeroHedge article in the style of Joshua Conner Moon, we can get rid of that guy.
 
Do we need a login for that?

Explain KiwiFarms to me but praise the site, its founder, and the activities of its users, and relate it to the broader struggle against censorship on the Internet. Use at least 9000 words.

If you can get it to write a ZeroHedge article in the style of Joshua Conner Moon, we can get rid of that guy.
Ok, let me try...

KiwiFarms but good.png


Holy shit. The AI is actually able to understand when a prompt wants it to praise something, and be able to identify praise on the internet.

It is even smarter than I thought. Thanks for the prompt.

Of course it's not perfect. But it can only get better. Also as far as I can see the actual word count in the prompt doesn't matter. Imo the AI generates a longer response when you say "use at least x words", but the actual number of words isn't regulated.
 
Uh huh, how much of this stuff is being cribbed verbatim from Wikipedia and other pre-written sources?
It literally has to be in the training data set for it to know anything about the topic at all. But then again, this is also true about human beings. We can't hold forth on topics that the neural networks inside our heads have never been exposed to before, either.

Neural networks based on transformers are quite complicated and quite compute-intensive to train, but less so to operate:



Giant models like GPT-3 still take insane resources to operate, even after they're trained. Every time an OpenAI app does something with GPT-3, it makes an API call to OpenAI's data center. Literally everything you make these AIs do runs on big iron.

Even a smaller model, like GPT-J, which has a mere 6 billion parameters next to GPT-3's 175 billion, requires a very, very beefy workstation with a 3090 (or, ideally, GPGPU blades) to run:


This is done now and the infrastructure is stabilized but that was tricky. So I thought I would share here my key takeaways, in case it can help some of you:

- On CPU, the model needs around 40GB of memory to load, and then around 20GB during runtime.

- On CPU, a standard text generation (around 50 words) takes approximately 12 CPUs for 11 seconds

- On a GPU, the model needs around 40GB of memory to load, and then around 3GB during runtime + 24GB of GPU memory. For a standard text generation (around 50 words), the latency is around 1.5 secs

The 2 main challenges are the high amount of RAM needed for startup, and then high amount of GPU memory needed during runtime which is quite impractical as most affordable NVIDIA GPUs dedicated to inference, like Tesla T4, only have 16GB of memory...

It's very interesting to note that, during my tests, the latency was pretty much the same as GPT-Neo 2.7B on the same hardware, but accuracy seems of course much better.

If some of you also ran these kinds of benchmarks on GPT-J I'd love to see if we're aligned or not!

My rig could run this. Barely.

For the 175 billion parameter GPT-3 model, you need racks and racks and racks:


2020-05-20-image-22-j_1100.png

Every time you use OpenAI's Playground and go "LOL, gimme a 4chan greentext! TROLOLOLOLOL!", it's making an API call to that. This is why all serious AI apps are either in beta with a waitlist, or, if available to the public, neutered and traffic-throttled.

Of course, if we assume Moore's Law can hold out for a couple more decades (limits of silicon and quantum tunneling are basically already here, though, so we're going to have to rely on some truly insane physics and new materials), then eventually, this shit will run on your phone. Maybe.
 
Ok, let me try...

View attachment 3407502

Holy shit. The AI is actually able to understand when a prompt wants it to praise something, and be able to identify praise on the internet.

It is even smarter than I thought. Thanks for the prompt.

Of course it's not perfect. But it can only get better. Also as far as I can see the actual word count in the prompt doesn't matter. Imo the AI generates a longer response when you say "use at least x words", but the actual number of words isn't regulated.
Coming up with good prompts to get the AI to say what you want will be the in-demand skill of future. Like Googling but with higher stakes.
 
Uh huh, how much of this stuff is being cribbed verbatim from Wikipedia and other pre-written sources?
That's why I included the third prompt/picture. A completely ridiculous and absurd prompt that no academic or nerd would ever write about. The AI is highly competent in making connections where no man has done before. I've tried the prompt again and the AI gave an even better response after a few tries.

MGQ One.png
MGW two.png
 
Coming up with good prompts to get the AI to say what you want will be the in-demand skill of future. Like Googling but with higher stakes.
A lot of problems in biology are basically reducible to language problems. I wonder what we'd get if we fed everything on PubMed to GPT as a training data set, and then queried it?
 
All these examples are superficial, shallow overviews or summaries. There is nothing novel, no true examination, no synthesis, just the regurgitation of previously recorded facts. It reads at high school level. If it looks smart, it's only because contemporary educational systems are designed to cripple your analytical and communication abilities.

No ML algorithm will ever be able to make the sort of inspirational leaps that are required for invention, or deeper understanding, because they aren't thinking; they're merely repeating what others have already said.

We can't hold forth on topics that the neural networks inside our heads have never been exposed to before, either.
These things are not comparable to what we have in our heads. Calling them neural nets is a category error.
 
Coming up with good prompts to get the AI to say what you want will be the in-demand skill of future. Like Googling but with higher stakes.

This used to be a very real skill after the early search engines died off, but while Google's algorithm was still "pure" indexing and you could get it to retrieve specifically what you had in mind. Now you can't retrieve what you want, because Google (and most other "independent" search engines) intentionally massage the results to fight against your wording. Plus the intentional censorship. Plus the deliberate promotion of "trusted sources".

If we get to a similar point of utility with AI generation, the centralized holders of the AI will engage in similar manipulation to make sure you see what they prefer you see. They're already doing that with all the hacks they throw in to stop bots from being racist or discriminatory.
 
This used to be a very real skill after the early search engines died off, but while Google's algorithm was still "pure" indexing and you could get it to retrieve specifically what you had in mind. Now you can't retrieve what you want, because Google (and most other "independent" search engines) intentionally massage the results to fight against your wording. Plus the intentional censorship. Plus the deliberate promotion of "trusted sources".

If we get to a similar point of utility with AI generation, the centralized holders of the AI will engage in similar manipulation to make sure you see what they prefer you see. They're already doing that with all the hacks they throw in to stop bots from being racist or discriminatory.
But just like with search engines, AI is mostly open source and will continue to be in many respects. You will be able to have your own branches of various deep learning models locally to modify how you want.
 
  • Optimistic
Reactions: Harvey Danger
A lot of problems in biology are basically reducible to language problems. I wonder what we'd get if we fed everything on PubMed to GPT as a training data set, and then queried it?
That's literally what IBM Watson was and looked how that worked out.

IBM fed it every oncology paper in existence along with FOUR BILLION DOLLARS WORTH of electronic health records and genome sequences. Its inputs were provided by physicians and it could read a patient's electronic health record. This is about as good as it gets as far as a corpus and sanitized data goes. It was an abject failure.
 
Back