Uh huh, how much of this stuff is being cribbed verbatim from Wikipedia and other pre-written sources?
It literally has to be in the training data set for it to know anything about the topic at all. But then again, this is also true about human beings. We can't hold forth on topics that the neural networks inside our heads have never been exposed to before, either.
Neural networks based on transformers are quite complicated and quite compute-intensive to train, but less so to operate:
Giant models like GPT-3 still take insane resources to operate, even after they're trained. Every time an OpenAI app does something with GPT-3, it makes an API call to OpenAI's data center. Literally everything you make these AIs do runs on big iron.
Even a smaller model, like GPT-J, which has a mere 6 billion parameters next to GPT-3's 175 billion, requires a very, very beefy workstation with a 3090 (or, ideally, GPGPU blades) to run:
This is done now and the infrastructure is stabilized but that was tricky. So I thought I would share here my key takeaways, in case it can help some of you:
- On CPU, the model needs around 40GB of memory to load, and then around 20GB during runtime.
- On CPU, a standard text generation (around 50 words) takes approximately 12 CPUs for 11 seconds
- On a GPU, the model needs around 40GB of memory to load, and then around 3GB during runtime + 24GB of GPU memory. For a standard text generation (around 50 words), the latency is around 1.5 secs
The 2 main challenges are the high amount of RAM needed for startup, and then high amount of GPU memory needed during runtime which is quite impractical as most affordable NVIDIA GPUs dedicated to inference, like Tesla T4, only have 16GB of memory...
It's very interesting to note that, during my tests, the latency was pretty much the same as GPT-Neo 2.7B on the same hardware, but accuracy seems of course much better.
If some of you also ran these kinds of benchmarks on GPT-J I'd love to see if we're aligned or not!
My rig could run this. Barely.
For the 175 billion parameter GPT-3 model, you need racks and racks and racks:
Every time you use OpenAI's Playground and go "LOL, gimme a 4chan greentext! TROLOLOLOLOL!", it's making an API call to
that. This is why all serious AI apps are either in beta with a waitlist, or, if available to the public, neutered and traffic-throttled.
Of course, if we assume Moore's Law can hold out for a couple more decades (limits of silicon and quantum tunneling are basically already here, though, so we're going to have to rely on some truly insane physics and new materials), then eventually, this shit will run on your phone. Maybe.