ChatGPT, and most current AI models, rely on having a giant dataset that you have to tag so the AI knows how to associate concepts together. when you make a request to this kind of API it searches its memory for how to link all the things you sent to it together based on the data related to the tags that it processes from your request. this is called Supervised Learning (SFT)
DeepSeek does not get trained on pre-tagged data. instead, you let it search through the data for things it thinks might be relevant and reward it for processing the data in the way you think is correct. the rewards that you give it determine how it will process data in the future, even data it hasn't encountered before. this is called Reinforcement Learning (RL)
the reason DeepSeek is a big deal is because it's the first model to rely primarily on RL. with SFT you can have blind spots that the AI freaks out on because you're giving it data that it has no idea how to process because the thing isn't associated with any tags in its programming. with RL, even if there's data it hasn't seen before, it processes data based on its reinforcements so it just goes ahead and processes it like it does any other data