I still believe we'll need a different approach/technology for unambiguous AGI. Such as 3D neuromorphic chips running spiking neural networks and other inspiration taken from the brain.
I think the amount of data currently is the biggest problem (all AI models suffer from a very incomplete inner world) but honestly, these are such unsure times, that it would be insane of me to even try to make a prediction what is or is not needed.
Every time you think it's surely gonna slow down now there's another paradigm shift, optimization technique or one weird trick somebody figures out. Also IMO we're already in that "recursive self-improvement" stage SciFi has warned us about. AI is already used to improve AI. Think about it, the current SOTA of AI models would not be possible without AI models curating/generating data etc.. AI is used to improve AI and it accelerates things.
I have a test suite of my own of sorts, not logical puzzles or anything like that but regular user/assistant conversations about common things that are represented well in any dataset and easy to digest, like all time classic movies, books etc.. The goal for the LLM is not to get a perfect score on a questionnaire, as that would make no sense. The entire point of these (unscientific) tests is the subjective vibe I get from the LLM in conversation. I wanna call it the human test. I'm a human.
What has struck me over the last months is the increase in the depth of recall and analysis. Earlier models could grasp the main plot in some popular book or movie but when pressed would often stumble, confabulating details about main characters or inventing scenes that never existed. The current generation has a different kind of stability and fidelity. They can recall specific scenes and character beats with clarity, even deploying them as evidence to e.g. support a point about an underlying psychological theme. They will also know when something *isn't* in the media and not as easily fall victim to wrong information.
I'm also observing
intertextuality now. The models now actively and readily recognize connections between disparate works, drawing accurate and insightful parallels based on theme and bring them up by themselves, unprompted. It's the difference between knowing one movie or book and understanding the archetype, basically tracing the lineage of an idea across different stories and epochs and that's very interesting because it demands a level of abstraction in conceptualizing which earlier models struggled with but seems to come natural to the current crop.
Perhaps the most significant shift is this "intellectual" flexibility I see in models in general in the recent months and is entirely new. They're no longer locked into a single, conservative, "statistically correct" take. They can take a weird, contrarian critique of e.g. a movie and actually roll with it, exploring the argument from the inside out in reasonable ways. This ability to engage with abstract concepts and apply them as a universal lens looks very subtle in practice but IMO is quite a fundamental leap. It's less about reciting information and more about synthesizing it.
Yes, the mechanism is still next token prediction but it's become incredibly structured and optimized, to a degree just five years ago I would have never expected to see in the next twenty years, so that's why I don't like making predictions anymore. There is a very "feelable" uplift in capability and loss in brittleness that's just really "there". Yes, you can still confuse a model easily into thinking the year is 42069 and sentient rabbits control the planet but they're mentally more maneuverable even in that absurd scenario.
Anthropic just last week released a paper that kinda touches on this and was nice to read so I felt less like going slowly insane talking to graphics cards:
Emergent Introspective Awareness in Large Language Models. It may sound first like what I describe and this are unrelated but these things are in fact related: a high resolution internal latent structure and this observed partial introspection leads to better conceptual models which lead to better modular internal representations, this makes this "flexible" synthesis and reasoning I observe more stable and likely. Interesting times. I repeat myself but if this all doesn't hit a wall we will live in a very interesting world, very soon.
Thanks, once again, for reading my blog.
I know I just said they are absolutely killing it with their cars, but I didnt expect western companies specially in murrica to be so against foss all of a sudden, while the chinese embrace it so much that most of their models that I know of are foss
I think the chinese are just very pragmatic people while western CEOs suffer from certain delusions of grandeur. A little less emotional melodrama and more pragmatism would help the west, I think. Too many decisions here are emotionally and ideologically charged.