I've wondered about this since it would certainly be killer for a big open world game if done right. It needs to run locally for latency and cost reasons (inb4 single player games require Internet connection and subscriptions for ChatGPT), and TES VI is targeting current-gen Xbox Series X (...and S with 10 GB RAM, RIP). I don't think the game will use LLMs or real-time voice synthesis. It will release during the late part of this console generation and somebody else will deliver those innovations 3-5 years later. Maybe they will fast track a TES VI sequel and stuff it full of AI, creating the most complex and buggiest mess yet.
An alternative: Cut out LLM-derived dialog and quests, but do voice synthesis. Like how Morrowind's huge amount of text dialog is being fed into Elevenlabs to voice the game. I'm unsure if any available algorithms can do that locally and in real-time with sufficient quality yet. It could all be "pre-rendered" to cut voice acting costs, but that loses the advantage of shrinking the storage space needed for voice, which could be the key to having random characters with 1000 lines of dialog instead of 5.
In short, Todd Howard's "ultimate fantasy world simulator" needs another decade in the oven.