I played with Mantella for a bit, running a 24b local model on a 3090 via koboldcpp with 8192 context to save VRAM, its surprisingly fast and the slowest part is the TTS generation, so there is still awkward silence. The issue is that once I open Skyrim (unmodded except for Mantella and the requirements) I cap out the full 24gb of VRAM. It's completely playable though and I wasn't expecting it to be so fast, but since replies are capped to two sentences at most and the context window is very small and supplimented by summaries instead of entire history, it doesn't really slow down over time. The model I used is a roleplay finetune of Mistral 24b called Cydonia, it surpisingly knows Elder Scrolls lore and locations pretty well, but it has trouble with separating the different games and when they take place, so if you talk about Morrowind stuff it doesn't talk about Morrowind in the 4th era, it talks about 3e. It does also slip into describing actions, which the mod is supposed to auto strip, but the LLMs work by looking at the chat history and copying the patterns, so if you get bad token RNG and it doesn't wrap descriptions in * then it will make the NPC say the actions.