There's currently a lot of noise around bitnet (1.5 bit quantization method, basically
ternary models) and banishing matrix multiplications from the process, which would make efficient, bespoke hardware for LLMs more viable and would be a complete paradigm shift in hardware requirements for even the largest models. It actually would even get pretty viable to run big LLMs on normal CPUs. It's too early to tell if these actually will lead anywhere though. A bitnet 70b parameter model would fit into ~20 gb of ram. Downside with bitnet is that the model needs to be made for it from the get go, it's not something you can do to existing models. The original bitnet paper people haven't released their models but Nous managed to reproduce them back in March.