It's been brought up before, but Tortoise is slowly getting there - in just a week it's gotten more development for the Web UI that Stable Diffusion uses. It's still pretty rough - audio can be hit or miss, though I've had better luck creating a sample that's a minute long and using that opposed to multiple files. It cannot do walls of text, you have to break it down to one-two sentences and even then it may or may not result in what you want. Can be frustrating given how long they take to generate but some things I've fed through have been damn good clones.
Stable Diffusion has seen a huge leap in quality so here's hoping Tortoise sees some development that starts pushing it towards that direction as well.
This repo/rentry aims to serve as both a foolproof guide for setting up AI voice cloning tools for legitimate, local use on Windows/Linux, as well as a stepping stone for anons that genuinely want to play around with TorToiSe. Similar to my own findings for Stable Diffusion image generation, this...
rentry.org