General Discussion for Virtual Youtubers / Vtubers / Chuubas - it's okay to be a simp for 2D, just don't thirstpost.

I've been exposed to more vtuber clips in the last week than I'd ever want to see, and I have a question

Why is the face rigging so bad? Their mouths barely move, the puppets just kind of lean left and right, the head doesn't seem to turn, and the eyes constantly flitter like they're having a stroke. Doesn't that take you out?
The technology is still advancing. If I had to guess, most vtuber models are older generation live 2d. Also there are a lot of riggers in the industry with wide degree of skill. I would also assume a lot of vtubers with custom models can't afford high quality rigging.

I tried to find a clip that illustrated some of the newest models that have features like being able to lean in towards the camera.
 
I've been exposed to more vtuber clips in the last week than I'd ever want to see, and I have a question

Why is the face rigging so bad? Their mouths barely move, the puppets just kind of lean left and right, the head doesn't seem to turn, and the eyes constantly flitter like they're having a stroke. Doesn't that take you out?
High quality models, and experience "acting" with it does help negate those flaws.

I personally don't really focus on the model, it's just a stand-in, because just watching gameplay or a blank screen with someone talking would be too impersonal, or boring in certain cases.
 
Last edited:
  • Like
Reactions: FlappyBat
The technology is still advancing. If I had to guess, most vtuber models are older generation live 2d. Also there are a lot of riggers in the industry with wide degree of skill. I would also assume a lot of vtubers with custom models can't afford high quality rigging.

I tried to find a clip that illustrated some of the newest models that have features like being able to lean in towards the camera.
Looks like shit
 
I've been exposed to more vtuber clips in the last week than I'd ever want to see, and I have a question

Why is the face rigging so bad? Their mouths barely move, the puppets just kind of lean left and right, the head doesn't seem to turn, and the eyes constantly flitter like they're having a stroke. Doesn't that take you out?
It varies wildly depending on tuber to tuber due to the software that's used. looking at Sakana's insane asylum, even comparing someone like Lumi with, say, Uruka shows a gigantic difference in quality.

Some people think it's bad QC, I personally like the differences, it at least makes things feel a little less corporate that way.
 
  • Like
Reactions: FlappyBat
I've been exposed to more vtuber clips in the last week than I'd ever want to see, and I have a question

Why is the face rigging so bad? Their mouths barely move, the puppets just kind of lean left and right, the head doesn't seem to turn, and the eyes constantly flitter like they're having a stroke. Doesn't that take you out?
Rigging the art is a fairly new industry and most riggers usually ask for pretty high sums for mediocre jobs due to the sheer demand of it
The technology is still advancing. If I had to guess, most vtuber models are older generation live 2d. Also there are a lot of riggers in the industry with wide degree of skill. I would also assume a lot of vtubers with custom models can't afford high quality rigging.

I tried to find a clip that illustrated some of the newest models that have features like being able to lean in towards the camera.
The one rigging displayed here is done by a pseudo-cow rigger/asset maker named KevinX Twitter, he is used because he is fast and relatively cheap compared to others, but as one can see, it looks kinda shit when they do certain things. he is probably one of the most prolific riggers in the western scene, but he has often caught flak because most of his assets are low-brow and sexualized in some form or another. Big western vtubing company Vshojo is a known frequent client of his, the freaky bug woman with big tits above is a member of said company
 
Last edited:
I've been exposed to more vtuber clips in the last week than I'd ever want to see, and I have a question

Why is the face rigging so bad? Their mouths barely move, the puppets just kind of lean left and right, the head doesn't seem to turn, and the eyes constantly flitter like they're having a stroke. Doesn't that take you out?
There's a couple of reasons.
1) Tracking software generally either sucks dick or is very GPU intensive, because most of it is made by Scandinavian troons in their parents' basements, or proprietary Jap software that only Hololive gets to use. 3D tracking software is somehow even worse and more unstable than 2D, which leads to most vtubers only using 3D models for special events or for chatting streams, which leads to:

2) Most vtubers use Live2D models, which are basically a bunch of drawing layers slapped together and rigged to look like they're 3D. Getting a non-shit artist to do all of the work for a model is expensive as fuck. Getting a good rigger is probably even more expensive. Getting artists to draw enough layers for a full 360 effect, and getting someone to rig that properly (if it's even possible, I haven't tried) would cost a retarded amount of money. It is a developing technology still.

There's a whole joke in the community where vtweeters will pay thousands and thousands of dollars for a top-of-the-line Live2D model, stream, and get 4 viewers because they're not entertaining. The model is there to get people interested, but the personality or talents of the streamer are what keeps people around, so most fans don't care about the models themselves that much, which leads to the vtubers themelves not caring that much generally. Kirsche, as an example, used a really shitty 3D model from 2020ish that shook and spazzed like she had Parkinson's and barely worked, set up by someone just learning how to make a model, and she went from 100 viewers to a consistent 1500+ before changing it. Hell, Gura, biggest vtuber in the world still has completely broken eye tracking, because state-of-the-art Jap coding just can't deal with someone probably wearing glasses.

A decent video about 3D tracking, from a person who basically went as far as possible with it (don't worry, she doesn't talk like a child):

As for immersion and being taken out of it, personally, I just don't really care, never did. I just regard them as regular streamers, but sometimes they sing, or do goofy shit that isn't just playing videogames.
 
This raccoon is smug as hell with her singing and, basically, you're just fucking stupid for not singing along.
1721854593571.png
Tenma thinks her son is cursed.
I've been exposed to more vtuber clips in the last week than I'd ever want to see, and I have a question

Why is the face rigging so bad? Their mouths barely move, the puppets just kind of lean left and right, the head doesn't seem to turn, and the eyes constantly flitter like they're having a stroke. Doesn't that take you out?
Some people are better puppetteering their models than other despite the limitations of cheaper rigging but all in all it doesn't really bother me. I think it's like watching any other streamer or video: You can have bad picture and great audio but you can't have bad audio and great picture. A paper plate face can get away with a lot as long as their mic doesn't suck ass.
 
Why is the face rigging so bad? Their mouths barely move, the puppets just kind of lean left and right, the head doesn't seem to turn
You'd think that since its a stream with a minute or two of delay anyways they would just introduce 250ms of latency or something and get the positioning on every frame more or less perfect

I guess what must be happening is its very heavily filtered because the facial recognition algorithm might be dramatically wrong at any time so the filtering is required to keep it from spazzing out too badly (ie its averaging out many frames to tame down the spaz-outs at the cost of latency/capturing fast or minute movements)
 
I've been exposed to more vtuber clips in the last week than I'd ever want to see, and I have a question

Why is the face rigging so bad? Their mouths barely move, the puppets just kind of lean left and right, the head doesn't seem to turn, and the eyes constantly flitter like they're having a stroke. Doesn't that take you out?
For people using 3d models like someone like Kirsche, facial tracking has always been a bit more spotty outside of very expensive setups. Add onto this that a lot of 3d vtubers outside of major companies are just working with VRChat models which itself has a lot of pretty mid quality modeling in the first place.

For Live2d which I would say is much more commonly accepted as the "norm" when people think about vtubers, there's a bunch of factors that can affect it. Some of it in the hands of who made did the rigging for the model, since every individual character is custom rigged and it's a fairly intensive process and skillset to do. You'll get a big variance in quality depending on how good the person doing the rigging is, as well as how much money the person comissioning the rigger is willing to spend. It's an extremely tedious thing to do to set up just a piece of artwork drawn by an artist to then move through tracking software with a billion and a half minor details all needing to be set up to move properly down to individual strands of hair needing to be set to move properly and where it should move for the full range of motion if you don't just want to have completely static parts of your model. Here's a timelapse of someone working on a model from start to finish just to give an example of how involved the process is. (I tried looking at a few of them and for some reason none of them just state what the actual full time to complete was but I suppose that's not that important)


The software has improved over the years but something that can cause a dropoff in quality is that apparantly if the person doesn't use an apple device (generally tracking is always done with phones in most cases) there's a pretty noticable quality difference in the tracking. Something about whatever it is apple uses for their facial recognition software is also applicable to vtuber tracking and is just heaps better than Android. The use of mobile devices for this ends up being a limiting factor on some stuff like 3d models as well as even the 2d models are resource intensive programs to run 3d even more so. So there's only so much that can be done before the process is so intensive that it just bricks your phone for trying to use it.

And then there's the more individual things that can cause problems, shit placement of the device tracking can make it less efective, poor lighting can cause the same issues as well. some stuff like the eye stutter you mentioned can be caused if the streamer is regularly looking off in a direction where their phone loses sight of their retina so then it just tries to guess and the result is flickering eye movement as it can't pin down what to track on. Also some stuff like if the person wears glasses It can fuck up eye tracking thanks to reflections.
 
I've been exposed to more vtuber clips in the last week than I'd ever want to see, and I have a question

Why is the face rigging so bad? Their mouths barely move, the puppets just kind of lean left and right, the head doesn't seem to turn, and the eyes constantly flitter like they're having a stroke. Doesn't that take you out?
Most of the time I'm listening to it in the background. I just am here for the anguish and suffering of insane women for my entertainment and enjoyment.
That said a lot of people have covered it. Early days tech. Most people who are doing this are using a simple webcam or an iphone because its got a better camera. Equipment really hasn't been made to do this specifically yet.
 
I've been exposed to more vtuber clips in the last week than I'd ever want to see, and I have a question

Why is the face rigging so bad? Their mouths barely move, the puppets just kind of lean left and right, the head doesn't seem to turn, and the eyes constantly flitter like they're having a stroke. Doesn't that take you out?
All this shit has to run on either a desktop or a cell phone. The general options are either 3D models, or a technology called L2D which -can- be extremely well animated, but basically requires setting up every damn bit of the model by hand and is incredibly finnicky.

Basically consumer ultra realistic 3D motion tracking isn't there yet for your average consumer. A really top quality L2D model can cost like $4-5000, which is squarely out of the reach of most streamers. The earliest vtubers, like Sakura Miko, used a 3D model, and still do today. Really good 3D setups involve using a lot of widgets you attach to your hands, legs, body, etc to track your movement and show it properly, although the results are often still a bit scuffed.

From what I know about L2D models, basically every single bit that moves has to have 'bones' that connect to every other piece next to it and tell the program on how to handle the movement of the actual graphics attached to it, and the process has to be done by hand. Here is a random ass example I googled so you can get a basic idea.

All that aside, there's plenty of successful PNGtubers that just have a static image, although I think that breed is slowly dying out as animated models become gradually more accessible.
 

Attachments

  • 1721861337885.png
    1721861337885.png
    444.5 KB · Views: 25
That's wild. Is there a clip of this incident yet (she wrote the VOD stays up)? I'm curious how she reacted.

I totally forgot that although it was mentioned a few posts ago. This was funny as hell
View attachment 6228323
I have absolutely nothing against Shiina but this is still the funniest vtuber clip I've ever seen, all these years later. The initial fuckup, the "oh shit" moment of realisation, then the half-hearted attempt at backpedalling because she knew actually backpedalling would only make the situation worse... it's perfect. Chef's kiss.

I've been exposed to more vtuber clips in the last week than I'd ever want to see, and I have a question

Why is the face rigging so bad? Their mouths barely move, the puppets just kind of lean left and right, the head doesn't seem to turn, and the eyes constantly flitter like they're having a stroke. Doesn't that take you out?
With 2D models it's a combination of cost-cutting (more fluid movement means the artist needs to draw more layers and the rigger needs to spend more time implementing them, which makes it cost a fortune), and the skill of actually operating it, since a lot of vtubers have a habit of forgetting they need to be sitting directly in front of the camera for the facial tracking to pick up their movements accurately, so if they lean back in their chair or whatever the lip-sync etc. is probably gonna go to shit.

Honestly though, I think it works out better with some scuffed 2D movement, than the 3D models with much more advanced full-body tracking. With the 2D avatars you know exactly what you're getting, it's an animated character acting like an animated character. Outside of the concerts which are specifically staged and choreographed for the occasion, 3D models with realistic movement cause an uncomfortable uncanny-valley effect, since it looks like an animated character but acts like a real person.
 
The earliest vtubers, like Sakura Miko, used a 3D model, and still do today. Really good 3D setups involve using a lot of widgets you attach to your hands, legs, body, etc to track your movement and show it properly, although the results are often still a bit scuffed.
I think people like Miko, Sora, and Kirsche use regular phone camera mocap with 3d models, rather than something like Filian's setup.
 
  • Like
Reactions: Nambona890
Back