The Kiwi Farms Media Processing Server

No promises but I may screw around with that and if I find anything I'll post it.
I’m sure there’s a way to make both happen, but if the scope increases he’ll have to get some kind of AI specific hardware eventually.
 
Yea if you wanna handle a potentially big volume of auto illicit content detection I would also recommend against pcie3 and intel workstation GPUs generally

If its just video transcoding I'd agree with the other guy, A310 as thats the smallest one with the full transcoding ASIC suite on it (actually ill give you one if you want to figure out how to get the cursed nigger to work properly) and furthermore pcie3 will easily keep up

edit: If you want things to 'just work' then off of real personal experience I suggest just using nvenc and getting old nvidia gpus that do the work on the shader cores. I personally did not find the intel experience to be good at all, although in theory if you get it working properly the intel stuff is supposed to be really good
 
I’m sure there’s a way to make both happen, but if the scope increases he’ll have to get some kind of AI specific hardware eventually.
For the big projects he's mentioned, yeah. But if this is the media-only (meaning basically just transcoding, subtitles, tagging, etc.) then some A310s may well be the best option.

And I want to drive this home again: Trying to do big multi-gpu AI projects and then do media encoding on those same gpus is going to be such a pain in the dick. Getting them both setup and working properly (and then not breaking constantly afterwords). Then there will be the weird transient performance issues that are impossible to troubleshoot because no one else is doing that.

I know it must seem like a money saver. "Why not use the whole gpu?" but it's a fools errand IMO.

edit: If you want things to 'just work' I would off of real personal experience suggest just using nvenc and getting old nvidia gpus that do the work on the shader cores. I personally did not find the intel experience to be good at all, although in theory if you get it working properly the intel stuff is allegedly really good
Problem is the consumer nvidia cards are limited to I think 2 nvec streams. Unless by old you mean a tesla but then those won't have AV1.

I got my intel card to work pretty easy but I'm on linux and am just using it for transcoding. Although I did get stuck for awhile due to an igpu conflict so it wasn't perfect.
 
These tests are using quantizations of whisper-large-v3-turbo from here. These are all one-off tests on the audio from this
Summary of 2025's legal battle
View attachment 8405031

I posted this already in the Legal thread but then I realized they probably wouldn't allow any discussion about it like you can have here.
FYI this is why my annual update is taking so long, I have pretty much every other aspect of Russell's antics logged and voiced, it's this court shit that goes over my head most times.
Obviously huge thanks to everyone linking the Pacer files, this is pretty much all you.

I was able to get my ARC card up and running without too much fuss. I've attached the outputs of all of them (they are srt format but I had to rename them txt because the site doesn't accept .srt files). I watched through one of the Q8 ones and it seemed basically correct. For all tests memory usage was essentially constant.

Q8_0
Tesla M40: 56s, 1.5gb, 120w
Ancient Xeons: 247s, 2.1gb, probably 400w or some shit
Arc A380: 94s, 1.3gb, ??*

Q4_k
Tesla M40: 49s, 1.7gb, 120w
Ancient Aliens Xeons: 150s, 2.1gb, probably 400w or some shit
Arc A380: 81s, 875mb, ??*

Q4_0
Tesla M40: 38s, 1.7gb, 120w
Ancient Xeons: 111s, 2.1gb, probably 400w or some shit
Arc A380: 80s, 875mb, ??*

*intel gpu top doesn't report wattage for my card. It doesn't require an extra power connector so per to the PCIe spec it must be under 75w.

I don't have REBAR enabled but I don't think that will affect anything after the model is loaded. Out of interest I ran a test with the Q4_0 on a basic bitch 4c/4t skylake cpu and it took 560s. I also tried some much longer files and it didnt' seem to change the memory usage.

I'm pretty surprised that the low tier A series cards can do this. On top of that whisper.cpp is running on them via vulkan and in intel-gpu-top there's separate bars for "render/3d", "computer" and "video". I tried running a transcoding job at the same time and it seems like that's under "video" (and obviously vulkan is under "render/3d") so had no effect.

All in all it seems like dual AV1 encoding + whisper captioning seems feasible on ARC cards and if @geckogoy is willing to send you one for free to test with I would take him up on that offer.
 

Attachments

I wasn't super clear in that post so to rephrase: Null thinks 48gb is enormous for subtitling and face recognition (it is) but 24gb is also enormous for that. The whisper-large-v3-turbo model is only 1.5gb. Not fifteen, one-point-five. And that's the full-fat model not a quantized version. And face recognition runs on 30$ temu cameras.

I'm actually grabbing a copy of the turbo version to test with right now. I want to see how much time + memory it takes on a tesla and I see some chatter saying even CPU is viable so I was going to give it a run on my ancient xeons. I will report back.

Beyond that there might be a truly retarded possibility. The A310 has 4gb of memory. Normally you'd dismiss a 30w low end card for any AI work but... I mean, 1.5 is less than 4 right? There are also A380s that have 6gb of memory and are actually cheaper (~150$).
No promises but I may screw around with that and if I find anything I'll post it.
The thing is that 48gb is also enormous for other word such as facial recognition. if you upload to the server a picture of a group photo and say "which lolcows are these" surely a reasonably model could identify the threads right?

1770564526021.png
 
I can't. The only thing I have concerns with with this setup is that I don't know if those blowers on the GPU actually help or hurt. The supermicro is designed with NVIDIA cards that use passive cooling and there's some concern the fans will either obstruct flow, misdirect flow, or simply be smashed against the wall and starved. The online documentation says it is only approved for passive cooled Nvidia cards.
can you list your current equipment, then we'd have a a better idea of what upgrade paths are possible.
what is the "4u"? how much rack space do you have available?
 
The thing is that 48gb is also enormous for other word such as facial recognition. if you upload to the server a picture of a group photo and say "which lolcows are these" surely a reasonably model could identify the threads right?
Imagine a local KiwiGPT that is either trained from user posts or able to live index the site and summarize posts from specific pages.

KiwiGPT: summarize the last 80 pages of Rekieta sperging, explain why BMJ lost it again and pull up the dox for this twitch guy called "maldavious" thx
 
The thing is that 48gb is also enormous for other word such as facial recognition. if you upload to the server a picture of a group photo and say "which lolcows are these" surely a reasonably model could identify the threads right?
Face recognition models are tiny. As in some are designed to run on CPU and ArcFace (the biggest one I'm aware of) is like 130mb. Like some of them are just in pip and you pip install them instead of downloading the model separately.

If the only things you're planning to do is transcription and face recognition then B60s are massive overkill.

If you're planning on other things you need to tell us what those are. My 48gb 4090 suggestion was based on you mentioning on MATI in the past that you wanted an AI search for the whole forum. And again any big projects like that should probably not be using the same GPUs that you're encoding videos on.
 
2x KIOXIA CD6-R KCD6XLUL960G 960GB U.3 4.0 GEN4 2.5IN NVMe SSD ($496.36)
6x ASRock B60 Intel Arc Pro B60 B60 CT 24G 24GB 192-bit GDDR6 PCI Express 5.0 x8 Graphics ($4358.62)
Supermicro SYS-2029GP-TR 2U 6x GPU Server 2x Gold 6152 44-Core 2.1GHz 32GB RAM ($1,163.41)
= $6018.39

B60s will retroactively process all existing media into 1080p AV1 + 720p/480p/360p H.264.
Moving forward, video cards will process new uploads into H.264 and the CPU will process into AV1 on Quality 8.
Popular old videos may also get a software re-encode to improve the quality/bitrate ratio.

At some point I will also task the GPUs to inference subtitles, later subject detection.

You can all thank the random guy who offered me $6000 to get me to shut up about it on my poodcast.
 
At some point I will also task the GPUs to inference subtitles
FYI whisper.cpp supports intel stuff but requires a lot of bullshit like VAD to get usable subtitles. In my experience subgen just werks but only supports cuda (because of ctranslate2).

Moving forward, video cards will process new uploads into H.264 and the CPU will process into AV1 on Quality 8.
In my mind the main purpose of going intel was for fast AV1 encode to keep up with uploads.

I still think B60s are a waste of money. It has the same media engine as lower spec cards and the gpu horsepower + 24gb is massive overkill for just subtitles and face detection.

If you scaled back on the GPUs there's no other hardware upgrades (spare disks?) you could use? Does "K" want you to spend it all right now? Would he be be okay with you simply putting the extra in a money market account earmarked for hardware?
 
I have put and plan on continuing to put significant heckin valid computational time into encoding all my posted videos in the finest webm VP9/OPUS encoding. Please consider offering a way to avoid blowing away these space and BW saving efforts.
 
Only solution I know is buying ~50 AMR500's and strapping 'em to the servers.
1771480797896.jpeg
Yeah, Nice Comoesta dosent do well in networking unless its CAN-BUS.
 
As usual I'm late, but I've been very happy with WhisperX for my transcription needs. But I think it's still somewhat Nvidia specific but might do ONNX now. But it does per word timestamps and per speaker tracking.

And if we're making a wishlist for stuff to be easier to search I'd like to vote for images to have OCR done on them so things like twits and facethings that are screenshotted can be found. So, I don't have a good model for that yet, I'm still trying to find one.
 
FYI whisper.cpp supports intel stuff but requires a lot of bullshit like VAD to get usable subtitles. In my experience subgen just werks but only supports cuda (because of ctranslate2).


In my mind the main purpose of going intel was for fast AV1 encode to keep up with uploads.

I still think B60s are a waste of money. It has the same media engine as lower spec cards and the gpu horsepower + 24gb is massive overkill for just subtitles and face detection.

If you scaled back on the GPUs there's no other hardware upgrades (spare disks?) you could use? Does "K" want you to spend it all right now? Would he be be okay with you simply putting the extra in a money market account earmarked for hardware?
look at the end of the day someone is going to disagree with everything I do all the fucking time which is why most site admins don't tell you shit. does 4chan announce it's plans? no, because imagine trying to reach consensus on fucking 4chan. someone gave me $6k to buy something to get video processing - just pick something. the hard lesson learned is that what i do is stupid and it just annoys people to hear their feedback wasn't implemented. if they don't even know there's decisions going on nobody has to care.
 
look at the end of the day someone is going to disagree with everything I do all the fucking time which is why most site admins don't tell you shit. does 4chan announce it's plans? no, because imagine trying to reach consensus on fucking 4chan. someone gave me $6k to buy something to get video processing - just pick something. the hard lesson learned is that what i do is stupid and it just annoys people to hear their feedback wasn't implemented. if they don't even know there's decisions going on nobody has to care.
I mean, yeah, if you don't ask people something then no one will disagree and if you ask thousands of people something there will be disagreements. So when you do ask thousands of people on your forum about something it seems silly to expect everyone to come to a consensus.

My position is you're about to light 3g's on fire (4358.62-139.99*8=3238.7) that could be better spent elsewhere (more than 32gb of ram, more storage, saved for future hardware expenses, whatever).

But I understand you're asking for suggestions (of which there are many) and it's your money, your website, your decision. I'm not going to leave the farms in a huff or start some blood feud with you (bish) because you didn't go with my suggestion for media server hardware.

If I came off like I was offended I apologize. I'm not so don't worry about it. Seriously.
 
Back
Top Bottom