The Kiwi Farms Media Processing Server

tempted to go to 1080p in AV1
May I ask why? AV1 is extremely hard to do in software as you probably know. Why not downgrade to vp9 and instantly be able to use more affordable hardware?

Same with decode - the uptake on the client side for hardware av1 decode isn't that big, but vp9 decode is supported by hardware from 2016 onwards

Ignoring all of the above, the ARC GPUs are a good route for media encoding, Intel CPUs with QSV are a second bet. Doing it with Nvidia is shooting chickens with expensive thermonuclear ballistics, not to mention that they eat a shitton of power

Btw, I'm glad you took the ZFS pill after ignoring my suggestions, not salty about that whatsoever. How's it treating you?

getting AI inference shit to work on non-Nvidia GPU's has been a pain in the fucking ass in my experience
This is now solved with vulkan compute. Llama.cpp runs like a dream on all AMD cards without the gay ROCm modules or Intel IPEX, and at least on AMD it runs *faster* than ROCm
 
Last edited:
Arc cards are great for video transcoding and raster graphics but I've read and heard through the grape vine that they're spotty with what AI workflows they work with, even if both 1st and 2nd gen high end cards come with 16gb of vram. Since they're the people's choice and the lowest cost to entry I'd say there's no harm in experimenting with them for the AI CSAM recognition workflow and seeing if it can be reasonably implemented.
As for the media encoding, they're unbeatable. QuickSync and VAAPI quality is the highest on Intel GPUs and they support h264, h265, vp9, and av1.
For supporting multiple codecs, i would urge against falling into the trap that YouTube has found themselves in with re-encoding videos into every single resolution for every single codec, thus bloating their data many many times over.
VP9 is mature enough to where any device from 2013 on can support it and AV1 is only getting better. Even without hardware decode support, software decoders such as "libdav1d" from the engineers who maintain VLC, and libvpx from Google are phenomenally optimized and will run HD videos on a potato.
If you do choose to go supporting multiple resolutions I would suggest sticking to a high quality HD encode and a lower resolution "preview" in 480p in either AV1 or VP9. Hot take at the end here but h264 is 20 years old now and it's time to move on from it.
 
I want to suggest an additional avenue of attack. Null, you can sit down for this one.

All the rest of you... does someone else also want to give Null a few thousand bucks as a donation? COME ON LET'S GET SOME DONOS. SUB TRAIN WOO WOO!! WE JUST NEED FIVE (thousand) MORE.
 
Mainland gooks used to go hot for Arc cards because they were less restrictive when it comes to exporting. IDK if that's the same case anymore.
 
all the tech reviews ive watched on intel arc support it as a media processing card. Arc cards were recommended as encoding cards by multiple different reviewers.

I cant say 100% that still holds up right now but i think arc is worth looking at.
 
Those processors are already couple of years past EOL so while it’d work for now you might have to upgrade sooner than you’d like. If you’re under budget I’d spring for something newer, and I don’t think some light fundraising would hurt if it’d meaningfully improve site usability.
can you find anything that'd fit budget?
 
I’ve been looking and it’s a mess, I’m away from home now should be easier with a desktop
I can't. The only thing I have concerns with with this setup is that I don't know if those blowers on the GPU actually help or hurt. The supermicro is designed with NVIDIA cards that use passive cooling and there's some concern the fans will either obstruct flow, misdirect flow, or simply be smashed against the wall and starved. The online documentation says it is only approved for passive cooled Nvidia cards.
 
The retail Intel workstation cards are all single direction blowers AFAIK, so they might still work if they are blasting air all rearward. Its a definite risk though.

There are enterprise and system integrator Intel Arc cards that are passive cooled, but they are not available for consumer retail, I think.

Though via Lolcow LLC and a business checking account they might be purchaseable depending on the vendor.
 
Because null has a habit of not reading entire posts:

ARC A310

I have experience with all this. First off your sources are good, the ARC and Tesla recommendations are good. But reading the thread there's some key info that's missing:

- Hardware encode has fixed quality outputs.*
- Encoding is a functions of the media engine and basically nothing else.**
- Some cards (mostly consumer grade nivida) have an artificial limit on the number of encodes they can run at once.


*This means that a H264 stream from a gtx 980 will look worse than a rtx 3080 no matter what. You cannot spend more time on the 980 to make it look better. It wasn't in your recommendations but some people mentioned AMD. AMD encoding quality is shit and needs to be avoided.

** "Better cards" with the same media engine will perform exactly the same. ex. The A310 vs. A770 will have literally no difference in encoding speed or quality. GPU memory also essentially doesn't matter. A RTX 3080 10gb and RTX 3080 20gb will be similarly be functionally identical.

After deliberating on what formats to use, I am tempted to go to 1080p in AV1 + 720/480/360p in MPEG.
I think this is a good choice for formats (assuming by MPEG you mean h264, not MPEG-2). You might also consider in the future just having 1080/720/480/360p in both h264 and AV1 then serving AV1 if the client supports it to save bandwidth.

Going back to your recommendations from your trusted sources I would do either 1, 2, or both. Get an intel ARC card for AV1 and optionally an old datacenter nvidia card for h264. I would not try running media encode and AI on the same cards. It just leads to headaches.

Get one (or multiple) ARC A310 cards. They take minimal power (30w) and have the same encoding capability as every other intel card (the B series didn't change their media engine much). In my testing they can handle 4-5 real time 1080p streams. Expect to pay about 200$ per. https://www.ebay.com/itm/277659062320

Nvidia tesla/pascal cards don't have a limit on the number of h264 streams and h264 encoding was pretty mature by that point. They'll fit in your server chassis and take 250w but can be power limited to 180w. They're old enough that they're cheap and moreover because gpu memory doesn't matter you can get the versions with less memory for dirt fucking cheap. This guy wants $35 and you could probably offer him $25 https://www.ebay.com/itm/358198061676

Finally you didn't mention how many pcie slots you have but presuming you have an decent board that can do bifurcation even one x16 slot could turn into 4x4x4x4x with four A310s and would avoid some of the hassle of nvidia drivers and probably not require a PSU upgrade. Honestly I'd probably just do that 4x A310 tbh. Start with one and then add more if/when you need them.
 
I run a media server with an Intel i5 12400 and Quicksync is no joke. That thing can transcode multiple 4K HDR movies easily if I need it to.

Obviously the Farms would be a much larger scale as my server is only for family, but just to give you an idea what a little 65W TDP CPU can handle.
 
If I go with B60 though I can also do inferencing
I would not try running media encode and AI on the same cards. It just leads to headaches.

Loading and unloading models or inference if you're using more than one gpu at once will max out the pcie bus. Encoding requires pcie bandwidth as well in order to keep the media engine fed and offload the encoded video. And if you go with intel for inference you don't get CUDA. I know that's getting better but it's still a headache.

On your budget I would get A310s for video encode and then probably a modded nvidia card with extra memory [edit: for inferencing]. One of those 48gb 4090s probably.
 
Last edited:
Loading and unloading models or inference if you're using more than one gpu at once will max out the pcie bus. Encoding requires pcie bandwidth as well in order to keep the media engine fed and offload the encoded video. And if you go with intel for inference you don't get CUDA. I know that's getting better but it's still a headache.

On your budget I would get A310s for video encode and then probably a modded nvidia card with extra memory [edit: for inferencing]. One of those 48gb 4090s probably.
Why would I be loading and unloading fast enough to do that though? Usually you're loading a single set of models and running them repeatedly
 
Why would I be loading and unloading fast enough to do that though? Usually you're loading a single set of models and running them repeatedly
If you're running a model across multiple GPUs pcie bandwidth is really important and you're looking at 4x B60s. Multiple models is also a headache because in my experience if you try to load one and there's not enough memory it just crashes. There's no intelligent loading/unloading for what's in use. And again I think losing CUDA could be an issue.

If the models you're planning on running all fit comfortably on a single gpu it could work. What models are you planning to use and how big are they? iirc you wanted to try to run RAG on the entire site which seems like it would be a lot.
 
48GB is enormous for simple things like subtitles and maybe face recognition?
 
48GB is enormous for simple things like subtitles and maybe face recognition?
That's all your planning on using this for? In that case I would still get the A310s for video encode and get some cheap teslas to run your subtitles and face recognition. I thought this was for RAG on the farms, automated image recognition with ncmec, and whatever other projects.
 
That's all your planning on using this for? In that case I would still get the A310s for video encode and get some cheap teslas to run your subtitles and face recognition. I thought this was for RAG on the farms, automated image recognition with ncmec, and whatever other projects.
If null just focused on encode for now, how feasible would it be to use something like a DGX spark to do the AI stuff? I know it’s not the most cost effective but it’s about as simple as it’ll get.
 
If null just focused on encode for now, how feasible would it be to use something like a DGX spark to do the AI stuff? I know it’s not the most cost effective but it’s about as simple as it’ll get.
I wasn't super clear in that post so to rephrase: Null thinks 48gb is enormous for subtitling and face recognition (it is) but 24gb is also enormous for that. The whisper-large-v3-turbo model is only 1.5gb. Not fifteen, one-point-five. And that's the full-fat model not a quantized version. And face recognition runs on 30$ temu cameras.

I'm actually grabbing a copy of the turbo version to test with right now. I want to see how much time + memory it takes on a tesla and I see some chatter saying even CPU is viable so I was going to give it a run on my ancient xeons. I will report back.

Beyond that there might be a truly retarded possibility. The A310 has 4gb of memory. Normally you'd dismiss a 30w low end card for any AI work but... I mean, 1.5 is less than 4 right? There are also A380s that have 6gb of memory and are actually cheaper (~150$).
No promises but I may screw around with that and if I find anything I'll post it.
 
Back
Top Bottom