The Kiwi Farms Media Processing Server

  • Want to keep track of this thread?
    Accounts can bookmark posts, watch threads for updates, and jump back to where you stopped reading.
    Create account
@SCV your avatar is really creepy and gross but if you want to pitch a suggestion go for it

budget: $5500 (I'm keeping the two SSDs)
need: 1080p/720p/480p/360p encoding for terabytes of videos + near realtime encoding for user uploads.
really want: AI inferencing for subtitles, text transcription, and face detection
want: AV1 for 1080p specifically
hopeful: inferencing capable of more complex things such as describing an image.
 
my current thought is two NVIDIA L4 Ada. put it in kf server. should fit both dimensionally and with power.
 
My only suggestion is not to use existing hardware. There's likely to be kernel tweaks/upgrades, cursing at drivers, running the whole system out of RAM a couple times, etc. Best to not do all that on a system with other production workloads.
 
Supermicro SYS-4028GT-TRT2 - 1000$
Xeon e5-2660 v3 x2 - FREE!
32gb ddr4 - FREE!
A380 x4 - 540$
RTX 3090 x4- 3400$
Total: $4940

Spend the extra 500$ on more ram, storage, new fans, two more A380s, whatever.

Notes:
- You will need to get 3090 models that are 2 slot.
- This isn't using all the PCIe slots so there are expansion opportunities. I'm thinking either more A380s in the future if/when needed or maybe some cheap 2060s for transcription (etc.) to free up the 3090s exclusively for big ambitious projects (e.g. vLLM instance for kiwiAI).
- Theoretically if you went balls to the wall for a big project a 4x NV link bridge is an option with this.
- You can adjust the Xeons if you want, everything except the tippy-top versions for this socket are very cheap.
- You could probably do better on the price of the 3090's if you shop around a bit.
- Manuals: Server, Motherboard

need: 1080p/720p/480p/360p encoding for terabytes of videos + near realtime encoding for user uploads.
want: AV1 for 1080p specifically
4x A380s should be plenty for this. If it wasn't clear they also do h264 so you can use them to get both AV1 and h264 in all resolutions.

really want: AI inferencing for subtitles, text transcription, and face detection
hopeful: inferencing capable of more complex things such as describing an image.
The 3090s are overkill for subtitles, transcription, and face detection but assuming you want to spend your entire budget they are the best value. They are also beefy enough to handle more complicated projects.

GPU Inference:
Multi-gpu inference comes with performance penalties, gotchas, and limited software support. As a result you should first determine what inferencing your doing and if possible get a single card with enough memory. This is basically the pc build guide question of "what do you use your computer for".

CUDA is king. Yes intel and AMD can do inference but expect performance penalties, gotchas and limited software support. For serious inferencing that you want to work it needs to be nvidia still (sadly).

Old nivida cards are a great value but are slower, have some gotchas and have limited software support. Old intel cards doesn't exist, old AMD are even less supported.

Mix and match any of these three things for exponentially more difficult and potentially unworkable results.

So what we want is nvidia 20-series cards or later with enough memory to do what you want on one card.

CPU (Inference):
Using CPUs for inference is used to run really big models because you can get a lot more memory. The actual CPU doesn't really matter except for the SIMD extensions it has. Most projects won't run without AVX2 and after that the only big speed up is AMX. Beyond this basically only memory amount + bandwidth matter. For older stuff dual xeons is decent, for newer threadripper is great.

There is a special case with offloading all the experts in an MoE models to system memory while having the prompt processing and remaining tensors on a single beefy gpu. This is how people run large quantizations of huge models locally at good speeds. Not really a considering right now with ram prices as they are.

However we are doing GPU stuff so as along as it has AVX2 it basically doesn't matter.

Encoding:

Intel is good for cheap AV1 (and h264) encode because they don't cut down their media engine on lower spec cards or limit the amount of transcodes like nvidia does. Critically that means higher spec cards are not better for encoding. AMD has trash quality and is not to be considered.

I remain convinced that a couple of the cheapest available A/B series intel cards are the correct choice.

RAM:
The pricing on this is a disaster at the moment. I would get ~64gb in the cheap by leveraging a large amount of DIMM slots and the fact that system memory speed isnt' super important, with the intention to upgrade later after this nonsense ends. The link I gave is just one example. You might even go down to 8gb sticks.

Conclusion:
Server with a ton of PCIe slots, old CPU+RAM to save money, several A310 or A380s for video encode, without more specific inference needs several bang-for-buck-without-gotchas aka 3090's.


Other thoughts:
Expandability:

Not all of the PCIe slots are populated so there are options. The stated inference requirements at the moment are miniscule so I think buying a 2060 or two would suffice and that would free up the 3090s exclusively for really big mistakes projects. Further the server supports 1TB of ddr4 at 2400. Once the ram bubble pops getting that would be cheap and open up opportunities like the previous mentioned MoE models or whatever else.

PSU:
the supermicro had a proprietary psu pin
Most server motherboards have propretary psu connections in my experience to support multiple hot-swappable PSUs. As a matter of fact most server boards are not even a standard form factor and fit only in the server rack which they were designed for. I don't think this is feasible to try and avoid in a rack mount situation.

3090 vs. B60 vs. whatever else
Why am I recommending 3090s after shitting on B60s? Two reasons. One because of CUDA and software support and their comparable price I would chose a 3090 over a B60 for any situation without a second thought. Two these will not be used for video encoding. Because of that we're freed from any requirements except "best bank for buck for AI inference" which is a (couple) 3090(s). You could swap these for B60s (or any other card dedicated for inference) but if it's not an nvidia 20-series or later it will cause headaches.

Jank:
I'm not sure how jank you are willing to go with this. I think a consumer motherboard with the intel gpus for transcoding and a separate server for AI stuff would be entirely reasonable. Further do you want this AI server to be all buttoned down or are you okay with pcie risers and zip ties to a 20$ amazon wire shelf? I'm going to one integrated server based on your previous purchase and iirc you said you have a server rack.

your avatar is really creepy and gross
It's literally the unit portrait for the SCV in starcraft so I hadn't really noticed. Now that I'm looking at it I see what you mean. Why have you done this to me?

Edit: Found a much better deal
Supermicro SYS-4028GT-TRT - 1000$
Xeon e5-2650 v4 x2 - 30$
64gb ddr4 2400 - 300$
A380 x4 - 540$
RTX 3090 x4- 3400$
Total: $5270

Manuals: Server,

Also the TRT2 has a newer PCIe daughter board and supports single root complex.
 
Last edited:
@SCV what about 2x https://www.newegg.com/p/N82E16888892004 in existing Supermicro AS-2024S-TR / CSE-LA26AC12-R920LP1

1772195154175.png
1772195171107.png
 
Last edited:
@Null
If we're changing the objective from "new server for av1 video encode + subtitles/face detection + vague future large AI projects" to "a couple cards for existing server for just av1 video encode + subtitles/face detection" then the L4s make sense, but they're too pricey...

...because there's a nearly identical card for HALF AS MUCH! I bet you could even jew them a bit with a best offer since you're buying multiple.

Get you a couple "RTX 4000 SFF ADA Generation" cards.
- Same nvenc as the L4 (2 encoders, unlimited sessions, same gen, etc.)
- Same 70w / no additional pcie power TDP
- 20gb memory, 10% less bandwidth
- 1250$

There's also the non-SFF version which has a higher core clock and uses 130w but is even cheaper.

Edit: You mentioned in that specific server case. If you're worried about the server airflow with a non-passive card I don't think it will be an issue. But if you want you can unplug the fan and remove the plastic shroud to effectively make it a passive card.

Alternatively super budget option would be a pair of 5070 Tis but I think the 4000 ADAs are the way to go.
L4s will perform significantly worse than 3090's for AI tasks due to having 1/3rd the memory bandwidth. Also pretty rich for my blood at 3x the cost.

L4: 3090:
1772196273583.png1772196341717.png
1772196293150.png1772196359251.png

That said 3090s can't do AV1 encoding and it seems like L4s should have good AV1 encode capabilities. Actual performance is hard to nail down (graphs is about 720p @ 30fps with 8 gpus on ffmpeg 5 and the fastest/lowest quality preset) but it seems like each one might do a dozen or so 1080p AV1 encodes with a decent preset? Also the L4 uses significantly less power as I'm sure you're aware, but the price...

Of course I've been saying the inference you want to do for just subtitles, face detection, etc. is minimal so the lesser AI performance isn't a problem if we're dumping the vague future AI requirement. Considering this the first thing I did was check the nvidia T4 but that sadly doesn't have AV1 encode.

Out of curiosity I decided to check just how limited nivida consumer cards are on nvenc streams now-a-days and it seems nvidia has seen fit to lift the boot off of people's necks a bit. Turns out they're now allowing 8 streams apparently 12 streams. Meaning a pair of 4070 TIs (as this is the lowest spec with dual nvenc chips) seems like it would be a good fit.

Then I was looking through the encoder table again with the idea that we're considering just a few cards to drop into an existing server rather than a new build. And I think here I found something nearly ideal:

An RTX 4000 SFF Ada Generation (great name)
- AV1 encode
- dual nvenc
- unlimited nvenc sessions
- respectable AI performance. (still massive overkill for subtitles, face detection, etc.)
- 20gb memory
1772199537265.png1772199554298.png
1772199623116.png

And here's the thing: It's only $1250! Half an L4 for nearly the same AI peformance and the exact same nvenc performance!
 
Last edited:
1772201984967.png1772202007363.png

Do you have any way of checking if the cards would physically fit into the supermicro model I gave you? They're supposedly full-sized cards. I'd probably need a chassis.
 
I'm not sure what you mean by full-sized. Both the L4 and RTX 4000 ADA are low profile cards so they should both fit nicely in a 2U case. Your comparison site says the L4 is way longer but I don't think that's right, they're both basically the same size.

I think I see what's causing the confusion. You're looking at the "Nvidia RTX 4000 Ada Generation" and not the "Nvidia RTX 4000 SFF Ada Generation". The "SFF" one is low profile and will fit in a 2U server.

1772260169961.png
1772260192861.png

The only real difference I can see is that the rtx 4000 ada is dual slot so you "lose" some PCIe slots but you can just get a single slot cooler.
That one in particular has fins oriented with the airflow in the server so if you got that and simply didn't install the fan it'd be perfect for passive cooling.

EDIT:
The original ebay auction I linked for the SFF version has gone on sale and is now $1125.
I would offer them $3000 for 3 on the best offer. If they don't accept you still have three days to get them at $1125
 
Last edited:
I think I see what's causing the confusion. You're looking at the "Nvidia RTX 4000 Ada Generation" and not the "Nvidia RTX 4000 SFF Ada Generation". The "SFF" one is low profile and will fit in a 2U server.
That's not going to fit.

Code:
  Supermicro H12DSi-N6 — 6 PCIe 4.0 Slots (alternating x8/x16):

  ┌──────┬────────────┬────────┬───────────────────────────────────────────┐
  │ Slot │    Type    │ Status │                  Device                   │
  ├──────┼────────────┼────────┼───────────────────────────────────────────┤
  │ 1    │ x8 (Short) │ Empty  │ —                                         │
  ├──────┼────────────┼────────┼───────────────────────────────────────────┤
  │ 2    │ x16 (Long) │ Empty  │ —                                         │
  ├──────┼────────────┼────────┼───────────────────────────────────────────┤
  │ 3    │ x8 (Short) │ In Use │ LSI MegaRAID SAS-3 3108 (RAID controller) │
  ├──────┼────────────┼────────┼───────────────────────────────────────────┤
  │ 4    │ x16 (Long) │ Empty  │ —                                         │
  ├──────┼────────────┼────────┼───────────────────────────────────────────┤
  │ 5    │ x8 (Short) │ In Use │ Intel 82599ES 10GbE SFP+ NIC              │
  ├──────┼────────────┼────────┼───────────────────────────────────────────┤
  │ 6    │ x16 (Long) │ Empty  │ —                                         │
  ├──────┼────────────┼────────┼───────────────────────────────────────────┤

- https://www.nvidia.com/en-us/products/workstations/rtx-4000-sff/
- https://www.pny.com/File Library/Co...ochures/NVIDIA Data Center GPUs/proviz-rtx-40
00-sff-ada-datasheet-2616456-web.pdf
- https://www.dell.com/en-us/shop/del...on-20-gb-gddr6-low-profile-pcie-40x16-graphic
s-card/apd/490-bkpj/graphic-video-cards

- https://www.supermicro.com/en/products/motherboard/H12DSi-N6
- https://www.supermicro.com/manuals/motherboard/EPYC7000/MNL-2363.pdf
 
@Null I don't understand why you think it won't fit. If you're worried about PCIe slots you can get a single slot aftermarket heat sink.
the rtx 4000 ada is dual slot so you "lose" some PCIe slots but you can just get a single slot cooler.
https://n3rdware.com/gpu-coolers/single-slot-rtx-4000-sff-ada-cooler

Even if you were leery of changing coolers you could still put two cards in slots 6 and 4 in your diagram, and put the SAS controller and NIC in 1 and 2. Am I missing something obvious?
 
@Null I don't understand why you think it won't fit. If you're worried about PCIe slots you can get a single slot aftermarket heat sink.


Even if you were leery of changing coolers you could still put two cards in slots 6 and 4 in your diagram, and put the SAS controller and NIC in 1 and 2. Am I missing something obvious?
You're asking me to ask my remote hands to physically modify the cards..? What if they still can't fit? Then I can't return them.

Those PCIe slots are in use bro
 
I thought you had access to your server being back in the US so you didn't need to rely on remote hands.

You can still fit two of them with the original coolers.
Code:
  ┌──────┬────────────┬────────┬───────────────────────────────────────────┐
  │ Slot │    Type    │ Status │                  Device                   │
  ├──────┼────────────┼────────┼───────────────────────────────────────────┤
  │ 1    │ x8 (Short) │ Empty  │ LSI MegaRAID SAS-3 3108 (RAID controller) │
  ├──────┼────────────┼────────┼───────────────────────────────────────────┤
  │ 2    │ x16 (Long) │ Empty  │ Intel 82599ES 10GbE SFP+ NIC              │
  ├──────┼────────────┼────────┼───────────────────────────────────────────┤
  │ 3    │ x8 (Short) │ In Use │ COOLER OVERHANG                           │
  ├──────┼────────────┼────────┼───────────────────────────────────────────┤
  │ 4    │ x16 (Long) │ Empty  │ RTX 4000 ADA GENERATION SFF               │
  ├──────┼────────────┼────────┼───────────────────────────────────────────┤
  │ 5    │ x8 (Short) │ In Use │ COOLER OVERHANG                           │
  ├──────┼────────────┼────────┼───────────────────────────────────────────┤
  │ 6    │ x16 (Long) │ Empty  │ RTX 4000 ADA GENERATION SFF               │
  ├──────┼────────────┼────────┼───────────────────────────────────────────┤

I doubt(?) remote hands would be willing to swap coolers but if they were then I would think they'd also be willing to swap the coolers back to return them if there's an issue, no?

Regardless, if remote hands doesn't want to change coolers you could ship them to you, swap them yourself and then send the cards to the data center. You hold on to the original coolers until everything is installed. That way if there's a problem just have them send the cards back to you, swap back to the original coolers, and return them yourself.

They will fit in a single slot if you get a single slot cooler.
 
Did you get two and just send them to the data center? Or three + single slot heat sinks? Did you try to best offer them on the listing or just buy them at $1125/ea.?
 
Did you get two and just send them to the data center? Or three + single slot heat sinks? Did you try to best offer them on the listing or just buy them at $1125/ea.?
What you are discovering in your dealings with the Nigger Of The Internet is that he is probably clinically insane

I wouldn't worry about it too much thoughbeit

He probably went 2x2slot given that he expressed doubts about getting remote hands to do the modification, whether he could return them if he did that, plus there would tbh probably of been more of a delay figuring out if remote hands would do the modification or not whereas it seems like he instead ordered right away
 
He probably went 2x2slot given that he expressed doubts about getting remote hands to do the modification, whether he could return them if he did that, plus there would tbh probably of been more of a delay figuring out if remote hands would do the modification or not whereas it seems like he instead ordered right away
Probably, but if he ordered them to his house to swap the heat sinks himself before sending them to the data center that would also been an immediate order. And did the sellers accept a best offer on ebay that fast? Inquiring minds want to know.
 
Probably, but if he ordered them to his house to swap the heat sinks himself before sending them to the data center that would also been an immediate order. And did the sellers accept a best offer on ebay that fast? Inquiring minds want to know.
TBH I think he wants to be done with it I really doubt he ordered them to his house to modify and test them first lol
 
Back
Top Bottom