First reason: CSAM, like creepshots aren't necessarily nudity, but still sexual. This is why you got those perverted Youtube kids channels, since they are
not detected by said method you mentioned.
Second reason: Large files (such as videos) need a lot of processing to compress /"convert" said file in
order to run it through such a system (ML are designed for a certain type of input of the same type it was trained on). If it's a low streaming bitrate you can do it (well, at least if you got data centers as Facebook or Google). Anything larger than a standard streaming video format is a huge task. So if you were to upload large RAW files as I
mentioned before, either they wouldn't bother to scan it or it would take a lot of power to do so.
Despite all the memes of the FEDs spying on you, the main problem they have is how to process it all.