The MKV container is objectively better than MP4 because it handles multiple video, audio and subtitle streams, more codecs and does so with marginally less memory than MP4, however, the virtually universal compatibility of MP4 trumps that IMO.
So this is almost always a case of an mp4 container holding the video stream and a webm container holding the opus audio stream getting merged together. mp4 and webm are containers that limit their codec support for compatibility, whereas mkv is extensible and can hold just about anything. Since it's only the opus audio stream that's incompatible with the mp4 container, we only need to re-encode the audio: