Mark Cerny's PS5 presentation is the perfect example of Sony upselling old technology as something new and revolutionary. He was talking about how PS5 will have head-related transfer function audio, or HRTF audio in short, as the revolutionary 3D audio system that'll require their special Tempest CPU to calculate in real time.
I am,
deeply passionate about this topic, so allow me to go on a short tangent for a moment.
Preamble: The science of spatial hearing in humans
Humans have two ears. Everything about them, from how they're shaped to how they interpret the air vibration allows us to pinpoint exactly from where a specific sound comes from. Walk down a street and pay attention to what you're hearing and from where. You can tell if something's in front of you, behind you, to the left of you, to the right of you, above you, below you, and any combination of those thee axis as our brains can calculate the minuscule delays of when the sound reaches one ear and the other, among other things like the shape of our ear lobes. The shape of our heads and torso also has an impact on how we perceive sound, and it's also why our voice sounds so weird when we listen to it from a recording, we're so used to hearing it from inside of us that it throws us off when we hear it from outside of us, that's how connected our head and torso is to our ability to hear.
1. Surround sound technology
It's very hard to set up a pair of studio monitors to sound properly since the vibrations they create will resonate around your surroundings and everything in them will impact them, which is why usually you need a special calibration mic and software to play a sine wave sweep through the speakers to compensate for your surrounding, even after soundproofing it. This has never been the issue with headphones, where the vibrations happen right by your ear, making it much easier to achieve proper sound and be able to fine tune it without per-setup calculations. AutoEQ is a perfect example of that, just measure the frequency response of the headphones, mathematically generate parametric equalization to compensate for it to reach a specific sound signature and now you can use this preset on any pair of these headphones.
The exact same thing happens with 3D audio. With speakers, you have Ambisonics, Dolby Atmos, you need 4 speaker setups, 5.1 and 7.1 speaker setups, all in the right position for those surround sound solutions to work properly. But in case of headphones it all gets "trivialized" with HRTF. You have two speakers by your ears, now it's a matter of replicating the same types of audio delays, phase shifts and other bits of human hearing that give us the ability to pinpoint from where the sound comes from, and through that we can achieve realistic surround sound in video games with any pair of stereo headphones.
The biggest hurdle to overcome with HRTF though, is that part where the shape of our ears, head and torso impacts the way we perceive sound. Basically you need a fingerprint of one's ears, head and torso for HRTF to work properly, and it'll only work well if that fingerprint matches that person. You can create an approximate fingerprint that'll be good enough for most people, but for the best of the best results, per-individual calculations are needed.
Back in 2002-2003, French research institute IRCAM has done these types of measurements, where fingerprints of certain subjects, like subject #1037 are very close to a "universal" fingerprint. A YouTube channel dedicated to HRTF audio in videogames named I Drink Lava made a video using every measurememnt from IRCAM for people to figure out which one is best suited for them. We'll come back to this a bit later.
I highly recommend checking out the channel, it is a goldmine of video game HRTF information, plenty of demos of various technologies as well as comprehensive lists of games that support 3D audio in Google Docs links in the channels's description.
HRTF is a fantastic piece of technology that obviously is older than you might think. While IRCAM's measurements were made in 2002-2003, the research on HRTF dates back to the 1970's with Jens Blauert's academic works. The most important thing however, is that HRTF has been successfully implemented in video games already in the late 1990's.
2. Aureal Semiconductor
Aureal Semiconductor was a company founded in 1995. A year later, they have released Aureal 3-Dimensional, or A3D for short, a brand new 3D audio technology that allowed for realistic surround sound in any video game on an ATX PC, thanks to their dedicated hardware and software support. Phil's Computer Lab has made a video demonstrating just how amazing Aureal was for something made back in 1996. Remember to use headphones since as you should already know, HRTF relies on you using headphones.
Impressive, right? Remember that this was achievable on home computers in the 1990's with Aureal's audio accelerator that calculated all the complicated math required in real time. Back then these types of calculations would murder your CPU, especially given the fact that Aureal wasn't doing any dirty tricks to achieve room reverb. No, the way it worked was that the audio sources would essentially ray trace the virtual room and calculate how it would affect the sound in real time. It was that advanced, and it was used in many games of that time, including Half-Life, Unreal Tournament and Quake III Arena.
However, Aureal Semiconductor went bankrupt by the year 2000 and this cutting edge technology was lost with them. The reason why they went bankrupt however is something truly blood boiling.
3. Creative Labs v. Aureal Semiconductor
Creative Labs was a big player back in the day. When PC's were just crawling out of their niche as office machines and were becoming the multimedia powerhouses we know them for today, there was the issue of lack of good audio. That was until Creative introduced the Sound Blaster that allowed for CD quality PCM audio to be widely accessible and affordable in home computers, so they had a massive leverage on the industry back in the day. Coincidentally, around 1998, Creative had their own surround sound and reverb effect technology called Environmental Audio Extensions, or EAX for short. However they had no chance in hell in competing with Aureal, they were light years ahead of Creative.
So, Creative did the only thing they could do in this situation. No, not deliver a better product to the market as proper capitalism would entail. Sue Aureal into oblivion over patent infringements to make sure Creative had a monopoly on this technology.
And so began the lawsuit between Creative and Aureal. Aureal quickly countersued Creative, also over patent infringement. By the very end of 1999 the courts have ruled in favor of Aureal, however Aureal was a tiny company compared to the giant that was Creative at that time, so the costs of lawfare have forced them to declare bankruptcy by the year 2000. Of course, as with every case of bankruptcy, someone inevitably ends up buying out the assets of the company. By sheer cosmic chance, out of all the companies to buy out the leftovers of Aureal, the company that acquired all of this cutting edge technology and all of the patents for it, was none other, than Creative, Technology, L, t, d.
Yes, Creative was so petty about having a monopoly on selling sound cards forever and ever that they have sued Aureal into bankruptcy, bought them out, shelved all of their technology so that it would never see the light of day ever again, only to get smacked in the face by the neck-breaking pace of technological advancements where by the early 2000's every home computer had on-board audio with not much need for Creative's special sauce sound cards, unless of course you wanted EAX which was inferior to what Aureal offered.
To add insult to injury, by 2006, Microsoft has done a major rewrite of their Windows NT family of operating systems, officially released as Windows Vista. One of the major changes was a completely new audio backend that allowed for per-program audio levels, better audio quality, better audio mixing and many other nice things. However in the process they have removed support for hardware accelerated audio, the thing that Creative's sound cards relied on. In response, Creative created ALchemy, a software compatibility layer that aimed to emulate EAX via Creative's OpenAL library. Unsurprisingly, support for EAX withered away as time went on, with games like Fallout: New Vegas being the last to support this technology, and soon enough developers simply stopped bothering.
For the longest time, HRTF in video games has been largely forgotten thanks to Creative's corporate greed killing off progress, however in the recent years there has been a resurgence of interest in the technology that shows a promising future.
4. Software-based HRTF
I have mentioned that Creative had a software library called OpenAL, which was widely used in the 2000's as a free audio library for video games. For example, the S.T.A.L.K.E.R. trilogy utilized it, with a partial implementation of EAX nonetheless. For a brief time it was licensed under GNU LGPL which effectively made it free and open source software. However Creative has decided to make it proprietary by 2009, which meant that the FOSS community would pick up the last bits of code licensed under GNU LGPL and develop their own forks.
OpenAL Soft would become the most popular and robust fork of this library.
If you remember, back in the 1990's, this type of audio required dedicated computing units to properly run all the simulations in real time. However by the 2010's, CPU's have gotten so powerful that such calculations could be ran without any need for such technology with very little overhead. OpenAL Soft does exactly that, it allows for HRTF, EAX, EFX, Ambisonics and plethora of other audio technologies to run on your CPU, hardware agnostic. And since it's a fork of Creative's OpenAL, it also means it's backwards compatible with it. Meaning that if a game used Creative's OAL for EAX, you can replace it with OAL Soft and enjoy EAX without a Creative sound card. If a game used DirectSound3D, the API that Microsoft killed off with Vista, OAL Soft's developer has created a wrapper library by the name of
DSOAL, which as the name implies, translated
Direct
Sound3D calls to
Open
AL Soft. That way you can enjoy all of those old games with robust sound without the need for legacy hardware.
Also worth of note, OpenAL Soft allows for converting the aforementioned IRCAM measurements into HRTF profiles you can then use them in games for a more personalized HRTF profile that might be more optimal than the default one. Technically speaking, you could convert standardized HRTF measurements of your own head, if you manage to find a lab that'll do these for you, and use those in your games. That of course requires you sitting in an expensive contraption for a few good minutes, having microphones shoved in your ear canals and being rotated one degree at a time as a sine wave is played all around you, the same method used for speaker calibration. Obviously not a very accessible option, hence the need for universal profiles.
But this is all old games, right? What about new ones? Well, Valve has their solution to that.
Steam Audio. An Apache-2.0 licensed library that can implement the type of audio that Aureal did back in the 90's, all software side. HRTF, environment reverb based on ray tracing, not hacky EQ spaces like with EAX, all of this amazing tech, free of charge, can be implemented in any game and every modern CPU will be able to use it. Valve has implemented it in their games already. such as Half-Life: Alyx, Counter Strike: Global Offensive and Counter Strike 2, where in the latter two it shines the most when you can pinpoint exactly from where the footsteps and gunshots are coming from before you see the source of the sound. There are also other open solutions out there, such as Google's Resonance Audio.
5. Why HRTF?
I think it's about time I mention the main difference between conventional stereo sound and HRTF sound in games, and why you'd want HRTF over regular stereo. To put it bluntly, conventional stereo sucks. It only does left-right panning and nothing more. You cannot tell exactly where the sound is coming from, only that it's somewhere to the left of you or to the right of you. In a first person shooter, especially a multiplayer one, this is very sub optimal. Is the enemy behind you? Above you? Somewhere the the left in front of you? You have no idea. With HRTF you can pinpoint the exact location of the enemy by just listening to the in-game surroundings. This is something you can only get if, for example, you've played S.T.A.L.K.E.R.: Call of Pripyat with OpenAL Soft's HRTF, you've walked by the concrete factory in Zaton and heard zombies
above you, even though you couldn't see them, so you've climbed the ladder by the factory to find zombies on the rooftop. Without HRTF you'd have no idea that they were above you since it would always sound like they were on the same level with conventional stereo.
You can use these sound cues in competitive games to your advantage in split second situations. It's not just about immersion, this technology completely changes the way you play certain games. Normally when you play a game, you react based only on what you see, you never pay attention to the audio. But with HRTF, now you pay close attention to what you
hear and combine it with what you see, completely changing the way you interact with the game world. In real life, you regularly combine your hearing and sight in your everyday life, you hear a noise, you can tell where it came from and you turn towards it to see what it was. You never really do that in games with conventional stereo since those audio cues are useless. But with HRTF you quickly rely on those audio cues because of how accurate they are. This of course also means better immersion in the game world, now you can walk into a large crowded room, hear it's reverb and the exact location of every little noise. Out of all the "gimmicky" game technologies, HRTF audio is arguably the most useful and important one that went on underused.
6. Tempest Engine
Now let's go back to Mark Cerny and Playstation 5. By this point you can tell that a lot of what I've mentioned was also mentioned by Mark in his presentation. By the end of the day, he is a talented engineer and a massive nerd, so he knows a lot about HRTF, surely more than I do. The most important bit however, is that Mark was upselling it as a major breakthrough in game technology that only Sony is now bringing to the market with their proprietary Tempest Engine. Tempest, Engine. A dedicated hardware accelerator for HRTF calculations. Sounds familiar? Yeah, Aureal did it back in the 90's, and we're at a point where it's wholly unnecessary for HRTF since modern CPU's run those calculations like it's nothing. We had this technology since the 19 fucking 90's, you know, back when Playstation first came to the market, but it was on PC's only. Hopefully by this point you're informed enough to tell just how bullshit Sony's upselling of Tempest is.
However I have to defend Mark for a moment. People were baffled by his idea of you sending Sony photos or videos of your head so that you could have zany wacky 3D audio in your PS5 games, right? From my perspective, the only questionable part is having to send that data to Sony's servers instead of doing those calculations locally, because the idea itself is very commendable. By this point you know how much the shape of our head and torso impacts our hearing, how HRTF relies on that, and how expensive, slow and painful the process of creating an HRTF profile is. Head imaging is another way of creating HRTF profiles, however that again requires dedicated hardware for 3D scans. With the current advent of machine learning it could be possible to simply take photos of your head and torso and have an algorithm accurately calculate your personal HRTF profile for use in games. That was the big idea behind what Mark has said, but it's possible that they never pulled through with the idea for whatever reasons.
Now with this short tangent out of the way, I hope I've interested you into delving further into the fantastic world of HRTF audio in video games. Personally I try my best to reimplement it in any older game that I might play by checking I Drink Lava's Google Docs lists and using my knowledge to hook up OpenAL Soft to those games, I absolutely adore 3D audio. It's a real shame it has been destroyed by Creative and the only publicity it has gotten in the recent years was from Mark's PS5 presentation, which was overwhelmingly negative.