Understanding Compressed Audio File Formats

The state of the audio nation is at a crossroads. With internet bandwidth increasing, we don't need to data compress audio files as much as before. Joe Albano explores quality vs. size.  

We listen to them every day, whether we think about it or not, and they’re the only kind of audio files many people get to hear nowadays—of course, I’m talking about those ubiquitous data-compressed, lo-res audio files, like MP3s and the MP4/AACs you get from the iTunes Store. But those of us who are involved with making music and audio production, are all too aware of the compromised nature of the sound quality in these formats. Lossless methods do exist for reducing audio file size without compromise, preserving all the audio data—FLAC and Apple Lossless are probably the most familiar codecs—but they can only cut file size in half at best. Lossy codecs—MP3, MP4/AAC, and a host of others—reduce file sizes more dramatically. These small audio files were the key to successful streaming and downloading of audio when the internet was young and slow—but to make an audio file 5 to 10 times smaller than its normal (full-quality PCM) audio file size, something’s gotta give. 

Fig 1 A chart of the most common audio file formats: Linear PCM, Lossless Codecs, and Lossy Codecs (from the MPV/AskVideo Course Audio Concepts 105: Sound Recording).

Fig 1 A chart of the most common audio file formats: Linear PCM, Lossless Codecs, and Lossy Codecs (from the MPV/AskVideo Course Audio Concepts 105: Sound Recording).

Taking the Hit

Lossy data compression techniques make use of psychoacoustic principles to achieve their significantly smaller sizes. With the human hearing apparatus (ears & brain), due to the psychoacoustic phenomenon of Masking, some elements of a sound wave are covered up by others, and the listener doesn’t perceive them. Lossy algorithms break down and analyze audio waves, and encode the bits that represent these masked portions of the audio at lower bit resolutions. These lossy algorithms use Perceptual Coding to perform the psychoacoustic analysis and data reduction, and the resulting audio file can have its size reduced by up to around 10 times, taking a typical 40 MB song on a CD down to only 4 MB or so! Not only does this make it small enough to attach to an email, but it reduces the data rate (for streaming and downloading) from PCM’s 1.4 MB/sec (1400 kb/sec) down to 128-160 kb/sec, which made those activities possible even with the slow bandwidths of the early days of the Web, firmly establishing audio’s (and video’s) place in the new online world.

Fig 2 File size comparison of PCM (16 & 24 bit) Lossless (Apple Lossless) and MP3 & MP4 files (at various bitrates.

Fig 2 File size comparison of PCM (16 & 24 bit) Lossless (Apple Lossless) and MP3 & MP4 files (at various bitrates.

But this data/size reduction comes at a cost. Unlike lossless files, lossy formats do not preserve all the audio data, and, despite sounding better than simply slashing SR & Bit Resolution, there is some audible loss of sound quality—how much sound quality is compromised depends on the codec itself, its implementation, and how small the new data-compressed file is.

The Sound

Just how much of a hit do MP3s and AACs exact on the original, un-compromised PCM audio data? Well, that depends. There’s no one single way of implementing an MP3 or AAC encoding process, so different encoding software may produce different results. At the smallest file sizes deemed suitable for music (128/160 kb/sec), there’s usually some noticeable loss of low-end heft, and a “smearing” of highs, resulting in a loss of depth and clarity, and sometimes a reduced stereo image. An AAC should sound a little better in this regard than an MP3 at the same size (I prefer AACs, when I have to use a lossy format). Currently, it’s more common to use “double” sizes—256 kb or even 320 kb/sec MP3/AACs. Apple upgraded the standard for the iTunes Store to 256 kb AACs (and even nicely upgraded users’ older (purchased) 128kb files automatically).

When I include an audio clip with an AskAudio article, I encode it as 256 kb AAC. Heard in isolation, these are perfectly acceptable—not as lacking as smaller MP3s. But if current lossy files are not as bad as all that (or at least as they once were), why does some streaming audio you hear on the internet—especially on YouTube and other social media sites—still sound so awful?

Well, a big part of the problem is that, when audio is uploaded to a commercial site for streaming, that site will often reduce file size even further (to maximize their servers’ bandwidth, presumably), lossily re-encoding the audio until your carefully prepped 256/320 kb audiofile is turned to screechy, phasey mush, reminiscent of the bad old days. Unfortunately, dealing with this can be a hassle, when there is any recourse. Some sites give you no options, others let you enjoy better quality audio streams for a $ premium—if you’re promoting your music, or worse, promoting your studio skills as an engineer or mixer, you’ll need to look carefully into the audio-quality options for any site you decide to use to disseminate your tracks.

The State of Things

And that gets us to the current state of affairs, when it comes to lossy audio. With the higher bandwidths currently available to many users, do we really still need lossy compression at all? 256 kb files are only 5 times smaller than the original (CD-quality) PCM versions, and lossless codecs are half the size of PCM files, so the size difference isn’t quite as great as it was back then. Though Apple hasn’t jumped on the lossless bandwagon yet, it’s becoming more common for sites that offer downloadable audio to offer higher-res lossless versions (Jay-Z’s new streaming service features a high-res option, for a slight premium). While many are still stuck with slower connections, it may not be the time for a wholesale abandonment of lossy formats, but eventually, with no need for them, lossless audio should (and probably will) take over. 

But this potential progression is complicated by other factors. When some people talk about hi-res audio, they’re not just referring to the difference between lossy and lossless encoding techniques, they’re instead critiquing even the full-quality (un-encoded) PCM resolution files, especially Sample Rates (like the 44.1k and 48k standards), favoring higher SR’s (96k). 24-bit resolution is the (demonstrably more accurate) accepted standard for professional PCM audio currently, but the question of SR is more contentious (not all engineers believe there’s as much benefit to raising SR, which includes ultrasonic sound components in the audiofile). And much of the debate on the quality of current recordings centers on the use/over-use of heavy compression and limiting in Mastering, in the quest for louder files (the infamous Loudness Wars). So, issues of Lossy vs. Lossless have become all jumbled up with standard-res PCM vs. hi-res PCM arguments, and mastering compression complaints, making discussions often go round & round in circles, with different camps focusing on different things, in an escalating Tower of audio Babel. 

Make the Best of It

So, for this article, I’ve tried to focus only on the lossy/lossless encoding issues and standards, leaving those other issues for another time and another place. So, when you do have to create and share lossy versions of your carefully crafted recordings and mixes, what can you do to ensure the quality will be as good as possible, and acceptable for the purpose at hand? For example, if I were asked to email-attach a (small) file of a mix for a musician client, to hear the overall treatment of the mix, I probably wouldn’t hesitate to encode it as a 256k AAC or a 320k MP3. But if sending a file to, say, an audiophile jazz label, to be judged on sound quality, I’d insist (and so would they!) on it being a lossless file, even if that meant forgetting about email and using a file-sharing site that can handle the larger file sizes. 

When making the lossy file, there are often a few options available. Settings like Joint Stereo (may improve encoding on some stereo files) and Filter Low Frequencies (may help avoid artifacts caused by strong low-frequency energy in the file) can sometimes address issues that may crop up with certain audio files, but the key settings are quality and data rate—the bit rate. If there’s a Quality option, that may take longer to encode (probably a negligible difference these days), but at least you know you’re getting the best that particular encoder is capable of. When it comes to bitrate, since bandwidth is much higher these days, I’d avoid the older standards (128/160kb) and, again, always use the 256/320kb rates—the file size doubles, but it’s still a 5th the size of the original,  and the higher bitrates may help eliminate some of the more obvious artifacts that might be present with the smaller sizes. In fact, if you can get away with it, you could even try making & sending a lossless version—it’ll be slightly more than twice the size of a 256/320kb lossy file, but if it doesn’t choke, or get rejected by an email client, you’ve sidestepped the whole lossy file issue!

Fig 3 Some options for MP3 and MP4 file creation in Logic (top, middle) and QuickTime Pro 7 (bottom).

Fig 3 Some options for MP3 and MP4 file creation in Logic (top, middle) and QuickTime Pro 7 (bottom).

If you’re using standalone software to create the files, you should try a number of encoders beforehand, and settle on the one that has the best sound quality overall. If you’re working out of a DAW, most nowadays include the option to create an MP3 or AAC file as part of the Bounce-to-Disk process, with a choice of data rates, and most DAWs have licensed high-quality encoding algorithms. Likewise, iTunes and QuickTime can give good results (on Mac, you should have a copy of the older QuickTime 7 lurking in the Utilities folder, and that provides user-adjustable encoding quality options for AAC/MP4 (m4a) audio files—it’d be worth a little experimentation). Needless to say, the cleaner and clearer the original file, the more likely it’ll be to survive the Perceptual Encoding process with most of that quality intact. Unfortunately, the level-maxed, heavily-compressed masters that are all too common nowadays, will often translate more poorly to a lossy format than less squashed, more dynamic mixes (those other quality issues don’t exist in complete isolation after all). 

Wrap-up

As long as many people still have less-then-high-bandwidth connections, and email servers still cap the size of attachments, it seems we’ll be dealing with lossy audio, as least in the short term. And besides the purely technical considerations, lossy audio files have become the standard for generations of music-lovers and music-makers who grew up with them. Some studies suggest that those listeners even prefer that slightly band-limited, slightly-squashed sound to full-quality PCM! (I think that’s probably just a preference for the familiar—if/when they get the chance to become accustomed to better sound quality, they’ll come around). In the meantime, those of us who are more actively concerned with sound quality can keep on as we’ve been—exchange PCM/lossless audio whenever possible, find the best-quality encoders, choose the higher (256/320) bitrates, and keep a watchful eye on any websites that are presenting your work to the public. And keep an eye on the various developments in this area—things are poised to change for the better, and those changes may be happening even as we speak.

Learn more in Joe Albano's Audio Concepts courses at AskVideo.com

Joe is a musician, engineer, and producer in NYC. Over the years, as a small studio operator and freelance engineer, he's made recordings of all types from music & album production to v/o & post. He's also taught all aspects of recording and music technology at several NY audio schools, and has been writing articles for Recording magaz... Read More

Discussion

Want to join the discussion?

Create an account or login to get started!