STOP USING BITRATE
A protracted rant by the author of Ennuicastr
The digital media community uses “bitrate” to mean “quality”, and using bitrate in this way actually reduces the quality of digital media with almost no benefit. Bitrate is a reasonable way to measure the quality of a digital media format (how does it compare against another format at the same bitrate), but not a reasonable way to actually master digital video or audio. I'm going to start this rant as a tirade against constant bitrate, but my argument is that even average bitrate is usually the wrong choice, and I'll argue why that is as well.
First, let's understand how we got here.
The earliest lossy digital media formats had constant, fixed bitrates. One second of digital audio or video took a fixed amount of space, no matter which second of digital audio or video it was. The reason for this isn't that constant bitrate is a good idea; even the authors of the formats at the time knew that it wasn't. The reason is that constant bitrate is predictable in a way that was beneficial for surrounding hardware at the time, in particular digital magnetic tape, which ran at a constant speed, and ISDN lines, which provided an exact, predictable point-to-point bandwidth.
As new formats were created, this constant bitrate mode was retained, but intended only for use in such esoteric situations. For all normal uses, there's no reason to use a constant bitrate.
So, what's so wrong with constant bitrate? Doesn't it make sense that every second of digital media should take the same amount of space as every other second? Unfortunately, no, and the reason is how digital media compression works.
Digital media compression—whether video or audio—is differential. Rather than describing every sample of audio or every pixel of video, they describe how the video or audio changes over time. That basic idea is the key to lossy compression, and why lossy compression works. But, it's also the key to why constant bitrates don't make much sense: a held note, or in particular, silence, changes much less than something more dynamic.
If you've watched a low-quality digital video, you've probably noticed how things go muddy when there's a lot of action, then go crisp when there's a less active scene. The reason for this is constant bitrate; the information that describes the motion of an active scene takes more space than the information that describes the motion of a less active scene, and so more information must be sacrificed from active scenes to keep the bitrate constant.
But, there's another, less silly option: just use more space when you need it, and less space when you don't. Keep the quality constant, instead of the bitrate. Use constant-quality encoding.
Quality is not actually quantifiable, so constant quality modes just use magic numbers, which is part of what scares people off from using them. Common H.264 video encoders, for example, use “CRF”, which is a totally opaque term. Worse yet, lower CRF is better, and the common range is around 16 to 28. It's all magic numbers. Unfortunately, until and unless we can all agree on at least some fixed range to our magic numbers (I like 0 to 100, like JPEG's quality setting), we're stuck learning the meanings of these magic numbers for our encoders of choice.
You may at this point be thinking, “but if we use more space for some sections than others, isn't it possible that we'll use too much space sometimes, and cause problems playing back video or audio?” It's notionally possible, but even if you encode pure noise at a reasonable CRF in H.264, for instance, it won't take so much space that it's unusable.
Of course, you may have a strict file size limit; perhaps, for instance, you're mastering a video to burn it to a Blu-ray disc. In this case, constant quality has the annoying property of unpredictable final file size, so surely constant bitrate is better? Nope! Many encoders support so-called two-pass encoding, where one pass is used to determine how compressible the source is, and then another pass does the actual compression. That way, it can, in essence, choose a constant quality setting that will result in the desired final file size.
There is one good argument left for predictable bitrates: streaming. After all, you have so much bandwidth, so if you're streaming, you need to make good use of that bandwidth and no more, right? It was for these uses that average bitrate modes were created.
The way that average bitrate modes are implemented is actually quite clever. What they actually are is constant quality modes, but they periodically (typically every two seconds) check the bitrate, and adjust the quality to try to keep the average to what's specified. That periodic window is usually adjustable, but in practice, it's almost never adjusted. So, that offers the best of both worlds, right? Unfortunately, not quite; the fact that it's adjusting the quality dynamically just means that it takes a little bit of time to get muddy, rather than getting muddy immediately when there's a lot of action. Average bitrate isn't a terrible option, but I argue that even it is usually the wrong option.
With the Internet Protocol, “you have so much bandwidth” is simply not how it works. The amount of bandwidth available to you fluctuates completely unpredictably. The entire network is designed to handle very bursty data well, where sometimes you're sending a little bit of data, but sometimes you're sending much more. But, that sounds familiar, right? The Internet Protocol is ideally suited for constant-quality data!
Of course, you still do have a maximum. Some encoders have so-called “constrained constant quality” (CCQ) modes, where the quality setting is primary, but a maximum momentary bitrate can be set, and it won't go above that. CCQ is the ideal mode for streaming. If CCQ isn't available to you, then average bitrate is a good choice. Constant bitrate is never the right choice.
I'll leave the discussion with this wonderful bit of trickery from the creators of Opus. Opus does have a constant bitrate mode, for the esoteric reasons I mentioned above, but it's rarely used. Rather, when you use Opus, you set a bitrate, but it's not a constant bitrate mode. So, that means it must be an average bitrate mode, right? No! It's a constant quality mode, and Opus, knowing full well that the digital media community has been using bitrate as a standin for quality, named their quality setting “bitrate”. If you encode pure noise as “128kbit” Opus, it will take much more than 128kbit, and if you record silence as “128kbit” Opus, it will take much less. The bitrate setting is purely a constant quality mode, having no direct constraint on the bitrate at all! I love and hate this decision, but regardless, it would probably be unreasonable for authors of video formats to do the same, since the size and range of possible sizes are so much larger.