Digital audio primer

This document exists to describe some of the digital audio concepts used by Ennuicastr. If you just want a description of a particular term, jump to the glossary.

Digital audio is usually compressed: that is, encoded to take up less space. Raw, uncompressed audio is commonly distributed in the wav file format, but nowadays there is no reason to use that format unless some outdated piece of software demands it. Compressed audio comes in two forms: lossless and lossy compression.

Lossless compression is just that: No audio quality is lost in the compressing, and so there's rarely a reason to prefer uncompressed audio to losslessly-compressed audio. The most popular format for lossless compression is FLAC, the Free Lossless Audio Codec. Apple users may also be familiar with ALAC.

Lossy compression loses some detail in the audio, but saves an enormous amount of space in doing so. The goal is always “audibly inperceptible loss”, that is, to lose only details that the human ear won't notice. That leaves much more range for creativity than lossless compression, and as a consequence, there is a much wider range of lossy audio codecs. Some of the most well-known examples are MPEG-4 AAC (often known just as either MPEG-4 or AAC, but they're the same), Ogg Vorbis, and Opus. Opus is commonly used for voice communication, and most modern voice chat applications (including Ennuicastr) use Opus, but software support is much wider for AAC and Vorbis. Because lossy compression is, well, lossy, it cannot match the quality of lossless compression, but you pay for lossless compression in size. In the case of Ennuicastr, lossless FLAC is expected to, on average, be about 16x the size of lossy Opus. The improvement in quality of lossless over lossy is likely only worth it if you have a very good microphone and recording environment.

Ennuicastr has two phases of audio compression. When you are recording, the audio data being recorded is compressed and sent to the server for processing; and, when you are downloading the audio, it is compressed for download. These are necessarily two separate steps (the audio must be uncompressed and processed between them), which introduces a new wrinkle: Generational loss. If you compress audio data using lossy compression, then uncompress it and compress it again using lossy compression, even if the same format is used, you will lose further quality. As there are typically many phases to audio editing, it is wise to reduce the number of lossy phases as much as possible. As such, audio is offered for download in FLAC, even if it was recorded in Opus, and you're always recommended to use this option.

Aside from compression, another consideration is recording continuity. If you simply sit at a microphone and press “record”, you will get a continuous recording, with no gaps of true silence—even when you're not talking, there is some background noise. All of this background noise takes space, and adds little to the recording. As such, in voice communication, it's common to use a “voice activity detector”, which sends audio only when you're actually talking. However, no VAD is perfect, and some talking can always be lost. Ennuicastr typically records with a VAD, but for an increased price, you can opt to disable the VAD and record continuously.

FLAC: Free Lossless Audio Codec. A lossless way of compressing audio: That is, audio will take less space, but lose no quality, relative to raw audio data.

Opus: A lossy audio codec commonly used for compressing voice. Ennuicastr uses high-quality Opus in its default configuration instead of lossless compression, to save space.

MPEG-4 AAC: A lossy audio codec commonly used for compressing music. Ennuicastr offers audio for download in AAC, but FLAC is usually preferable.

Voice activity detector: A technique for seding audio only when the user is speaking. Ennuicastr uses a VAD to save space. Disabling the VAD (using continuous mode) costs extra.

Continuous: Audio with no gaps. When the voice activity detector isn't used, audio captured by Ennuicastr is continuous (barring connection problems).

