Audio format choice is one of those technical decisions that punishes laziness years later. The podcaster who recorded masters as 128 kbps MP3 cannot remaster cleanly when the show takes off. The film studio that delivered a stereo AAC mix instead of multichannel WAV cannot remix for a new theatrical cut. The voice-over artist who delivered to a client in OGG Vorbis discovers the post-production pipeline does not accept it. Format choices ripple downstream, and the cheapest fix is to make them correctly the first time.
The audio format ecosystem is older and more stable than the video ecosystem. The dominant codecs (PCM, MP3, AAC, FLAC, Opus, Vorbis, ALAC) have been settled for years. Operating system support is mature. The remaining decisions are project-specific: lossless versus lossy, the target bitrate, the channel layout, the sample rate, the container, and the metadata schema. This article walks through the technical characteristics that distinguish the formats and the project profiles that map cleanly to each one.
The Two-Axis Map: Lossy vs Lossless and Container vs Codec
Most confusion in the audio format space comes from collapsing two independent axes. The first axis is whether the encoding is lossy or lossless. The second axis is whether the term refers to a codec (the algorithm) or a container (the file wrapper). The same codec can live inside different containers; the same container can hold different codecs.
PCM is uncompressed and lossless. It is the raw representation of digital audio: a sequence of sample values at a defined bit depth and sample rate. WAV and AIFF are containers that typically hold PCM. FLAC and ALAC are lossless compressed codecs; they preserve the original samples exactly while reducing file size by roughly half on typical music. MP3, AAC, Opus, and Vorbis are lossy compressed codecs; they discard psychoacoustically inaudible information to achieve dramatically smaller files at modest quality cost.
# WAV file header (RIFF format)
[RIFF][file size][WAVE]
[fmt ][16][1][channels][sample rate][byte rate][block align][bits]
[data][data size][PCM samples...]
The 44-byte header reveals the file's sample rate, bit depth, and channel count. The samples follow as raw PCM data. This simplicity is why WAV remains the format of choice for editing: every operation is direct, with no decode step.
"The waveform is the truth. Everything else is a lie of compression we have agreed to tolerate." Bob Katz, Mastering Audio: The Art and the Science
The Codecs Worth Knowing in 2026
A short list covers ninety-five percent of real-world audio work.
PCM (Pulse Code Modulation) is uncompressed digital audio. The waveform is sampled at a fixed rate (44.1 kHz for CD, 48 kHz for video, 96 kHz or 192 kHz for high-resolution masters) and quantized to a fixed bit depth (16 bits for CD, 24 bits for professional masters, 32 bits float for headroom in editing). PCM is the working format for editing and mastering. Containers: WAV, AIFF, BWF.
FLAC (Free Lossless Audio Codec) is lossless compressed. It produces files about half the size of WAV without altering a single sample. FLAC supports up to 24-bit depth and 192 kHz sample rates, embedded metadata, cover art, ReplayGain, and seeking. It is the universally recommended format for archiving music and lossless distribution.
ALAC (Apple Lossless Audio Codec) is Apple's lossless codec. It produces slightly larger files than FLAC and is the default on Apple platforms. Since Apple open-sourced ALAC in 2011, it works everywhere, but FLAC has broader tool support outside the Apple ecosystem.
MP3 (MPEG-1 Audio Layer III) is the legacy lossy codec. At 192 kbps and above it sounds nearly indistinguishable from CD quality on typical playback equipment. It is universally supported. Patents expired in 2017, so it is now royalty-free. For new projects the technical case for MP3 is weaker than Opus or AAC, but compatibility remains its strength.
AAC (Advanced Audio Coding) is the lossy successor to MP3, used by Apple, YouTube, and most streaming services. At equivalent bitrates AAC sounds noticeably better than MP3, especially on stereo and multichannel content. The MPEG-4 container (M4A) holds AAC by default. AAC is patent-encumbered for some uses but free for streaming and personal use.
Opus is the lossy codec that wins almost every modern technical comparison. It outperforms AAC and MP3 across the bitrate range and handles voice, music, and full-bandwidth content equally well. WebRTC and most modern voice applications use Opus internally. Container: typically OGG or WebM.
Vorbis is the predecessor to Opus and the original open-source competitor to MP3. New projects should prefer Opus, but Vorbis remains in use for legacy content and game audio engines that built on it.
Format Selection by Project Type
| Project Type | Master Format | Distribution Format | Sample Rate / Bit Depth |
|---|---|---|---|
| Music album, professional release | WAV (BWF) 24-bit | FLAC for downloads, MP3 320 / AAC for streaming | 96/24 master, 44.1/16 deliverable |
| Podcast spoken word | WAV 24-bit | MP3 96 kbps mono / Opus 48 kbps for modern apps | 48/24 master, 44.1 deliverable |
| Audiobook | WAV 24-bit | MP3 64-128 kbps mono, M4B for Apple | 44.1/16 |
| Voiceover for video | WAV 24-bit | Embedded in video container (AAC) | 48/24 |
| Film theatrical mix | WAV multichannel 24-bit | DTS, Dolby AC-3, Dolby Atmos in deliverables | 48/24 |
| Game audio | WAV 24-bit | OGG Vorbis or platform-specific | 44.1 or 48 / 16-24 |
| Live streaming | Opus | Opus | 48/16 |
| Voice messaging | Opus | Opus | 16/16 |
| Field recording archive | FLAC 24-bit | FLAC | 96/24 or 192/24 |
| Home music listening | FLAC | FLAC | 44.1/16 minimum, 96/24 preferred |
Bitrate, Sample Rate, and Bit Depth: What They Actually Mean
These three numbers are often conflated in casual discussion, but they describe different things.
The sample rate determines the maximum frequency the file can represent. The Nyquist theorem says the highest representable frequency is half the sample rate. CD audio at 44.1 kHz can represent up to 22.05 kHz, which covers the entire human hearing range with margin. 48 kHz is the standard for video; 96 kHz and 192 kHz are professional mastering rates that provide headroom for downsampling and pitch shifting.
The bit depth determines the dynamic range. 16 bits gives 96 dB of dynamic range, sufficient for finished consumer audio. 24 bits gives 144 dB, used for mastering because it allows headroom for processing without quantization noise becoming audible. 32-bit float is the working format inside digital audio workstations; it is impractical to clip in 32-bit float because the dynamic range exceeds 1500 dB.
The bitrate (for lossy formats) determines the average data rate of the encoded file. Higher bitrate generally means better quality, but the curve flattens at higher bitrates. Beyond 192 kbps for AAC and 256 kbps for MP3, additional quality is hard to detect even on critical listening systems.
| Bit Depth | Dynamic Range | Use Case |
|---|---|---|
| 8-bit | 48 dB | Telephone-grade voice, low-quality samplers |
| 16-bit | 96 dB | CD audio, finished consumer files |
| 24-bit | 144 dB | Professional mastering, archival |
| 32-bit float | 1528 dB | DAW internal processing, production headroom |
"Twenty-four bits is not for the listener. Twenty-four bits is for the engineer who has not finished the work." Susan Rogers, Sound Engineer for Prince
Containers and Why They Matter
A container is the wrapper that holds the audio codec data plus metadata, chapter markers, embedded artwork, and synchronization information. The same Opus stream can live inside an OGG container, a WebM container, or an MP4 container with different metadata capabilities in each.
WAV (Wave) is the simplest container, designed by Microsoft and IBM in 1991. It typically holds PCM. The header is fixed at 44 bytes for standard PCM. WAV files over 4 GB require the RF64 extension because the original spec used 32-bit size fields.
AIFF is the Apple equivalent of WAV. Functionally similar; less common outside the Apple ecosystem.
BWF (Broadcast Wave Format) is WAV with extended metadata for professional production: timecode, originator, description, and reference. The European Broadcasting Union maintains the standard. Any modern video and audio post-production tool reads BWF.
FLAC is its own container. The file format includes the codec specification.
OGG is an open-source container that holds Vorbis, Opus, FLAC, or Speex. Common on Linux and in older audio distribution.
MP4 / M4A / M4B are MPEG-4 containers with different conventional file extensions. M4A is for music with AAC, M4B is for audiobooks with chapter markers, MP4 is the catch-all.
WebM is Google's container for web delivery, typically holding Opus audio and VP9 or AV1 video.
Practical Conversion Recipes
Most audio conversion in 2026 runs through ffmpeg, which handles every mainstream codec and container. A short set of recipes covers nearly all conversion needs.
# WAV master to MP3 320 kbps
ffmpeg -i master.wav -codec:a libmp3lame -b:a 320k output.mp3
# WAV master to AAC 256 kbps in M4A
ffmpeg -i master.wav -codec:a aac -b:a 256k output.m4a
# WAV master to FLAC (lossless, level 8 max compression)
ffmpeg -i master.wav -codec:a flac -compression_level 8 output.flac
# WAV master to Opus at 96 kbps (excellent for podcasts)
ffmpeg -i master.wav -codec:a libopus -b:a 96k output.opus
# Downsample 96/24 master to 44.1/16 with proper dither
ffmpeg -i master-96-24.wav -ar 44100 -sample_fmt s16 \
-af "aresample=44100:dither_method=triangular_hp" \
output-44-16.wav
# Convert stereo to mono summing both channels
ffmpeg -i stereo.wav -ac 1 mono.wav
# Embed metadata and cover art in MP3
ffmpeg -i input.wav -i cover.jpg -map 0 -map 1 \
-codec:a libmp3lame -b:a 320k \
-metadata title="Track Title" \
-metadata artist="Artist Name" \
-metadata album="Album Name" \
-id3v2_version 3 \
output.mp3
The dither step on downsampling matters. Truncating 24 bits to 16 without dither produces audible quantization noise on quiet passages. Triangular high-pass dither shapes the noise above the most sensitive part of the human hearing range.
Mastering Headroom and Loudness Standards
Modern audio delivery is governed by loudness standards, not peak levels. Streaming platforms normalize to specific integrated loudness targets and reduce the perceived volume of files that exceed the target.
| Platform | Target Loudness (LUFS) | True Peak Limit |
|---|---|---|
| Spotify | -14 LUFS | -1.0 dBTP |
| Apple Music | -16 LUFS | -1.0 dBTP |
| YouTube | -14 LUFS | -1.0 dBTP |
| Tidal | -14 LUFS | -1.0 dBTP |
| Amazon Music | -14 LUFS | -2.0 dBTP |
| Broadcast TV (EBU R 128) | -23 LUFS | -1.0 dBTP |
| Podcast (industry standard) | -16 LUFS mono / -19 LUFS stereo | -1.0 dBTP |
| Audiobook (Audible) | -18 to -23 LUFS | -3.0 dBTP |
Audio Format Decisions for Specific Workflows
Podcast publishers face a particular decision because the listener base divides between modern apps that support Opus and legacy apps that need MP3. The pragmatic compromise in 2026 is to publish a primary MP3 feed at 96 kbps mono for spoken word and offer an optional Opus version at 48 kbps for listeners on modern apps. The Opus version is half the size at higher quality.
Music publishers releasing through major streaming platforms have less flexibility. The platforms accept WAV or FLAC at submission and re-encode to their own delivery codecs internally. The producer's job is to deliver a master at 24-bit 44.1 kHz or 96 kHz, properly mastered for the platform's loudness target, and let the platform handle final encoding.
Game audio engines have specific format requirements. Unity prefers OGG Vorbis or PCM WAV. Unreal Engine prefers OGG Vorbis or FLAC. Console-specific engines use proprietary formats internally but accept WAV at import. The right working format is high-quality PCM with the engine's import settings handling final compression.
The cognitive perception research at What's Your IQ shows that listener perception of audio quality involves both the physical signal and the listener's expectation of quality, which has implications for how aggressively audio can be compressed without complaint. The note-keeping pipelines at When Notes Fly describe the same trade-off in voice-memo capture: too aggressive compression destroys downstream transcription accuracy.
Metadata and Tagging
Audio files without complete metadata are a maintenance burden. Modern players, library systems, and streaming platforms rely on embedded metadata for organization, search, and royalty calculation.
The standard tag schema for music files (the Vorbis comment style used by FLAC, OGG, and most modern players):
TITLE=Track Name
ARTIST=Performer
ALBUM=Album Title
DATE=2026
TRACKNUMBER=3
TOTALTRACKS=12
DISCNUMBER=1
TOTALDISCS=1
GENRE=Electronic
COMPOSER=Composer Name
ISRC=US1234567890
REPLAYGAIN_TRACK_GAIN=-3.5 dB
REPLAYGAIN_TRACK_PEAK=0.987654
For podcasts, additional fields matter: TITLE for episode name, ALBUM for show name, TRACKNUMBER for episode number, and COMMENT for show notes. The MP3 ID3v2 schema covers the same data with different field names.
"Metadata is what turns a folder of files into a library. Without it you have a collection that only the original owner can navigate." Allyson Carter, Audio Asset Management
Decision Checklist
A short checklist for audio format decisions on a new project.
What is the source quality? If you are recording the master, capture in PCM at the highest sample rate and bit depth your tools support. 48 kHz / 24-bit is the practical floor.
Where will it be delivered? List every destination: streaming platforms, web download, embedded in video, physical media. Each has format requirements.
What is the target loudness? Master to the loudness target of the primary destination. For multi-destination releases, master to the most-played destination's target and accept the others' normalization.
Is metadata embedded? Embed every field the destination supports. Library tools and rights management depend on it.
Have you archived the master? Keep an uncompressed or lossless archive of the master separate from the lossy delivery files. Future remasters cannot recover information that was discarded by lossy compression.
For related guidance, see understanding audio formats which one is right for you and how to convert audio files complete format guide.
References
- International Telecommunication Union. ITU-R BS.1770-5 Algorithms to measure audio programme loudness and true-peak audio level. https://www.itu.int/rec/R-REC-BS.1770
- European Broadcasting Union. EBU R 128 Loudness normalisation and permitted maximum level of audio signals. https://tech.ebu.ch/publications/r128
- Xiph.Org Foundation. FLAC Format Specification. https://xiph.org/flac/format.html
- Internet Engineering Task Force. Definition of the Opus Audio Codec. RFC 6716. https://www.rfc-editor.org/rfc/rfc6716
- ISO/IEC 14496-3:2019 Information technology, Coding of audio-visual objects, Part 3: Audio. https://www.iso.org/standard/76383.html
- ISO/IEC 11172-3:1993 Information technology, Coding of moving pictures and associated audio for digital storage media (MPEG-1 Audio). https://www.iso.org/standard/22411.html
- Microsoft. WAVEFORMATEXTENSIBLE structure. https://learn.microsoft.com/en-us/windows-hardware/drivers/audio/extensible-wave-format-descriptors
- AES Technical Council. AES17-2020 standard method for digital audio engineering measurement of digital audio equipment. https://www.aes.org/publications/standards/
Frequently Asked Questions
What audio format should I use for archiving music?
FLAC (Free Lossless Audio Codec) is the best choice for archiving music. It compresses audio without any quality loss, typically reducing file size by 40-60% compared to WAV while remaining bit-for-bit identical to the source. ALAC (Apple Lossless) is an equivalent alternative in Apple ecosystems. Always keep at least one lossless master copy.
What is the best audio format for video game sound effects?
OGG Vorbis is the most common choice for game audio due to its good compression at low bitrates, royalty-free license, and broad engine support (Unity, Unreal, Godot). For short sound effects, uncompressed WAV is also widely used to avoid decoding latency. MP3 is less preferred in games due to its licensing history and higher latency.
How do I choose between AAC and MP3 for mobile audio?
Choose AAC. It consistently outperforms MP3 at the same bitrate — AAC at 128 kbps sounds comparable to MP3 at 192 kbps. AAC is the default format for iOS, Apple Music, YouTube, and most modern streaming services. Unless you need MP3 for compatibility with very old hardware, AAC is the better choice for all new audio production.
Ready to Convert Your Files?
Use our free online file converter supporting 240+ formats. No signup required, fast processing, and secure handling of your files.
Convert Files


