Podcast producers face a narrower format problem than music producers and a wider one than radio engineers. The narrow part is that the content is overwhelmingly speech, which compresses cleanly and tolerates aggressive bitrate reduction. The wider part is that the listener base spans every device and software combination ever shipped, from a six-year-old Android phone running a podcast app last updated in 2019 to a high-resolution audio player that demands FLAC. The format decision affects file size on the host, bandwidth on delivery, listener compatibility, post-production flexibility, and the show's ability to remaster years later.

This article compares the formats podcast producers actually use, names the trade-offs that matter for spoken word content, and offers concrete recommendations for the master format, the public delivery format, and the optional alternative formats that some shows now publish for listeners on modern apps. The recommendations are calibrated for spoken-word podcasts: interviews, narrative shows, news, and long-form discussion. Music podcasts, ambient sound art, and shows that mix substantial music elements need somewhat different choices.

The Master Format Question

Every podcast workflow starts with capture and editing in a master format. The master format must be lossless because every editing operation, every fade, every level adjustment introduces small numerical changes that compound. Lossy compression on top of lossy editing produces audible artifacts within a few generations. The master format is not what listeners hear; it is what the show edits and archives in.

WAV at 48 kHz / 24-bit is the standard master format for podcast production. The 48 kHz sample rate aligns with video pipelines, which matters when the podcast also produces video versions. The 24-bit depth gives 144 dB of dynamic range, which absorbs editing noise without quantization artifacts.

# Multitrack capture in a DAW typically writes per-channel WAV files
# Each track at 48 kHz / 24-bit
host-mic.wav    [48000 Hz, 24-bit, mono, 90 minutes = ~620 MB]
guest-mic.wav   [48000 Hz, 24-bit, mono, 90 minutes = ~620 MB]
remote-feed.wav [48000 Hz, 24-bit, mono, 90 minutes = ~620 MB]

# Stem mix and full mix archived alongside
final-mix.wav   [48000 Hz, 24-bit, stereo, 90 minutes = ~1.2 GB]

For long-running shows, the storage cost of WAV masters compounds. A weekly podcast with 90-minute episodes accumulates roughly 60 GB of master audio per year per microphone. Archiving WAV masters for the entire run is the right call; lossless storage on a NAS or in cloud cold storage is the practical solution.

"The recording is the only material you cannot recreate. Spend money on the storage, not on the gear that captured it." Ira Glass, This American Life

The Public Delivery Format

The public delivery format is the file that goes into the RSS feed and is downloaded by listeners. The two real choices in 2026 are MP3 at 64-128 kbps and Opus at 32-64 kbps. AAC remains relevant but solves no problem better than the alternatives.

MP3 is the universal default. Every podcast app on every platform supports MP3. Apple Podcasts, Spotify, YouTube Music, every player ever shipped, can play an MP3. The format is patent-free since 2017. For spoken word, MP3 at 64 kbps mono sounds clean; at 96 kbps mono it sounds excellent; at 128 kbps stereo it preserves the full host-and-guest stereo separation.

Opus is the technical successor. At any bitrate, Opus sounds noticeably better than MP3 on speech. The quality of Opus at 32 kbps is comparable to MP3 at 64 kbps. Modern podcast apps (Pocket Casts, Overcast, AntennaPod, the major streaming apps) handle Opus correctly, but some legacy apps do not, and Apple's own Podcasts app gained Opus support only recently.

FormatBitrateFile Size (1 hour mono)CompatibilityQuality on Speech
MP3 64 kbps mono64 kbps28 MBUniversalGood
MP3 96 kbps mono96 kbps42 MBUniversalExcellent
MP3 128 kbps stereo128 kbps56 MBUniversalExcellent
MP3 192 kbps stereo192 kbps84 MBUniversalDiminishing returns
AAC 64 kbps mono64 kbps28 MBWideExcellent
AAC 96 kbps stereo96 kbps42 MBWideExcellent
Opus 32 kbps mono32 kbps14 MBModern appsExcellent
Opus 48 kbps mono48 kbps21 MBModern appsOutstanding
Opus 64 kbps stereo64 kbps28 MBModern appsOutstanding
The bandwidth implications matter for shows with large audiences. A show with 100,000 weekly listeners pulling 60 MB MP3 files moves 6 TB per week through hosting. The same show delivering 25 MB Opus files moves 2.5 TB. Hosting bills scale linearly.

Mono vs Stereo for Spoken Word

Mono is the right channel layout for most podcasts. Spoken word is a single sound source per person. There is no spatial information to preserve. Stereo doubles the file size for no audible benefit and, on some encoders, distributes the same signal across both channels at lower per-channel quality than mono encoding would produce.

The exceptions are music-heavy podcasts, narrative shows with sound design, and panel shows that intentionally pan voices to separate left and right channels for clarity. For an interview show, two-mic recording can be summed to mono on delivery without losing intelligibility.

# ffmpeg recipe to deliver mono podcast
ffmpeg -i master.wav \
       -ac 1 \
       -codec:a libmp3lame -b:a 96k \
       -metadata title="Episode 47" \
       -metadata artist="Show Name" \
       -metadata album="Show Name" \
       -id3v2_version 3 \
       episode-47.mp3

The -ac 1 flag downmixes to mono. This sums the channels with proper gain compensation rather than discarding one. The resulting MP3 is half the size of the stereo equivalent at the same bitrate.

"The microphone is mono. The voice is mono. The information is mono. The stereo file is mostly empty channels carrying noise." Andrew Mason, This American Life producer notes

Loudness Targets for Podcasting

The podcast industry has converged on -16 LUFS integrated for mono shows and -19 LUFS integrated for stereo, with true peaks below -1.0 dBTP. These targets sit louder than the broadcast TV standard of -23 LUFS because podcast listeners are typically on mobile devices in noisy environments where additional level helps intelligibility.

StandardIntegrated LoudnessTrue PeakUse Case
Podcast mono industry standard-16 LUFS-1.0 dBTPSpoken word podcasts
Podcast stereo industry standard-19 LUFS-1.0 dBTPMusic-mix podcasts
Apple Podcasts recommendation-16 LUFS-1.0 dBTPApple distribution
Spotify-14 LUFS-1.0 dBTPSpotify exclusive shows
Audible / ACX (audiobook)-18 to -23 LUFS-3.0 dBTPAudiobooks, narrative spoken word
EBU R 128 (broadcast)-23 LUFS-1.0 dBTPPublic broadcasters
Mastering above the target produces a quieter playback because Spotify and Apple Podcasts both normalize. Mastering below the target produces a softer-sounding file. Either way, the loudness should be measured with an ITU-R BS.1770 compliant meter rather than estimated by eye.

Encoding Settings That Matter

The MP3 and Opus encoders both have a number of settings that affect quality. A few are worth knowing.

For MP3 with LAME (the open-source encoder): use VBR mode for highest quality at a target average bitrate. Use the -V 4 setting (roughly 165 kbps average for stereo, or use -V 6 at roughly 96 kbps for podcast mono). Use the --lowpass 18 flag to roll off above 18 kHz, which reduces encoding artifacts on speech.

For Opus with libopus: use the --application=voip flag for spoken-word content, which optimizes the codec for speech quality at low bitrates. Use the --bitrate 48 flag for a high-quality podcast mono delivery. The --vbr flag enables variable bitrate, which is the default and the right choice.

# High-quality MP3 podcast delivery
lame -V 6 --lowpass 18 -m m master.wav episode.mp3

# High-quality Opus podcast delivery
opusenc --application voip --bitrate 48 --vbr master.wav episode.opus

# Single ffmpeg pipeline for both
ffmpeg -i master.wav -ac 1 \
       -codec:a libmp3lame -q:a 6 episode.mp3
ffmpeg -i master.wav -ac 1 \
       -codec:a libopus -b:a 48k -application voip episode.opus

Metadata for Podcast Files

Podcast file metadata is more than aesthetic. Apple Podcasts and other directories read embedded metadata to display episode information when the file is opened outside the podcast app. The minimum useful metadata for an MP3 podcast file:

ID3v2.3 tags:
TIT2 (Title)        : Episode 47: The Map Maker
TPE1 (Artist)       : Show Name
TALB (Album)        : Show Name (use show name as album)
TRCK (Track)        : 47
TYER (Year)         : 2026
TCON (Genre)        : Podcast
COMM (Comment)      : Episode show notes (short)
APIC (Picture)      : Show artwork, 3000x3000 PNG

The cover art deserves attention. Apple Podcasts requires square artwork between 1400x1400 and 3000x3000 pixels. Embedded artwork at full size adds about 500 KB to each MP3 file. For shows with constant cover art across episodes, that is duplicated storage; some hosts handle artwork at the feed level instead.

The Optional Opus Stream

A growing number of podcast publishers offer two parallel feeds: a primary MP3 feed for universal compatibility and an optional Opus feed for listeners on modern apps. The Opus feed delivers comparable or better quality at half the bandwidth.

The pattern looks like this in the show's delivery infrastructure:

# Episode delivery: render both formats
ffmpeg -i master.wav -ac 1 -codec:a libmp3lame -b:a 96k episode.mp3
ffmpeg -i master.wav -ac 1 -codec:a libopus -b:a 48k -application voip episode.opus

# Two RSS feeds, same show
https://example.com/feed.rss          # MP3 enclosures
https://example.com/feed-opus.rss     # Opus enclosures

Most podcast hosts (Buzzsprout, Libsyn, Podbean, Anchor) do not yet automate Opus rendering. Self-hosted podcasts have full control. The note-keeping discipline at When Notes Fly on capturing voice memos in efficient formats applies directly: the episode capture and the public delivery are two different formats serving two different purposes, and conflating them is where bandwidth bills come from.

Hosting and Delivery Considerations

The format choice interacts with hosting in ways that matter. Most podcast hosts charge by storage, bandwidth, or both. The bandwidth fees scale with episode size and audience size.

A show with a million weekly listeners pulling a 30-minute episode at 96 kbps MP3 (21 MB per episode) moves 21 TB per week through hosting. At commodity CDN pricing of $0.05 per GB, that is $1,050 per week, or $54,600 per year, in delivery costs. Halving the file size halves that bill.

Audience SizeMP3 96 kbps Bandwidth (30-min mono)Opus 48 kbps BandwidthSavings at $0.05/GB
10,000 weekly210 GB105 GB$5.25/week
100,000 weekly2.1 TB1.05 TB$52.50/week
1,000,000 weekly21 TB10.5 TB$525/week
10,000,000 weekly210 TB105 TB$5,250/week
For shows with substantial audiences, the bandwidth math justifies engineering effort on format choice. The expert-written certification preparation guides at [Pass4Sure](https://pass4-sure.us) note the same principle for educational audio content delivery: storage and delivery costs can become the dominant operational expense at scale, and format optimization is the most leveraged intervention.

Recovery and Archive Strategy

Master files should be archived with the same care as source material in any other production. The two-tier strategy that works for most podcast operations:

The active tier holds masters and project files for the most recent six to twelve episodes. Storage is on fast disk; backups are continuous to cloud or to a NAS.

The cold tier holds masters and project files for older episodes. Storage is on archival media: cloud cold storage (S3 Glacier, Azure Archive), spinning disk that comes online for retrieval, or LTO tape for high-volume operations. Files are stored as FLAC of the final mix at minimum; multi-track stems are stored separately for shows that may return to remix.

Retrieving a master from cold storage to remaster a single episode for a "best of" compilation is the use case the archive serves. The cost of cold storage is dominated by retrieval, not by ongoing storage, so the discipline is to fetch infrequently.

"An archive that you cannot retrieve from is a museum of your own decisions. The retrieval is the test." Brewster Kahle, Internet Archive founder

Format Recommendations Summary

For a typical spoken-word podcast in 2026, the format stack looks like this.

The capture format is WAV at 48 kHz / 24-bit, multi-track, one file per microphone source. The editing format stays in the same WAV format inside the digital audio workstation. The mastered final mix is exported as WAV at 48 kHz / 24-bit, with a flat archive copy at -16 LUFS integrated and -1.0 dBTP true peak.

The public delivery format is MP3 at 96 kbps mono, with embedded metadata and ID3v2.3 tags including title, artist, album, track number, year, and 3000x3000 cover artwork. Optionally, a parallel Opus feed at 48 kbps mono for listeners on modern apps.

The archive format is FLAC of the final mix for long-term storage, with the WAV master retained on the active tier for the most recent episodes. Multi-track stems are archived separately for shows that need remix capability.

Encoding Workflow Pitfalls

A short list of recurring mistakes that bite podcast production pipelines.

The lossy-to-lossy chain. Recording into a portable recorder that produces MP3, editing the MP3 in a digital audio workstation that re-encodes to MP3 on export, then publishing that MP3 produces audible artifacts within three generations. Always record and edit in WAV; only encode to MP3 at the final delivery step.

The phantom stereo width. Some encoders process stereo content using mid-side stereo coding that allocates more bits to the mid (sum) channel and fewer to the side (difference) channel. On podcast content where one host is hard-left and another is hard-right, the side channel carries most of the information, and aggressive mid-side coding degrades the spatial separation. Force joint stereo off or use mono instead.

The forgotten loudness measurement. A loudness target is meaningless without measurement. Use a tool that implements ITU-R BS.1770 (ffmpeg's loudnorm filter, the EBU r128 module in Reaper, the Loudness Penalty meter in iZotope's RX) to verify each episode before publishing. Estimating by eye is unreliable.

The clipped peaks at the encoder stage. Lossy encoders can produce inter-sample peaks higher than the highest sample in the source. Master with true peak ceiling at -1.0 dBTP rather than the sample peak at -0.1 dBFS to leave headroom for the encoder.

The absent show notes in metadata. Listeners who play episodes outside their podcast app see only the embedded ID3 tags. Without title, artist, and a comment field with at least the show notes URL, the file is hard to identify when separated from the feed. Embed the basics.

For related guidance, see understanding mp3 vs flac which audio format to choose and audio formats explained choose right format project.

References

  1. International Telecommunication Union. ITU-R BS.1770-5 Algorithms to measure audio programme loudness and true-peak audio level. https://www.itu.int/rec/R-REC-BS.1770
  1. Internet Engineering Task Force. Definition of the Opus Audio Codec. RFC 6716. https://www.rfc-editor.org/rfc/rfc6716
  1. ISO/IEC 11172-3:1993 Information technology, Coding of moving pictures and associated audio for digital storage media (MPEG-1 Audio). https://www.iso.org/standard/22411.html
  1. Apple Podcasts Specification. https://podcasters.apple.com/support/823-podcast-requirements
  1. ID3v2.4.0 informal standard. https://id3.org/id3v2.4.0-structure
  1. Audio Engineering Society. AES17-2020 standard method for digital audio engineering measurement of digital audio equipment. https://www.aes.org/publications/standards/
  1. European Broadcasting Union. EBU R 128 Loudness normalisation and permitted maximum level of audio signals. https://tech.ebu.ch/publications/r128
  1. Xiph.Org Foundation. FLAC Format Specification. https://xiph.org/flac/format.html