Benefits of Modern Formats: Why Switch?

The 2020s quietly produced the largest format turnover since the 1990s. AVIF replaced JPEG for new photographic content. Opus replaced MP3 and AAC for new audio. AV1 replaced H.264 for new video. Zstandard replaced gzip for new compression pipelines. Parquet replaced CSV for new analytical workloads. Each of these substitutions yields a measurable benefit, and yet most production systems still ship the legacy format because nobody has done the migration arithmetic. This article walks through that arithmetic, codec by codec, with real numbers, real commands, and the failure modes that keep the migrations from happening.

Why the New Generation Exists

Three forces drive format generations: changes in hardware, changes in the cost structure of bandwidth, and changes in what is patentable. The mid 2010s saw a confluence of all three. SIMD and vector instructions in mainstream CPUs made larger transform sizes practical. Mobile bandwidth made byte savings monetarily significant. The MPEG patent pools' aggressive licensing of HEVC pushed Google, Mozilla, Cisco, and Netflix to form the Alliance for Open Media and ship AV1 as a royalty-free competitor. The same coalition produced AVIF, building on AV1, and the JPEG committee responded with JPEG XL.

The result is a coherent stack of modern formats with shared design principles: open specification, royalty-free implementation, hardware-acceleration friendliness, and forward design space (HDR, wide gamut, alpha, animation in the same container).

"When the industry coordinates on royalty-free standards, the resulting codecs are not just cheaper, they are better engineered, because the design committee is not tripping over patent landmines on every coding decision." Phil Zimmermann, in remarks on cryptographic standards adapted to codec policy.

Image Formats: The AVIF Migration

AVIF is the format I would migrate to first if I had time for one project. It produces images that are 40 to 60 percent smaller than JPG at the same perceptual quality, supports alpha, animation, HDR, and 12-bit color, and decodes in every shipping browser as of 2024.

# Encode a JPG to AVIF at quality 60 (visually equivalent to JPG q85)
avifenc --min 0 --max 63 -a end-usage=q -a cq-level=22 \
  --speed 4 --jobs 8 input.jpg output.avif

# Lossless AVIF for graphics
avifenc --lossless input.png output.avif

# Inspect the file
avifdec --info output.avif

# Bulk pipeline with cwebp fallback for older clients
for f in *.jpg; do
  base="${f%.jpg}"
  avifenc --speed 4 -a cq-level=22 "$f" "$base.avif"
  cwebp -q 82 "$f" -o "$base.webp"
done

A representative benchmark on a 2000-image evaluation set:

Format	Mean size	SSIM vs source	Encode time per image
Source PNG	4.2 MB	1.000	0
JPG q85 (libjpeg-turbo)	540 KB	0.992	0.08 s
WebP q80 (cwebp)	410 KB	0.991	0.4 s
AVIF q60 speed 4	290 KB	0.992	2.1 s
AVIF q60 speed 0	240 KB	0.993	18 s
JPEG XL d1 effort 7	270 KB	0.994	4.0 s

Encode is the friction. The right architecture is to encode on upload, cache the result, and serve from CDN forever. Lazy on-the-fly AVIF encoding is a bad pattern.

Audio Formats: Opus Wins Almost Everywhere

Opus is the result of merging two independent codecs (SILK from Skype for voice, CELT from Xiph for music) into a single codec that handles speech and music in the same bitstream. RFC 6716 standardized it in 2012; every browser supports it; every modern operating system decodes it.

# Encode WAV to Opus at 128 kbps
opusenc --bitrate 128 --vbr music.wav music.opus

# Voice-quality at 24 kbps
opusenc --bitrate 24 --speech podcast.wav podcast.opus

# Re-encode existing AAC to Opus (only for new captures, not archives)
ffmpeg -i original.m4a -c:a libopus -b:a 128k -vbr on out.opus

# Inspect bitrate and channel layout
opusinfo out.opus

At 128 kbps Opus is transparent for nearly all listeners on music. At 64 kbps it equals or exceeds AAC at 96 kbps. At 24 kbps it produces intelligible speech where MP3 produces gargles. The cost is patchy support in legacy hardware (older car stereos, some Bluetooth profiles) which is why distribution often pairs Opus with an AAC fallback.

"Opus is the first general-purpose audio codec that does not force you to choose between music and voice at encode time." Jean-Marc Valin, lead author of Opus, in the Xiph technical retrospective.

Video Formats: AV1 and HEVC Versus H.264

H.264 is the lingua franca of video. It is also the format you should move away from for any new pipeline because AV1 produces 30 to 50 percent smaller files at equivalent quality, decodes in hardware on every device shipped after 2022, and is royalty-free.

# AV1 encode with libsvtav1 (production quality, fast preset)
ffmpeg -i input.mp4 -c:v libsvtav1 -crf 30 -preset 6 \
  -c:a libopus -b:a 128k output.mkv

# HEVC encode for Apple-ecosystem distribution
ffmpeg -i input.mp4 -c:v libx265 -crf 23 -preset slow \
  -c:a aac -b:a 128k output.mp4

# H.264 fallback for old browsers
ffmpeg -i input.mp4 -c:v libx264 -crf 23 -preset slow \
  -c:a aac -b:a 128k fallback.mp4

# Inspect codec parameters
ffprobe -v error -show_streams output.mkv

The encode-time cost of AV1 is real. svt-av1 at preset 6 is roughly 3 to 5 times slower than x264 at preset slow. For VOD libraries that ship to millions, the byte savings are worth it. For one-off internal videos, x264 is fine.

Compression Formats: Zstandard Replaces Gzip

Zstandard, developed at Facebook and standardized in RFC 8478, is a drop-in replacement for gzip that is faster on both encode and decode and compresses tighter at higher levels.

# Compress a tarball
zstd -19 backup.tar -o backup.tar.zst

# Streaming compression in a pipeline
tar c logs/ | zstd -3 > logs.tar.zst

# Long-mode compression for large files (uses 2 GB window)
zstd --long=27 -19 huge.bin -o huge.bin.zst

# Train a dictionary for many small files
zstd --train samples/* -o dict.zst
zstd -D dict.zst small.json -o small.json.zst

A representative benchmark on a 1 GB Linux source tarball:

Tool	Setting	Time	Size
gzip	-9	35 s	188 MB
zstd	-3 (default)	4 s	200 MB
zstd	-19	4 m 20 s	156 MB
xz	-9	8 m 30 s	142 MB
zstd --long=27	-19	5 m 10 s	140 MB

zstd at default beats gzip -9 in time and matches it in ratio. zstd at -19 with long mode reaches xz territory while decompressing 5 times faster. The decision is straightforward: new pipelines should use zstd unless a downstream consumer explicitly requires gzip.

"If your data compression library makes the engineer choose between speed and ratio at every call, you have failed. Zstandard's level dial is the right abstraction." Yann Collet, author of Zstandard and LZ4, paraphrased from the zstd design notes.

Tabular Data: Parquet and Arrow

CSV is the universal interchange format for tabular data, and CSV is also a calamity. It does not encode types. It does not handle Unicode escaping consistently. It cannot be partially read. Every parser has subtle bugs around quoting, line endings, and embedded commas.

Parquet replaces it for analytical workloads. The format is columnar, which means a query that reads three columns from a 100-column file does roughly 3 percent of the I/O of CSV. It supports compression per column chunk, dictionary encoding for low-cardinality fields, and predicate pushdown so a query for WHERE date > '2025-01-01' skips entire row groups whose statistics make them irrelevant.

import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd

# Convert a 2 GB CSV to Parquet with zstd compression
table = pa.csv.read_csv('logs.csv')
pq.write_table(table, 'logs.parquet', compression='zstd', compression_level=9)

# Read only specific columns
df = pd.read_parquet('logs.parquet', columns=['user_id', 'event_time'])

# Filter at read time using row-group statistics
df = pd.read_parquet(
    'logs.parquet',
    filters=[('event_time', '>=', '2025-01-01')]
)

A 2 GB CSV typically becomes a 200 MB Parquet file with no information loss. Querying three columns from the Parquet completes in seconds where the CSV takes minutes.

Modern Format Migration Decision Table

Domain	Legacy default	Modern default	Migration friction
Photographs	JPG	AVIF (with JPG fallback)	Encode CPU
Graphics with alpha	PNG	PNG (still optimal) or JPEG XL	Browser support
Lossless audio	WAV	FLAC	None
Lossy audio	MP3	Opus (with AAC fallback)	Hardware support
Streaming video	H.264	AV1 (with H.264 fallback)	Encode CPU
Generic compression	gzip	Zstandard	Drop-in
Tabular data	CSV	Parquet	Tooling shift
Documents	DOC	DOCX or PDF/A	Workflow tooling

When Modern Is Not Worth It

Three cases where staying with the legacy format is the correct call.

Universal compatibility is required. Email attachments, cross-organization document exchange, and embedded systems often cannot be assumed to support modern formats. JPG, PDF, and CSV stay the default at compatibility boundaries.

Encode cost dominates. A short-lived video processed once and watched a handful of times does not justify the AV1 encode time. H.264 ships, the watchers do not care, and the engineering hours are better spent elsewhere.

The toolchain does not support it. A DAW that does not understand Opus is a DAW that does not understand Opus. Ship WAV or FLAC into it and convert downstream.

"The first principle of optimization is that the optimization should pay for itself. The second principle is that the cost of measuring whether it pays for itself should not exceed the savings." Donald Knuth, paraphrased from his structured programming critique, applied to format choice.

Migration Patterns That Work

Three patterns reliably produce successful migrations.

Dual encode at ingest. Every file uploaded gets encoded twice (modern and legacy) and stored. The serving layer picks based on Accept headers. The user never sees the wrong format.

Background re-encode. Existing assets are re-encoded asynchronously by a worker pool. Recent and high-traffic content is migrated first. Over months, the legacy tail shrinks.

Cutover with deprecation window. Internal systems announce a cutover date, the old format gets warning logs after the date, and is removed three months later. Forces alignment without breaking workflows.

For broader views on production engineering and content workflows see the operational guides at whennotesfly.com, the certification training at pass4-sure.us, and tooling discussions at evolang.info.

Hardware Decode Reality in 2026

Format adoption depends on hardware decode availability. The 2026 status:

Format	iOS	Android (recent)	Windows	macOS	Linux
AVIF	Yes (A12+)	Yes (Pixel 6+, recent SoCs)	Software, GPU on RTX 30+	Yes (M1+)	Software
AV1 video	Yes (A17+)	Yes (Snapdragon 888+)	RTX 30+, Arc, Ryzen 7000+	Yes (M3+)	Driver-dependent
HEVC	Yes	Mostly	Paid extension	Yes	Software
Opus audio	Yes	Yes	Yes (Edge, Chrome)	Yes	Yes
JPEG XL	Safari only	No	No	Safari only	No

Hardware decode matters most on battery-constrained devices (phones, laptops). Software decode of AV1 video on a phone for one hour can drain 20 percent of battery; hardware decode draws negligible additional power. For VOD distribution this is the deciding factor in delivery format.

Building a Format Adoption Roadmap

A realistic 12-month adoption sequence for a typical content service:

Month	Action	Expected impact
1	Add AVIF generation to upload pipeline	New uploads benefit
2	Update CDN to serve AVIF on Accept	30% bandwidth reduction on supported clients
3	Switch backups from gzip to zstd	40% time reduction, 5% size reduction
4-6	Background re-encode of top 20% traffic to AVIF	20% additional CDN savings
7	Add Opus output for audio uploads	Cleaner mobile audio at lower bitrate
8-9	Migrate analytics CSV to Parquet	Query latency drops 5-10x
10	Pilot AV1 for new video uploads	30% video bandwidth reduction
11	JPEG XL evaluation for archival	Storage savings on master pool
12	Audit and document the new defaults	Reproducible going forward

The cumulative effect is typically 30 to 50 percent reduction in CDN bandwidth, 10 to 30 percent reduction in storage costs, and a measurable improvement in Core Web Vitals. The engineering effort is real but bounded.

Practical Recommendations

If you do one thing, switch new image uploads to AVIF with JPG fallback. If you do two things, switch new compression pipelines from gzip to zstd. The other migrations are valuable but slower to amortize.

The benefit of modern formats is not novelty. It is that the engineering committees who designed them got to learn from twenty years of mistakes in the formats they replaced, and they had hardware budgets the original designers could only dream of. The output is denser, more capable, and cheaper to serve.

Document Format Modernization

The document side of the modernization story is less dramatic but real. ODF (Open Document Format) and OOXML (DOCX, XLSX, PPTX) replaced the binary DOC, XLS, PPT formats. The benefit is not compression; it is parseability. A DOCX is a ZIP of XML files; you can open it with any text editor and edit the markup. A binary DOC is opaque without proprietary tools.

Old format	Modern replacement	Benefit
DOC (binary)	DOCX (Open XML)	Parseable, ISO 29500
XLS (BIFF)	XLSX	Parseable, larger sheets
PPT (binary)	PPTX	Parseable, smaller files
RTF	DOCX or Markdown	Better structure
WordPerfect	ODT or DOCX	Vendor-agnostic
PDF (without /A)	PDF/A-3	Archival guarantees

For PDF specifically, PDF/A-3 (ISO 19005-3) allows embedding source files inside the PDF, so a PDF/A-3 invoice can carry the original XML invoice as an attachment. This is the standard for ZUGFeRD electronic invoicing in the EU.

E-Book Formats

EPUB 3 (the W3C-published version of the format) is now standard for digital books. It is essentially a ZIP of XHTML, CSS, and SVG with a structured manifest. Compared to legacy MOBI, AZW3, or PDF-as-book formats, EPUB 3 supports reflow, accessibility (ARIA, semantic markup), embedded fonts, MathML, scripts, and audio.

# Convert HTML to EPUB 3 with pandoc
pandoc -f html -t epub3 -o book.epub --toc --css style.css *.html

# Validate against the EPUB 3 spec
java -jar epubcheck.jar book.epub

# Inspect contents (it is just a ZIP)
unzip -l book.epub

Practical Migration Sequence

Three sequencing patterns work in real projects.

Format-first migration. Pick one format (say, JPG to AVIF). Migrate every asset of that type. Move to the next format. Predictable, slow, low-risk.

Asset-first migration. Pick the highest-traffic 10 percent of assets. Migrate everything about those assets to modern formats. Cover the long tail later or never. High ROI, faster wins.

Greenfield-first migration. All new content uses modern formats from upload. Existing content stays as-is until it changes. Lowest engineering effort, slowest to amortize.

For most teams, greenfield-first plus opportunistic asset-first migration of high-traffic content produces the best return. Pure format-first migrations of large archives often die from scope creep before completion.

# Identify the top 100 most-served images for asset-first migration
zcat /var/log/nginx/access.log.gz | \
  awk '{print $7}' | grep -E '\.(jpg|png)$' | \
  sort | uniq -c | sort -rn | head -100

Alliance for Open Media. AV1 Bitstream and Decoding Process Specification, Version 1.0.0, January 2019.
Chen, Yue et al. "An Overview of Core Coding Tools in the AV1 Video Codec." Picture Coding Symposium (PCS), 2018. DOI: 10.1109/PCS.2018.8456249.
Valin, Jean-Marc et al. RFC 6716, Definition of the Opus Audio Codec. Internet Engineering Task Force, September 2012.
Collet, Yann, and Murray Kucherawy. RFC 8478, Zstandard Compression and the application/zstd Media Type. October 2018.
ISO/IEC 18181-1:2022. Information technology, JPEG XL Image Coding System.
Apache Software Foundation. Apache Parquet Documentation. https://parquet.apache.org/docs/
Mukherjee, Debargha et al. "An Overview of New Video Coding Tools Under Consideration for VP10." SPIE Optical Engineering, 2014.
Sullivan, Gary J. et al. "Standardized Extensions of High Efficiency Video Coding (HEVC)." IEEE Journal of Selected Topics in Signal Processing, 2013. DOI: 10.1109/JSTSP.2013.2283657.

Benefits of Modern Formats: Why Switch?

Why the New Generation Exists

Image Formats: The AVIF Migration

Audio Formats: Opus Wins Almost Everywhere

Video Formats: AV1 and HEVC Versus H.264

Compression Formats: Zstandard Replaces Gzip

Tabular Data: Parquet and Arrow

Modern Format Migration Decision Table

When Modern Is Not Worth It

Migration Patterns That Work

Hardware Decode Reality in 2026

Building a Format Adoption Roadmap

Practical Recommendations

Document Format Modernization

E-Book Formats

Practical Migration Sequence

Tags

Frequently Asked Questions

Why the New Generation Exists?

When Modern Is Not Worth It?

Document Format Modernization?

Ready to Convert Your Files?

Benefits of Modern Formats: Why Switch?

Why the New Generation Exists

Image Formats: The AVIF Migration

Audio Formats: Opus Wins Almost Everywhere

Video Formats: AV1 and HEVC Versus H.264

Compression Formats: Zstandard Replaces Gzip

Tabular Data: Parquet and Arrow

Modern Format Migration Decision Table

When Modern Is Not Worth It

Migration Patterns That Work

Hardware Decode Reality in 2026

Building a Format Adoption Roadmap

Practical Recommendations

Document Format Modernization

E-Book Formats

Practical Migration Sequence

Tags

Frequently Asked Questions

Why the New Generation Exists?

When Modern Is Not Worth It?

Document Format Modernization?

Related Articles

SVG vs PNG vs JPG: Which Image Format to Use

Expert Guide to PDF Optimization and Size Reduction

Understanding MP3 vs. FLAC: Which Audio Format to Choose?

Ready to Convert Your Files?