Metadata is the part of a file that nobody thinks about until it goes missing. A photographer discovers that the wedding album has lost every "shot at f/2.8 on the 85mm" detail because their batch script stripped EXIF. A journalist finds out the GPS coordinates of an anonymous source's home are embedded in the photo they just published. A podcast network gets complaints because every episode shows up as "Track 01 Unknown Artist" in car stereos. Three different industries, three different metadata standards, one shared root cause: the file converter in the middle of the workflow did not preserve what the producer assumed was preserved.

This article is about what metadata actually is, which standards apply where, what survives common conversions, and how to control preservation with tools like exiftool, ffmpeg, and qpdf. The focus is practical: which flag, which command, which verification step. Format theory is in the references at the end.

What metadata is, structurally

A file is two streams of bytes: payload and metadata. The payload is the part the user perceives: pixels, audio samples, document text. Metadata describes the payload: who made it, when, with what equipment, in what color space, with what intended audience. Metadata is small (usually under 1 percent of file size) but carries information that took human labor to produce.

Metadata in a media file lives in one of three places: in the container header (e.g., MP4 box atoms), in a dedicated metadata segment (e.g., JPEG APP1 markers for EXIF), or as an XMP packet that can appear in many container types because it is just XML. A file can have multiple metadata blocks pointing at overlapping fields with conflicting values.

"The fundamental cause of trouble in the modern world is that the stupid are cocksure while the intelligent are full of doubt." Bertrand Russell, The Triumph of Stupidity

Most batch scripts strip metadata because the author was not sure whether to keep it. The right default is preserve, with explicit stripping where you have a documented reason.

The major metadata standards

StandardUsed inWhat it storesMaintained by
EXIFJPEG, TIFF, HEIC, RAWCamera settings, GPS, timestamps, orientationCIPA (Camera & Imaging Products Association)
IPTCJPEG, TIFF, embedded as legacyCaptions, keywords, copyright, contact infoInternational Press Telecommunications Council
XMPAlmost any formatAnything (extensible RDF/XML)Adobe (now ISO 16684)
ID3MP3Title, artist, album, art, lyricsid3.org community
iXMLWAV (BWF)Production audio metadataGallery (broadcast audio standard)
MXF descriptorsMXFBroadcast video production dataSMPTE
PDF Info dictPDFTitle, author, creation dateAdobe (ISO 32000)
PDF XMPPDFAnything Adobe extendsISO 32000
Vorbis commentsOGG, FLACTags as key-value pairsXiph.org
Most modern files use multiple standards. A photo shot on an iPhone 15 has EXIF (camera data), IPTC (if edited in Photos), and XMP (Adobe-style edits if it passes through Lightroom). They can disagree, and the order of priority depends on the reader.

What survives common conversions

The honest answer is "less than you think, and you should verify every conversion path." The following table summarizes typical behavior with default settings of major tools.

ConversionEXIFIPTCXMPEmbedded art
JPEG to JPEG (ImageMagick mogrify)PreservedPreservedPreservedN/A
JPEG to JPEG (with -strip)RemovedRemovedRemovedN/A
JPEG to WebP (cwebp default)PreservedLostLostN/A
JPEG to AVIF (libavif)PreservedPreservedPreservedN/A
JPEG to PNG (most tools)Lost or partialLostLostN/A
RAW to JPEG (most converters)Subset preservedLostVariableN/A
HEIC to JPEG (sips on macOS)Mostly preservedLostLostN/A
MP3 to MP3 (ffmpeg default)N/AN/AN/ALost (no -map 0:v)
MP3 to MP3 (ffmpeg -map 0)N/AN/AN/APreserved
FLAC to MP3 (with -map_metadata 0)N/AN/AN/APreserved if mapped
MP4 to MP4 (ffmpeg -c copy)N/AN/AN/AN/A (movie atoms preserved)
PDF to PDF (qpdf)N/AN/APreservedN/A
PDF to DOCX (LibreOffice)N/AN/ATitle and author onlyN/A
DOCX to PDF (LibreOffice)N/AN/ATitle and author preservedN/A
The pattern: same-format conversions preserve well; cross-format conversions lose information unless the tool is specifically configured to map fields.

EXIF: the camera's story

EXIF is the metadata format for photography. It records aperture, shutter speed, ISO, focal length, lens model, GPS coordinates, capture timestamp, and the camera's color space assumption. A modern phone photo has roughly 60 EXIF tags filled in.

# View all metadata
exiftool photo.jpg

# View only EXIF
exiftool -EXIF:all photo.jpg

# Copy all metadata from one file to another
exiftool -tagsFromFile source.jpg -all:all target.jpg

# Strip GPS but keep everything else
exiftool -gps:all= photo.jpg

The single most important EXIF tag is Orientation. Phones record images in landscape pixel order regardless of how the user held the device, with an Orientation tag indicating "rotate 90 clockwise to display correctly." A converter that ignores this tag produces sideways thumbnails. Always normalize orientation early in any pipeline:

# ImageMagick: bake orientation into pixels then reset the tag
mogrify -auto-orient photo.jpg

# Or with exiftool, just set the tag without rotating pixels
# (only do this if the pixels are already in the right order)
exiftool -Orientation=1 -n photo.jpg
"The most important property of a program is whether it accomplishes the intention of its user." C.A.R. Hoare

A photo whose orientation tag is correct in EXIF but whose pixels are sideways accomplishes nothing useful. Verify orientation on output, do not assume the metadata describes reality.

IPTC: the editorial layer

IPTC stores information that an editor or photographer adds: caption, headline, keywords, byline, credit, copyright notice, contact email, location name. News agencies use IPTC heavily. Wedding photographers use it to embed model release status. E-commerce uses it to embed product SKUs.

The IPTC IIM standard (the older binary form) is being superseded by IPTC Photo Metadata 2024, which uses XMP as the carrier. Most tools read both and write to both for backward compatibility.

# Add IPTC fields to a photo
exiftool \
  -IPTC:By-line="Jane Doe" \
  -IPTC:CopyrightNotice="(c) 2026 Example Studios" \
  -IPTC:Caption-Abstract="Bride and groom at sunset, Cancun" \
  -IPTC:Keywords="wedding, sunset, Cancun, beach" \
  photo.jpg

For batch operations, a CSV-driven exiftool invocation lets a content manager update fields across thousands of files from a spreadsheet:

exiftool -csv=metadata.csv -overwrite_original *.jpg

The CSV's first column is the source filename and the rest are tag names matching exiftool's syntax.

XMP: the format Adobe gave the world

XMP is Adobe's universal metadata format. It is RDF/XML, which means it is verbose but extensible. Anything you want to record about a file can be expressed as a custom XMP namespace, and competent tools will preserve it through conversions.

XMP packets can be embedded in JPEG, TIFF, PNG, PDF, MP3, MP4, MOV, and many others. The packet is human-readable, which is occasionally useful for debugging:

# Extract XMP from a PDF
qpdf --show-object=trailer document.pdf
exiftool -xmp -b document.pdf > extracted.xmp

For pipelines that process content across multiple media types (a marketing team handling photos, videos, and PDFs), XMP is the most reliable way to carry structured metadata without per-format lookup tables.

Audio metadata: ID3, Vorbis comments, iXML

Audio metadata fragments more than image metadata. MP3 uses ID3, FLAC and OGG use Vorbis comments, WAV files in production use BWF iXML, AAC uses iTunes-style atoms in MP4. They all store roughly the same information (title, artist, album, year) but the field names and structures differ.

FieldID3v2.3VorbisMP4 atom
TitleTIT2TITLEnam
ArtistTPE1ARTISTART
AlbumTALBALBUMalb
Track numberTRCKTRACKNUMBERtrkn
YearTYERDATEday
GenreTCONGENREgen
Album artistTPE2ALBUMARTISTaART
ComposerTCOMCOMPOSERwrt
LyricsUSLTLYRICSlyr
Cover artAPICMETADATA_BLOCK_PICTUREcovr
ffmpeg's `-metadata` flag works at the abstract level: pass `title=...` and ffmpeg writes the right atom or frame for the output container.
ffmpeg -i input.flac \
  -metadata title="Episode 47" \
  -metadata artist="Audio Engineering Show" \
  -metadata album="Season 3" \
  -metadata date="2026" \
  -map_metadata 0 \
  -id3v2_version 3 \
  -c:a libmp3lame -q:a 2 \
  output.mp3

The -map_metadata 0 carries everything from input 0 first, then -metadata flags override or add specific fields.

PDF metadata: the document layer

PDF stores metadata in two places: the legacy Info dictionary (Title, Author, Subject, Keywords, Producer, Creator, CreationDate, ModDate) and the XMP packet, which can carry anything. ISO 32000-2 mandates XMP for PDF/A archival compliance, and the Info dictionary is increasingly seen as a legacy fallback.

# View PDF metadata
exiftool document.pdf

# Set Info dictionary fields with qpdf
qpdf --replace-input \
  --set-page-labels=1:r:i \
  --add-page-info \
  document.pdf

# Set with exiftool
exiftool \
  -Title="Annual Report 2026" \
  -Author="Jane Doe" \
  -Subject="Financial summary" \
  -Keywords="2026, annual, financial" \
  document.pdf

For PDF/A compliance (long-term archival), metadata must be present in both the Info dictionary and as a matching XMP packet. veraPDF or qpdf can verify the match.

Privacy: when stripping is the goal

The default for production batches should be preserve, but there are clear cases where stripping is the right action.

ScenarioWhat to stripWhy
Photos uploaded to a public websiteGPS, camera serial numberGeolocation and equipment-tracking risk
Documents shared with adversaries (legal, journalism)Author, modification history, commentsReveals identity and editorial process
Real-estate marketing photosGPS onlyReveals exact property location
Medical imaging shared for second opinionPatient name, exam ID, hospitalHIPAA and equivalent regulations
Photos of children for public sharingAll metadataComprehensive privacy
# Strip everything from a JPEG
exiftool -all= -overwrite_original photo.jpg

# Strip only GPS
exiftool -gps:all= -overwrite_original photo.jpg

# Strip metadata from a PDF
qpdf --linearize --object-streams=generate \
  --remove-info=true \
  --remove-metadata=true \
  input.pdf clean.pdf

Verification is critical because some tools claim to strip metadata but leave fragments. Always run exiftool against the output to confirm.

Cross-domain consistency in metadata pipelines

A team producing content across multiple sites needs a metadata strategy that survives format conversion. The same field names, the same controlled vocabulary for keywords, the same copyright notice format. This is true whether the content is study materials at pass4-sure.us, email-and-writing templates at evolang.info, or company-formation guides at corpy.xyz.

The pattern that scales is: single source of truth for metadata in a database, batch-applied to deliverables at conversion time, never edited in the deliverable itself. This avoids drift between platforms.

"The road to hell is paved with broken hyperlinks." Tim Berners-Lee, paraphrased

The same applies to metadata. A pipeline that breaks copyright notices on every conversion gradually loses the ability to prove who made what.

Verification: the QC step nobody runs

Every batch should verify metadata on the output, not trust the tool's defaults. The simplest check:

# After conversion, dump metadata for spot review
exiftool -j -EXIF:all -IPTC:all -XMP:all output.jpg > output.json

# Compare to a reference
diff <(exiftool -j input.jpg) <(exiftool -j output.jpg) | less

For batches, write the comparison into the runner so any drop in field count triggers a warning.

Metadata in video containers

Video containers (MP4, MKV, MOV, MXF) store metadata as a tree of typed atoms or elements. The key practical concerns:

# Probe video metadata
ffprobe -v error -show_format -show_streams input.mp4

# Carry all metadata through a remux
ffmpeg -i input.mp4 -c copy \
  -map_metadata 0 -map_chapters 0 \
  -movflags use_metadata_tags \
  output.mp4

# Add specific metadata
ffmpeg -i input.mp4 -c copy \
  -metadata title="Episode 47" \
  -metadata description="In which we discuss compression" \
  -metadata creation_time="2026-04-30T12:00:00Z" \
  -metadata:s:v:0 language=eng \
  -metadata:s:a:0 language=eng \
  -metadata:s:s:0 language=fra \
  output.mp4

The -movflags use_metadata_tags is critical for Apple ecosystems. Without it, custom metadata fields written to MP4 are dropped on iOS playback because Apple's parser only reads a known set of atoms.

For broadcast workflows using MXF, metadata fields are far more structured (SMPTE descriptive metadata schemes) and require dedicated tools like FFmpeg's mxfmd5 or Avid's MXF tools. Generic ffmpeg can read most fields but does not understand the full SMPTE schema.

Tooling table

NeedToolOne-line example
Read any metadataexiftoolexiftool file
Write image metadataexiftoolexiftool -Title=X file.jpg
Write video metadataffmpegffmpeg -i in -metadata title=X -c copy out
Write PDF metadataqpdf or exiftoolexiftool -Title=X file.pdf
Strip all metadataexiftoolexiftool -all= file
Verify PDF/A metadata complianceveraPDFverapdf --flavour 2b file
Diff metadata before/afterexiftool with diffSee verification section

Common mistakes that survive years of practice

Three errors recur. First, batches that strip metadata "to save space" save kilobytes and lose information that took human labor to produce. Second, batches that assume metadata transfers across formats produce silent losses (especially JPEG to PNG, which loses EXIF in most tools). Third, pipelines that never verify output metadata accumulate drift that nobody notices until a customer complains.

A pipeline that respects these three rules ships files that carry their full provenance to the people who need it.

References

  1. CIPA DC-008-2023, "Exchangeable image file format for digital still cameras: EXIF Version 3.0." Camera & Imaging Products Association, 2023.
  2. ISO 16684-1:2019, "Graphic technology - Extensible metadata platform (XMP) - Part 1: Data model, serialization and core properties." International Organization for Standardization.
  3. ISO 32000-2:2020, "Document management - Portable document format - Part 2: PDF 2.0." International Organization for Standardization.
  4. IPTC, "IPTC Photo Metadata Standard 2024.1." International Press Telecommunications Council, 2024. Available: https://iptc.org/std/photometadata/specification/
  5. ID3.org, "ID3 tag version 2.4.0 - Native Frames." Available: https://id3.org/id3v2.4.0-frames
  6. Xiph.Org Foundation, "Vorbis comment specification." Available: https://www.xiph.org/vorbis/doc/v-comment.html
  7. ISO 19005-3:2012, "Document management - Electronic document file format for long-term preservation - Part 3: Use of ISO 32000-1 with support for embedded files (PDF/A-3)."
  8. Harvey, P., "ExifTool documentation." Available: https://exiftool.org/