Converting an archive between formats is one of the more common operations in any data pipeline that touches files from multiple sources. A vendor ships a 7z file to a Windows 10 environment that does not support it. A Linux backup pipeline ingests a RAR archive and needs to repack it as TAR.ZST for storage. A user receives a tarball and wants ZIP for sharing with non-technical recipients. Each scenario boils down to the same operation: extract, then repack, with optional changes to compression algorithm, encryption, or directory structure.
The work is straightforward when done correctly and full of subtle traps when done casually. File modification timestamps drift across formats with different time precision. Symbolic links and hardlinks are preserved by some formats and silently flattened by others. Filename encoding differs between ZIP variants, with mojibake as the result. Permissions and ownership are preserved by TAR but not by ZIP. Empty directories vanish in some pipelines. This guide walks through the common conversions in detail, names the gotchas, and provides command-line recipes that produce predictable results.
What Conversion Actually Means
There is no single command in the archive ecosystem that converts a ZIP directly to a 7z without an intermediate extraction. Conversion is always a two-step process: extract to a working directory, then repack into the target format. Tools like Bandizip, PeaZip, and 7-Zip File Manager hide this from the user with a "convert" command, but underneath they extract to a temporary directory and repack.
# Generic conversion pattern
mkdir staging
7z x source.7z -o staging
cd staging
zip -r ../target.zip .
cd ..
rm -rf staging
This pattern is conceptually simple, but it has implications. The intermediate extraction requires disk space equal to the uncompressed size of the archive. For a 50 GB archive that expands to 200 GB, the conversion needs 200 GB of free space plus the original archive plus the target archive. On systems with limited storage, this can fail in ways that are not obvious until the disk fills.
The streaming alternative is available for some format pairs. TAR archives can be transcoded between compression algorithms without expanding to disk because TAR is a streaming format.
# Transcode tar.gz to tar.zst without intermediate disk
gunzip -c source.tar.gz | zstd -19 -o target.tar.zst
# Transcode tar.bz2 to tar.xz without intermediate disk
bzcat source.tar.bz2 | xz -9 > target.tar.xz
For ZIP to 7z and similar non-streaming pairs, intermediate extraction is required. Plan for it.
"Every conversion is also a re-encoding. The bytes change, the metadata changes, and if you do not know exactly what was metadata you will lose it." Eric S. Raymond, The Art of Unix Programming
ZIP to 7z
Converting ZIP to 7z is one of the most common conversions because it typically reduces the file size by twenty to forty percent on text-heavy content. The 7z format uses LZMA2 by default, which has a wider compression window than ZIP's DEFLATE.
# Linux/macOS
mkdir extract-dir
unzip -d extract-dir source.zip
7z a -t7z -mx=9 -m0=lzma2 -mfb=64 -md=64m -ms=on target.7z extract-dir/*
rm -rf extract-dir
The 7z flags deserve explanation. -t7z sets the archive type. -mx=9 selects maximum compression. -m0=lzma2 chooses the LZMA2 algorithm. -mfb=64 sets the fast bytes parameter, which improves compression on prose. -md=64m sets the dictionary size to 64 megabytes, which helps for archives with repetition across files. -ms=on enables solid mode, which compresses all files as a single stream and improves the ratio significantly for collections of small similar files.
The trade-off with solid mode is that random access becomes expensive: extracting one file from the middle requires decompressing everything before it. For archives that will be read whole, solid mode is correct. For archives that will be browsed by the recipient, it is slower than non-solid.
ZIP to TAR.GZ
This conversion is common when migrating data from a Windows environment to a Linux server pipeline. The semantic differences matter: ZIP does not preserve POSIX permissions or symbolic links reliably, while TAR does.
mkdir extract-dir
unzip -d extract-dir source.zip
tar czf target.tar.gz -C extract-dir .
rm -rf extract-dir
The -C extract-dir flag changes directory before adding files, which keeps the archive paths clean. The trailing dot adds the contents rather than the directory itself.
For a Linux-to-Linux transfer, prefer TAR.ZST over TAR.GZ:
tar -cf - -C extract-dir . | zstd -19 -o target.tar.zst
RAR to ZIP
RAR is proprietary and the WinRAR vendor's licensing terms restrict commercial redistribution of the encoder. Most open-source environments include unrar for extraction but not a RAR encoder. Conversion away from RAR is therefore the typical direction.
mkdir extract-dir
unrar x source.rar extract-dir/
cd extract-dir
zip -r ../target.zip . -9
cd ..
rm -rf extract-dir
Note that RAR's recovery records do not survive the conversion. If the source RAR included recovery records and the data is irreplaceable, generate PAR2 parity files for the new archive separately.
TAR.GZ to TAR.XZ or TAR.ZST
These same-bundle, different-compression conversions are the cleanest in the archive ecosystem because TAR is a streaming format. No intermediate file is required, and metadata is preserved exactly.
# tar.gz to tar.xz (smaller, slower)
gunzip -c source.tar.gz | xz -9 > target.tar.xz
# tar.gz to tar.zst (similar size, much faster)
gunzip -c source.tar.gz | zstd -19 -o target.tar.zst
# tar.bz2 to tar.zst (cleaner for new pipelines)
bzcat source.tar.bz2 | zstd -19 -o target.tar.zst
For backup pipelines retiring older formats, this is a one-line migration. The tar contents are unchanged byte-for-byte; only the compression wrapper differs.
| Source Format | Target Format | Pattern | Disk Overhead |
|---|---|---|---|
| ZIP | 7z | Extract then repack | 1x uncompressed size |
| ZIP | TAR.GZ | Extract then repack | 1x uncompressed size |
| RAR | ZIP | Extract then repack | 1x uncompressed size |
| RAR | TAR.ZST | Extract then repack | 1x uncompressed size |
| TAR.GZ | TAR.ZST | Stream transcode | None |
| TAR.GZ | TAR.XZ | Stream transcode | None |
| TAR.BZ2 | TAR.ZST | Stream transcode | None |
| 7z | ZIP | Extract then repack | 1x uncompressed size |
| 7z | TAR.ZST | Extract then repack | 1x uncompressed size |
| ISO | ZIP | Mount and copy then zip | 1x uncompressed size |
Preserving Filename Encoding
The single most common conversion bug is filename mojibake. ZIP archives created on Windows in non-UTF-8 locales (Japanese Shift_JIS, Korean EUC-KR, Cyrillic CP1251) often have filenames stored in the local code page rather than UTF-8. Linux unzip defaults to UTF-8 interpretation and produces broken filenames.
# Specify source encoding explicitly
unzip -O cp932 source-japanese.zip -d extract-dir
unzip -O cp949 source-korean.zip -d extract-dir
unzip -O cp1251 source-russian.zip -d extract-dir
The Info-ZIP unzip distributed with most Linux distributions supports the -O flag for source encoding. Once extracted with the correct encoding, the filenames are correct UTF-8 on the filesystem and can be repacked into a new archive that uses UTF-8 metadata flags.
When creating new ZIP archives, use the UTF-8 flag explicitly:
zip -r -UN=UTF8 target.zip extract-dir
The 7z and TAR formats handle UTF-8 filenames cleanly when both source and destination tools are recent. Legacy TAR implementations may truncate at 100 characters; the POSIX 1003.1-2001 (USTAR) and PAX extensions remove that limit.
Preserving Permissions and Symbolic Links
POSIX permissions and symbolic links are first-class citizens in TAR and ignored or partially supported in ZIP. Round-tripping through ZIP loses these attributes silently.
# Round-trip preserving permissions on Linux
tar cf - source-dir | (cd /target && tar xf -)
# Round-trip through ZIP loses permissions and symlinks
zip -r intermediate.zip source-dir # symlinks become file copies
unzip intermediate.zip -d target # permissions become umask defaults
For backup work, never round-trip through ZIP if the contents include executables, dotfiles with security-sensitive permissions, or symbolic links that point inside the archive. Use TAR end to end.
"Filesystems are full of metadata you only notice when it disappears. The permissions, the timestamps, the extended attributes, the access control lists. A copy that drops them is not a copy." Marshall Kirk McKusick, The Design and Implementation of the FreeBSD Operating System
Encrypted Archive Conversion
Encrypted archives need particular care during conversion because the conversion involves a window during which the data is in cleartext on disk. The mitigation is to do the conversion in a memory-backed filesystem (tmpfs) so the cleartext never touches persistent storage, or to use a streaming conversion that avoids intermediate writes.
# tmpfs-backed conversion to keep cleartext off disk
mkdir /dev/shm/staging
7z x -p source.7z -o/dev/shm/staging
cd /dev/shm/staging
zip -er -P "" target.zip . # prompts interactively for new password
cd -
rm -rf /dev/shm/staging
For high-value archives, prefer end-to-end GPG encryption. Encrypt with GPG once and the archive can be transcoded as a sealed blob using only the symmetric key that wraps the GPG payload. The cross-jurisdiction encryption requirements walked through at Corpy cover the regulatory implications when these files cross borders.
Bulk and Scripted Conversions
Pipelines that convert many archives at once benefit from a small wrapper script that handles cleanup, logging, and parallelism. The basic shape:
#!/usr/bin/env bash
set -euo pipefail
for src in *.zip; do
base="${src%.zip}"
staging="$(mktemp -d)"
trap "rm -rf '$staging'" EXIT
unzip -q -d "$staging" "$src"
tar -cf - -C "$staging" . | zstd -19 -o "${base}.tar.zst"
rm -rf "$staging"
trap - EXIT
done
The set -euo pipefail line is essential: it makes the script fail fast on any error, including pipefail in the tar to zstd pipeline. Silent failures during bulk conversion are how data corruption sneaks into archives at scale.
For very large bulk conversions, consider parallelism with xargs -P or GNU parallel. A typical modern workstation can run six to twelve compression streams in parallel without thrashing, depending on dictionary sizes and memory.
Validating the Conversion
Every conversion should end with a verification step that confirms the new archive contains exactly the data the source did. The minimum check is to extract the new archive to a clean directory and compare against an extract of the source.
mkdir source-extract target-extract
7z x source.7z -osource-extract
unzip target.zip -d target-extract
diff -r source-extract target-extract
A successful diff with empty output means the file contents match. Permissions and timestamps may differ depending on the format pair; diff treats them separately.
For archives where bit-exact preservation matters, compute checksums of every file in the extracted source and target and compare:
( cd source-extract && find . -type f -exec sha256sum {} \; ) | sort > source.sums
( cd target-extract && find . -type f -exec sha256sum {} \; ) | sort > target.sums
diff source.sums target.sums
The validation step is the line between a conversion that succeeded and a conversion that quietly dropped a directory.
Performance Numbers for Common Conversions
Rough conversion times on a modern workstation (24-core x86_64, NVMe SSD, 64 GB RAM) for a 10 GB source archive of mixed text and source code:
| Conversion | Wall Time | CPU Cores Used |
|---|---|---|
| ZIP to TAR.GZ | 8 minutes | 1 |
| ZIP to TAR.ZST (level 3) | 4 minutes | 4 (extract) + 1 (compress) |
| ZIP to TAR.ZST (level 19) | 18 minutes | 4 (extract) + 1 (compress) |
| ZIP to 7z (Ultra) | 24 minutes | 8 |
| RAR to TAR.ZST (level 3) | 5 minutes | 1 (unrar) + 4 (compress) |
| TAR.GZ to TAR.ZST (stream) | 3 minutes | 1 (gunzip) + 4 (zstd) |
| TAR.GZ to TAR.XZ (stream) | 12 minutes | 1 (gunzip) + 1 (xz) |
The note-keeping discipline at When Notes Fly maps naturally to backup discipline: a system that records what conversion was applied and when survives audits and restorations far better than tribal knowledge.
Common Pitfalls and How to Avoid Them
A short list of the failure modes that bite real conversion pipelines.
The empty directory drop. Some tools omit empty directories during conversion. ZIP and TAR both preserve them when invoked correctly, but a sloppy pipeline that uses find ... -type f will skip them. Use directory-aware archiving commands.
The timestamp drift. ZIP stores modification time at two-second precision (DOS time). TAR and 7z store at one-second or sub-second precision. Round-tripping through ZIP rounds timestamps to even seconds. For build pipelines that key on timestamps, this causes invalidations.
The hardlink flattening. ZIP does not preserve hardlinks; each link becomes a full copy. TAR preserves them with the --hard-dereference flag inverted (the default preserves links). For archives with significant hardlink density (some software distributions, rsync snapshots), this can balloon the archive size.
The Unicode normalization mismatch. macOS uses NFD-normalized filenames; Linux and Windows use NFC. A roundtrip through a tool that does not preserve normalization can cause filename mismatches that look identical visually but compare differently. Recent tar and zip handle this correctly; older versions can corrupt.
The forgotten extended attributes. macOS uses extended attributes for resource forks, quarantine flags, and ACLs. Linux uses them for SELinux contexts and capabilities. Standard tar and zip do not preserve them by default; use tar --xattrs on Linux and the macOS native ditto command for cross-Mac transfers.
Cross-Platform Conversion Notes
The same conversion command can produce subtly different output on different operating systems because of how each platform handles filesystem metadata. Three classes of differences are worth knowing.
Filesystem case sensitivity. macOS and Windows are case-insensitive by default; Linux is case-sensitive. An archive created on Linux that contains both Readme.txt and README.txt extracts on macOS or Windows with one file overwriting the other. The data loss is silent. Convert with awareness of the destination filesystem when source archives might contain case-only-distinct names.
Path separator handling. Windows uses backslash internally; ZIP and TAR specifications use forward slash. Most modern tools handle this automatically, but legacy ZIP archives from very old Windows tools can contain backslash separators that confuse Linux extractors. Use a recent unzip and the issue disappears.
Path length limits. Windows file paths historically capped at 260 characters. ZIP and TAR archives created on Linux with deep directory trees may produce paths that exceed the Windows limit on extraction. Windows 10 1607 and later support long paths when the application opts in, but many extractors do not. For archives intended for Windows, keep total path length under 250 characters where practical.
The procurement and audit playbooks at Pass4Sure on cross-platform certification deliveries cover similar concerns: the file that works perfectly on the producer's machine is not necessarily the file that opens reliably on every reviewer's machine, and the conversion is the moment to catch the difference.
For related guidance, see how to choose the best archive format for data storage and the role of file conversion in enhancing accessibility.
References
- PKWARE. APPNOTE.TXT - .ZIP File Format Specification, version 6.3.10. https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT
- Pavlov, I. 7z Format Specification and 7-Zip documentation. https://www.7-zip.org/7z.html
- Free Software Foundation. GNU tar manual. https://www.gnu.org/software/tar/manual/
- Facebook (Meta). Zstandard Compression and the application/zstd Media Type. RFC 8878. https://www.rfc-editor.org/rfc/rfc8878
- Collin, L. The xz file format specification. https://tukaani.org/xz/xz-file-format.txt
- POSIX.1-2001 (IEEE Std 1003.1-2001) tar utility specification. https://pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html
- Info-ZIP. UnZip documentation. http://infozip.sourceforge.net/UnZip.html
- PAR2 specification (Multipar). https://github.com/Parchive/par2cmdline
Frequently Asked Questions
What Conversion Actually Means?
There is no single command in the archive ecosystem that converts a ZIP directly to a 7z without an intermediate extraction. Conversion is always a two-step process: extract to a working directory, then repack into the target format. Tools like Bandizip, PeaZip, and 7-Zip File Manager hide this from the user with a "convert" command, but underneath they extract to a temporary directory and repack.
Ready to Convert Your Files?
Use our free online file converter supporting 240+ formats. No signup required, fast processing, and secure handling of your files.
Convert Files


