Wasting disk space is a crime.
Why I started doing this
A few years ago, I wrote about playing games on a MacBook. Since then, I’ve downloaded dozens of games onto my Mac. The problem is obvious: the machine only has a 256 GiB SSD. Once enough games pile up, storage becomes painfully tight. I don’t really want to delete them all, so the next best option is making each game take up as little space as possible.
To keep things running smoothly on macOS, most of the games I play are built with engines that travel well across platforms: Ren’Py, RPG Maker, Godot, and similar tools. Games made with engines like RPG Maker also tend to share another trait: developers often rely on the engine’s built-in assets, and sometimes throw in lots of third-party stock resources too. In practice, that means a collection of dozens of games is almost guaranteed to contain a huge amount of duplicated art, audio, and other data. If those duplicates can be collapsed, the space savings should be pretty substantial.
The basic deduplication approach
For file deduplication, jdupes turned out to be a very good fit. It supports multiple strategies, including hard links and copy-on-write features provided by some file systems. I ended up choosing hard links for everything.
Copy-on-write sounds nice in theory, but in practice jdupes will still treat already-deduplicated files as separate files the next time it runs, so it keeps repeating the work. It also makes the final accounting less clear. In my case, the files being deduplicated are game assets that are never going to be edited after the fact, so hard links are the simplest and cleanest option.
The command is almost embarrassingly simple:
jdupes -r -L Game
After downloading a new game, I can just run that again inside the game library and let it merge duplicate assets with existing ones.
That said, plenty of games can’t be deduplicated directly this way. Some bundle all their resources into a single archive; others apply light encryption. Even if two games contain the exact same image or audio file, the packaged result may not be byte-for-byte identical. So before deduplication works properly, I need the assets extracted into their original standalone form.
How that works depends on the engine.
Engine-specific handling
RPG Maker MV / MZ
Decrypting RPG Maker MV/MZ games is straightforward. One well-known option is RPG-Maker-MV-Decrypter, which runs in the browser. That works, but a single game can contain a huge number of asset files, and uploading all of them into a browser workflow is just annoying.
I later found another tool written in C#: RPG Maker Decrypter. As a command-line utility, it is much more convenient, and it can extract only the resource files, which also lets me strip out the browser runtime files bundled with the game.
There was one issue in its code, though. It seemed to treat file selection as case-sensitive, so folders with uppercase letters in their names could get skipped. That didn’t fit what I needed. I don’t write C#, so I had AI patch it for me and submitted a pull request. The maintainer didn’t seem enthusiastic about AI-written code and apparently had no intention of merging it 😅. Fine by me; I only needed it for personal use.
Using the tool is simple:
RPGMakerDecrypter-cli [input] -p -o [output]
Once the files are extracted, I just edit data/System.json and set hasEncryptedImages and hasEncryptedAudio to false. After that, on macOS I can launch the game in a browser by running python3 -m http.server from the game directory.
While doing this, I noticed another recurring problem: some games ship full-resolution original artwork directly inside the game files. Individual images can be several megabytes each, even though RPG Maker won’t render them anywhere near that resolution. So they waste a large amount of disk space for no practical benefit. Because those images were encrypted in the original package, they are not even particularly useful as collectibles for most people.
So after decryption, I decided to batch-compress them with lossy encoding. I had AI write a simple compression script for that:
#!/usr/bin/env python3
"""
图片压缩脚本(多进程版本)
将 pictures.orig 文件夹中的图片使用 WebP 格式进行高效压缩,
保持分辨率不变,肉眼看不出差异,压缩后的图片保存到 pictures 文件夹。
使用方法:
python3 compress_images.py
压缩策略:
- 保持原始分辨率不变
- 使用 WebP 格式(有损压缩,高质量)
- 质量设置为 85,在保持视觉质量的同时显著减小文件大小
- 文件名和后缀保持不变
- 多进程并行处理
- 处理失败时自动复制原文件
"""
import os
import shutil
from PIL import Image
from pathlib import Path
from multiprocessing import Pool, cpu_count
from functools import partial
# 配置路径
SOURCE_DIR = "pictures.orig"
OUTPUT_DIR = "pictures"
# WebP 质量设置 (0-100,数值越高质量越好,文件也越大)
# 85 是一个很好的平衡点,肉眼几乎看不出差异
WEBP_QUALITY = 85
# 对于带有透明通道的图片,可以设置不同的质量
WEBP_QUALITY_WITH_ALPHA = 80
# 并行进程数,默认为 CPU 核心数
NUM_WORKERS = cpu_count()
def compress_single_image(img_file: tuple[str, str, str]) -> tuple[str, bool, int, int]:
"""
压缩单个图片文件(用于多进程)
Args:
img_file: (源文件路径, 输出文件路径, 输出目录) 元组
Returns:
(文件名, 是否成功, 原始大小, 压缩后大小) 元组
"""
source_path, output_path_str, output_dir = img_file
source_path = Path(source_path)
output_path = Path(output_path_str)
original_size = source_path.stat().st_size
try:
img = Image.open(source_path)
# 检查是否有透明通道
has_alpha = img.mode in ('RGBA', 'LA', 'PA') or (img.mode == 'P' and 'transparency' in img.info)
# 确定使用的质量
quality = WEBP_QUALITY_WITH_ALPHA if has_alpha else WEBP_QUALITY
# 保存为 WebP 格式,但使用原始的文件扩展名
img.save(
str(output_path),
format='WEBP',
quality=quality,
method=6 # 压缩方法 0-6,6 是最慢但压缩率最高的
)
compressed_size = output_path.stat().st_size
return (source_path.name, True, original_size, compressed_size)
except Exception as e:
# 处理失败时,复制原文件到输出目录
try:
shutil.copy2(source_path, output_path)
compressed_size = output_path.stat().st_size
return (source_path.name, False, original_size, compressed_size)
except Exception as copy_error:
return (source_path.name, False, original_size, 0)
def main():
source_dir = Path(SOURCE_DIR)
output_dir = Path(OUTPUT_DIR)
# 检查源目录是否存在
if not source_dir.exists():
print(f"错误: 源目录 '{SOURCE_DIR}' 不存在")
return
# 创建输出目录
output_dir.mkdir(exist_ok=True)
# 获取所有图片文件(支持多种格式)
image_extensions = ('*.png', '*.jpg', '*.jpeg', '*.bmp', '*.gif', '*.tiff', '*.webp')
image_files = []
for ext in image_extensions:
image_files.extend(source_dir.glob(ext))
image_files = sorted(set(image_files)) # 去重并排序
if not image_files:
print(f"在 '{SOURCE_DIR}' 中没有找到图片文件")
return
# 构建任务列表
tasks = []
for img_file in image_files:
output_path = output_dir / img_file.name # 保持原文件名和后缀
tasks.append((str(img_file), str(output_path), str(output_dir)))
print(f"找到 {len(tasks)} 个图片文件")
print(f"源目录: {SOURCE_DIR}")
print(f"输出目录: {OUTPUT_DIR}")
print(f"WebP 质量设置: {WEBP_QUALITY}")
print(f"并行进程数: {NUM_WORKERS}")
print("-" * 70)
# 使用多进程池处理图片
success_count = 0
fail_count = 0
total_original = 0
total_compressed = 0
with Pool(processes=NUM_WORKERS) as pool:
for i, (filename, success, original_size, compressed_size) in enumerate(pool.imap(compress_single_image, tasks), 1):
total_original += original_size
total_compressed += compressed_size
if success:
success_count += 1
marker = "✓"
reduction = (1 - compressed_size / original_size) * 100 if original_size > 0 else 0
status_msg = f"{reduction:+.1f}%"
else:
fail_count += 1
marker = "✗"
status_msg = "复制原文件"
status = f"[{i}/{len(tasks)}] {filename}"
print(f"{marker} {status:50} {original_size/1024:>8.1f}KB -> {compressed_size/1024:>8.1f}KB ({status_msg})")
# 输出总结
print("-" * 70)
total_reduction = (1 - total_compressed / total_original) * 100 if total_original > 0 else 0
print(f"压缩完成!")
print(f" 成功处理: {success_count}/{len(tasks)} 个文件")
if fail_count > 0:
print(f" 失败(已复制原文件): {fail_count}/{len(tasks)} 个文件")
print(f" 原始总大小: {total_original / 1024 / 1024:.2f} MB ({total_original / 1024:.1f} KB)")
print(f" 压缩后大小: {total_compressed / 1024 / 1024:.2f} MB ({total_compressed / 1024:.1f} KB)")
print(f" 总压缩率: {total_reduction:.1f}%")
print(f" 节省空间: {(total_original - total_compressed) / 1024 / 1024:.2f} MB")
if __name__ == "__main__":
main()
After that pass, I uploaded the original images to an EH gallery and kept only the compressed versions locally. The folder shrank from a bit over 2 GiB to a bit over 300 MiB, which is a very noticeable improvement.
Some games also used Ogg FLAC background music. That format takes up a lot of storage, and Safari could not decode it at all when I was playing in the browser, though Chrome probably can. I do care about HiFi when I’m listening to music, but for game BGM that feels unnecessary. So for files like that, I convert them into normal lossy Ogg with:
ffmpeg -i input.flac.ogg -c:a vorbis -strict -2 -q:a 10 output.ogg
RPG Maker XP / VX / VA
Games made with RPG Maker XP/VX/VA are built around RGSS, which is based on Ruby. As scripts, they have the ingredients for cross-platform use, but the official runtime was never made truly cross-platform, so they do not run directly on macOS.
The good news is that mkxp-z can run RPG Maker XP/VX/VA games across platforms, so I’ve collected some of those as well.
Their assets are usually protected only by very light obfuscation and are often packed into a single RGSSAD file. Extracting those is easy enough with the same RPG Maker Decrypter tool.
These games also have another quirk: some of them require RTP to run. RTP is basically the official shared asset package that ships with RPG Maker. Presumably it was originally intended as a storage-saving mechanism, which makes it a little strange that later MV/MZ releases moved away from that approach.
mkxp-z can load RTP through configuration, but since I’m already using hard links for deduplication, I don’t bother maintaining RTP separately. I just merge the RTP assets directly into each game and let jdupes handle the duplicates. That has one extra advantage: some XP/VX/VA assets may also overlap with MV/MZ assets, so hard-link deduplication can eliminate those duplicates too.
Ren’Py
Ren’Py is a bit different. The engine itself does not come with a major pool of shared public assets, so duplicate files are less of a general problem.
Still, some of the Ren’Py games I play are part of long-running series, and those series often reuse a large amount of art and audio between installments. Developers usually don’t bother to share those resources in a space-efficient way for players, and Ren’Py games also package data into archive files, so unpacking is still necessary before deduplication can do anything useful.
Ren’Py’s .rpa archives are easy to extract. There is an existing tool called unrpa, and you can install it directly with pip.
I do wonder why these engines love bundling resources into single package files when unpacking them is often this easy. Maybe it helps performance.
That said, because Ren’Py doesn’t have much in the way of common engine assets, unpacking is not always worth it. If the game is not part of a series, deduplication opportunities may be limited, and unpacking can actually make storage usage worse. A large archive can take less real disk space than a pile of small files because the file system allocates storage in clusters, so misalignment overhead starts to matter.
Checking whether it actually saved space
After all of that, the easiest way to verify the result is to compare du -sh and du -shl.
Here’s what I got after slimming down my game library:
~ % du -sh Game
33G Game
~ % du -shl Game
47G Game
That is a pretty respectable difference.
And with SSD prices where they are now, anything that reduces the need for more capacity feels worthwhile. Then again, all of this content can be downloaded again anyway, so the most effective storage-saving strategy might still be the boring one: just delete the games after finishing them 😂.