Building the Ultimate Linux Home Server - Part 1: Intro, MergerFS, and SnapRAID

Turning an old desktop PC into a fully fledged Linux server using MergerFS and SnapRAID.

Guide — Jul 24, 2021

A couple of months ago, while wasting time browsing Reddit, I discovered r/homelab. After looking at the top posts for a couple of weeks, I decided that I also wanted to join in on the action and build my own self-hosted Linux Server.

I soon ran into a couple of issues. You see, as a university student without a full-time job, I really didn't have any disposable income to spend on expensive server racks, network equipment, and drive arrays.

I did, however, have my old desktop which was collecting dust under my table after I had replaced it with a new laptop, so I decided to experiment and try to use that as my server.

Some of the requirements I wanted this server to fulfil were the following:

The ability to combine and use drives of different sizes with a single mountpoint
The ability to add new drives whenever space is running low without much hassle
Protection against drive failure
Automated backup of data and config files
NAS array and backup solution for other devices
Automated Movie, TV Show, Music, and Anime downloading, sorting, and tagging.
Personal streaming service for family and friends
Automated torrent download client
Personal cloud server accessible from anywhere
Automatic updating of all services
Host for any other projects/ideas/tools I wanted to use in the future
Easy deployment of applications and services without worrying about dependencies/conflicts

Post contents

Hardware

The PC I used by no means contains server-grade hardware, but since I had originally built it as a gaming computer, it did the job well enough. Its specs are as follows:

CPU: Intel Core i7-6700K 4GHz Quad-Core
Cooler: Cooler Master Seidon 120V
Motherboard: ASRock Fatal1ty Z170 Gaming K4
RAM: Corsair Vengeance LPX 16 GB (2 x 8 GB) DDR4-2400 CL14
GPU: MSI Radeon R9 390 8GB
Case: Phanteks Enthoo Pro ATX
Power Supply: Cooler Master 750W 80+ Bronze Semi-modular ATX
Boot Drive: Samsung 860 EVO 250GB SSD

I could use the system as-is, but I decided to turn off the dedicated GPU in the BIOS since it didn't have any real purpose, and I didn't want to use as much power as a small city.

As far as storage goes, I installed the operating system on the SSD and used a bunch of good-ol' spinning rust hard drives for everything else since most files would remain unchanged.

Operating System

There is a lot of debate about which Linux distro is best to use for a server. Most people would recommend something like Ubuntu Server or Debian because of their stability, however, I decided to use Arch^btw for a couple of really specific reasons:

I wanted a lightweight installation which I could tailor to my exact needs
I wanted to become more proficient at using the Linux command line and its tools, instead of having everything get set up automatically

Despite that, most of the things I will talk about will be the same or similar, independent of what distro you decide to use.

The OS installation itself is beyond the scope of this article since there are lots of better-written guides that explain the process (The Arch Wiki, for example).

MergerFS

There are definitely lots of different ways to go about setting up the storage system for your server. You could create a hardware RAID array, try out ZFS, or simply mount all your drives in your home directory and manage them manually.

However, none of these solutions is nearly as flexible, inexpensive, and easy to use as MergerFS.

Mergerfs is a union filesystem geared towards simplifying storage and management of files across numerous commodity storage devices. It is similar to mhddfs, unionfs, and aufs.

In layman's terms, MergerFS allows you to combine drives of different sizes and speeds into a single mountpoint, automatically managing how files are stored in the background.

1TB       +      2TB      =       3TB
/disk1           /disk2           /merged
|                |                |
+-- /dir1        +-- /dir1        +-- /dir1
|   |            |   |            |   |
|   +-- file1    |   +-- file2    |   +-- file1
|                |   +-- file3    |   +-- file2
+-- /dir2        |                |   +-- file3
|   |            +-- /dir3        |
|   +-- file4        |            +-- /dir2
|                     +-- file5   |   |
+-- file6                         |   +-- file4
                                  |
                                  +-- /dir3
                                  |   |
                                  |   +-- file5
                                  |
                                  +-- file6

Of course, there is a minor performance overhead when using this approach but as far as home servers are concerned, the advantages outweigh the disadvantages.

My Setup

For my system I have 3 drives in total formatted as ext4, excluding the boot drive:

WD Blue 1TB 5400 RPM HDD - /mnt/disk1 - Part of storage pool
WD Red 4TB 5400 RPM HDD - /mnt/disk2 - Part of storage pool
WD RED 4TB 5400 RPM HDD - /mnt/parity1 - SnapRAID parity, you must use your largest drive for this

The three main storage drives are then pooled into a new directory called /mnt/storage where all my files can be accessed from. /mnt/storage contains the following subdirectories:

/public: Public folder accessible by all users
/public/media: Media storage
/private: Folder with personal user data for all users that store files on the server.
/configs: Application config files

Installation

Getting MergerFS up and running is pretty simple since you just need to install it using your preferred package manager, edit your fstab file, and reboot.

yay -S mergerfs

After installing it, you need to find the disk IDs since the mapping of a device to a drive letter is not guaranteed to always be the same, even with the same hardware and configuration: ls /dev/disk/by-id

Running this command will return something similar to this:

ls /dev/disk/by-id
ata-Optiarc_DVD_RW_AD-5240S                          wwn-0x50014ee20d526ebe
ata-Samsung_SSD_860_EVO_250GB_S3YJNB0K512940F        wwn-0x50014ee20d526ebe-part1
ata-Samsung_SSD_860_EVO_250GB_S3YJNB0K512940F-part1  wwn-0x50014ee21329b6cf
ata-Samsung_SSD_860_EVO_250GB_S3YJNB0K512940F-part2  wwn-0x50014ee21329b6cf-part1
ata-WDC_WD10EZRZ-00HTKB0_WD-WCC4J2AXKT3R             wwn-0x50014ee2bcff024d
ata-WDC_WD10EZRZ-00HTKB0_WD-WCC4J2AXKT3R-part1       wwn-0x50014ee2bcff024d-part1
ata-WDC_WD40EFAX-68JH4N0_WD-WX12D80N59SR             wwn-0x5002538e403be893
ata-WDC_WD40EFAX-68JH4N0_WD-WX12D80N59SR-part1       wwn-0x5002538e403be893-part1
ata-WDC_WD40EFAX-68JH4N0_WD-WX52D104L73P             wwn-0x5002538e403be893-part2
ata-WDC_WD40EFAX-68JH4N0_WD-WX52D104L73P-part1

What we are interested in are the lines containing the IDs of the partitions themselves instead of the entire drives (the ata-xxx-part1 lines), so in this case:

ata-WDC_WD10EZRZ-00HTKB0_WD-WCC4J2AXKT3R-part1
ata-WDC_WD40EFAX-68JH4N0_WD-WX12D80N59SR-part1
ata-WDC_WD40EFAX-68JH4N0_WD-WX52D104L73P-part1

Next, edit your fstab file, mount the partitions (including the parity drive) and create the MergerFS pool.

...
# hard drives
/dev/disk/by-id/ata-WDC_WD10EZRZ-00HTKB0_WD-WCC4J2AXKT3R-part1 /mnt/disk1 	ext4 defaults 0 0
/dev/disk/by-id/ata-WDC_WD40EFAX-68JH4N0_WD-WX52D104L73P-part1 /mnt/disk2 	ext4 defaults 0 0
/dev/disk/by-id/ata-WDC_WD40EFAX-68JH4N0_WD-WX12D80N59SR-part1 /mnt/parity1 	ext4 defaults 0 0

# mergerfs
/mnt/disk* /mnt/storage fuse.mergerfs defaults,dropcacheonclose=true,allow_other,minfreespace=25G,fsname=mergerfs 0 0
...

/etc/fstab

You can find a full list of options for your storage pool here.

After editing your fstab file, save and reboot. If everything went well, you should be able to create a file in /mnt/storage and see that the file was actually stored in one of the /mnt/diskX directories.

SnapRAID

We have now finished setting up our storage pool, but what happens when one of our drives inevitably fails? This is where SnapRAID comes into play. Remember the parity drive we left unused until now? Well, this drive won't actually store any of your data, instead, it will hold parity information used to recover data if any disk dies.

Keep in mind that using SnapRAID has a couple of caveats:

The parity drive must match or be larger than the size of the biggest data disk
SnapRAID does not perform the parity "on write", meaning that you must manually invoke the snapraid sync command the recalculate the parity data (or use a tool like snapraid-runner).

Installation

yay -S snapraid

Next, create/edit your SnapRAID configuration file:

# Defines the file to use as parity storage
# It must NOT be in a data disk
parity /mnt/parity1/snapraid.parity

# Defines the files to use as content list
# You can use multiple specification to store more copies
# You must have least one copy for each parity file plus one.
# They can be in the disks used for data, parity or boot,
# but each file must be in a different disk.
content /var/snapraid.content
content /mnt/parity1/.snapraid.content
content /mnt/disk1/.snapraid.content
content /mnt/disk2/.snapraid.content

# Defines the data disks to use
# The order is relevant for parity, do not change it
disk d1 /mnt/disk1
disk d2 /mnt/disk2

# Excludes hidden files and directories (uncomment to enable).
#nohidden

# Defines files and directories to exclude
# Remember that all the paths are relative at the mount points
# Format: "exclude FILE"
# Format: "exclude DIR/"
# Format: "exclude /PATH/FILE"
# Format: "exclude /PATH/DIR/"
exclude /lost+found/

# You might also want to exclude any log files or temporary DB files since these are changed frequently and might mess with the parity file.
exclude *wal

/etc/snapraid.conf

After editing /etc/snapraid.conf, try running snapraid sync as root to check if everything is configured correctly. Just keep in mind that this first sync could take a long time depending on the size of your drives.

Automation

Since one of the requirements at the start of the article was automated backups, we are going to use snapraid-runner to run a parity sync once every week:

git clone https://github.com/Chronial/snapraid-runner.git /opt/snapraid-runner

After that create your configuration file (just make sure to fill in your email settings):

[snapraid]
; path to the snapraid executable (e.g. /bin/snapraid)
executable = /usr/bin/snapraid
; path to the snapraid config to be used
config = /etc/snapraid.conf
; abort operation if there are more deletes than this, set to -1 to disable
deletethreshold = -1
; if you want touch to be ran each time
touch = false

[logging]
; logfile to write to, leave empty to disable
file = /var/log/snapraid.log
; maximum logfile size in KiB, leave empty for infinite
maxsize = 5000

[email]
; when to send an email, comma-separated list of [success, error]
sendon = success,error
; set to false to get full programm output via email
short = true
subject = [SnapRAID] Status Report
from = {fill in}
to = {fill in}
; maximum email size in KiB
maxsize = 500

[smtp]
host = {fill in}
; leave empty for default port
port = {fill in}
; set to "true" to activate
ssl = {fill in}
tls = {fill in}
user = {fill in}
password = {fill in}

[scrub]
; set to true to run scrub after sync
enabled = true
percentage = 22
older-than = 12

/etc/snapraid-runner.conf

We are then going to use Cron to call snapraid-runner once every week, specifically at 12:00 every Sunday: sudo EDITOR=nano crontab -e

...
0 12 * * 0 python /opt/snapraid-runner/snapraid-runner.py --conf /etc/snapraid-runner.conf
...

crontab

After saving the crontab file, SnapRAID will automatically back up your drives every week!

Final Thoughts

Just by installing and configuring these two tools, we have managed to satisfy the first 4 requirements for our home server. We could stop right here and be good to go. However, there are a couple of things I strongly recommend doing before starting to host any services and exposing your server to the public:

Configure SSH and harden it using 2-factor authentication
Configure fail2ban to counter brute-force attacks
Give your server a recognizable hostname (in my case that's jupiter)
Set up an SMTP client like msmtp so that your server can send you e-mail alerts
Set up S.M.A.R.T. monitoring for your drives so that you get an early warning if one of your drives is about to fail.
Remember the rule known as Schrödinger's Backup: The condition of any backup is unknown until a restore is attempted. Therefore I recommend setting up another backup solution other than SnapRAID, just in case.

In the next part, we are going to be setting up Docker and Portainer for container management, Watchtower for automatic container updates, and OpenVPN for remote server management.