gslewis.id.au:~
The Bedlington Ape

A little bit of RAID knowledge

Table of Contents

Every few years I have to mess with software RAID again by which time I have forgotten everything I had learnt so herein lies various notes about how to do the bare minimum of RAID "stuff" that I need.

Originally (2016-era) I started with two 4TB disks in a RAID1 (mirrored) setup, later upgraded to two 8TB disks, mirrored. My current configuration is three 8TB disks in a RAID5 array, giving me 16TB of storage (realistically about 14TB). I have two NAS boxes using this setup: a main always-on NAS with new-ish Seagate IronWolf drives, and a backup NAS with my older 8TB disks to which I replicate weekly. (I also have two backup 8TB USB drives in rotation — "RAID is not a backup".)

Decommissioning a RAID drive

Let’s say I am reusing a disk from an existing RAID array and want to erase all existing metadata. If the disks have been detected as a RAID array, first stop the array and wipe the metadata.

# RAID1 array using /dev/sda1 and /dev/sdb1 assembled as /dev/md127
mdadm --stop /dev/md127
mdadm --zero-superblock /dev/sd[ab]1
wipefs --all /dev/sd[ab]{1,}

If you do the wipefs then the mdadm --zero-superblock may not be necessary, but it doesn’t hurt. We are now left with two blank disks as /dev/sda and /dev/sdb.

Building a RAID5 array

Let’s say we now have three new or wiped matching disks (same size) as /dev/sd[abc]. Using cfdisk, format each disk with a GPT partition table containing a single partition with a type of "Linux RAID".

Create a new RAID5 array using the three disks.

mdadm --verbose --create /dev/md0 --level=5 --raid-devices=3 /dev/sd[abc]1

The array will start building and its progress can be monitored using cat /proc/mdstat or mdadm --detail /dev/md0. It is supposedly safe to start using (format, mount, write, etc) but I tend to wait until its state is "clean".

Formatting using ext4

The bulk of the content I store on my RAID array is media. It is written to occasionally and read from frequently. I don’t feel the need to take snapshots. Consequently, I choose to format using ext4.

When formatting using ext4 on a RAID array, there are two extended options (-E) that are worth calculating and setting: "stride" and "stripe_width". Refer to the mkfs.ext4 man page. A script to calculate the recommended values can be found at https://busybox.net/~aldot/mkfs_stride.html.

The RAID chunk size, as reported by mdadm --detail /dev/md0 or /proc/mdstat, is 512K and I am using the default block size of 4096 which is equivalent to 4KiB. This gives me a stride value of 128.

stride = chunk / block
       = 512 / 4
       = 128

In my case, I have a RAID5 array of 3 disks which means two data disks and one parity disk.

stripe_width = ((total_disks - parity_disks) * stride)
             = ((3 - 1) * 128)
             = 256

Hence the format command is:

mkfs.ext4 -L MYLABEL -E stride=128,stripe_width=256 /dev/md0

The default block size is 4096 so I don’t need to specify -b 4096. The settings can be confirmed using tune2fs -l /dev/md0 and checking the "RAID stride" and "RAID stripe width" values.

Should I use the 64bit feature option? According to the ext4 man page, it enables file systems larger the 2^32 blocks which I don’t think is possible on a 3x8TB RAID5 array with a 4096 byte block size. Also, the default Slackware /etc/mke2fs.conf has "auto_64-bit_support=1" set so that the feature will automatically be enabled if required by the partition and block size.

When running mkfs.ext4, there is a message about missing out on storing full 32-bit metadata checksums if you don’t enable the 64bit feature. Not sure how necessary or desirable this is.

"Why /dev/md127?", or local and foreign RAID arrays

There is the concept of "local" and "foreign" RAID arrays. When I initially build the array (using mdadm --create) or manually assemble it (using mdadm --assemble) I can name it /dev/md0. If I now reboot without doing anything else, during the init stage the array will be detected and reassembled as /dev/md127.

This is because it is treated as a foreign array and foreign arrays are, apparently, numbered from 127 decreasing. That is, the first foreign array detected is md127, the second is md126, and so on. Local arrays are numbered starting at zero (md0) and increasing. (I say "apparently" because there is nothing about this numbering scheme in the mdadm{.conf} man pages.)

It is probably fine to live with your RAID array being /dev/md127. One potential downside is that "foreign" arrays are auto-assembled at boot as "read-auto" which, according to the mdadm man page, will delay resync, etc. until something writes to the device. Is this likely to be an issue for me? Who knows. Whatever the case, I prefer my one-and-only RAID array to be /dev/md0.

To ensure the array is treated as local to the machine on which it was created, one method is to ensure the machine’s hostname matches the hostname value at the time the array was created. On Slackware, the RAID arrays are auto-assembled early in the init process, well before the hostname is set (by /etc/rc.d/rc.M). To make the hostname available to mdadm, set the HOMEHOST value in /etc/mdadm.conf and rebuild the initrd with RAID support so that the mdadm.conf file is included in the initrd.

In my case, the main NAS at the time I created the array had a hostname of "cosmos.localdomain".

/etc/mdadm.conf
HOMEHOST cosmos.localdomain

Relocating a RAID array

(Short answer: put an ARRAY entry in your mdadm.conf file.)

Longer rambling answer…​

My scenario was as follows. I have two DIY NAS; a main (cosmos) and a backup (sonda), each with three 8TB HDDs in a RAID5 array. The backup NAS array, being built more recently, has three new drives with a metadata homehost value of "sonda.localdomain". Having synced the contents of the two arrays, I want to put the backup’s array in the main NAS because the backup’s drives are newer and quieter.

Simply swapping the two sets of three drives works, except both are now identified as being foreign because the "homehost" value on which each array was created no longer matches the hostname of the machine on which they are running.

(Probably one solution is to simply swap the hostnames as well, but I don’t want to do that.)

Ideally I want to edit the metadata of each array and change the "homehost" value to match the new hostname but, thus far, I haven’t been able to do that. I can change the metadata "name" value — the current array name can be seen in the mdadm --detail output.

To change the name, you first have to stop the array then reassemble using the --update option.

Update name metadata
mdadm --stop /dev/md127
mdadm --assemble /dev/md0 /dev/sd[abc]1 --update=name --name=cosmos.localdomain

Supposedly you can also update the "homehost" metadata as well but I don’t know of any way to see the current "homehost" value. It seems that the Name value reported by mdadm --detail will show something different if there is no "name" metadata (?) or the "name" and "homehost" values don’t match. Who knows? More testing required, but as for now, my only two RAID arrays are in production, I am not going to mess around.

Anyway, as far as I can tell, you can’t update name and homehost in one "assemble" command so you have to stop and reassemble to make each change.

Update homehost metadata (maybe?)
mdadm --stop /dev/md127
mdadm --assemble /dev/md0 /dev/sd[abc]1 --update=homehost --homehost=cosmos.localdomain

However, for me, none of this helps. I update the name & homehost metadata in the array, I have the correct HOMEHOST in the initrd’s mdadm.conf, but the relocated array always comes up as foreign /dev/md127 on boot. I also tried using "HOMEHOST <ignore>" in the initrd’s mdadm.conf with no success.

The only way I could get it to auto-assemble the relocated array as /dev/md0 was explicitly specifying the ARRAY in mdadm.conf using the array’s UUID. To get the text of the ARRAY line to be added to mdadm.conf, run mdadm --detail --scan /dev/md0.

/etc/mdadm.conf
HOMEHOST cosmos.localdomain
ARRAY /dev/md0 UUID=<my array's uuid value ...>

One thing you lose when being explicit about the array in mdadm.conf is that you only get the specified device (eg., /dev/md0) whereas if the array is auto-assembled, you get both an allocated /dev/mdN device and a symlink to it under /dev/md/ using the array’s name.

For instance, when auto-assembling my relocated array as foreign, I got:

/dev/md127
/dev/md/cosmos.localdomain -> ../md127

When defining the array using "ARRAY /dev/md0 …​" in mdadm.conf, I just get /dev/md0 as requested. See the "ARRAY" section of the mdadm.conf man page.

Useful resources