A little bit of RAID knowledge
Every few years I have to mess with software RAID again by which time I have forgotten everything I had learnt so herein lies various notes about how to do the bare minimum of RAID "stuff" that I need.
Originally (2016-era) I started with two 4TB disks in a RAID1 (mirrored) setup, later upgraded to two 8TB disks, mirrored. My current configuration is three 8TB disks in a RAID5 array, giving me 16TB of storage (realistically about 14TB). I have two NAS boxes using this setup: a main always-on NAS with new-ish Seagate IronWolf drives, and a backup NAS with my older 8TB disks to which I replicate weekly. (I also have two backup 8TB USB drives in rotation — "RAID is not a backup".)
Decommissioning a RAID drive
Let’s say I am reusing a disk from an existing RAID array and want to erase all existing metadata. If the disks have been detected as a RAID array, first stop the array and wipe the metadata.
# RAID1 array using /dev/sda1 and /dev/sdb1 assembled as /dev/md127 mdadm --stop /dev/md127 mdadm --zero-superblock /dev/sd[ab]1 wipefs --all /dev/sd[ab]{1,}
If you do the wipefs
then the mdadm --zero-superblock
may not be
necessary, but it doesn’t hurt. We are now left with two blank disks as
/dev/sda
and /dev/sdb
.
Building a RAID5 array
Let’s say we now have three new or wiped matching disks (same size) as
/dev/sd[abc]
. Using cfdisk
, format each disk with a GPT partition table
containing a single partition with a type of "Linux RAID".
Create a new RAID5 array using the three disks.
mdadm --verbose --create /dev/md0 --level=5 --raid-devices=3 /dev/sd[abc]1
The array will start building and its progress can be monitored using
cat /proc/mdstat
or mdadm --detail /dev/md0
. It is supposedly safe to
start using (format, mount, write, etc) but I tend to wait until its state is
"clean".
Formatting using ext4
The bulk of the content I store on my RAID array is media. It is written to
occasionally and read from frequently. I don’t feel the need to take
snapshots. Consequently, I choose to format using ext4
.
When formatting using ext4
on a RAID array, there are two extended options
(-E
) that are worth calculating and setting: "stride" and "stripe_width".
Refer to the mkfs.ext4
man page. A script to calculate the recommended
values can be found at https://busybox.net/~aldot/mkfs_stride.html.
The RAID chunk size, as reported by mdadm --detail /dev/md0
or
/proc/mdstat
, is 512K and I am using the default block size of 4096 which is
equivalent to 4KiB. This gives me a stride value of 128.
stride = chunk / block = 512 / 4 = 128
In my case, I have a RAID5 array of 3 disks which means two data disks and one parity disk.
stripe_width = ((total_disks - parity_disks) * stride) = ((3 - 1) * 128) = 256
Hence the format command is:
mkfs.ext4 -L MYLABEL -E stride=128,stripe_width=256 /dev/md0
The default block size is 4096 so I don’t need to specify -b 4096
. The
settings can be confirmed using tune2fs -l /dev/md0
and checking the "RAID
stride" and "RAID stripe width" values.
Should I use the 64bit
feature option? According to the ext4
man page, it
enables file systems larger the 2^32 blocks which I don’t think is possible on
a 3x8TB RAID5 array with a 4096 byte block size. Also, the default Slackware
/etc/mke2fs.conf
has "auto_64-bit_support=1" set so that the feature will
automatically be enabled if required by the partition and block size.
When running mkfs.ext4
, there is a message about missing out on storing full
32-bit metadata checksums if you don’t enable the 64bit
feature. Not sure
how necessary or desirable this is.
"Why /dev/md127?", or local and foreign RAID arrays
There is the concept of "local" and "foreign" RAID arrays. When I initially
build the array (using mdadm --create
) or manually assemble it (using mdadm
--assemble
) I can name it /dev/md0
. If I now reboot without doing anything
else, during the init stage the array will be detected and reassembled as
/dev/md127
.
This is because it is treated as a foreign array and foreign arrays are,
apparently, numbered from 127 decreasing. That is, the first foreign array
detected is md127, the second is md126, and so on. Local arrays are numbered
starting at zero (md0) and increasing. (I say "apparently" because there is
nothing about this numbering scheme in the mdadm{.conf}
man pages.)
It is probably fine to live with your RAID array being /dev/md127
. One
potential downside is that "foreign" arrays are auto-assembled at boot as
"read-auto" which, according to the mdadm
man page, will delay resync, etc.
until something writes to the device. Is this likely to be an issue for me?
Who knows. Whatever the case, I prefer my one-and-only RAID array to be
/dev/md0
.
To ensure the array is treated as local to the machine on which it was
created, one method is to ensure the machine’s hostname matches the hostname
value at the time the array was created. On Slackware, the RAID arrays are
auto-assembled early in the init process, well before the hostname is set (by
/etc/rc.d/rc.M
). To make the hostname available to mdadm
, set the
HOMEHOST
value in /etc/mdadm.conf
and rebuild the initrd with RAID support
so that the mdadm.conf file is included in the initrd.
In my case, the main NAS at the time I created the array had a hostname of "cosmos.localdomain".
HOMEHOST cosmos.localdomain
Relocating a RAID array
(Short answer: put an ARRAY entry in your mdadm.conf file.)
Longer rambling answer…
My scenario was as follows. I have two DIY NAS; a main (cosmos) and a backup (sonda), each with three 8TB HDDs in a RAID5 array. The backup NAS array, being built more recently, has three new drives with a metadata homehost value of "sonda.localdomain". Having synced the contents of the two arrays, I want to put the backup’s array in the main NAS because the backup’s drives are newer and quieter.
Simply swapping the two sets of three drives works, except both are now identified as being foreign because the "homehost" value on which each array was created no longer matches the hostname of the machine on which they are running.
(Probably one solution is to simply swap the hostnames as well, but I don’t want to do that.)
Ideally I want to edit the metadata of each array and change the "homehost"
value to match the new hostname but, thus far, I haven’t been able to do that.
I can change the metadata "name" value — the current array name can be seen
in the mdadm --detail
output.
To change the name, you first have to stop the array then reassemble using the
--update
option.
mdadm --stop /dev/md127 mdadm --assemble /dev/md0 /dev/sd[abc]1 --update=name --name=cosmos.localdomain
Supposedly you can also update the "homehost" metadata as well but I don’t
know of any way to see the current "homehost" value. It seems that the
Name value reported by mdadm --detail
will show something different if
there is no "name" metadata (?) or the "name" and "homehost" values don’t
match. Who knows? More testing required, but as for now, my only two RAID
arrays are in production, I am not going to mess around.
Anyway, as far as I can tell, you can’t update name and homehost in one "assemble" command so you have to stop and reassemble to make each change.
mdadm --stop /dev/md127 mdadm --assemble /dev/md0 /dev/sd[abc]1 --update=homehost --homehost=cosmos.localdomain
However, for me, none of this helps. I update the name & homehost metadata in
the array, I have the correct HOMEHOST in the initrd’s mdadm.conf, but the
relocated array always comes up as foreign /dev/md127
on boot. I also tried
using "HOMEHOST <ignore>" in the initrd’s mdadm.conf with no success.
The only way I could get it to auto-assemble the relocated array as /dev/md0
was explicitly specifying the ARRAY in mdadm.conf using the array’s UUID. To
get the text of the ARRAY line to be added to mdadm.conf, run
mdadm --detail --scan /dev/md0
.
HOMEHOST cosmos.localdomain ARRAY /dev/md0 UUID=<my array's uuid value ...>
One thing you lose when being explicit about the array in mdadm.conf
is that you only get the specified device (eg., /dev/md0
) whereas if the
array is auto-assembled, you get both an allocated /dev/mdN
device and a
symlink to it under /dev/md/
using the array’s name.
For instance, when auto-assembling my relocated array as foreign, I got:
/dev/md127 /dev/md/cosmos.localdomain -> ../md127
When defining the array using "ARRAY /dev/md0 …" in mdadm.conf, I just get
/dev/md0
as requested. See the "ARRAY" section of the mdadm.conf man page.