Monthly Archives: October 2018

Solving a persistent MD Array problem in RHEL7.4

Ok, so I lend one of my Servers to two of my colleagues in The States, that required to prepare some test for a customer. I always try to be nice and to stimulate sales.

I work with Declustered RAID, DRAID, and ZFS.

The Server was a 4U90, so a 4U Server with 90 SAS3 drives and 4 SSD. Drives are Dual Ported, and two Controllers (motherboard + CPU) have access simultaneously to the drives for HA.

After their tests my colleagues, returned me the Server, and I needed to use it and my surprise was when I tried to provision with ZFS and I encountered problems. Not much in the logs.

I checked:

cat /proc/mdstat

And that was the thing 8 MD Arrays where there.

[root@4u90-B ~]# cat /proc/mdstat 
Personalities : 
md2 : inactive sdba1[9](S) sdag1[7](S) sdaf1[3](S)
11720629248 blocks super 1.2

md1 : inactive sdax1[7](S) sdad1[5](S) sdac1[1](S) sdae1[9](S)
12056071168 blocks super 1.2

md0 : inactive sdat1[1](S) sdav1[9](S) sdau1[5](S) sdab1[7](S) sdaa1[3](S)
19534382080 blocks super 1.2

md4 : inactive sdbf1[9](S) sdbe1[5](S) sdbd1[1](S) sdal1[7](S) sdak1[3](S)
19534382080 blocks super 1.2

md5 : inactive sdam1[1](S) sdan1[5](S) sdao1[9](S)
11720629248 blocks super 1.2

md8 : inactive sdcq1[7](S) sdz1[2](S)
7813752832 blocks super 1.2

md7 : inactive sdbm1[7](S) sdar1[1](S) sdy1[9](S) sdx1[5](S)
15627505664 blocks super 1.2

md3 : inactive sdaj1[9](S) sdai1[5](S) sdah1[1](S)
11720629248 blocks super 1.2

md6 : inactive sdaq1[7](S) sdap1[3](S) sdr1[8](S) sdp1[0](S)
15627505664 blocks super 1.2

Ok. So I stop the Arrays

mdadm --stop /dev/md127

And then I zero the superblock:

mdadm --zero-superblock /dev/sdb1

After doing this for all I try to provision and… surprise! does not work. /dev/md127 has respawned like in the old times from Doom video game.

I check the mdmonitor service and even disable it.

systemctl disable mdmonitor

I repeat the process.

And /dev/md127 appears again, using another device.

At this point, just in case, I check the other controller, which should be powered off.

Ok, it was on. I launch the poweroff command, and repeat, same!.

I see that the poweroff command on the second Controller is doing a reboot. So I launch the halt command that makes it not respond to the ping anymore.

I repeat the process, and still the ghost md array appears there, and blocks me from doing my zpool create.

The /etc/mdadm.conf file did not exist (by default is not created).

I try a more aggressive approach:

DRIVES=`cat /proc/partitions | grep 3907018584 | awk '{ print $4; }'`

for DRIVE in $DRIVES; do echo "Trying /dev/${DRIVE}1"; mdadm --examine /dev/${DRIVE}1; done

Ok. And destruction time:

for DRIVE in $DRIVES; do echo "Trying /dev/${DRIVE}"; wipefs -a -f /dev/${DRIVE}; done

for DRIVE in $DRIVES; do echo "Trying /dev/${DRIVE}1"; mdadm --zero-superblock /dev/${DRIVE}1; done

Apparently the system is clean, but still I cannot provision, and /dev/md127 respaws and reappears all the time.

After googling and not finding anything about this problem, and my colleagues no having clue about what is causing this, I just proceed with a simple solution, as I need the Server for my company completing the tests in the next 24 hours.

So I create the file /etc/mdadm.conf with this content:

[root@draid-08 ~]# cat /etc/mdadm.conf 
AUTO -all

After that I rebooted the Server and I saw the infamous /dev/md127 is not there and I’m able to provision.

I share the solution as it may help other people.

ZFS Improving iSCSI performance for Block Devices (trick for Volumes)

ZFS has a performance problem with the zvol volumes.

Even using a ZIL you will experience low speed when writing to a zvol through the Network.

Even locally, if you format a zvol, for example with ext4, and mount locally, you will see that the speed is several times slower than the native ZFS filesystem.

zvol volumes are nice as they support snapshots and clone (from the snapshot), however too slow.

Using a pool with Spinning Drives and two SSD SLOG devices in mirror, with a 40Gbps Mellanox NIC accessing a zvol via iSCSI, with ext4, from the iSCSI Initiator, you can be copying Data at 70 MB/s, so not even saturating the 1Gbps.

The trick to speed up this consist into instead of using zvols, creating a file in the ZFS File System, and directly share it through iSCSI.

This will give 4 times more speed, so instead of 70MB/s you would get 280MB/s.