Author Archives: Carles Mateo

Solving an infinite loop in CentOS after inducing a Kernel Panic in a Server

This trick may be useful for you.
Almost surely if you power cycle, completely powering down your Server you’ll fix booting too.
Unfortunately we do not always have access to the Data Center or Remote Hands service available, so this trick may be useful for you.

Just reset your BMC card with this:
ipmitool -H 172.30.30.7 -U admin -P thepassword bmc reset cold

After this use the remote control tool to request a reboot and it will do and power on normally.

This may not work in all the Servers, it depends on a lot of aspects (firmware, bmc manufacturer, etc…) but can do the trick for you maybe.

Create a small partition on the drives for tests

Ok, as you know I work with ZFS, DRAID, Erasure Coding… and Cold Storage.
I work with big disks, SAS, SSD, and NVMe.
Sometimes I need to conduct some tests that involve filling completely to 100% the pool.
That’s very slow having to fill 14TB drives, with Servers with 60, 90 and 104 drives, for obvious reasons. So here is a handy script for partitioning those drives with a small partition, then you use the small partition for creating a pool that will fill faster.

1. Get the list of drives in the system
For example this script can help

DRIVES=`ls -al /dev/disk/by-id/ | grep "sd" | grep -v "part" | grep "wwn" | tr "./" "  " | awk '{ print $11; }'`

If your drives had a previous partition this script will detect them, and will use only the drives with wwn identifier.
Warning: some M.2 booting drives have wwn where others don’t. Use with caution.

2. Identify the boot device and remove from the list
3. Do the loop with for DRIVE in $DRIVES or manually:

for DRIVE in sdar sdcd sdi sdj sdbp sdbd sdy sdab sdbo sdk sdz sdbb sdl sdcq sdbl sdbe sdan sdv sdp sdbf sdao sdm sdg sdbw sdaf sdac sdag sdco sds sdah sdbh sdby sdbn sdcl sdcf sdbz sdbi sdcr sdbj sdd sdcn sdr sdbk sdaq sde sdak sdbx sdbm sday sdbv sdbg sdcg sdce sdca sdax sdam sdaz sdci sdt sdcp sdav sdc sdae sdf sdw sdu sdal sdo sdx sdh sdcj sdch sdaw sdba sdap sdck sdn sdas sdai sdaa sdcs sdcm sdcb sdaj sdcc sdad sdbc sdb sdq
do
(echo g; echo n; echo; echo; echo 41984000; echo w;) | fdisk /dev/$DRIVE
done

Simulating a SAS physical pull out of a drive

Please, note:
Nothing is exactly the same as a physical disk pull.
A physical disk pull can trigger errors by the expander that will not be detected just emulating.
Hardware failures are complex, so you should not avoid testing physically.
If your company has the Servers in another location you should request them to have Servers next to you, or travel to the location and spend enough time hands on.

A set of commands very handy for simulating a physical drive pull, when you have not physical access to the Server, or working within a VM.

To delete a disk (Linux stop seeing it until next reboot/power cycle):

echo "1" > /sys/block/${device_name}/device/delete

Set a disk offline:

echo "offline" > /sys/block/${device_name}/device/state

Online the disk

echo "running" > /sys/block/${device_name}/device/state

Scan all hosts, rescan

for host in /sys/class/scsi_host/host*; do echo '- - -' > $host/scan; done

Disabling the port in the expander

This is more like physically pulling the drive.
In order to use the commands, install the package smp_utils. This is now
installed on the 4602 and the 4U60.

The command to disable a port on the expander:

smp_phy_control --phy=${phy_number} --op=dis /dev/bsg/${expander_id}

You will need to know the phy number of the drive. There may be a better
way, but to get it I used:

smp_discover /dev/bsg/${expander_id}

You need to look for the sas_address of the drive in the output from the
smp_discover command. You may need to try all the expanders to find it.

You can get the sas_address for your drive by:

cat /sys/block/${device_name}/device/sas_address

To re-enable the port use:

smp_phy_control --phy=${phy_number} --op=lr /dev/bsg/${expander_id}

Some handy scripts when working with ZFS

To kill one drive given the id (device name may change between reboots)

TO_REMOVE="wwn-0x5000c500a6134007"
DRIVE=`ls -al /dev/disk/by-id/ | grep ${TO_REMOVE} | grep -v "\-part" | awk '{ print $11 }' | tr --delete './'`; 

if [[ ! -z "${DRIVE}" ]];
then
    echo "1" > /sys/block/${DRIVE}/device/delete
else
    echo "Drive not found"
fi

Loop to see the status of the pool

while true; do zpool status carles-N58-C3-D16-P3-S1 | head --lines=20; sleep 5; done

Google Compute Engine Talk for Group Google Developers Cork

My talk in Google Developers Cork Group.
It’s about deploying an Instance in GCE and grows in complexity until Deploying a Load Balancer with AutoScaling for a group of LAMP Webservers.

Join the group at: https://www.meetup.com/GDG-Cork/

The videos:

Keshan Sodimana: Tensors

Curiosity Python string.strip() removes just more than white spaces

Another Python curiosity.

If you see the Official Python3 documentation for strip(), it says that strip without parameters will return the string without the leading and trailing white spaces.
Optionally you can pass a string with the characters you want to eliminate.

The official documentation for Python 2 says:

string.strip(s[, chars])

Return a copy of the string with leading and trailing characters removed. If chars is omitted or None, whitespace characters are removed. If given and not None, chars must be a string; the characters in the string will be stripped from the both ends of the string this method is called on.

Changed in version 2.2.3: The chars parameter was added. The chars parameter cannot be passed in earlier 2.2 versions.

https://docs.python.org/2/library/string.html

A white space is a white space. Is not an Enter.
But strip() without parameters will remove white spaces (space), and Enter \n and Tabs \t.

Probably you will not realize that unless you read from a file that has empty lines at the end for a reason, and you use strip().

You can see a demonstration following this small program, that runs the same for Python2 and Python3.

And the corresponding output for python2 and python3:

The [ ] characters where added to show that there are no hidden tabs or similar after.

Here I paste the code so you can try yourself:

import sys


def print_bar():
    print("-----------------------------------------------------")


def print_between_brackets(s_test):
    print("[" + s_test + "]")


s_string_with_enters = "  Testing strip not only removing white spaces, but Enter and Tabs s well\n\t\n\n"

print("Testing .strip()")
print("You are running Python " + sys.version)
print("This is the original string")
print_bar()
print_between_brackets(s_string_with_enters)
print_bar()
print("Now after strip()...")
print_bar()
print_between_brackets(s_string_with_enters.strip())
print_bar()
print("As you can see the Enters and the Tabs have been removed, not just the spaced")

I think this should be disambiguate so I decided to take action. Is very easy to blame and never contribute. Not me. I went to Python to fix that and I located a bug reporting this issue:

https://bugs.python.org/issue25433

The issue was registed and made specially interesting contributions by Dimitri Papadopoulos Orfanos.

The thread is really interesting to read. I recommend it.

At a glance:

“Python heavily relies on isspace() to detect “whitespace” characters.”

* Lib/string.py near line 23:
  whitespace = ' \t\n\r\v\f'

So all those characters will be stripped in Python2.7 if you use just string.strip()

The ticket was opened the 2015-10-18 12:15. So it’s a shame the documentation has not been updated yet, more than 3 years later. Those are the kind of things, lack of care, that I can’t understand. Not looking for the excellence.

Please, do note that Python3 supports Unicode natively and things are always a bit different than with Python2 and AscII.

Curiosity Python 2.7 and print() from Python 3.6

I lead a project where I decided to go with Python 2.7, for the wide compatibility across all the Servers around.

With RHEL now supporting Python 3 as well, it doesn’t make much sense any more, as all the major brands do support Python 3 directly.

I saw it coming so in my Coding Style Guide for my Team I explained that we will use print(“”) which is the required way to proceed with Python 3, as opposite to print “whatever” from Python 2. Noye Python 2 supports both methods.

But today something unexpected appeared in the Tests. One line of code was making a print of ().

The line of code was:

print()

And not

print("")

And the curiosity is that if you do print() in Python 2.7 it outputs ().

 

Solving a persistent MD Array problem in RHEL7.4

Ok, so I lend one of my Servers to two of my colleagues in The States, that required to prepare some test for a customer. I always try to be nice and to stimulate sales.

I work with Declustered RAID, DRAID, and ZFS.

The Server was a 4U90, so a 4U Server with 90 SAS3 drives and 4 SSD. Drives are Dual Ported, and two Controllers (motherboard + CPU) have access simultaneously to the drives for HA.

After their tests my colleagues, returned me the Server, and I needed to use it and my surprise was when I tried to provision with ZFS and I encountered problems. Not much in the logs.

I checked:

cat /proc/mdstat

And that was the thing 8 MD Arrays where there.

[root@4u90-B ~]# cat /proc/mdstat 
Personalities : 
md2 : inactive sdba1[9](S) sdag1[7](S) sdaf1[3](S)
11720629248 blocks super 1.2

md1 : inactive sdax1[7](S) sdad1[5](S) sdac1[1](S) sdae1[9](S)
12056071168 blocks super 1.2

md0 : inactive sdat1[1](S) sdav1[9](S) sdau1[5](S) sdab1[7](S) sdaa1[3](S)
19534382080 blocks super 1.2

md4 : inactive sdbf1[9](S) sdbe1[5](S) sdbd1[1](S) sdal1[7](S) sdak1[3](S)
19534382080 blocks super 1.2

md5 : inactive sdam1[1](S) sdan1[5](S) sdao1[9](S)
11720629248 blocks super 1.2

md8 : inactive sdcq1[7](S) sdz1[2](S)
7813752832 blocks super 1.2

md7 : inactive sdbm1[7](S) sdar1[1](S) sdy1[9](S) sdx1[5](S)
15627505664 blocks super 1.2

md3 : inactive sdaj1[9](S) sdai1[5](S) sdah1[1](S)
11720629248 blocks super 1.2

md6 : inactive sdaq1[7](S) sdap1[3](S) sdr1[8](S) sdp1[0](S)
15627505664 blocks super 1.2

Ok. So I stop the Arrays

mdadm --stop /dev/md127

And then I zero the superblock:

mdadm --zero-superblock /dev/sdb1

After doing this for all I try to provision and… surprise! does not work. /dev/md127 has respawned like in the old times from Doom video game.

I check the mdmonitor service and even disable it.

systemctl disable mdmonitor

I repeat the process.

And /dev/md127 appears again, using another device.

At this point, just in case, I check the other controller, which should be powered off.

Ok, it was on. I launch the poweroff command, and repeat, same!.

I see that the poweroff command on the second Controller is doing a reboot. So I launch the halt command that makes it not respond to the ping anymore.

I repeat the process, and still the ghost md array appears there, and blocks me from doing my zpool create.

The /etc/mdadm.conf file did not exist (by default is not created).

I try a more aggressive approach:

DRIVES=`cat /proc/partitions | grep 3907018584 | awk '{ print $4; }'`

for DRIVE in $DRIVES; do echo "Trying /dev/${DRIVE}1"; mdadm --examine /dev/${DRIVE}1; done

Ok. And destruction time:

for DRIVE in $DRIVES; do echo "Trying /dev/${DRIVE}"; wipefs -a -f /dev/${DRIVE}; done

for DRIVE in $DRIVES; do echo "Trying /dev/${DRIVE}1"; mdadm --zero-superblock /dev/${DRIVE}1; done

Apparently the system is clean, but still I cannot provision, and /dev/md127 respaws and reappears all the time.

After googling and not finding anything about this problem, and my colleagues no having clue about what is causing this, I just proceed with a simple solution, as I need the Server for my company completing the tests in the next 24 hours.

So I create the file /etc/mdadm.conf with this content:

[root@draid-08 ~]# cat /etc/mdadm.conf 
AUTO -all

After that I rebooted the Server and I saw the infamous /dev/md127 is not there and I’m able to provision.

I share the solution as it may help other people.

ZFS Improving iSCSI performance for Block Devices (trick for Volumes)

ZFS has a performance problem with the zvol volumes.

Even using a ZIL you will experience low speed when writing to a zvol through the Network.

Even locally, if you format a zvol, for example with ext4, and mount locally, you will see that the speed is several times slower than the native ZFS filesystem.

zvol volumes are nice as they support snapshots and clone (from the snapshot), however too slow.

Using a pool with Spinning Drives and two SSD SLOG devices in mirror, with a 40Gbps Mellanox NIC accessing a zvol via iSCSI, with ext4, from the iSCSI Initiator, you can be copying Data at 70 MB/s, so not even saturating the 1Gbps.

The trick to speed up this consist into instead of using zvols, creating a file in the ZFS File System, and directly share it through iSCSI.

This will give 4 times more speed, so instead of 70MB/s you would get 280MB/s.

Creating a compressed filesystem with Linux and ZFS

Many times it could be very convenient to have a compressed filesystem, so a system that compresses data in Real Time.

This not only reduces the space used, but increases the IO performance. Or better explained, if you have to write to disk 1GB log file, and it takes 5 seconds, you have a 200MB/s performance. But if you have to write 1GB file, and it takes 0.5 seconds you have 2000MB/s or 2GB/s. However the trick in here is that you really only wrote 100MB, cause the Data was compressed before being written to the disk.

This also works for reading. 100MB are Read, from Disk, and then uncompressed in the memory (using chunks, not everything is loaded at once), assuming same speed for Reading and Writing (that’s usual for sequential access on SAS drives) we have been reading from disk for 0.5 seconds instead of 5. Let’s imagine we have 0.2 seconds of CPU time, used for decompressing. That’s it: 0.7 seconds versus 5 seconds.

So assuming you have installed ZFS in your Desktop computer those instructions will allow you to create a ZFS filesystem, compressed, and mount it.

ZFS can create pools using disks, partitions or other block devices, like regular files or loop devices.

# Create the File that will hold the Filesystem, 1GB

root@xeon:/home/carles# dd if=/dev/zero of=/home/carles/compressedfile.000 bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.621923 s, 1.7 GB/s

# Create the pool

zpool create compressedpool /home/carles/compressedfile.000

# See the result

# If you don’t have automount set, then set the mountpoint

zpool set compressedpool mountpoint=/compressedpool

# Set the compression. LZ4 is fast and well balanced

zfs set compression=lz4 compressedpool

# Push some very compressible 1GB file. Don’t use just 0s as this is optimized :)

# Myself I copied real logs

ls -al --block-size=M *.log
-rw------- 1 carles carles 1329M Sep 26 14:34 messages.log
root@xeon:/home/carles# cp messages.log /compressedpool/

Even if the pool only had 1GB we managed to copy 1.33 GB file.

Then we check and only 142MB are being used for real, thanks to the compression.

root@xeon:/home/carles# zfs list
NAME USED AVAIL REFER MOUNTPOINT
compressedpool 142M 738M 141M /compressedpool
root@xeon:/home/carles# df /compressedpool
Filesystem 1K-blocks Used Available Use% Mounted on
compressedpool 899584 144000 755584 17% /compressedpool

By default ZFS will only import the pools that are based on drives, so in order to import your pool based on files after you reboot or did zfs export compressedpool, you must specify the directory:

zpool import -d /home/carles compressedpool

 

You can also create a pool using several files from different hard drives. That way you can create mirror, RAIDZ1, RAIDZ2 or RAIDZ3 and not losing any data in that pool based on drives in case you loss a physical drive.

If you use one file in several hard drived, you are aggregating the bandwidth.