Category Archives: Hardware

News from the Blog 2020-11-11

  • The latest drive enclosures have been a bit of a fiasco.

The small one (6 drives) fits perfectly in one ATX Bay, however, the SAS SSD are too height to fit.

I fit 1 SATA3 SSD 1TB and 4 SATA3 HDD 2TB.

The other one, the 3 Bay 5.25″ SAS/SATA enclosure for 12 drives did not fit in the Corsair Obsidian Series 750D case, and I had to install it outside. Doing a DIY, as I explain in my book about assembling, fixing and upgrading your own PCs and laptops.

However the 12 Gbps SAS SSD were returning Checksum errors in ZFS when I did copy information or I ran scrub. I’m afraid the enclosure can only provide 6 Gbps at max, or a poor connection. Cables or expanders use to be the reason. I ordered new cables to make a direct connection to the HBA Controller without the enclosure to validate my theory and the drives stopped showing errors.

There is something good in all bad: I have been able to document and explain how to troubleshoot, actual errors in ZFS, in my book and talk about the problems with the cables, and the advantages of using a SAS controller even if you use SATA drives.

  • I got my first Excellent in an Assignment in an Ireland university, which makes me specially happy. And I keep going on studying in Linux Academy, the last course I did was GCP and Terraform, even if I knew both it helps me to keep my skills sharp.
  • I share with you some offers and charity bundles that I enrolled and enjoyed a lot:

There is an offer with Microsoft Pass which is that we can use Disney+ for free during 30 days.

  1. I started watching the Mandalorian, Season 2, and is wonderfully displayed in 4K.
    The quality of the video surprised me. Not that many contents in Netflix are 4K and I really enjoyed the great quality of the image.
  2. Humble Bundle offers a pack of 8 VR games per €13.45.
    If you like Virtual Reality and have your headset, this pack is amazing, and the benefits go to charity: Movember. The games are downloaded from Steam and the pack will last for 14 days.
  3. Humble Bundle offers a pack of Java and one go books for €12.65, with a minimum of €0.84 for 3 books. Benefits go for charity: Code for America.

Don’t forget to balance how much of your contribution goes to every player.

Unfortunately by default most of the money goes to O’Reilly and Humble Tip and few to the Charity cause. You can change that from the web when going to to the payment.

News from the blog 2020-11-03

Nice articles recommended

This article talks about how at Riot Games they use Slack. Slack is really a powerful tool, and also makes the communication more human in companies with their approach and the funny icons and /giphy. I’m very serious when it comes to work but I recognize the friendly, warm, human and lovely touch these kind of animated icons bring to the conversations.

Remember that life of the SSD is different from spinning drives. I recommend to keep your backups on external spinning drives disconnected most of the time.

Operating at Scale – An Inside Look at Facebook’s Production Engineering Team

CMIPS

I’ve been working on testing performance of more configurations on Azure and GCP.

I’m also looking forward to test the AMD Ryzen™ 7 3700X, AM4, Zen 2, 8 Core, 16 Thread, 3.6GHz, 4.4GHz Turbo that is arriving to me this week.

CTOP.py v. 0.7.8 released

I closed the ticket #21 (Thank Jian!) so ensuring CTOP.py is compatible with Python 3.5 versions.

Feature requests and bugs are listed using gitlab: https://gitlab.com/carles.mateo/ctop/-/issues

My Python Combat Guide Book

I updated it the Nov-01, as I normally do, bringing more content.

I’ve been paid the royalties for he past two months and I reinvested everything (and more from my pocket) in Hardware for working with ZFS.

I was offered by an editorial in The States to publish Python Combat Guide and other of my books worldwide. I was thinking it for a while. It was very good money, translation to multiple languages and platforms and marketing and a lot of promotion, but I would had loss the rights and the Freedom I have now, like the possibility to offer discount coupons to who I want and to update the contents often. So to celebrate my decision for you, readers of the blog, during September, I provide a discounted price of $5 USD for the fist 100 sales instead of the $25 USD suggested price. Use the following link:

https://leanpub.com/pythoncombatguide/c/blog-carles-nov2020

ZFS progress

As part of my effort to contributing with nice Open Source products to the Community I have made some investments to keep contributing to:

  • OpenZFS
  • My old tool for managing ZFS and Network shares easily

I’m writing a new book about managing ZFS for Small Business too, so I show how to operate on this hardware, good points and downsides.

I’m assembling a new Pc with ZFS plenty of Disk Storage within a mix of:

  • SAS Enterprise grade SSD 2.5″
  • SATA 12Gb Enterprise grade SSD 2.5″
  • SATA SSD 2.5″
  • SATA HDD 2TB 2.5″
  • SATA HDD 2TB 3.5″

I’m a big fan of Intel, but this time I have chosen AMD. Concretely a AMD Ryzen 7 3700X AM4 8 Core / 16 Threads, 3.6 GHz to 4.4 GHz with Turbo. The reason I chose this CPU is because it only uses 65W but still has 8 Cores / 16 Threads.

Also I want to see the performance of this AMD Ryzen with CMIPS and another important reason is that AMD motherboards support PCI 4.0. I have bought a NVMe SSD Samsung 980 PRO PCI 4.0 (x4) able to read at 6,400 MB/s. I will use this AMD box for running VMs as well. Basically Virtual Box and Docker.

I’ve been surprised that for 169.99 GBP I can have a very good Asus Motherboard with a 2.5 Gb Ethernet: ASUS ROG STRIX B550-F GAMING, AMD B550, AM4, DDR4, PCIe 4.0, SATA3, Dual M.2, CrossFire, 2.5GbE, USB 3.2 Gen2 A+C, ATX.

In order to have an Asus motherboard with a 2.5 Gb Ethernet for Intel I had to jump to a 254 GBP motherboard and Intel is still PCI 3.0. Actually there are PCI 10Gb NICs at 80 GBP so at some point I’ll upgrade my home network from Gigabit to 10 Gb. That will come slowly, but if the new equipment I assemble has 2.5 Gb when I upgrade the main switches to 10 Gb, at least I’ll be able to communicate at 2.5 Gb without ant additional change.

Also memory at 3200, speed that the AMD motherboard can provide, is more than affordable.

This new server will have 64 GB of RAM (Corsair DDR4 Vengeance PC4-25600 (3200)), as I plan to run VMs and use Volumes mounted via iSCSI and locally as block devices to improve my Software. I’ve bought a new UPS to keep it running in case power goes down. That’s something that doesn’t happen often in my city in Ireland, honestly, but I never forget that this happens in Barcelona two or three times per year, and that a high tension spike can burn your motherboard, drives, or electronics like the TV or the fridge. I’ve bought as well a new KVM Switch, a HDMI 4K and USB too one, so I don’t have to have so many keyboards. My logitech M720 allowed me to use it with 3 computers, but still I want something more operational. The KVM I bought allow me to switch with a button or within a hotkey in the keyboard.

I bought a new Icy box fox handling 6 2.5 drives in just one bay of the tower, and a 850 Watt Corsair PSU that will be able to power the many drives I want at the same time.

More books coming

I started two new books:

Those can be purchased while I’m still working on them and get the updates that I’ll be publishing and keeping a communication with me about doubts or improvements.

Halloween Software Offers

I saw some Halloween offers and I purchased Software licenses for Software I use.

Backup Guard is one of the products I registered:

https://backup-guard.com/

I contribute a lot to Open Source, and many years ago before Open Source existed I was creating Freeware Software. But I think that good commercial Software deserves to be supported. Like everything in life, if they are doing a good work that is useful to me, why not giving them support?. It is also a way to make sure they will continue producing amazing Software. And in the other hand, myself, I create Software. Some times commercial Software, and I like to be paid, so I apply the same principle.

News from the blog 2020-10-16

  • I’ve been testing and adding more instances to CMIPS. I’m planning on testing the Azure instance with 120 cores.
  • News: Microsoft makes an option to permanently remote work

https://www.bbc.com/news/business-54482245

  • One of my colleagues showed me dstat, a very nice tool for system monitoring, and bandwidth of a drive monitoring. Also ifstat, as complement to iftop is very cool for Network too. This functionality is also available in CTOP.py
  • As I shared in the past news of the blog, I’m resuming my contributions to ZFS Community.

Long time ago I created some ZFS tools that I want to share soon as Open Source.

I equipped myself with the proper Hardware to test on SAS and SATA:

  • 12G Internal PCI-E SAS/SATA HBA RAID Controller Card, Broadcom’s SAS 3008, compatible for SAS 9300-8I.
    This is just an HDA (Host Data Adapter), it doesn’t support RAID. Only connects up to 8 drives or 1024 through expander, to my computer.
    It has a bandwidth of 9,600 MB/s which guarantees me that I’ll be able to add 12 SAS SSD Enterprise grade at almost the max speed of the drives. Those drives perform at 900 MB/s so if I’m using all of them at the same time, like if I have a pool of 8 + 3 and I rebuild a broken drive or I just push Data, I would be using 12×900 = 10,800 MB/s. Close. Fair enough.
  • VANDESAIL Mini-SAS Cables, 1m Internal Mini-SAS to 4x SAS SATA Forward Breakout Cable Hard Drive Data Transfer Cable (SAS Cable).
  • SilverStone SST-FS212B – Aluminium Trayless Hot Swap Mobile Rack Backplane / Internal Hard Drive Enclosure for 12x 2.5 Inch SAS/SATA HDD or SSD, fit in any 3x 5.25 Inch Drive Bay, with Fan and Lock, black
  • Terminator is here.
    I ordered this T-800 head a while ago and finally arrived.

Finally I will have my empty USB keys located and protected. ;)

Remember to be always nice to robots. :)

Adding a RAMDISK as SLOG ZIL to ZFS

If you use ZFS with spinning drives and you share iSCSI, you will need to use a SLOG device for ZIL otherwise you’ll see your iSCSI connections interrupted.

What is a ZIL?

  • ZIL: Acronym for ZFS Intended Log. Logs synchronous operations to disk
  • SLOG: Acronym for (S)eperate (LOG) Device

In ZFS Data is first written and stored in-memory, then it’s flushed to drives. This can take 10 seconds normally, a bit more in certain occasions.

So without SLOG it can happen that if a power loss occurs, you may loss the last 10 seconds of Data submitted.

The SLOG device brings security that if there is a power loss, after remounting the pool, the information in the SLOG, acknowledged to iSCSI clients, is not lost and flushed to the Hard drives conforming the pool. Basically this device keeps the writings that come from network and flushes to the Hard drives and then clears this data from the SLOG.

The SLOG also allows ZFS to sort how the transactions will be written, to do in a more efficient way.

Normally I’m describing configurations with a fast device for SLOG ZIL, like one or a pair of NVMe drive or SAS SSD, most commonly in mirror a pool of 12 HDD drives or more SAS preferentially, maybe SATA, with 14TB or more each.

As the SLOG device will persist your Data if there is a power off, and submit to the pool the accepted transactions, it is clear that you cannot spare yourself from having a SLOG ZIL device (or better a mirror). It is needed to bring security when remotely writing.

But what happens if we have a kind of business where we don’t care about that the last 10 seconds writings may be lost? (ZFS will never get corrupted due to its kinda journal system), just because we are filling a Server the fastest possible, migrating from another, or because we are running workouts that can be retaken is some data is lost… do we really need to have the speed constrain of an SSD?. Examples are a Hadoop node, or a SETI@Home client. Tasks will be resumed if something failed.

Or maybe you fill your servers with sync=always, so writing it’s safe, and then you use them only for read, or for a Statics Internet Caches (CDNs like Akamai, Cloudfare…) or you use it for storing Backups, write once read many. You don’t really need the constraint speed of a ZIL running at 800 MB/s.

Let me put in another way, we have 2 NIC 100Gbps, in bonding, so 200Gbps (equivalent to (25GB/s Gigabytes per second), 90 HDD drives that can work in parallel up to 250 MB/s each (22.5GB/s) and our Server has a pair or SAS SSD ZIL in mirror, that writes at 900 MB/s (Megabytes per second, so 0.9 GB/s), so our bottleneck or constraint is the SLOG ZIL.

Adding one RAMDISK, or better two RAMDISKs in mirror, we can get to much more highers speeds. I cannot tell you how much, but in my tests with regular configurations (8D+3P) I was achieving more than 2 GB (Gigabytes) per second sustained of Data to the pool. Take in count that the speed writing to the pool does not only depend on the speed on the ZIL, and the speed of the HDD spinning drives (slow, between 100 and 250 MB/s), but also about the config of the pool (number of vdevs, distributions of data and parity drives) and the throughput of your IOC (Input Output Controller), and the number of them.

Live real scenarios use to be more in the line of having 2x10GbpE cards, combined in bonding making 20Gbps, so being able to transmit 2.5GB/s. So to get the max speed of our Network this Ramdrive will do it. Also NVMe devices used as ZIL will do it.

The problem with the NVMe is that they are connected to the PCI Express bus, and so they are not hot swap. If one dies, you cannot replace without stopping the Server.

The problem with the SSD is that they are not made for writing, they will die, so you need at least a mirror and for heavy IO I strongly recommend you to go with Enterprise grade SAS SSD drives. Those are made to last.

SSD Enterprise grade are double price versus one common SSD, but that peace of mind and extra lasting is worth it. And you don’t need a very big device, only has to hold 10 seconds of Data at max speed. So if you can ingest Data through the Network at 20 Gbps (2.5GB/s) you only need approximately 25 GB of space of the SLOG. 50 GB if you want to be more than safe.

Also you can use partitions instead of complete devices for the SLOG (like for the ZFS pool, where you can add complete drives, or partitions).

If you write locally, and you have 4 IOC’s capable of delivering 8 GB/s each, and you write to a Dataset to the pool, and not to a ZVOL which are slow by nature, you can get astonishing combined speed writing to the drives. If you are migrating a Server to another new, where you can resume if power goes down, then it’s safe to disable sync (set async) while this process runs, and turn sync on when going live to production. If you use async you don’t need to use a SLOG.

4 IOC’s able to deliver 8 GB/s are enough to provide sustained speed to 90 HDD SAS drives. 90x200MB/s=18GB/s required at max speed or 90x250MB/s=22.5GB/s.

The HDD drives provide different speeds in the inner and in the outer areas of the drive, so normally those drives up to 8TB perform between 100 and 200 MB/s, and the drives from 10TB SAS to 14TB SAS perform between 145 and 250 MB/s. I cannot tell about the 16 TB as I’ve not tested them.

The instructions to set a Ramdrive and to assign to a pool are like this:

#!/usr/bin/env bash
RAM_GB=1
RAM_DRIVE_SIZE_IN_BYTES=$((RAM_GB*1048576))

if [[ $(id -u) -ne 0 ]] ; then
    echo "Please run as root"
    exit 1
fi

modprobe brd rd_nr=1 rd_size=${RAM_DRIVE_SIZE_IN_BYTES} max_part=0

echo "Use it like: zpool add carlespool log ram0"

If you created more than one Ramdisk you can add a mirror for the slog to the pool with:

zpool add carlespool log mirror /dev/ram0 /dev/ram1

You can partition the Ramdrive and add a partition but we want to add the whole ram device.

Obviously you cannot put other things to that Ramdisk (like the Metadata) as you need persistence for that.

In any case, please, avoid JBODs loaded of big HDD drives with low bandwidth micro SATA like 3Gbps per channel to the Server, and RAID. The bandwidth is too low. Your rebuilds will take forever.

With ZFS you’ll resilver (rebuild) only the actual data, not the whole drive.

iostat_bandwitdth.sh – Utility to calculate the bandwidth used by all your drives

This is a shell script I made long time ago and I use it to monitor in real time what’s the total or individual bandwidth and maximum bandwidth achieved, for READ and WRITE, of Hard drives and NMVe devices.

It uses iostat to capture the metrics, and then processes the maximum values, the combined speed of all the drives… has also an interesting feature to let out the booting device. That’s very handy for Rack Servers where you boot from an SSD card or and SD, and you want to monitor the speed of the other (SAS probably) devices.

I used it to monitor the total bandwidth achieved by our 4U60 and 4U90 Servers, the All-Flash-Arrays 2U and the NVMe 1U units in Sanmina and the real throughput of IOC (Input Output Controllers).

I used also to compare what was the real data written to ZFS and mdraid RAID systems, and to disks and the combined speed with different pool configurations, as well as the efficiency of iSCSI and NFS from clients to the Servers.

You can specify how many times the information will be printed, whether you want to keep the max speed of each device per separate, and specify a drive to exclude. Normally it will be the boot drive.

If you want to test performance metrics you should make sure that other programs are not running or using the swap, to prevent bias. You should disable the boot drive if it doesn’t form part of your tests (like in the 4U60 with an SSD boot drive in a card, and 60 hard drive bays SAS or SATA).

You may find useful tools like iotop.

You can find the code here, and in my gitlab repo:

https://gitlab.com/carles.mateo/blog.carlesmateo.com-source-code/-/blob/master/iostat_bandwidth.sh

#!/usr/bin/env bash

AUTHOR="Carles Mateo"
VERSION="1.4"

# Changelog
# 1.4
# Added support for NVMe drives
# 1.3
# Fixed Decimals in KB count that were causing errors
# 1.2
# Added new parameter to output per drive stats
# Counting is performed in KB

# Leave boot device empty if you want to add its activity to the results
# Specially thinking about booting SD card or SSD devices versus SAS drives bandwidth calculation.
# Otherwise use i.e.: s_BOOT_DEVICE="sdcv"
s_BOOT_DEVICE=""
# If this value is positive the loop will be kept n times
# If is negative ie: -1 it will loop forever
i_LOOP_TIMES=-1
# Display all drives separatedly
i_ALL_SEPARATEDLY=0
# Display in KB or MB
s_DISPLAY_UNIT="M"

# Init variables
i_READ_MAX=0
i_WRITE_MAX=0
s_READ_MAX_DATE=""
s_WRITE_MAX_DATE=""
i_IOSTAT_READ_KB=0
i_IOSTAT_WRITE_KB=0

# Internal variables
i_NUMBER_OF_DRIVES=0
s_LIST_OF_DRIVES=""
i_UNKNOWN_OPTION=0

# So if you run in screen you see colors :)
export TERM=xterm

# ANSI colors
s_COLOR_RED='\033[0;31m'
s_COLOR_BLUE='\033[0;34m'
s_COLOR_NONE='\033[0m'

for i in "$@"
do
    case $i in
        -b=*|--boot_device=*)
        s_BOOT_DEVICE="${i#*=}"
        shift # past argument=value
        ;;
        -l=*|--loop_times=*)
        i_LOOP_TIMES="${i#*=}"
        shift # past argument=value
        ;;
        -a=*|--all_separatedly=*)
        i_ALL_SEPARATEDLY="${i#*=}"
        shift # past argument=value
        ;;
        *)
        # unknown option
        i_UNKNOWN_OPTION=1
        ;;
    esac
done

if [[ "${i_UNKNOWN_OPTION}" -eq 1 ]]; then
    echo -e "${s_COLOR_RED}Unknown option${s_COLOR_NONE}"
    echo "Use: [-b|--boot_device=sda -l|--loop_times=-1 -a|--all-separatedly=1]"
    exit 1
fi

if [ -z "${s_BOOT_DEVICE}" ]; then
    i_NUMBER_OF_DRIVES=`iostat -d -m | grep "sd\|nvm" | wc --lines`
    s_LIST_OF_DRIVES=`iostat -d -m | grep "sd\|nvm" | awk '{printf $1" ";}'`
else
    echo -e "${s_COLOR_BLUE}Excluding Boot Device:${s_COLOR_NONE} ${s_BOOT_DEVICE}"
    # Add an space after the name of the device to prevent something like booting with sda leaving out drives like sdaa sdab sdac...
    i_NUMBER_OF_DRIVES=`iostat -d -m | grep "sd\|nvm" | grep -v "${s_BOOT_DEVICE} " | wc --lines`
    s_LIST_OF_DRIVES=`iostat -d -m | grep "sd\|nvm" | grep -v "${s_BOOT_DEVICE} " | awk '{printf $1" ";}'`
fi

AR_DRIVES=(${s_LIST_OF_DRIVES})
i_COUNTER_LOOP=0
for s_DRIVE in ${AR_DRIVES};
do
    AR_DRIVES_VALUES_AVG[i_COUNTER_LOOP]=0
    AR_DRIVES_VALUES_READ_MAX[i_COUNTER_LOOP]=0
    AR_DRIVES_VALUES_WRITE_MAX[i_COUNTER_LOOP]=0
    i_COUNTER_LOOP=$((i_COUNTER_LOOP+1))
done


echo -e "${s_COLOR_BLUE}Bandwidth for drives:${s_COLOR_NONE} ${i_NUMBER_OF_DRIVES}"
echo -e "${s_COLOR_BLUE}Devices:${s_COLOR_NONE} ${s_LIST_OF_DRIVES}"
echo ""

while [ "${i_LOOP_TIMES}" -lt 0 ] || [ "${i_LOOP_TIMES}" -gt 0 ] ;
do
    s_READ_PRE_COLOR=""
    s_READ_POS_COLOR=""
    s_WRITE_PRE_COLOR=""
    s_WRITE_POS_COLOR=""
    # In MB
    # s_IOSTAT_OUTPUT_ALL_DRIVES=`iostat -d -m -y 1 1 | grep "sd\|nvm"`
    # In KB
    s_IOSTAT_OUTPUT_ALL_DRIVES=`iostat -d -y 1 1 | grep "sd\|nvm"`
    if [ -z "${s_BOOT_DEVICE}" ]; then
        s_IOSTAT_OUTPUT=`printf "${s_IOSTAT_OUTPUT_ALL_DRIVES}" | awk '{sum_read += $3} {sum_write += $4} END {printf sum_read"|"sum_write"\n"}'`
    else
        # Add an space after the name of the device to prevent something like booting with sda leaving out drives like sdaa sdab sdac...
        s_IOSTAT_OUTPUT=`printf "${s_IOSTAT_OUTPUT_ALL_DRIVES}" | grep -v "${s_BOOT_DEVICE} " | awk '{sum_read += $3} {sum_write += $4} END {printf sum_read"|"sum_write"\n"}'`
    fi

    if [ "${i_ALL_SEPARATEDLY}" -eq 1 ]; then
        i_COUNTER_LOOP=0
        for s_DRIVE in ${AR_DRIVES};
        do
            s_IOSTAT_DRIVE=`printf "${s_IOSTAT_OUTPUT_ALL_DRIVES}" | grep $s_DRIVE | head --lines=1 | awk '{sum_read += $3} {sum_write += $4} END {printf sum_read"|"sum_write"\n"}'`
            i_IOSTAT_READ_KB=`printf "%s" "${s_IOSTAT_DRIVE}" | awk -F '|' '{print $1;}'`
            i_IOSTAT_WRITE_KB=`printf "%s" "${s_IOSTAT_DRIVE}" | awk -F '|' '{print $2;}'`
            if [ "${i_IOSTAT_READ_KB%.*}" -gt ${AR_DRIVES_VALUES_READ_MAX[i_COUNTER_LOOP]%.*} ]; then
                AR_DRIVES_VALUES_READ_MAX[i_COUNTER_LOOP]=${i_IOSTAT_READ_KB}
                echo -e "New Max Speed Reading for ${s_COLOR_BLUE}$s_DRIVE${s_COLOR_NONE} at ${s_COLOR_RED}${i_IOSTAT_READ_KB} KB/s${s_COLOR_NONE}"
            echo
            fi
            if [ "${i_IOSTAT_WRITE_KB%.*}" -gt ${AR_DRIVES_VALUES_WRITE_MAX[i_COUNTER_LOOP]%.*} ]; then
                AR_DRIVES_VALUES_WRITE_MAX[i_COUNTER_LOOP]=${i_IOSTAT_WRITE_KB}
                echo -e "New Max Speed Writing for ${s_COLOR_BLUE}$s_DRIVE${s_COLOR_NONE} at ${s_COLOR_RED}${i_IOSTAT_WRITE_KB} KB/s${s_COLOR_NONE}"
            fi

            i_COUNTER_LOOP=$((i_COUNTER_LOOP+1))
        done
    fi

    i_IOSTAT_READ_KB=`printf "%s" "${s_IOSTAT_OUTPUT}" | awk -F '|' '{print $1;}'`
    i_IOSTAT_WRITE_KB=`printf "%s" "${s_IOSTAT_OUTPUT}" | awk -F '|' '{print $2;}'`

    # CAST to Integer
    if [ "${i_IOSTAT_READ_KB%.*}" -gt ${i_READ_MAX%.*} ]; then
        i_READ_MAX=${i_IOSTAT_READ_KB%.*}
        s_READ_PRE_COLOR="${s_COLOR_RED}"
        s_READ_POS_COLOR="${s_COLOR_NONE}"
        s_READ_MAX_DATE=`date`
        i_READ_MAX_MB=$((i_READ_MAX/1024))
    fi
    # CAST to Integer
    if [ "${i_IOSTAT_WRITE_KB%.*}" -gt ${i_WRITE_MAX%.*} ]; then
        i_WRITE_MAX=${i_IOSTAT_WRITE_KB%.*}
        s_WRITE_PRE_COLOR="${s_COLOR_RED}"
        s_WRITE_POS_COLOR="${s_COLOR_NONE}"
        s_WRITE_MAX_DATE=`date`
        i_WRITE_MAX_MB=$((i_WRITE_MAX/1024))
    fi

    if [ "${s_DISPLAY_UNIT}" == "M" ]; then
        # Get MB
        i_IOSTAT_READ_UNIT=${i_IOSTAT_READ_KB%.*}
        i_IOSTAT_WRITE_UNIT=${i_IOSTAT_WRITE_KB%.*}
        i_IOSTAT_READ_UNIT=$((i_IOSTAT_READ_UNIT/1024))
        i_IOSTAT_WRITE_UNIT=$((i_IOSTAT_WRITE_UNIT/1024))
    fi

    # When a MAX is detected it will be displayed in RED
    echo -e "READ  ${s_READ_PRE_COLOR}${i_IOSTAT_READ_UNIT} MB/s ${s_READ_POS_COLOR} (${i_IOSTAT_READ_KB} KB/s) Max: ${i_READ_MAX_MB} MB/s (${i_READ_MAX} KB/s) (${s_READ_MAX_DATE})"
    echo -e "WRITE ${s_WRITE_PRE_COLOR}${i_IOSTAT_WRITE_UNIT} MB/s ${s_WRITE_POS_COLOR} (${i_IOSTAT_WRITE_KB} KB/s) Max: ${i_WRITE_MAX_MB} MB/s (${i_WRITE_MAX} KB/s) (${s_WRITE_MAX_DATE})"
    if [ "$i_LOOP_TIMES" -gt 0 ]; then
        i_LOOP_TIMES=$((i_LOOP_TIMES-1))
    fi
done

Troubleshooting a shell prompt irresponsible that locks/hangs intermittently

You do df -h or ls / and the terminal freezes and not even CTRL + C works, you have a lock.

Normally this is due to a lock of the system trying to perform an IO.

Could be a physical spinning disk failing, but the most probably nowadays is that you have a network mount point and it is timing out.

If you execute mount and you get a timeout, and when you finally see the list you see a NFS, iSCSI or another kind of Network mount (you will see an Ip Address), check for errors.

To do this in CentOS/RHEL you can do as root:

dmesg | grep -i "timed"

or depending on the System

cat /var/log/messages | grep -i "timed"

You’ll get something like this:

[root@compute01 carles]# dmesg -T | grep timed | head -n5
[Fri Mar 20 02:27:44 2020] nfs: server storage07 not responding, timed out
[Fri Mar 20 02:27:44 2020] nfs: server storage07 not responding, timed out
[Fri Mar 20 02:27:44 2020] nfs: server storage07 not responding, timed out
[Fri Mar 20 02:27:44 2020] nfs: server storage07 not responding, timed out
[Fri Mar 20 02:27:45 2020] nfs: server storage07 not responding, timed out

Please note I use dmesg -T in order to have human readable date instead of Unix Epoch.

You can count the errors today:

[root@compute01 carles]# dmesg -T | grep time | grep "Mon Apr 6" | wc --lines
3123

Datacenters, D&R and coronavirus

I’ve been working for years within Data centers, with D&R strategies, and then in the middle of COVID-19, with huge demands on increments of bandwidth and compute, some DCs decided to do not allow in the Engineers of their customers.

As somebody that had my own Startup and CSP and had infrastructure in DCs and servers from customers in colocation, and has replaced Hw components at 1AM, replaced drives from broken RAIDs, and fixed systems so many times inside so many Datacenters across the world, I’m shocked about that.

I understand health reasons can be argued, but I still have Servers in Datacenters because we all believed they were the most safe place, prepared for disaster and recovery, with security, 24×7… and now, one realise that cannot enter to fix or upgrade the own machines.
Please note, still you can use the remote hands from the DC, although this is not a good idea many times, I’m not sure this will still be an available option when the lock down in those countries becomes more strict.

I’m wondering if DCs current model have any future at all.

I think most of the D&R strategies from now will be in the cloud, in different regions, with different providers, so companies can resist providers or governments letting them down.

Media Player in my Raspberry Pi 4

Just installed a media player in my Raspberry Pi 4

So I mentioned it was one of my pending tasks, to do while I’m confined here, at home, to help the Irish government to stop the quick spread of the coronavirus.

I’m happy that the situation in Ireland has stabilized, unlikely in Spain, where that historical lack of discipline and selfishness and super ego to believe Madrid the capital of the world, and so deciding not to close it for quarantine, will cause a lot of pain. I hope the closing of frontiers in Catalonia works.

Well, what I do you’re probably asking yourself, so I installed LibreELEC https://libreelec.tv/.

They have a very nice SD image writer for Linux, Mac and Windows, that will install the proper image on the micro-SD for your ARM device.

This Raspberry Pi 4 comes with Wifi integrated and a Gigabit Ethernet network port.

When I was in Barcelona, I had Kodi with Raspberry pi 2 and version 3.

This model v. 4 is much more cooler. I bought the 4GB version, and has 2xHDMI 4K.

So it is great to connect to any modern TV.

In Barcelona, I have Linux tower as NFS Server sharing my files with the Pi. Work good, even for the 100Mbit NIC of the version 3, but at that time I was only playing Full HD as the Pi didn’t supported greater resolution, and I only had that resolution on my displays too.

For now, I’m going to explore how is reading from a USB 3.0. Let’s see if it’s able to play smoothly.

The cool thing also is that I have SSH access, and so I can use the Pi for many more things. :)

I have my first update, I noticed that copying to that USB was not the best for me, as I tried to copy a .MKV file of 4.9GB and I encountered the limit of 4GB of FAT32. I could format the USB as ext4, but what I did is, SSH into the box, I see that I have two partitions on the SD for booting the Pi, the second one is a ext4 called storage. So I copied to the SD, through the network, using sftp the file I wanted.

The Gigabit connection was fast, but when the buffer fulled it started to show the real speed of the SD which is 15MB/s for writing.

Ext4 has no problem in holding a file 4.9GB so I’m watching my movie now. Will think about setting a NFS for the Pi as it will be very convenient. :)

I have an external, remote, keyboard logitech, but it happens that LibreELEC recognizes my Sony command, from the television. I don’t need the keyboard/mouse. Nice.

Here you can see my Raspberry Pi 4, connected to TV, in “combat mode”, naked, as PoC, before setting in its definitive place behind the TV.

Playing from the external USB 3.0 stick was also fluid, allowing 4K perfectly.

The only problem I has was when I was pushing movies to the USB through the network, and playing at the same time from the SD. It seems like the Raspberry reached its limits doing this and playing stuck frequently.

Upgrading my new HP 14-bp060sa

As the company I was working for, Sanmina, has decided to move all the Software Development to Colorado, US, and closing the offices in Bishopstown, Cork, Ireland I found myself with the need to get a new laptop. At work I was using two Dell laptops, one very powerful and heavy equipped with an Intel Xeon processor and 32 GB of RAM. The other a lightweight one that I updated to 32 GB of RAM.

I had an accident around 8 months ago, that got my spine damaged, and so I cannot carry much weight.

My personal laptops at home, in Ireland are a 15″ with 16 GB of RAM, too heavy, and an Acer 11,6″ with 8GB of RAM and SSD (I upgraded it), but unfortunately the screen crashed. I still use it through the HDMI port. My main computer is a tower with a Core i7, 64GB of RAM and a Samsung NVMe SSD drive. And few Raspberrys Pi 4 and 3 :)

I was thinking about what ultra-lightweight laptop to buy, but I wanted to buy it in Barcelona, as I wanted a Catalan keyboard (the layout with the broken ç and accents). I tried by Amazon.es but I have problems to have shipped the Catalan keyboard layout laptops to my address in Ireland.

I was trying to find the best laptop for me.

While I was investigating I found out that none of the laptops in the market were convincing me.

The ones in around 1Kg, which was my initial target, were too big, and lack a proper full size HDMI port and Gigabit Ethernet. Honestly, some models get the HDMI or the Ethernet from an USB 3.1, through an adapter, or have mini-HDMI, many lack the Gigabit port, which is very annoying. Also most of the models come with 8GB of RAM only and were impossible to upgrade. I enrolled my best friend in my quest, in the research, and had the same conclusions.

I don’t want to have to carry adapters with me to just plug to a monitor or projector. I don’t even want to carry the power charger. I want a laptop that can work with me for a complete day, a full work session, without needing to recharge.

So while this investigation was going on, I decided to buy a cheap laptop with a good trade off of weight and cost, in order to be able to work on the coffee. I needed it for writing documents in Google Docs, creating microservices architectures, programming in Java and PHP, and writing articles in my blog. I also decided that this would be my only laptop with Windows, as honestly I missed playing Star Craft 2, and my attempts with Wine and Linux did not success.

Not also, for playing games :) , there are tools that are only available for Windows or for Mac Os X and Windows, like: POSTMAN, Kitematic for managing dockers visually, vSphere…

(Please note, as I reviewed the article I realized that POSTMAN is available for Linux too)

Please note: although I use mainly Linux everywhere (Ubuntu, CentOS, and RedHat mainly) and I contribute to Open Source projects, I do have Windows machines.

I created my Start up in 2004, and I still have Windows Servers, physical machines in a Data Center in Barcelona, and I still have VMs and Instances in Public Clouds with Windows Servers. Also I programmed some tools using Visual Studio and Visual Basic .NET, ASP.NET and C#, but when I needed to do this I found more convenient spawn an instance in Amazon or Azure and pay for its use.

When I created my Start up I offered my infrastructure as a way to get funding too, and I offered VMs with VMWare. I found that having my Mail Servers in VMs was much more convenient for Backups, cloning, to scale up, to avoid disruption and for Disaster and Recovery.

I wanted a cheap laptop that will not make feel bad if transporting it in a daily basis gets a hit and breaks, or that if it rains (and this happens more than often in Ireland) and it breaks is not super-hurtful, or even if it gets stolen. Yes, I’m from a big city, like is Barcelona, Catalonia, and thieves are a real problem. I travel, so I want a laptop decent enough that I can take to travel, and for going for a coffee, coding anything, and I feel comfortable enough that if something happens to it is not the end of the world.

Cork is not a big city, so the options were reduced. I found a laptop that meets my needs.

I got a HP s14-bp060sa for 439€.

It is equipped with a Intel® Core™ i3-6006U (2 GHz, 3 MB cache, 2 cores) , a 500GB SATA HDD, and 4 GB of DDR4 RAM.

The information on HP webpage is really scarce, but checking other pages I was able to see that the motherboard has 2 memory banks, accepting a max of 16GB of RAM.

I saw that there was an slot, unclear if supporting NVMe SSD drives, but supporting M.2 SSD for sure.

So I bought in Amazon 2x8GB and a M.2 500GB drive.

Since I was 5 years old I’ve been upgrading and assembling by myself all the computers. And this is something that I want to keep doing. It keeps me sharp, knowing the new ports, CPUs, and motherboard architectures, and keeps me in contact with the Hardware. All my life I’ve thought that specializing Software Engineers and Systems Engineers, like if computers were something separate, is a mistake, so I push myself to stay up to date of the news in all the fields.

I removed the spinning 500 GB SATA HDD, cause it’s slow and it consumes a lot of energy. With the M.2 SSD the battery last forever.

The interesting part is how I cloned the drive from the Spinning HDD to the new M.2.

I did:

  • Open the computer (see pics below) and Insert the new drive M.2
  • Boot with an USB Linux Rescue distribution (to do that I had to enable Legacy Boot on BIOS and boot with the USB)
  • Use lsblk command to identify the HDD drive, it was easy as it was the one with partitions
  • dd from if=/dev/sda to of=/dev/sdb with status=progress to see live status and speed (around 70MB/s) and estimated time to complete.
  • Please note that the new drive should be bigger or at least have the same number of bytes to avoid problems with the last partition.
  • I removed the HDD drive, this reduces the weight of my laptop by 100 grams
  • Disable Legacy Boot, and boot the computer. Windows started perfectly :)

I found so few information about this model, that I wanted to share the pictures with the Community. Here are the pictures of the upgrade process.

Here you can see the Crucial M.2 SSD installed and the Spinning HDD removed. Yes, I did in a coffee :)
Final step, installing the 2x8GB RAM memory modules

A sample forensic post mortem for a iSCSI Initiator (client) that had connectivity problems to the Server

My Team in The States report an issue with a Red Hat iSCSI Initiator having issues connecting to a Volume exported by a ZFS Server.

There is an issue on GitLab.

As I always do when I troubleshot a problem, I create a forensics post-mortem document recording everything I do, so later, others can learn how I fix it, or they can learn the steps I did in order to troubleshoot.

Please note: Some Ip addresses have been manually edited.

2019-08-09 10:20:10 Start of the investigation

I log into the Server, with Ip Address: xxx.yyy.16.30. Is an All-Flash-Array Server with RHEL6.10 and DRAID v.08091350.

Htop shows normal/low activity.

I check the addresses in the iSCSI Initiator (client), to make sure it is connecting to the right Server.

[root@Host-164 ~]# ip addr list 
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
    valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:25:90:c5:1e:ea brd ff:ff:ff:ff:ff:ff
    inet xxx.yyy.13.164/16 brd xxx.yyy.255.255 scope global eno1
    valid_lft forever preferred_lft forever
    inet6 fe80::225:90ff:fec5:1eea/64 scope link
    valid_lft forever preferred_lft forever
3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000
    link/ether 00:25:90:c5:1e:eb brd ff:ff:ff:ff:ff:ff 
4: enp3s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 24:8a:07:a4:94:9c brd ff:ff:ff:ff:ff:ff
    inet 192.168.100.164/24 brd 192.168.100.255 scope global enp3s0f0
    valid_lft forever preferred_lft forever
    inet6 fe80::268a:7ff:fea4:949c/64 scope link
    valid_lft forever preferred_lft forever 
5: enp3s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 24:8a:07:a4:94:9d brd ff:ff:ff:ff:ff:ff
    inet 192.168.200.164/24 brd 192.168.200.255 scope global enp3s0f1
    valid_lft forever preferred_lft forever inet6 fe80::268a:7ff:fea4:949d/64 scope link
    valid_lft forever preferred_lft forever                                                                                                       
                                                                                                                                

I see the luns on the host, connecting to the 10Gbps of the Server:

[root@Host-164 ~]# iscsiadm -m session
 tcp: [10] 192.168.100.30:3260,1 iqn.2003-01.org.linux-iscsi:vol4 (non-flash)
 tcp: [11] 192.168.100.30:3260,1 iqn.2003-01.org.linux-iscsi:vol5 (non-flash)
 tcp: [7] 192.168.100.30:3260,1 iqn.2003-01.org.linux-iscsi:vol1 (non-flash)
 tcp: [8] 192.168.100.30:3260,1 iqn.2003-01.org.linux-iscsi:vol2 (non-flash)
 tcp: [9] 192.168.100.30:3260,1 iqn.2003-01.org.linux-iscsi:vol3 (non-flash)

Finding the misteries…

Executing cat /proc/partitions is a bit strange respect mount:

[root@Host-164 ~]# cat /proc/partitions
 major minor #blocks name
 8  0 125034840 sda
 8  1 512000 sda1
 8  2 124521472 sda2
 253 0 12505088 dm-0
 253 1 112013312 dm-1
 8 32 104857600 sdc
 8 16 104857600 sdb
 8 48 104857600 sdd
 8 64 104857600 sde
 8 80 104857600 sdf

As mount has this:

/dev/sdg1 on /mnt/large type ext4 (ro,relatime,seclabel,data=ordered)

Lsblk shows that /dev/sdg is not present:

[root@Host-164 ~]# lsblk
 NAME
 MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
 sda 8:0 0 119.2G 0 disk
 ├─sda1 8:1 0 500M 0 part /boot
 └─sda2 8:2 0 118.8G 0 part
  ├─rhel-swap 253:0 0 11.9G 0 lvm [SWAP]
  └─rhel-root 253:1 0 106.8G 0 lvm /
 sdb 8:16 0 100G 0 disk
 sdc 8:32 0 100G 0 disk
 sdd 8:48 0 100G 0 disk
 sde 8:64 0 100G 0 disk
 sdf 8:80 0 100G 0 disk

And as expected:

[root@Host-164 ~]# ls -al /mnt/large
 ls: reading directory /mnt/large: Input/output error
 total 0

I see that the Volumes appear to not having being partitioned:

[root@Host-164 ~]# fdisk /dev/sdf
 Welcome to fdisk (util-linux 2.23.2).
 Changes will remain in memory only, until you decide to write them.
 Be careful before using the write command.
 Device does not contain a recognized partition table
 Building a new DOS disklabel with disk identifier 0xddf99f40.
 Command (m for help): p
 Disk /dev/sdf: 107.4 GB, 107374182400 bytes, 209715200 sectors
 Units = sectors of 1 * 512 = 512 bytes
 Sector size (logical/physical): 512 bytes / 512 bytes
 I/O size (minimum/optimal): 512 bytes / 512 bytes
 Disk label type: dos
 Disk identifier: 0xddf99f40
 Device Boot
 Start
 End
 Blocks Id System
 Command (m for help): q

I create a partition and format with ext2

[root@Host-164 ~]# mke2fs /dev/sdb1
 mke2fs 1.42.9 (28-Dec-2013)
 Filesystem label=
 OS type: Linux
 Block size=4096 (log=2)
 Fragment size=4096 (log=2)
 Stride=0 blocks, Stripe width=0 blocks
 6553600 inodes, 26214144 blocks
 1310707 blocks (5.00%) reserved for the super user
 First data block=0
 Maximum filesystem blocks=4294967296
 800 block groups
 32768 blocks per group, 32768 fragments per group
 8192 inodes per group
 Superblock backups stored on blocks:
 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
 4096000, 7962624, 11239424, 20480000, 23887872
 Allocating group tables: done
 Writing inode tables: done
 Writing superblocks and filesystem accounting information: done

I mount:

[root@Host-164 ~]# mount /dev/sdb1 /mnt/vol1

I fill the volume from the client, and it works. I check the activity in the Server with iostat and there are more MB/s written to the Server’s drives than actually speed copying in the client.

I completely fill 100GB but speed is slow. We are working on a 10Gbps Network so I expected more speed.

I check the connections to the Server:

[root@obs4602-1810 ~]# netstat | grep -v "unix"
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address               Foreign Address             State      
tcp        0      0 192.168.10.10:iscsi-target  192.168.10.180:55300        ESTABLISHED
tcp        0      0 192.168.10.10:iscsi-target  192.168.10.180:55298        ESTABLISHED
tcp        0      0 xxx.yyy.18.10:ssh            xxx.yyy.12.154:57137         ESTABLISHED
tcp        0      0 192.168.10.10:iscsi-target  192.168.10.180:55304        ESTABLISHED
tcp        0      0 192.168.10.10:iscsi-target  192.168.10.180:55301        ESTABLISHED
tcp        0      0 192.168.10.10:iscsi-target  192.168.10.180:55306        ESTABLISHED
tcp        0      0 xxx.yyy.18.10:ssh            xxx.yyy.12.154:56395         ESTABLISHED
tcp        0      0 xxx.yyy.18.10:ssh            xxx.yyy.14.52:57330          ESTABLISHED
tcp        0      0 192.168.10.10:iscsi-target  192.168.10.180:55296        ESTABLISHED
tcp        0      0 192.168.10.10:iscsi-target  192.168.10.180:55305        ESTABLISHED
tcp        0      0 xxx.yyy.18.10:ssh            xxx.yyy.12.154:57133         ESTABLISHED
tcp        0      0 192.168.10.10:iscsi-target  192.168.10.180:55303        ESTABLISHED
tcp        0      0 192.168.10.10:iscsi-target  192.168.10.180:55299        ESTABLISHED
tcp        0      0 192.168.10.10:iscsi-target  192.168.10.176:57542        ESTABLISHED
tcp        0      0 192.168.10.10:iscsi-target  192.168.10.180:55302        ESTABLISHED

I see many connections from Host 180, I check that and another member of the Team is using that client to test with vdbench against the Server.

This explains the slower speed I was getting.

Conclusions

  1. There was a local problem on the Host. The problems with the disconnection seem to be related to a connection that was lost (sdg). All that information was written to iSCSI buffer, not to the Server. In fact, that volume was mapped in the system with another letter, sdg was not in use.
  2. Speed was slow due to another client pushing Data to the Server too
  3. Windows clients with auto reconnect option are not reporting timeout reports while in Red Hat clients iSCSI connection timeouts. It should be increased

2020-03-10 22:16 IST TIP: At that time we were using Google suite and Skype to communicate internally with the different members across the world. If we had used a tool like Slack, and we had a channel like #engineering for example or #sanjoselab, then I could have paged and asked “Is somebody using obs4602-1810?