Tag Archives: parted

Installing Red Hat Linux in a M.2 that crashes the installer

Few months ago I encountered with a problem with RHEL installer and some of the M.2 drives.

I’ve productized my Product, to be released with M.2 booting SATA drives of 128GB.

The procedure for preparing the Servers (90 and 60 drives, Cold Storage) was based on the installation of RHEL in the M.2 128GB drive. Then the drives are cloned.

Few days before mass delivery the company request to change the booting M.2 drives for others of our own, 512 GB drives.

I’ve tested many different M.2 drives and all of them were slightly different.

Those 512 GB M.2 drives had one problem… Red Hat installer was failing with a python error.

We were running out of time, so I decided to clone directly from the 128GB M.2 working card, with everything installed, to the 512 GB card. Doing that is so easy as booting with a Rescue Linux USB disk, and then doing a dd from the 128GB drive to the 512GB drive.

Booting with a live USB system is important, as Filesystem should not be mounted to prevent corruption when cloning.

Then, the next operation would be booting the 512 GB drive and instructing Linux to claim the additional space.

Here is the procedure for doing it (note, the OS installed in the M.2 was CentOS in this case):

Determine the device that needs to be operated on (this will usually be the boot drive); in this example it is /dev/sdae

# df -h 
Filesystem                             Size  Used Avail Use% Mounted on
/dev/mapper/centos_4602c-root           50G  2.4G   47G   1% /
devtmpfs                                16G     0   16G   0% /dev
tmpfs                                   16G     0   16G   0% /dev/shm
tmpfs                                   16G  395M   16G   3% /run
tmpfs                                   16G     0   16G   0% /sys/fs/cgroup
/dev/sdae1                            1014M  146M  869M  15% /boot
/dev/mapper/centos_4602c-home           57G   33M   57G   1% /home
tmpfs                                  3.2G     0  3.2G   0% /run/user/0
logs                                    68G  7.4M   68G   1% /logs
mysql                                  481G  128K  481G   1% /mysql
N58-C3-D16-P3-S1                       491T  334G  490T   1% /N58-C3-D16-P3-S1

Extend the OS partition using Parted

# parted /dev/sdae
print
resizepart PART_NUMBER END
quit

Where:

  • PART_NUMBER: Is the partition number obtained from the “print” command
  • END: This is the end of the drive; for example, for a 50GB drive, enter 50000

Examining the LVM Partitions

The centos_4602c-root LVM partition is the one we want to extend.

# lsblk /dev/sdae
NAME                          MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sdae                           65:224  0   477G  0 disk 
├─sdae1                        65:225  0     1G  0 part /boot
└─sdae2                        65:226  0 475.9G  0 part 
  ├─centos_4602c-root         253:0    0    50G  0 lvm  /
  ├─centos_4602c-swap         253:1    0  11.9G  0 lvm  [SWAP]
  └─centos_4602c-home         253:2    0  56.3G  0 lvm  /home

Using LVM Commands

The following commands will:

  • Display the LVM volumes on the system
  • Resize a volume (device)
  • Re-display the updated LVM volumes
  • Extend the desired LVM partition (lvextend command)
# pvdisplay
  /dev/sdbm: open failed: No medium found
  /dev/sdbn: open failed: No medium found
  /dev/sdbj: open failed: No medium found
  /dev/sdbk: open failed: No medium found
  /dev/sdbl: open failed: No medium found
  --- Physical volume ---
  PV Name               /dev/sdae2
  VG Name               centos_4602c
  PV Size               118.24 GiB / not usable 3.00 MiB
  Allocatable           yes (but full)
  PE Size               4.00 MiB
  Total PE              30269
  Free PE               0
  Allocated PE          30269
  PV UUID               yvHO6t-cYHM-CCCm-2hOO-mJWf-6NUI-zgxzwc
# pvresize /dev/sdae2
  /dev/sdbm: open failed: No medium found
  /dev/sdbn: open failed: No medium found
  /dev/sdbj: open failed: No medium found
  /dev/sdbk: open failed: No medium found
  /dev/sdbl: open failed: No medium found
  Physical volume "/dev/sdae2" changed
  1 physical volume(s) resized or updated / 0 physical volume(s) not resized
# pvdisplay
  /dev/sdbm: open failed: No medium found
  /dev/sdbn: open failed: No medium found
  /dev/sdbj: open failed: No medium found
  /dev/sdbk: open failed: No medium found
  /dev/sdbl: open failed: No medium found
  --- Physical volume ---
  PV Name               /dev/sdae2
  VG Name               centos_4602c
  PV Size               <475.84 GiB / not usable 3.25 MiB
  Allocatable           yes 
  PE Size               4.00 MiB
  Total PE              121813
  Free PE               91544
  Allocated PE          30269
  PV UUID               yvHO6t-cYHM-CCCm-2hOO-mJWf-6NUI-zgxzwc
# vgdisplay
  --- Volume group ---
  VG Name               centos_4602c
  System ID             
  Format                lvm2
  Metadata Areas        2
  Metadata Sequence No  6
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                3
  Open LV               3
  Max PV                0
  Cur PV                2
  Act PV                2
  VG Size               <475.93 GiB
  PE Size               4.00 MiB
  Total PE              121838
  Alloc PE / Size       30269 / <118.24 GiB
  Free  PE / Size       91569 / 357.69 GiB
  VG UUID               ORcp2t-ntwQ-CNSX-NeXL-Udd9-htt9-kLfvRc
# lvextend -l +91569 /dev/centos_4602c/root 
  Size of logical volume centos_4602c/root changed from 50.00 GiB (12800 extents) to <407.69 GiB (104369 extents).
  Logical volume centos_4602c/root successfully resized.

Extend the xfs file system to use the extended space

The xfs file system for the root partition will need to be extended to use the extra space; this is done using the xfs_grow command as shown below.

# xfs_growfs /dev/centos_4602c/root  
meta-data=/dev/mapper/centos_4602c-root isize=512    agcount=4, agsize=3276800 blks
         =                       sectsz=512   attr=2, projid32bit=1          =                       crc=1        finobt=0 spinodes=0 data     =                       bsize=4096   blocks=13107200, imaxpct=25 
         =                       sunit=0      swidth=0 blks 
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1 log      =internal               bsize=4096   blocks=6400, version=2          =                       sectsz=512   sunit=0 blks, lazy-count=1 
realtime =none                   extsz=4096   blocks=0, rtextents=0
data blocks changed from 13107200 to 106873856 

Verify the results

Note that the c-root LVM partition is now 408GB.

# df -h 
Filesystem                             Size  Used Avail Use% Mounted on
/dev/mapper/centos_4602c-root          408G  2.4G  406G   1% /
devtmpfs                                16G     0   16G   0% /dev
tmpfs                                   16G     0   16G   0% /dev/shm
tmpfs                                   16G  395M   16G   3% /run
tmpfs                                   16G     0   16G   0% /sys/fs/cgroup
/dev/sdae1                            1014M  146M  869M  15% /boot
/dev/mapper/centos_4602c-home           57G   33M   57G   1% /home
tmpfs                                  3.2G     0  3.2G   0% /run/user/0
logs                                    68G  7.4M   68G   1% /logs
mysql                                  481G  128K  481G   1% /mysql
N58-C3-D16-P3-S1                       491T  334G  490T   1% /N58-C3-D16-P3-S1

So now we are able to clone directly from one 512GB to another.

You may be interested to take a look to the commands:

growpart
resize2fs
xfs_growfs (from xfsprogs package)

If you want to do this in an instance in Amazon, here is a very good documentation.

Some handy tricks for working with ZFS

Adding a RAM drive as SLOG (ZIL)

I came with this solution when one of my 4U60 Servers had two slots broken. You’ll not use this in Production, as SLOG loses its function, but I managed to use one $40K USD broken Server and to demonstrate that the Speed of the SLOG device (ZFS Intented Log or ZIL device) sets the constraints for the writing speed.

The ZFS DRAID config I was using required 60 drives, basically 58 14TB Spinning drives and 2 SSD for the SLOG ZIL. As I only had 58 slots I came with this idea.

This trick can be very useful if you have a box full of Spinning drives, and when sharing by iSCSI zvols you get disconnected in the iSCSI Initiator side. This is typical when ZFS has only Spinning drives and it has no SLOG drives (dedicated fast devices for the ZIL, ZFS INTENDED LOG)

Create a single Ramdrive of 10GB of RAM:

modprobe brd rd_nr=1 rd_size=10485760 max_part=0

Confirm ram0 device exists now:

ls /dev/ram*

Confirm that the pool is imported:

zpool list

Add to the pool:

zpool add carles-N58-C3-D16-P2-S4 log ram0

In the case that you want to have two ram devices as SLOG devices, in mirror.

zpool add carles-N58-C3-D16-P2-S4 log mirror <partition/drive 1> <partition/drive 2>

It is interesting to know that you can work with partitions instead of drives. So for this test we could have partitioned ram0 with 2 partitions and make it work in mirror. You’ll see how much faster the iSCSI communication goes over the network. The writing speed of the ZIL SLOG device is the constrain for ingesting Data from the Network to the Server.

Creating a partition bigger than 2TiB

Master Boot Record (MBR) based partitioning is limited to 2TiB however GUID Partition Table (GPT) has a limit of 8 ZiB.

That’s something very simply, but make you lose time if you’re partitioning big iSCSI Shares, or ZFS Zvols, so here is the trick:

[root@CTRLA-18 ~]# cat /etc/redhat-release 
 Red Hat Enterprise Linux Server release 7.6 (Maipo)
 [root@CTRLA-18 ~]# parted /dev/zvol/N58-C19-D2-P1-S1/vol54854gb 
 GNU Parted 3.1
 Using /dev/zd0
 Welcome to GNU Parted! Type 'help' to view a list of commands.
 (parted) mklabel gpt
 Warning: The existing disk label on /dev/zd0 will be destroyed and all data on this disk will be lost. Do you want to continue?
 Yes/No? y                                                                 
 (parted) print                                                            
 Model: Unknown (unknown)
 Disk /dev/zd0: 58.9TB
 Sector size (logical/physical): 512B/65536B
 Partition Table: gpt
 Disk Flags: 
 Number  Start  End  Size  File system  Name  Flags
 (parted) mkpart primary 0GB 58.9TB                                        
 (parted) print                                                            
 Model: Unknown (unknown)
 Disk /dev/zd0: 58.9TB
 Sector size (logical/physical): 512B/65536B
 Partition Table: gpt
 Disk Flags: 
 Number  Start   End     Size    File system  Name     Flags
  1      1049kB  58.9TB  58.9TB               primary
 (parted) quit                                                             
 Information: You may need to update /etc/fstab.
 [root@CTRLA-18 ~]# mkfs                                                   
 mkfs         mkfs.btrfs   mkfs.cramfs  mkfs.ext2    mkfs.ext3    mkfs.ext4    mkfs.minix   mkfs.xfs     
 [root@CTRLA-18 ~]# mkfs.ext4 /dev/zvol/N58-C19-D2-P1-S1/vol54854gb
 mke2fs 1.42.9 (28-Dec-2013)
....
[root@CTRLA-18 ~]# mount /dev/zvol/N58-C19-D2-P1-S1/vol54854gb /Data
[root@CTRLA-18 ~]# df -h
 Filesystem             Size  Used Avail Use% Mounted on
 /dev/mapper/rhel-root   50G  2.5G   48G   5% /
 devtmpfs               126G     0  126G   0% /dev
 tmpfs                  126G     0  126G   0% /dev/shm
 tmpfs                  126G  1.1G  125G   1% /run
 tmpfs                  126G     0  126G   0% /sys/fs/cgroup
 /dev/sdp1             1014M  151M  864M  15% /boot
 /dev/mapper/rhel-home   65G   33M   65G   1% /home
 logs                    49G  349M   48G   1% /logs
 mysql                  9.7G  128K  9.7G   1% /mysql
 tmpfs                   26G     0   26G   0% /run/user/0
 /dev/zd0                54T   20K   51T   1% /Data

ZFS is unable to use a disk

Some times, after creating many pools ZFS may be unable to create a new pool using a drive that is perfectly fine. In this situation, the ideal is wipe the first areas of it, or all of it if you want. If it’s an SSD that is very fast:

dd if=/dev/zero of=/dev/sdc bs=1M status=progress

The status=progress will show a nice progress bar.

Filling a half Petabyte pool as fast as possible

To fill a 60 drives pool composed by 10TB or 14TB spinning drives, so more than half PB, in order to test with real data, you can use this trick:

First, write to the Dataset directly, that’s way much more faster than using zvols.

Secondly, disable the ZIL, set sync=disabled.

Third, use a file in memory to avoid the paytime of reading the file from disk.

Fourth, increase the recordsize to 1M for faster filling (in my experience).

You can use this script of mine that does everything for you, normally you would like to run it inside an screen session, and create a Dataset called Data. The script will mount it in /Data (zfs set mountpoint=/data YOURPOOL/Data):

#!/usr/bin/env bash
# Created by Carles Mateo
FILE_ORIGINAL="/run/urandom.1GB"
FILE_PATTERN="/Data/urandom.1GB-clone."
# POOL="N56-C5-D8-P3-S1"
POOL="N58-C3-D16-P3-S1"
# The starting number, if you interrupt the filling process, you can update it just by updating this number to match the last partially written file
i_COPYING_INITIAL_NUMBER=1
# For 75% of 10TB (3x(16+3)+1 has 421TiB, so 75% of 421TiB or 431,104GiB is 323,328) use 323328
# i_COPYING_FINAL_NUMBER=323328
# For 75% of 10TB, 5x(8+3)+1 ZFS sees 352TiB, so 75% use 270336
# For 75% of 14TB, 3x(16+3)+1, use 453120
i_COPYING_FINAL_NUMBER=453120

# Creating an array that will hold the speed of the latest 1 minute
a_i_LATEST_SPEEDS=(0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0)
i_POINTER_SPEEDS=0
i_COUNTER_SPEEDS=-1
i_ITEMS_KEPT_SPEEDS=60
i_AVG_SPEED=0
i_FILES_TO_BE_COPIED=$((i_COPYING_FINAL_NUMBER-i_COPYING_INITIAL_NUMBER))

get_average_speed () {
# Calculates the Average Speed
   i_AVG_SPEED=0
   for i_index in {0..59..1}
       do
           i_SPEED=$((a_i_LATEST_SPEEDS[i_index]))
           i_AVG_SPEED=$((i_AVG_SPEED + i_SPEED))
       done
   i_AVG_SPEED=$((i_AVG_SPEED/((i_COUNTER_SPEEDS)+1)))
}


echo "Bash version ${BASH_VERSION}..."

echo "Disabling sync in the pool $POOL for faster speed"
zfs set sync=disabled $POOL
echo "Maximizing performance with recordsize"
zfs set recordsize=1M ${POOL}
zfs set recordsize=1M ${POOL}/Data
echo "Mounting the Dataset Data"
zfs set mountpoint=/Data ${POOL}/Data
zfs mount ${POOL}/Data

echo "Checking if file ${FILE_ORIGINAL} exists..."
if [[ -f ${FILE_ORIGINAL} ]]; then
    ls -al ${FILE_ORIGINAL}
    sha1sum ${FILE_ORIGINAL}
else
    echo "Generating file..."
    dd if=/dev/urandom of=${FILE_ORIGINAL} bs=1M count=1024 status=progress
fi

echo "Starting filling process..."
echo "We are going to copy ${i_FILES_TO_BE_COPIED} , starting from: ${i_COPYING_INITIAL_NUMBER} to: ${i_COPYING_FINAL_NUMBER}"

for ((i_NUMBER=${i_COPYING_INITIAL_NUMBER}; i_NUMBER<=${i_COPYING_FINAL_NUMBER}; i_NUMBER++));
    do
        s_datetime_ini=$(($(date +%s%N)/1000000))
        DATE_NOW=`date '+%Y-%m-%d_%H-%M-%S'`
        echo "${DATE_NOW} Copying ${FILE_ORIGINAL} to ${FILE_PATTERN}${i_NUMBER}"
        cp ${FILE_ORIGINAL} ${FILE_PATTERN}${i_NUMBER}
        s_datetime_end=$(($(date +%s%N)/1000000))
        MILLISECONDS=$(expr "$s_datetime_end" - "$s_datetime_ini")
        if [[ ${MILLISECONDS} -lt 1 ]]; then
            BANDWIDTH_MBS="Unknown (too fast)"
            # That sould not happen, but if did, we don't account crazy speeds
        else
            BANDWIDTH_MBS=$((1000*1024/MILLISECONDS))
            # Make sure the Array space has been allocated
            if [[ ${i_POINTER_SPEEDS} -gt ${i_COUNTER_SPEEDS} ]]; then
                # Add item to the Array the first times only
                a_i_LATEST_SPEEDS[i_POINTER_SPEEDS]=${BANDWIDTH_MBS}
                i_COUNTER_SPEEDS=$((i_COUNTER_SPEEDS+1))
            else
                a_i_LATEST_SPEEDS[i_POINTER_SPEEDS]=${BANDWIDTH_MBS}
            fi
            i_POINTER_SPEEDS=$((i_POINTER_SPEEDS+1))
            if [[ ${i_POINTER_SPEEDS} -ge ${i_ITEMS_KEPT_SPEEDS} ]]; then
                i_POINTER_SPEEDS=0
            fi
            get_average_speed
        fi
        i_FILES_TO_BE_COPIED=$((i_FILES_TO_BE_COPIED-1))
        i_REMAINING_TIME=$((1024*i_FILES_TO_BE_COPIED/i_AVG_SPEED))
        i_REMAINING_HOURS=$((i_REMAINING_TIME/3600))
        echo "File cloned in ${MILLISECONDS} milliseconds at ${BANDWIDTH_MBS} MB/s"
        echo "Avg. Speed: ${i_AVG_SPEED} MB/s Remaining Files: ${i_FILES_TO_BE_COPIED} Remaining seconds: ${i_REMAINING_TIME} s. (${i_REMAINING_HOURS} h.)"
    done

echo "Enabling sync=always"
zfs set sync=always ${POOL}
echo "Setting back recordsize to 128K"
zfs set recordsize=128K ${POOL}
zfs set recordsize=128K ${POOL}/Data
echo "Unmounting /Data"
zfs set mountpoint=none ${POOL}/Data

Creating a Sparse file that you can partition or create a loopback on it

I know, your laptop has 512GB of M.2 SSD or NVMe, so that’s it.

Well, you can create a sparse file much more bigger than your capacity, and use 0 bytes of it at all.

For example:

truncate -s 1600GB file_disk0.img

Then you can add a loop device:

sudo losetup -f /dev/carles/file_disk0.img

I do with the 5 I created.

Then you can check that they exist with:

lsblk

or

cat /proc/partitions

The loop devices will appear under /dev/ now