Here is an easy trick that you can use for adding swap temporarily to a Server, VMs or Workstations, if you are in an emergency.
In this case I had a cluster composed from two instances running out of memory.
I got an alert for one of the Servers, reporting that only had 7% of free memory.
Immediately I checked it, but checked also any other forming part of the cluster.
Another one appeared, had just only a bit more memory than the other, but was considered in Critical condition too.
The owner of the Service was contacted and asked if we can hold it until US Business hours. Those Servers were going to be replaced next day in US Business hours, and when possible it would be nice not to wake up the Team. It was day in Europe, but night in US.
I checked the status of the Server with those commands:
# df -h
There are 13GB of free space in /. More than enough to be safe as this service doesn’t use much.
# free -h
total used free shared buff/cache available
Mem: 5.7G 4.8G 139M 298M 738M 320M
Swap: 0B 0B 0B
I checked the memory, ok, there are only 139MB free in this node, but 738MB are buff/cache. Buff/Cache is memory used by Linux to optimize I/O as long as it is not needed by application. These 738 MB in buff/cache (or most of it) will be used if needed by the System. The field available corresponds to the memory that is available for starting new applications (not counting the swap if there was any), and basically is the free memory plus a fragment of the buff/cache. I’m sure we could use more than 320MB and there is a lot if buff/cache, but to play safe we play by the book.
With that in mind it seemed that it would hold perfectly to Business hours.
I checked top. It is interesting to mention the meaning of the Column RES, which is resident memory, in other words, the real amount of memory that the process is using.
I had a Java process using 4.57GB of RAM, but a look at how much Heap Memory was reserved and actually being used showed a Heap of 4GB (Memory reserved) and 1.5GB actually being used for real, from the Heap, only.
It was unlikely that elastic search would use all those 4GB, and seemed really unlikely that the instance will suffer from memory starvation with 2.5GB of 4GB of the Heap free, ~1GB of RAM in buffers/cache plus free, so looked good.
To be 100% sure I created a temporary swap space in a file on the SSD.
# fallocate -l 1G /swapfile-temp
# dd if=/dev/zero of=/swapfile-temp bs=1024 count=1048576 status=progress
1034236928 bytes (1.0 GB) copied, 4.020716 s, 257 MB/s
1048576+0 records in
1048576+0 records out
1073741824 bytes (1.1 GB) copied, 4.26152 s, 252 MB/s
If you ask me why I had to dd, I will tell you that I needed to. I checked with command blkid and filesystem was xfs. I believe that was the reason.
The speed writing to the file is fair enough for a swap.
# chmod 600 /swapfile-temp
# mkswap /swapfile-temp
Setting up swapspace version 1, size = 1048572 KiB
no label, UUID=5fb12c0c-8079-41dc-aa20-21477808619a
# swapon /swapfile-temp
I check that memory is good:
# free -h
total used free shared buff/cache available
Mem: 5.7G 4.8G 117M 298M 770M 329M
Swap: 1.0G 0B 1.0G
And finally I check that the Kernel parameter swappiness is not too aggressive:
However the 12 Gbps SAS SSD were returning Checksum errors in ZFS when I did copy information or I ran scrub. I’m afraid the enclosure can only provide 6 Gbps at max, or a poor connection. Cables or expanders use to be the reason. I ordered new cables to make a direct connection to the HBA Controller without the enclosure to validate my theory and the drives stopped showing errors.
There is something good in all bad: I have been able to document and explain how to troubleshoot, actual errors in ZFS, in my book and talk about the problems with the cables, and the advantages of using a SAS controller even if you use SATA drives.
I got my first Excellent in an Assignment in an Ireland university, which makes me specially happy. And I keep going on studying in Linux Academy, the last course I did was GCP and Terraform, even if I knew both it helps me to keep my skills sharp.
I share with you some offers and charity bundles that I enrolled and enjoyed a lot:
There is an offer with Microsoft Pass which is that we can use Disney+ for free during 30 days.
I started watching the Mandalorian, Season 2, and is wonderfully displayed in 4K. The quality of the video surprised me. Not that many contents in Netflix are 4K and I really enjoyed the great quality of the image.
Humble Bundle offers a pack of 8 VR games per €13.45. If you like Virtual Reality and have your headset, this pack is amazing, and the benefits go to charity: Movember. The games are downloaded from Steam and the pack will last for 14 days.
Me, with 4 more Senior BackEnd Engineers wrote the new e-Commerce for a multinational.
The old legacy Software evolved into a different code for every country, making it impossible to be maintained.
The new Software we created used inheritance to use the same base code for each country and overloaded only the specific different behavior of every country, like for the payment methods, for example Brazil supporting “parcelados” or Germany with specific payment players.
We rewrote the old procedural PHP BackEnd into modern PHP, with OOP and our own Framework but we had to keep the transactional code in existing MySQL Procedures, so the logic was split. There was a Front End Team consuming our JSONs. Basically all the Front End code was cached in Akamai and pages were rendered accordingly to the JSONs served from out BackEnd.
It was a huge success.
This e-Commerce site had Campaigns that started at a certain time, so the amount of traffic that would come at the same time would be challenging.
The project was working very well, and after some time the original Team was split into different projects in the company and a Team for maintenance and evolutives was hired.
At certain point they started to encounter duplicate transactions, and nobody was able to solve the mystery.
I’m specialized into fixing impossible problems. They used to send me to Impossible Missions, and I am famous for solving impossible problems easily.
So I started the task with a SRE approach.
The System had many components and layers. The problem could be in many places.
I had in my arsenal of tools, Software like mysqldebugger with which I found an unnoticed bug in decimals calculation in the past surprising everybody.
Previous Engineers involved believed the problem was in the Database side. They were having difficulties to identify the issue by the random nature of the repetitions.
Some times the order lines were duplicated, and other times were the payments, which means charging twice to the customer.
Redis Cluster could also play a part on this, as storing the session information and the basket.
But I had to follow the logic sequence of steps.
If transactions from customer were duplicated that mean that in first term those requests have arrived to the System. So that was a good point of start.
With a list of duplicated operations, I checked the Webservers logs.
That was a bit tricky as the Webserver was recording the Ip of the Load Balancer, not the ip of the customer. But we were tracking the sessionid so with that I could track and user request history. A good thing was also that we were using cookies to stick the user to the same Webserver node. That has pros and cons, but in this case I didn’t have to worry about the logs combined of all the Webservers, I could just identify a transaction in one node, and stick into that node’s log.
I was working with SSH and Bash, no log aggregators existing today were available at that time.
So when I started to catch web logs and grep a bit an smile was drawn into my face. :)
There were no transactions repeated by a bad behavior on MySQL Masters, or by BackEnd problems. Actually the HTTP requests were performed twice.
And the explanation to that was much more simple.
When I explained it they were really surprised, but then they started to worry about how they could fix that.
Well, there are many ways, like using an UUID in each request and do not accepting two concurrents, but I came with something that we could deploy super fast.
That case was very funny for me, because it was not necessary to go crazy inspecting the different layers of the system. The problem was detected simply with HTTP logs. :)
People often forget to follow the logic steps while many problems are much more simple.
As a curious note, I still see people double clicking on links and buttons on the Web, and some Software not handling it. :)
This article talks about how at Riot Games they use Slack. Slack is really a powerful tool, and also makes the communication more human in companies with their approach and the funny icons and /giphy. I’m very serious when it comes to work but I recognize the friendly, warm, human and lovely touch these kind of animated icons bring to the conversations.
I updated it the Nov-01, as I normally do, bringing more content.
I’ve been paid the royalties for he past two months and I reinvested everything (and more from my pocket) in Hardware for working with ZFS.
I was offered by an editorial in The States to publish Python Combat Guide and other of my books worldwide. I was thinking it for a while. It was very good money, translation to multiple languages and platforms and marketing and a lot of promotion, but I would had loss the rights and the Freedom I have now, like the possibility to offer discount coupons to who I want and to update the contents often. So to celebrate my decision for you, readers of the blog, during September, I provide a discounted price of $5 USD for the fist 100 sales instead of the $25 USD suggested price. Use the following link:
As part of my effort to contributing with nice Open Source products to the Community I have made some investments to keep contributing to:
My old tool for managing ZFS and Network shares easily
I’m writing a new book about managing ZFS for Small Business too, so I show how to operate on this hardware, good points and downsides.
I’m assembling a new Pc with ZFS plenty of Disk Storage within a mix of:
SAS Enterprise grade SSD 2.5″
SATA 12Gb Enterprise grade SSD 2.5″
SATA SSD 2.5″
SATA HDD 2TB 2.5″
SATA HDD 2TB 3.5″
I’m a big fan of Intel, but this time I have chosen AMD. Concretely a AMD Ryzen 7 3700X AM4 8 Core / 16 Threads, 3.6 GHz to 4.4 GHz with Turbo. The reason I chose this CPU is because it only uses 65W but still has 8 Cores / 16 Threads.
Also I want to see the performance of this AMD Ryzen with CMIPS and another important reason is that AMD motherboards support PCI 4.0. I have bought a NVMe SSD Samsung 980 PRO PCI 4.0 (x4) able to read at 6,400 MB/s. I will use this AMD box for running VMs as well. Basically Virtual Box and Docker.
I’ve been surprised that for 169.99 GBP I can have a very good Asus Motherboard with a 2.5 Gb Ethernet: ASUS ROG STRIX B550-F GAMING, AMD B550, AM4, DDR4, PCIe 4.0, SATA3, Dual M.2, CrossFire, 2.5GbE, USB 3.2 Gen2 A+C, ATX.
In order to have an Asus motherboard with a 2.5 Gb Ethernet for Intel I had to jump to a 254 GBP motherboard and Intel is still PCI 3.0. Actually there are PCI 10Gb NICs at 80 GBP so at some point I’ll upgrade my home network from Gigabit to 10 Gb. That will come slowly, but if the new equipment I assemble has 2.5 Gb when I upgrade the main switches to 10 Gb, at least I’ll be able to communicate at 2.5 Gb without ant additional change.
Also memory at 3200, speed that the AMD motherboard can provide, is more than affordable.
This new server will have 64 GB of RAM (Corsair DDR4 Vengeance PC4-25600 (3200)), as I plan to run VMs and use Volumes mounted via iSCSI and locally as block devices to improve my Software. I’ve bought a new UPS to keep it running in case power goes down. That’s something that doesn’t happen often in my city in Ireland, honestly, but I never forget that this happens in Barcelona two or three times per year, and that a high tension spike can burn your motherboard, drives, or electronics like the TV or the fridge. I’ve bought as well a new KVM Switch, a HDMI 4K and USB too one, so I don’t have to have so many keyboards. My logitech M720 allowed me to use it with 3 computers, but still I want something more operational. The KVM I bought allow me to switch with a button or within a hotkey in the keyboard.
I bought a new Icy box fox handling 6 2.5 drives in just one bay of the tower, and a 850 Watt Corsair PSU that will be able to power the many drives I want at the same time.
I contribute a lot to Open Source, and many years ago before Open Source existed I was creating Freeware Software. But I think that good commercial Software deserves to be supported. Like everything in life, if they are doing a good work that is useful to me, why not giving them support?. It is also a way to make sure they will continue producing amazing Software. And in the other hand, myself, I create Software. Some times commercial Software, and I like to be paid, so I apply the same principle.
One of my colleagues showed me dstat, a very nice tool for system monitoring, and bandwidth of a drive monitoring. Also ifstat, as complement to iftop is very cool for Network too. This functionality is also available in CTOP.py
As I shared in the past news of the blog, I’m resuming my contributions to ZFS Community.
Long time ago I created some ZFS tools that I want to share soon as Open Source.
I equipped myself with the proper Hardware to test on SAS and SATA:
12G Internal PCI-E SAS/SATA HBA RAID Controller Card, Broadcom’s SAS 3008, compatible for SAS 9300-8I. This is just an HDA (Host Data Adapter), it doesn’t support RAID. Only connects up to 8 drives or 1024 through expander, to my computer. It has a bandwidth of 9,600 MB/s which guarantees me that I’ll be able to add 12 SAS SSD Enterprise grade at almost the max speed of the drives. Those drives perform at 900 MB/s so if I’m using all of them at the same time, like if I have a pool of 8 + 3 and I rebuild a broken drive or I just push Data, I would be using 12×900 = 10,800 MB/s. Close. Fair enough.
VANDESAIL Mini-SAS Cables, 1m Internal Mini-SAS to 4x SAS SATA Forward Breakout Cable Hard Drive Data Transfer Cable (SAS Cable).
SilverStone SST-FS212B – Aluminium Trayless Hot Swap Mobile Rack Backplane / Internal Hard Drive Enclosure for 12x 2.5 Inch SAS/SATA HDD or SSD, fit in any 3x 5.25 Inch Drive Bay, with Fan and Lock, black
Terminator is here. I ordered this T-800 head a while ago and finally arrived.
Finally I will have my empty USB keys located and protected. ;)
That’s one of the problems with Python. Blocks of code are defined by their indentation position.
That’s a pain when you copy and past and the IDE reindents the code thinking that is doing great, or generate a new inner class instead of replacing all the code.
Well, this error is very annoying cause it means that you mixed spaces and Tabs as indent separators.
But you can go crazy trying to find a tab in your code, so there is a trick that I came with:
Basically go to Menu Edit > Find and then type 4 times space. PyCharm will highlight all the places were this indentation (4 spaces) is present, so you’ll find the impostor without going blind or losing to many time.
As you can see, in front of def execute_command_without_waiting we don’t have 4 spaces. And in this case the impostor was not a camouflaged tab \t but 3 spaces instead of four.
You can identify manually what are attacks, and what are legit requests.
After you have your definitive list of offending Ip’s (and make sure you didn’t introduce yours accidentally), then you can execute the second part of the script:
echo '#!/bin/bash' > add_ufw_rules.sh
i_COUNTER_RULE=0; for s_OFFENDING_IP in $(cat 2020-10-03-offending-ips.txt); do i_COUNTER_RULE=$((i_COUNTER_RULE+1)); echo "ufw insert $i_COUNTER_RULE deny from $s_OFFENDING_IP to any" >> add_ufw_rules.sh; done
echo "ufw status numbered" >> add_ufw_rules.sh
echo "sudo ufw allow OpenSSH" >> add_ufw_rules.sh
echo "sudo ufw allow 22/tcp" >> add_ufw_rules.sh
echo 'sudo ufw allow "Apache Full"' >> add_ufw_rules.sh
echo "sudo ufw enable" >> add_ufw_rules.sh
Then you less your file add_ufw_rules.sh to see everything is Ok:
ufw insert 1 deny from 22.214.171.124 to any
ufw insert 2 deny from 126.96.36.199 to any
ufw insert 3 deny from 188.8.131.52 to any
ufw insert 4 deny from 184.108.40.206 to any
ufw insert 5 deny from 220.127.116.11 to any
ufw insert 6 deny from 18.104.22.168 to any
ufw insert 7 deny from 22.214.171.124 to any
ufw insert 8 deny from 126.96.36.199 to any
ufw insert 9 deny from 188.8.131.52 to any
ufw insert 10 deny from 184.108.40.206 to any
ufw insert 11 deny from 220.127.116.11 to any
ufw insert 223 deny from 18.104.22.168 to any
ufw insert 224 deny from 22.214.171.124 to any
ufw status numbered
sudo ufw allow OpenSSH
sudo ufw allow 22/tcp
sudo ufw allow "Apache Full"
sudo ufw enable
Then you simply give permissions with chmod +x add_ufw_rules.sh and run the script to apply.
I have benchmarked three different CPUs and two Compute optimized Amazon AWS instances with CMIPS 1.0.5 64bit. The two Intel Xeon baremetals equip 2 x Intel Xeon Processor and the third baremetal equips a single Intel Core i7-7800X:
If you’re surprised by the number of cores reported by the Amazon instance m5d.24xlarge, and even more for the baremetal c5n.metal, you’re guessing well that this comes from having Servers with 4 CPUs for Compute Optimized series.
When I can choose I use Linux, but in many companies I work with Windows workstations. I’ve published a list of useful Software I use in all my Windows workstations.
WFH I currently use two external monitors attached to the laptop. I planned to add a new one using a Display Port connected to the Dell USB-C dongle that provides me Ethernet and one additional HDMI as well. I got the cable from Amazon but unfortunately something is not working. In order to make myself comfortable and see some the graphs of the systems worldwide as I have on the office’s displays, I created a small HTML page, that joins several monitor pages in one single web page using frames. This way I only have one page loaded on the browser, maximized, and this monitor is dedicated to those graphs of the stats of the Systems. Something very simple, but very useful. You can extend the number of columns and rows it to have more graphics in the same screen.
If you don’t have the space or the resources for more monitors you can use the ingenious.
I have a cheap HDMI switch that allows me to do PinP (Picture in Picture) with one main source on the monitor, and two using a fraction of their original space. It may allow you to see variants in graphics.
And in you have only a single monitor, you can use a chrome extension that rotates tabs, which is also very useful.
Be careful if you use the reload features with software like Jira or Confluence. If they are slow normally, imagine if you mess it by reloading every 30 seconds… I discourage you to use auto refresh on these kind of Softwares.
This past week I have connected the XBOX One X Controller to the Windows laptop for the first time. Normally I use the Pc only for strategy games, but I wanted to play other games like Lost Planet 3, or Fall Guys in a console mode way. I figured that would be very easy and it was. You turn on the controller, press the connect button like you did to pair with the console, and in Windows indicate pair to a Xbox One controller. That’s it.
I’ve also updated my Python 3 Combat Guide, to add the explanation, step by step, about how to refactor and make resilient, and add Unit Testing to a spaghetti code, and turn it into a modern OOP. Is currently 255 DIN-A4 pages.
This is something I wanted to share with you for a while. One of the most funny things in my career is what I call: Squirrel Strikes Back
I named this as the first incident where a provider told that the reason of a fiber failure was a squirrel chewing the cable.
I popularized this with my friends in Systems Administration and SRE and when they suffer a Squirrel Attack incident, they forward it to me, for great joy.
I’m used to construction or gas, water, electricity, highways repair operations on the cities accidentally cutting fiber cables, thunders or truck accidents on the highway breaking the floor and cutting tubes and issues like that. I’ve been seeing that for around 25 years.
So the first time I saw a provider referring to a squirrel cutting the cables it was pretty hilarious. :)
In my funny mental picture: I could visually imagine a cable thrown in the middle of the forest, over trees, and a squirrel chewing it as it tastes like peanuts. :) or a shark cutting a Google’s or Facebook’s intercontinental cable thrown without any protection. ;)
The sense of humor and the good vibes, are two of the most important things in life.