Here is an easy trick that you can use for adding swap temporarily to a Server, VMs or Workstations, if you are in an emergency.
In this case I had a cluster composed from two instances running out of memory.
I got an alert for one of the Servers, reporting that only had 7% of free memory.
Immediately I checked it, but checked also any other forming part of the cluster.
Another one appeared, had just only a bit more memory than the other, but was considered in Critical condition too.
The owner of the Service was contacted and asked if we can hold it until US Business hours. Those Servers were going to be replaced next day in US Business hours, and when possible it would be nice not to wake up the Team. It was day in Europe, but night in US.
I checked the status of the Server with those commands:
# df -h
There are 13GB of free space in /. More than enough to be safe as this service doesn’t use much.
# free -h
total used free shared buff/cache available
Mem: 5.7G 4.8G 139M 298M 738M 320M
Swap: 0B 0B 0B
I checked the memory, ok, there are only 139MB free in this node, but 738MB are buff/cache. Buff/Cache is memory used by Linux to optimize I/O as long as it is not needed by application. These 738 MB in buff/cache (or most of it) will be used if needed by the System. The field available corresponds to the memory that is available for starting new applications (not counting the swap if there was any), and basically is the free memory plus a fragment of the buff/cache. I’m sure we could use more than 320MB and there is a lot if buff/cache, but to play safe we play by the book.
With that in mind it seemed that it would hold perfectly to Business hours.
I checked top. It is interesting to mention the meaning of the Column RES, which is resident memory, in other words, the real amount of memory that the process is using.
I had a Java process using 4.57GB of RAM, but a look at how much Heap Memory was reserved and actually being used showed a Heap of 4GB (Memory reserved) and 1.5GB actually being used for real, from the Heap, only.
It was unlikely that elastic search would use all those 4GB, and seemed really unlikely that the instance will suffer from memory starvation with 2.5GB of 4GB of the Heap free, ~1GB of RAM in buffers/cache plus free, so looked good.
To be 100% sure I created a temporary swap space in a file on the SSD.
(# means that I’m executing this as root, if you type literally with # in front, this will be a comment)
# fallocate -l 1G /swapfile-temp
# dd if=/dev/zero of=/swapfile-temp bs=1024 count=1048576 status=progress
1034236928 bytes (1.0 GB) copied, 4.020716 s, 257 MB/s
1048576+0 records in
1048576+0 records out
1073741824 bytes (1.1 GB) copied, 4.26152 s, 252 MB/s
If you ask me why I had to dd, I will tell you that I needed to. I checked with command blkid and filesystem was xfs. I believe that was the reason.
The speed writing to the file is fair enough for a swap.
# chmod 600 /swapfile-temp
# mkswap /swapfile-temp
Setting up swapspace version 1, size = 1048572 KiB
no label, UUID=5fb12c0c-8079-41dc-aa20-21477808619a
# swapon /swapfile-temp
I check that memory is good:
# free -h
total used free shared buff/cache available
Mem: 5.7G 4.8G 117M 298M 770M 329M
Swap: 1.0G 0B 1.0G
And finally I check that the Kernel parameter swappiness is not too aggressive:
# sysctl vm.swappiness
vm.swappiness = 30
Cool. 30 is a fair enough value.
2022-01-05 Update for my students that need to add additional 16GB of swap to their SSD drive:
However the 12 Gbps SAS SSD were returning Checksum errors in ZFS when I did copy information or I ran scrub. I’m afraid the enclosure can only provide 6 Gbps at max, or a poor connection. Cables or expanders use to be the reason. I ordered new cables to make a direct connection to the HBA Controller without the enclosure to validate my theory and the drives stopped showing errors.
There is something good in all bad: I have been able to document and explain how to troubleshoot, actual errors in ZFS, in my book and talk about the problems with the cables, and the advantages of using a SAS controller even if you use SATA drives.
I got my first Excellent in an Assignment in an Ireland university, which makes me specially happy. And I keep going on studying in Linux Academy, the last course I did was GCP and Terraform, even if I knew both it helps me to keep my skills sharp.
I share with you some offers and charity bundles that I enrolled and enjoyed a lot:
There is an offer with Microsoft Pass which is that we can use Disney+ for free during 30 days.
I started watching the Mandalorian, Season 2, and is wonderfully displayed in 4K. The quality of the video surprised me. Not that many contents in Netflix are 4K and I really enjoyed the great quality of the image.
Humble Bundle offers a pack of 8 VR games per €13.45. If you like Virtual Reality and have your headset, this pack is amazing, and the benefits go to charity: Movember. The games are downloaded from Steam and the pack will last for 14 days.
Me, with 4 more Senior BackEnd Engineers wrote the new e-Commerce for a multinational.
The old legacy Software evolved into a different code for every country, making it impossible to be maintained.
The new Software we created used inheritance to use the same base code for each country and overloaded only the specific different behavior of every country, like for the payment methods, for example Brazil supporting “parcelados” or Germany with specific payment players.
We rewrote the old procedural PHP BackEnd into modern PHP, with OOP and our own Framework but we had to keep the transactional code in existing MySQL Procedures, so the logic was split. There was a Front End Team consuming our JSONs. Basically all the Front End code was cached in Akamai and pages were rendered accordingly to the JSONs served from out BackEnd.
It was a huge success.
This e-Commerce site had Campaigns that started at a certain time, so the amount of traffic that would come at the same time would be challenging.
The project was working very well, and after some time the original Team was split into different projects in the company and a Team for maintenance and evolutives was hired.
At certain point they started to encounter duplicate transactions, and nobody was able to solve the mystery.
I’m specialized into fixing impossible problems. They used to send me to Impossible Missions, and I am famous for solving impossible problems easily.
So I started the task with a SRE approach.
The System had many components and layers. The problem could be in many places.
I had in my arsenal of tools, Software like mysqldebugger with which I found an unnoticed bug in decimals calculation in the past surprising everybody.
Previous Engineers involved believed the problem was in the Database side. They were having difficulties to identify the issue by the random nature of the repetitions.
Some times the order lines were duplicated, and other times were the payments, which means charging twice to the customer.
Redis Cluster could also play a part on this, as storing the session information and the basket.
But I had to follow the logic sequence of steps.
If transactions from customer were duplicated that mean that in first term those requests have arrived to the System. So that was a good point of start.
With a list of duplicated operations, I checked the Webservers logs.
That was a bit tricky as the Webserver was recording the Ip of the Load Balancer, not the ip of the customer. But we were tracking the sessionid so with that I could track and user request history. A good thing was also that we were using cookies to stick the user to the same Webserver node. That has pros and cons, but in this case I didn’t have to worry about the logs combined of all the Webservers, I could just identify a transaction in one node, and stick into that node’s log.
I was working with SSH and Bash, no log aggregators existing today were available at that time.
So when I started to catch web logs and grep a bit an smile was drawn into my face. :)
There were no transactions repeated by a bad behavior on MySQL Masters, or by BackEnd problems. Actually the HTTP requests were performed twice.
And the explanation to that was much more simple.
When I explained it they were really surprised, but then they started to worry about how they could fix that.
Well, there are many ways, like using an UUID in each request and do not accepting two concurrents, but I came with something that we could deploy super fast.
That case was very funny for me, because it was not necessary to go crazy inspecting the different layers of the system. The problem was detected simply with HTTP logs. :)
People often forget to follow the logic steps while many problems are much more simple.
As a curious note, I still see people double clicking on links and buttons on the Web, and some Software not handling it. :)
This article talks about how at Riot Games they use Slack. Slack is really a powerful tool, and also makes the communication more human in companies with their approach and the funny icons and /giphy. I’m very serious when it comes to work but I recognize the friendly, warm, human and lovely touch these kind of animated icons bring to the conversations.
I updated it the Nov-01, as I normally do, bringing more content.
I’ve been paid the royalties for he past two months and I reinvested everything (and more from my pocket) in Hardware for working with ZFS.
I was offered by an editorial in The States to publish Python Combat Guide and other of my books worldwide. I was thinking it for a while. It was very good money, translation to multiple languages and platforms and marketing and a lot of promotion, but I would had loss the rights and the Freedom I have now, like the possibility to offer discount coupons to who I want and to update the contents often. So to celebrate my decision for you, readers of the blog, during September, I provide a discounted price of $5 USD for the fist 100 sales instead of the $25 USD suggested price. Use the following link:
As part of my effort to contributing with nice Open Source products to the Community I have made some investments to keep contributing to:
My old tool for managing ZFS and Network shares easily
I’m writing a new book about managing ZFS for Small Business too, so I show how to operate on this hardware, good points and downsides.
I’m assembling a new Pc with ZFS plenty of Disk Storage within a mix of:
SAS Enterprise grade SSD 2.5″
SATA 12Gb Enterprise grade SSD 2.5″
SATA SSD 2.5″
SATA HDD 2TB 2.5″
SATA HDD 2TB 3.5″
I’m a big fan of Intel, but this time I have chosen AMD. Concretely a AMD Ryzen 7 3700X AM4 8 Core / 16 Threads, 3.6 GHz to 4.4 GHz with Turbo. The reason I chose this CPU is because it only uses 65W but still has 8 Cores / 16 Threads.
Also I want to see the performance of this AMD Ryzen with CMIPS and another important reason is that AMD motherboards support PCI 4.0. I have bought a NVMe SSD Samsung 980 PRO PCI 4.0 (x4) able to read at 6,400 MB/s. I will use this AMD box for running VMs as well. Basically Virtual Box and Docker.
I’ve been surprised that for 169.99 GBP I can have a very good Asus Motherboard with a 2.5 Gb Ethernet: ASUS ROG STRIX B550-F GAMING, AMD B550, AM4, DDR4, PCIe 4.0, SATA3, Dual M.2, CrossFire, 2.5GbE, USB 3.2 Gen2 A+C, ATX.
In order to have an Asus motherboard with a 2.5 Gb Ethernet for Intel I had to jump to a 254 GBP motherboard and Intel is still PCI 3.0. Actually there are PCI 10Gb NICs at 80 GBP so at some point I’ll upgrade my home network from Gigabit to 10 Gb. That will come slowly, but if the new equipment I assemble has 2.5 Gb when I upgrade the main switches to 10 Gb, at least I’ll be able to communicate at 2.5 Gb without ant additional change.
Also memory at 3200, speed that the AMD motherboard can provide, is more than affordable.
This new server will have 64 GB of RAM (Corsair DDR4 Vengeance PC4-25600 (3200)), as I plan to run VMs and use Volumes mounted via iSCSI and locally as block devices to improve my Software. I’ve bought a new UPS to keep it running in case power goes down. That’s something that doesn’t happen often in my city in Ireland, honestly, but I never forget that this happens in Barcelona two or three times per year, and that a high tension spike can burn your motherboard, drives, or electronics like the TV or the fridge. I’ve bought as well a new KVM Switch, a HDMI 4K and USB too one, so I don’t have to have so many keyboards. My logitech M720 allowed me to use it with 3 computers, but still I want something more operational. The KVM I bought allow me to switch with a button or within a hotkey in the keyboard.
I bought a new Icy box fox handling 6 2.5 drives in just one bay of the tower, and a 850 Watt Corsair PSU that will be able to power the many drives I want at the same time.
I contribute a lot to Open Source, and many years ago before Open Source existed I was creating Freeware Software. But I think that good commercial Software deserves to be supported. Like everything in life, if they are doing a good work that is useful to me, why not giving them support?. It is also a way to make sure they will continue producing amazing Software. And in the other hand, myself, I create Software. Some times commercial Software, and I like to be paid, so I apply the same principle.