Category Archives: Security

Backup and Restore your Ubuntu Linux Workstations

This is a mechanism I invented and I’ve been using for decades, to migrate or clone my Linux Desktops to other physical servers.

This script is focused on doing the job for Ubuntu but I was doing this already 30 years ago, for X Window as I was responsible of the Linux platform of a ISP (Internet Service Provider). So, it is compatible with any Linux Desktop or Server.

It has the advantage that is a very lightweight backup. You don’t need to backup /etc or /var as long as you install a new OS and restore the folders that you did backup. You can backup and restore Wine (Windows Emulator) programs completely and to/from VMs and Instances as well.

It’s based on user/s rather than machine.

And it does backup using the Timestamp, so you keep all the different version, modified over time. You can fusion the backups in the same folder if you prefer avoiding time versions and keep only the latest backup. If that’s your case, then replace s_PATH_BACKUP_NOW=”${s_PATH_BACKUP}${s_DATETIME}/” by s_PATH_BACKUP_NOW=”${s_PATH_BACKUP}” for instance. You can also add a folder for machine if you prefer it, for example if you use the same userid across several Desktops/Servers.

I offer you a much simplified version of my scripts, but they can highly serve your needs.

#!/usr/bin/env bash

# Author: Carles Mateo
# Last Update: 2022-10-23 10:48 Irish Time

# User we want to backup data for
s_USER="carles"
# Target PATH for the Backups
s_PATH_BACKUP="/home/${s_USER}/Desktop/Bck/"

s_DATE=$(date +"%Y-%m-%d")
s_DATETIME=$(date +"%Y-%m-%d-%H-%M-%S")

s_PATH_BACKUP_NOW="${s_PATH_BACKUP}${s_DATETIME}/"

echo "Creating path $s_PATH_BACKUP and $s_PATH_BACKUP_NOW"
mkdir $s_PATH_BACKUP
mkdir $s_PATH_BACKUP_NOW

s_PATH_KEY="/home/${s_USER}/Desktop/keys/2007-01-07-cloud23.pem"
s_DOCKER_IMG_JENKINS_EXPORT=${s_DATE}-jenkins-base.tar
s_DOCKER_IMG_JENKINS_BLUEOCEAN2_EXPORT=${s_DATE}-jenkins-blueocean2.tar
s_PGP_FILE=${s_DATETIME}-pgp.zip

# Version the PGP files
echo "Compressing the PGP files as ${s_PGP_FILE}"
zip -r ${s_PATH_BACKUP_NOW}${s_PGP_FILE} /home/${s_USER}/Desktop/PGP/*

# Copy to BCK folder, or ZFS or to an external drive Locally as defined in: s_PATH_BACKUP_NOW
echo "Copying Data to ${s_PATH_BACKUP_NOW}/Data"
rsync -a --exclude={} --acls --xattrs --owner --group --times --stats --human-readable --progress -z "/home/${s_USER}/Desktop/data/" "${s_PATH_BACKUP_NOW}data/"
rsync -a --exclude={'Desktop','Downloads','.local/share/Trash/','.local/lib/python2.7/','.local/lib/python3.6/','.local/lib/python3.8/','.local/lib/python3.10/','.cache/JetBrains/'} --acls --xattrs --owner --group --times --stats --human-readable --progress -z "/home/${s_USER}/" "${s_PATH_BACKUP_NOW}home/${s_USER}/"
rsync -a --exclude={} --acls --xattrs --owner --group --times --stats --human-readable --progress -z "/home/${s_USER}/Desktop/code/" "${s_PATH_BACKUP_NOW}code/"


echo "Showing backup dir ${s_PATH_BACKUP_NOW}"
ls -hal ${s_PATH_BACKUP_NOW}

df -h /

See how I exclude certain folders like the Desktop or Downloads with –exclude.

It relies on the very useful rsync program. It also relies on zip to compress entire folders (PGP Keys on the example).

If you use the second part, to compress Docker Images (Jenkins in this example), you will run it as sudo and you will need also gzip.

# continuation... sudo running required.

# Save Docker Images
echo "Saving Docker Jenkins /home/${s_USER}/Desktop/Docker_Images/${s_DOCKER_IMG_JENKINS_EXPORT}"
sudo docker save jenkins:base --output /home/${s_USER}/Desktop/Docker_Images/${s_DOCKER_IMG_JENKINS_EXPORT}
echo "Saving Docker Jenkins /home/${s_USER}/Desktop/Docker_Images/${s_DOCKER_IMG_JENKINS_BLUEOCEAN2_EXPORT}"
sudo docker save jenkins:base --output /home/${s_USER}/Desktop/Docker_Images/${s_DOCKER_IMG_JENKINS_BLUEOCEAN2_EXPORT}
echo "Setting permissions"
sudo chown ${s_USER}.${s_USER} /home/${s_USER}/Desktop/Docker_Images/${s_DOCKER_IMG_JENKINS_EXPORT}
sudo chown ${s_USER}.${s_USER} /home/${s_USER}/Desktop/Docker_Images/${s_DOCKER_IMG_JENKINS_BLUEOCEAN2_EXPORT}
echo "Compressing /home/${s_USER}/Desktop/Docker_Images/${s_DOCKER_IMG_JENKINS_EXPORT}"
gzip /home/${s_USER}/Desktop/Docker_Images/${s_DOCKER_IMG_JENKINS_EXPORT}
gzip /home/${s_USER}/Desktop/Docker_Images/${s_DOCKER_IMG_JENKINS_BLUEOCEAN2_EXPORT}

rsync -a --exclude={} --acls --xattrs --owner --group --times --stats --human-readable --progress -z "/home/${s_USER}/Desktop/Docker_Images/" "${s_PATH_BACKUP_NOW}Docker_Images/"

There is a final part, if you want to backup to a remote Server/s using ssh:

# continuation... to copy to a remote Server.

s_PATH_REMOTE="bck7@cloubbck11.carlesmateo.com:/Bck/Desktop/${s_USER}/data/"

# Copy to the other Server
rsync -e "ssh -i $s_PATH_KEY" -a --exclude={} --acls --xattrs --owner --group --times --stats --human-readable --progress -z "/home/${s_USER}/Desktop/data/" ${s_PATH_REMOTE}

I recommend you to use the same methodology in all your Desktops, like for example, having a data/ folder in the Desktop for each user.

You can use Erasure Code to split the Backups in blocks and store a piece in different Cloud Providers.

Also you can store your Backups long-term, with services like Amazon Glacier.

Other ideas are storing certain files in git and in Hadoop HDFS.

If you want you can CRC your files before copying to another device or server.

You will use tools like: sha512sum or md5sum.

News from the Blog 2022-02-22

My Open Source projects

zpool watch

zpool watch is a small Python program for Linux workstations with graphical environment and ZFS, that checks every 30 seconds if your OpenZFS pools are Ok.

If a pool is not healthy, it displays a message in a window using tk inter.

Basically allows you to skip checking from the terminal zpool status continuously or to having to customize the ZED service to send an email and having to figure out how to it can spawn a window alert to the graphical system or what to do if the session has not been initiated.

carleslibs

Since last News from the Blog I’ve released carleslibs v.1.06, v.1.0.5 and v.1.0.4.

v.1.0.6 adds a new class OsUtils to deal with mostly-Linux Os tasks, like knowing the userid, the username, if it’s root, the distribution name and kernel version.

It also adds:

DatetimeUtils.sleep(i_seconds)

In v.1.0.5 I’ve included a new method for getting the Datetime in Unix Epoc format as Integer and increased Code Coverage to 95% for ScreenUtils class.

v. 1.0.4 contains a minor update, a method in StringUtils to escape html from a string.

It uses the library html (part of Python core) so it was small work to do for me to create this method, and the Unit Test for it, but I wanted to use carleslibs in more projects and adding it as core functionality, makes the code of these projects I’m working on, much more clear.

I’m working in the future v.1.0.7.

CTOP.py

I released the stable version 0.8.8 and tagged it.

Minor refactors and adding more Code Coverage (Unit Testing), and protection in the code for division per zero when seconds passed as int are 0. (this was not an actual error, but is worth protecting the code just in case for the future)

Working on branch 0.8.9.

Currently in Master there is a stable version of 0.8.9 mainly fixing https://gitlab.com/carles.mateo/ctop/-/issues/51 which was not detecting when CTOP was running inside a Docker Container (reporting Unable to decode DMI).

My Books

Docker Combat Guide

Added 20 new pages with some tricks, like clearing the logs (1.6GB in my workstation), using some cool tools, using bind mounts and using Docker in Windows from command line without activating Docker Desktop or WSL.

https://leanpub.com/docker-combat-guide/

BTW if you work with Windows and you cannot use Docker Desktop due to the new license, in this article I explain how to use docker stand alone in Windows, without using WSL.

ZFS on Ubuntu

One of my SATA 2TB 2.5″ 5,400 rpm drive got damaged and so was generating errors, so that was a fantastic opportunity to show how to detect and deal with the situation to replace it with a new SATA 2TB 3.5″ 7,200 rpm and fix the pool.

So I updated my ZFS on Ubuntu 20.04 LTS book.

Python 3

I’ve updated Python 3 Exercises for Beginners and added a new example of how to parse the <title> tag from an HTML page, using Beautifulsoup package, to the repository of Python 3 Combat Guide book.

I also added three new exercises, and solved them.

My friend Michela is translating the book to Italian. Thanks! :)

If you already purchased any of my books, you can download the updates of them when I upload them to LeanPub.

Free courses

Code Challenges

One of my students sent me this platform, which is kinda hackerrank, but oriented to video games. To solve code challenges by programming video games.

He is having plenty of fun:

https://www.codingame.com/start

More Symfony, APIs

If you enjoyed the Free Videos about Symfony, there is more.

https://symfonycast.com/screencast/api-platform

It talks about a bundle for building APIs.

And this tutorial explains in detail how to work with Webpack Encore:

https://symfonycasts.com/screencast/webpack-encore

100 Days of Code: Python Bootcamp

A friend of mine, and colleague, Michela, is following this bootcamp and recommends it for people learning from ground 0.

https://udemy.com/course/100-days-of-code/

My work at Blizzard

The company sent me the Stein, which is sent to the employees that serve for two years, with a recognition and a celebration called “The Circle of Honor”.

Books purchased

I bought this book as often I discover new ways, better, to explain the things to my students.

Sometimes I buy books for beginners, as I can get explained what I want to do super fast and some times they teach nice tricks that I didn’t know. I have huge Django books, and it took a lot to finish them.

A simpler book may only talk about how to install and work with it under a platform (Windows or Mac, as instance) but it is all that I require as the command to create projects are the same cross platform.

For example, you can get to install and to create a simple project with ORM, connected to the database, very quickly.

Software

So I just discovered that Zoom has an option to draw in the shared screen, like Slack has. It is called Annotate. It is super useful for my classes. :)

Also discovered the icons in the Chat. It seems that not all the video calls accept it.

Hardware

As Working From Home I needed an scanner, I looked in Amazon and all of them were costing more than €200.

I changed my strategy and I bought a All-In-One from HP, which costed me €68.

So I’ll have a scanner and a backup printer, which always comes handy.

The nightmare started after I tried to connect it with Ubuntu.

Ubuntu was not recognizing it. Checking the manuals they force to configure the printer from an Android/iPhone app or from their web page, my understanding is for windows only. In any case I would not install the proprietary drivers in my Linux system.

Annoyed, I installed the Android application, and it was requesting to get Location permissions to configure it. No way. There was not possible to configure the printer without giving GPS/Location permissions to the app, so I cancelled the process.

I grabbed a Windows 10 laptop and plugged the All-in-one through the USB. I ran the wizard to search for Scanners and Printers and was not unable to use my scanner, only to configure as a printer, so I was forced to install HP drivers.

Irritated I did, and they were suggesting to configure the printer so I can print from Internet or from the phone. Thanks HP, you’ll be the next SolarWinds big-security-hole. I said no way, and in order to use the Wifi I have to agree to open that security door which is that the printer would be connected to Internet permanently, sending and receiving information. I said no, I’ll use only via USB.

Even selecting that, in order to scan, the Software forces me to create an account.

Disappointing. HP is doing very big stupid mistakes. They used to be a good company.

Since they stopped doing the drivers in Barcelona years ago, their Software and solutions (not the hardware) went to hell.

I checked the reviews in the App Store and so many people gave them 1 star and have problems… what a shame the way they created this solution.

Donations

I made a donation to OpenShot Video Editor.

This is a great Open Source, multi-platform editor, so I wanted to support the creator.

Security

Attacks: looking for exploits

This is just a sample of a set of attacks to the blog in a 3 minutes interval.

Another one this morning:

Now all are blocked in the Firewall.

This is a non stop practice from spammers and pirates that has been going on for years.

It was almost three decades ago, when I was the Linux responsible of an ISP, and I was installing a brand new Linux system connected to a service called “infovia”, at the time when Internet was used with dial-up and modems, and in the interval of time of the installation, it got hacked. I had the Ethernet connected. So then already, this was happening.

The morning I was writing this, I blocked thousands of offending Ip Addresses.

Protection solutions

I recommend you to use CloudFlare, is a CDN/Cache/Accelerator with DoS protection and even in its Free version is really useful.

Fun/Games

So I come with a game kind of Quiz that you can play with your friends, family or work colleagues working from home (WFH).

The idea is that the master shares screen and sound in Zoom, and then the rest connect to jackbox.tv and enter the code displayed on the master’s screen on their own browser, and an interactive game is started.

It is recommended that the master has two monitors so they can also play.

The games are so fun as a phrase appearing and people having to complete with a lie. If your friends vote your phrase, believing is true, you get points. If you vote the true answer, you get points too.

Very funny and recommendable.

Stuff

<humor>Skynet sent another terminator to end me, but I terminated it. Its processor lays exhibited in my home now</humor>

I bought a laminator.

It has also a ruler and a trimmer to cut the paper.

It was only €39 and I’ve to say that I’m very happy with the results.

It takes around 5 minutes to be ready, it takes to get to the hot-enough temperature, and feeds the pages slowly, around 50 secs a DIN-A4, but the results are worth the time.

I’ve protected my medical receipts and other value documents and the work was perfect. No bubbles at all. No big deal if the plastic covers are introduced not 100% straight. Even if you pass again an already plasticized document, all is good.

Fun

Databases

One of my friends sent me this image.

It is old, but still it’s fun. So it assumes the cameras of the parking or speed cameras, will OCR the plate to build a query, and that the code is not well protected. So basically is exploiting a Sql Injection.

Anybody working on the systems side, and with databases, knows how annoying are those potential situations.

Python and coding

One of my colleagues shared this :)

Migrating my 11 years Amazon AWS account services (Postmortem Analysis)

I started to explain that I was migrating some services from Amazon and that some of my sites were under Maintenance and that I would provide more information.

Here is the complete history of why I migrated all the services from my 11 years old Amazon account to other CSP.

Some lessons can be learned from my adventure.

I migrated my last services from Amazon to GCP

Amazon sent me an email on October 6th, this year 2021, telling me that they will disable EC2-Classic by August 2022. I thought I would not be able to keep my Static Ip’s as in the past VPC Ip’s and EC2-Classic Ip’s were not transferable, so considering that I would loss my Static Ip’s anyway I started to migrate to some to other providers like Digital Ocean.

Is not cool losing Static Ip (Elastic Ip in AWS) Addresses as this is bad for SEO, so given that I though I would lose my Static Ips that have been with me for years, I started to migrate certain services to providers much more economic.

Amazon is terrible communicating, and I talked with some product managers in the past about that, when they lost one of my Volumes, and the email was so cold and terrible that actually that hurt more than Amazon losing my Data. I believed that it was a poorly made Scam and when I realized it was true I reached one of my friends, that is manager there, as I know they care for doing things right, and he organized a meeting with two PM so I can pass my feedback.

The Cloud providers are changing things very fast, and nobody is able to be up to date with the changes, unless their work position allows plenty of time to get updated. Even if pages of documentation are provided, you have to react to an event that they externally generated forcing you to action. Action to read all the documentation about EC2-Classic migrations, action to prepare to have migrated by August 2022.

So August 2022… I was counting that I had plenty of time but I’m writing a new book about using the Amazon SDK for Python, boto3, and I was doing some API calls and they started to fail in a very unusual way, Exceptions with timeout, but only for the only region where I had EC2-Classic.

urllib3.exceptions.NewConnectionError: <botocore.awsrequest.AWSHTTPSConnection object at 0x7f0347d545e0>: Failed to establish a new connection: [Errno -2] Name or service not known

My config was:

        o_config = Config(
            region_name="us-east-1a",
            signature_version="v4",
            retries={
                'max_attempts': 10,
                'mode': 'standard'
            }
        )

But if I switched to another region name, it would work:

            region_name='us-west-2',

I made a mistake in here, the region name is “us-east-1” and not “us-east-1a“. “us-east-1a” is the availability zone. So the SDK was giving a timeout because in order to connect to the endpoint it uses the region name as part of the hostname. So it doesn’t find that endpoint because it doesn’t exist.

I never understood why a company like Amazon is unable to provide the SDK with a sample project or projects 100% working, with the source code so people has a base that works to build up.

Every API that I have created, I have provided it with documentation but also with example for several languages for how to use it.

In 2013 I was CTO of an online travel agency, and we had meta-searchers consuming our API and we were having several hundreds of thousands requests per second. Everything was perfectly documented, examples were provided for several languages, the document and the SDK had version numbers…

Everybody forgets about Developers and companies throw terrible and cold products to the poor Developers, so difficult to use. How many Developers would like to say: Listen Mr. President of the big Cloud Company XXXX, I only want to spawn a VM that works, and fast, with easy wizards. I don’t want to learn 50 hours before being able to use your overpriced platform, by doing 20 things before your Ip’s are reflexes of your infrastructure and based in Microservices. Modern JavaScript frameworks can create nice gently wizards even if you have supercold APIs.

Honestly, I didn’t realize my typo in the region and I connected to the Amazon Console to investigate and I saw this.

Honestly, when I read it I understood that they were going to end my EC2 Networking the 30th of October. It was 29th. I misunderstood.

It was my fault not reading it well to the end, I got shocked by the first part telling about shutdown and I didn’t fully understood as they were going to shutdown EC2-Classic for the zones I didn’t had anything running only.

From the long errors (3 exceptions chained) I didn’t realize that the endpoint is built with the region name. (And I was passing the availability zone)

botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "https://ec2.us-east-1a.amazonaws.com/"

Here is when I say that a good SDM would had thought and cared for the Developers more, and would had made the SDK to check if that region exists. How difficult is to create a SDK a bit more clever that detects a invalid region id?. It is not difficult.

It is true that it was late in the evening and I was tired of all the day, and two days of the week between work and zoom university classes I work 15 hours and 13 hours respectively, not counting the assignments, so by the end of the week I am very tired. But that’s why it is very important to follow methodology and to read well. I think Amazon has 50% of the fault by the way they do things: how the created the SDK, how they communicate, and by the errors that the console returned me when I tried to create a VPC instance of an EC2-Classic AMI (they seem related to the fact I had old VPC Network objects with shorter hash than the current they use) and the other 50% was my fault for not identifying the source of the error, and not reading the message in their website well.

But the fact that there were having those errors in the API’s and timeouts made me believe they were going to cut the EC2-Classic Networking the next day.

All the mistakes fall together in a perfect storm.

I checked for documentation and I saw it was possible to migrate my Static Ip’s to VPC Static Ip’s.

It was Friday evening, and I cancelled my plans, in order to migrate the Blog to VPC in an attempt to keep running it with Amazon.

As Cloud Architect, I like to have running instances in several CSP as it allows me to stay up to date with the changes they do.

I checked the documentation for the migration. Disassociating the Static Ip (Elastic Ip in AWS jargon) was easy. Turning into VPC as well.

As I progressed, what had to be easy turned into a nightmare, as I was getting many errors from the Amazon API, without any information, and my Instances were not created.

I figured out that their API could have problems with old VPC objects I created time ago, so I had to create new objects for several things.

I managed to spawn my instances but they were being launch and terminated instantly without information. Frustrating.

When launching a new instance from the AMI (a Snapshot of the blog), I was giving shown options to add more volumes without any sense. My Instance was using 16GB from a 20GB total Space, and I was shown different volume configs, depending on the instance, in some case an additional 20GB volume, in other small SSD, ephemeral and 10 GB for the AMI (which requires at least 16GB).

After some fight I manage to make it work after deleting the volumes that made no sense, and keeping only one of 20GB, the same size of my AMI.

But then my nightmare started to make the VPC Instance to have Internet access and to be seen from outside. I had to create a new Internet Gateway, NAT, Network, etc…

As mentioned the old objects I was trying to reusing were making the process to fail.

I was running out of time, and I thought in few time they were going to shutdown EC2-Classic network (as I did not read correctly), so I decided to download everything and to migrate to another provider. For doing that first I blocked all the traffic, except for my Ip.

I worked in parallel, creating the new config in Google Cloud, just in case I had forgot something. I had created a document for the migration and it was accurate.

I managed to do everything fast enough. The slower part was to download all the Data, as I hold entire VM’s for projects like Cassandra Universal Driver.

Then I powered off my Amazon Instance for the Blog forever.

In GCP I blocked all the traffic in the firewall, except for my Ip, so I could work calmly.

When everything was ready, I had to redirect the DNS to the new static Ip from Google.

The DNS provider I used had implemented some changes in their API so I was getting errors replacing my old entry ‘.’ (their JSON calls returned Internal Server Error). Finally I figured it out how to workaround it and I was able to confirm that the first service was up and running.

I did some tests to make sure there were not unexpected permission problems, entries in the logs, etc…

Only then I opened the Google Firewall. I have a second firewall in each instance where I block or open at Ip tables level what I want. Basically abusive bot’s IPs trying to find exploits or brute force by dictionary passwords.

I checked with my phone, without Wifi that the Firewall was all good. (It is always a good idea to use another external Ip, different from the management one, to check)

I added a post explaining that I was migrating some of my Services and were under maintenance.

I mentioned in the blog that some of my services were being migrated from Amazon to Digital Ocean.

For some reasons, in the Backup of the Database one user was lost, so I created it in the MySQL with the typical commands:

CREATE USER 'username'@'localhost' IDENTIFIED BY 'password';
FLUSH PRIVILEGES;
GRANT ALL PRIVILEGES ON mydatabase.* TO 'username'@'localhost';

News from the Blog 2021-07-01

  • Google Instances’ Performance
    I’ve updated the CMIPS score for the latest Google instances vs last Amazon’s I tried and baremetals.

This is the changelog for latest version:

v. 0.99
 A whole new chapter showing sorting in Python and lambdas. (.sort() and sorted() package First)

 I show writing lambdas for Sorting, and also what makes them crash.

 Explained why Lambdas are not recommended unless you use for working with data, like for sorting or filtering out, and unless you know what you are doing. They are difficult to Debug.

 Explained about PEP8 tool to validate style.

 Explaining why we define Instance variables in the Constructor.

 Provided more samples for Flask Applications.

 Fixed code sample https://gitlab.com/carles.mateo/python_combat_guide/-/blob/master/src/keywords.py as the editor removed the white line spaces.

 Added more books to the bibliography

 I explain the importance of running Unit Testing as both root and as regular users.

 Explain how to run as regular user inside a Docker Container.

 Explained requirements.txt file. And how integrates PyCharm to create venv/ Virtual Environment.

 Also how it is used in Dockerfile to make sure all the dependencies are satisfied in the Docker Container.

As any project committed to saving human lives, she has all my support and admiration.

News from the blog 2020-10-16

  • I’ve been testing and adding more instances to CMIPS. I’m planning on testing the Azure instance with 120 cores.
  • News: Microsoft makes an option to permanently remote work

https://www.bbc.com/news/business-54482245

  • One of my colleagues showed me dstat, a very nice tool for system monitoring, and bandwidth of a drive monitoring. Also ifstat, as complement to iftop is very cool for Network too. This functionality is also available in CTOP.py
  • As I shared in the past news of the blog, I’m resuming my contributions to ZFS Community.

Long time ago I created some ZFS tools that I want to share soon as Open Source.

I equipped myself with the proper Hardware to test on SAS and SATA:

  • 12G Internal PCI-E SAS/SATA HBA RAID Controller Card, Broadcom’s SAS 3008, compatible for SAS 9300-8I.
    This is just an HDA (Host Data Adapter), it doesn’t support RAID. Only connects up to 8 drives or 1024 through expander, to my computer.
    It has a bandwidth of 9,600 MB/s which guarantees me that I’ll be able to add 12 SAS SSD Enterprise grade at almost the max speed of the drives. Those drives perform at 900 MB/s so if I’m using all of them at the same time, like if I have a pool of 8 + 3 and I rebuild a broken drive or I just push Data, I would be using 12×900 = 10,800 MB/s. Close. Fair enough.
  • VANDESAIL Mini-SAS Cables, 1m Internal Mini-SAS to 4x SAS SATA Forward Breakout Cable Hard Drive Data Transfer Cable (SAS Cable).
  • SilverStone SST-FS212B – Aluminium Trayless Hot Swap Mobile Rack Backplane / Internal Hard Drive Enclosure for 12x 2.5 Inch SAS/SATA HDD or SSD, fit in any 3x 5.25 Inch Drive Bay, with Fan and Lock, black
  • Terminator is here.
    I ordered this T-800 head a while ago and finally arrived.

Finally I will have my empty USB keys located and protected. ;)

Remember to be always nice to robots. :)

How to block scanners that look for vulnerabilities to your Ubuntu Apache site

There are many robots scanning sites for vulnerabilities, to gain control or exploit the servers. Most of them come from China and Russia ip’s.

Here I explain an easy way to block them using the Ubuntu Firewall ufw.

If you use a CMS like WordPress and you know there are extensions that have had security exploits, for example, wp-file-manager then you can search directly for this request in Apache Access Logs.

For example:

cat /var/log/apache2/blog_carlesmateo_com-access.log | grep "wp-file-manager" | awk '{ print $1; }' | sort -u >> 2020-10-03-offending-ips.txt

cat /var/log/apache2/blog_carlesmateo_com-access.log.1 | grep "wp-file-manager" | awk '{ print $1; }' | sort -u >> 2020-10-03-offending-ips.txt

zcat /var/log/apache2/blog_carlesmateo_com-access.log.2.gz | grep "wp-file-manager" | awk '{ print $1; }' | sort -u >> 2020-10-03-offending-ips.txt

In the example we look for the access.log file, for the rotated access.log.1 and for the rotated and compressed access.log.2.gz. We use the tool zcat which does a cat over a compressed file.

If we don’t expect to have anybody posting to our xmlrpc Service, we can check for the offending Ip’s by doing:

cat /var/log/apache2/blog_carlesmateo_com-access.log | grep "POST /xmlrpc.php" | wc --lines
2490

In my case I have 2490 request just in the last log.

cat /var/log/apache2/blog_carlesmateo_com-access.log | grep "POST /xmlrpc.php" |awk '{ print $1; }' | sort -u | wc --lines

Interested in how many Ip’s are launching those requests, you can see how many different Ip’s are those:

cat /var/log/apache2/blog_carlesmateo_com-access.log | grep "POST /xmlrpc.php" |awk '{ print $1; }' | sort -u | wc --lines
145

And to add those Ip’s to the offending Ip’s list:

cat /var/log/apache2/blog_carlesmateo_com-access.log | grep "POST /xmlrpc.php" | awk '{ print $1; }' | sort -u >> 2020-10-03-offending-ips.txt

I can also check for repeated requests in the logs:

cat /var/log/apache2/blog_carlesmateo_com-access.log | awk '{ print $7; }' | sort | uniq -c | sort -r | less

That shows me some requests legit and others that are not:

   2532 /xmlrpc.php
    209 /wp-login.php
    205 /wp-admin/admin-ajax.php
     84 /
     83 *
     48 /robots.txt
     21 /favicon.ico
     16 /wp-login.php?redirect_to=https%3A%2F%2Fblog.carlesmateo.com%2Fwp-admin%2F&reauth=1
     15 /wp-includes/js/jquery/jquery.js?ver=1.12.4-wp
     14 /wp-includes/css/dist/block-library/theme.min.css?ver=5.5.1
     14 /wp-includes/css/dist/block-library/style.min.css?ver=5.5.1
     14 /wp-content/themes/2012-carles/style.css?ver=5.5.1
     14 /wp-content/plugins/contact-form-7/includes/js/scripts.js?ver=5.2.2
     14 /wp-content/plugins/captcha/css/front_end_style.css?ver=4.4.5
     13 /wp-includes/css/dashicons.min.css?ver=5.5.1
     13 /wp-content/themes/2012-carles/css/blocks.css?ver=20181230
     13 /wp-content/plugins/contact-form-7/includes/css/styles.css?ver=5.2.2
     12 /wp-includes/js/wp-embed.min.js?ver=5.5.1
     12 /wp-includes/images/w-logo-blue-white-bg.png
     12 /wp-content/themes/2012-carles/js/navigation.js?ver=20140711
     11 /wp-includes/js/wp-emoji-release.min.js?ver=5.5.1
     11 /wp-content/plugins/captcha/css/desktop_style.css?ver=4.4.5
     11 /feed/
     11 /contact/
     10 /wp-comments-post.php
     10 /?author=1
      9 /2016/06/30/creating-a-content-filter-for-postfix-in-php/
      9 /2014/10/13/performance-of-several-languages/
      8 /wp-includes/js/comment-reply.min.js?ver=5.5.1
      8 /wp-content/plugins/captcha/js/front_end_script.js?ver=5.5.1
      8 /e/admin/index.php
      8 /e/admin/
      7 /wp-login.php?action=register
      7 /current-projects/
      7 //xmlrpc.php
      6 /.env
      5 /2019/08/12/a-sample-forensic-post-mortem-for-a-iscsi-initiator-client-that-had-connectivity-problems-to-the-server/
      5 /2017/03/26/csort-multithread-versus-quicksort-java/
      4 /wp-json/wp/v2/types/wp_block?_locale=user
      4 /wp-json/wp/v2/blocks?per_page=100&_locale=user
      4 /wp-admin/
      4 /diguo/index.php
      4 /diguo/
      4 /category/web-development/
      4 /category/news-for-the-blog/
      3 /vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php
      3 /mt-notation-for-python/
      3 /ebk/index.php
      3 /ebk/
      3 /comments/feed/
      3 /bf/index.php
      3 /bf/
      3 /beifen/index.php
      3 /beifen/
      3 /Ebak/index.php
      3 /Ebak/
      3 /Bak/index.php
      3 /Bak/
      3 /2020/09/21/how-to-recover-access-to-your-amazon-aws-ec2-instance-if-you-loss-your-private-key-for-ssh/
      3 /2020/08/23/adding-a-ramdisk-as-slog-zil-to-zfs/
      3 /2019/07/03/adding-my-server-as-docker-with-php-catalonia-framework-explained/
      3 /2019/06/25/some-handy-tricks-for-working-with-zfs/
      3 /2015/02/01/stopping-definitively-the-massive-distributed-dos-attack/
      2 /ycadmin/login.php?gotopage=%2Fycadmin%2Findex.php
      2 /ueditor/net/controller.ashx
      2 /sql_beifen/index.php
      2 /sql_beifen/
      2 /sql/index.php
      2 /sql/
      2 /dgbf/index.php
      2 /dgbf/
      2 //xmlrpc.php?rsd
      2 //.env
      1 /wp-login.php?registration=disabled
      1 /wp-login.php?action=lostpassword
      1 /wp-json/wp/v2/users/me?_locale=user
      1 /wp-json/wp/v2/users/?who=authors&per_page=100&_locale=user
      1 /wp-json/wp/v2/taxonomies/post_tag?context=edit&_locale=user
      1 /wp-json/wp/v2/taxonomies/category?context=edit&_locale=user
      1 /wp-json/wp/v2/tags?per_page=100&orderby=count&order=desc&_fields=id%2Cname&search=ufw&_locale=user

You can identify manually what are attacks, and what are legit requests.

After you have your definitive list of offending Ip’s (and make sure you didn’t introduce yours accidentally), then you can execute the second part of the script:

echo '#!/bin/bash' > add_ufw_rules.sh

i_COUNTER_RULE=0; for s_OFFENDING_IP in $(cat 2020-10-03-offending-ips.txt); do i_COUNTER_RULE=$((i_COUNTER_RULE+1)); echo "ufw insert $i_COUNTER_RULE deny from $s_OFFENDING_IP to any" >> add_ufw_rules.sh; done

echo "ufw status numbered" >> add_ufw_rules.sh
echo "sudo ufw allow OpenSSH" >> add_ufw_rules.sh
echo "sudo ufw allow 22/tcp" >> add_ufw_rules.sh
echo 'sudo ufw allow "Apache Full"' >> add_ufw_rules.sh
echo "sudo ufw enable" >> add_ufw_rules.sh

Then you less your file add_ufw_rules.sh to see everything is Ok:

#!/bin/bash
ufw insert 1 deny from 40.79.250.88 to any
ufw insert 2 deny from 52.173.148.212 to any
ufw insert 3 deny from 94.103.85.175 to any
ufw insert 4 deny from 40.79.250.88 to any
ufw insert 5 deny from 78.85.208.240 to any
ufw insert 6 deny from 80.82.68.173 to any
ufw insert 7 deny from 188.165.230.118 to any
ufw insert 8 deny from 195.201.117.103 to any
ufw insert 9 deny from 40.79.250.88 to any
ufw insert 10 deny from 5.135.138.188 to any
ufw insert 11 deny from 51.116.189.135 to any
...
ufw insert 223 deny from 95.173.161.167 to any
ufw insert 224 deny from 95.84.228.227 to any
ufw status numbered
sudo ufw allow OpenSSH
sudo ufw allow 22/tcp
sudo ufw allow "Apache Full"
sudo ufw enable

Then you simply give permissions with chmod +x add_ufw_rules.sh and run the script to apply.

It’s up to you to turn on the Firewall logging:

sudo ufw logging on

How to recover access to your Amazon AWS EC2 instance if you loss your Private Key for SSH

This article covers the desperate situation where you had generated one or more instances, instructed Amazon to use a SSH Key Pair certs where only you have the Private Key, your instances are running, for example, an eCommerce site, running for months, and then you loss your Private Key (.pem file), and with it the SSH access to your instances’ Data.

Actually I’ve seen this situation happening several times, in actual companies. Mainly Start ups. And I solved it for them.

Assuming that you didn’t have a secondary method to access, which is another combination of username/password or other user/KeyPairs, and so you completely lost the access to the Database, the Webservers, etc… I’m going to show you how to recover the data.

For this article I will consider an scenario where there is only one Instance, which contains everything for your eCommerce: Webserver, code, and Database… and is a simple config, with a single persistent drive.

Warning: be very careful as if you use ephemeral drives, contents will be lost is you power off the instance.

Method 1: Quicker, launching a new instance from the previous

Step1: The first step you will take is to close the access from outside, using the Firewall, to avoid any new changes going to the disk. You can allow access to the instance only from your static Ip in the office/home.

Step 2: You’ll wait for 5 minutes to allow any transaction going on to conclude, and pending writes to be flushed to disk.

Step 3: From Amazon AWS Console, EC2, you’ll request an Snapshot. That step is to try to get extra security. Taking an Snapshot from a live, mounted, filesystem, is not the best of ideas, specially of a Database, but we are facing a desperate situation so we’re increasing the numbers of leaving this situation without Data loss. This is just for extra security and if everything goes well at the end you will not need this snapshot.

Make sure you select No reboot.

Step 4: Be very careful if you have extra drives and ephemeral drives.

Step 5: Wait till the Snapshot completes.

Step 6: Then request a graceful poweroff. Amazon will try to poweroff the Server in a gentle way. This may take two minutes.

Step 7: When the instance is powered off, request a new Snapshot. This is the one we really want. The other was just to be more safe. If you feel confident you can just unclick No Reboot on the previous Step and do only one Snapshot.

Step 8: Wait till the Snapshot completes.

Step 9: Generate and upload the new key you will use to AWS Console, or ask Amazon to generate a key pair for you. You can do it while creating the new instance through the wizard.

Step 10: Launch a new instance, based on your snapshot AMI. This will generate a copy of your previous instance (using the Snapshot) for the new one. Select the new Key pair. Finish assigning the Security groups, the elastic ip…

Step 11: Start the new instance. You can select a different flavor, like a more powerful instance, if you prefer. (scale vertically)

Step 12: Test your access by login via SSH with the new pair keys and from your static Ip which has access in the Firewall.

ssh -i /home/carles/Desktop/Data/keys/carles-ecommerce.pem ubuntu@54.208.225.14

Step 13: Check that the web Starts correctly, check the Database logs to see if there is any corruption. Should not have any if graceful shutdown went well.

Step 14: Reopen the access from the Firewall, so the world can connect to your instance.

Method 2: Slower, access the Data and rebuild whatever you need

The second method is exactly the same until Step 6 included.

Step 7: After this, you will create a new instance based on your favorite OS, with a new pair of Keys.

Step 8: You’ll detach the Volume from the eCommerce previous instance (the one you lost access).

Step 9: You’ll attach the Volume to the new instance.

Step 10: You’ll have access to the Data from the previous instance in the new volume. type cat /proc/partitions or df -h to see the mountpoints available. You can then download or backup, or install the Software again and import the Database…

Step 11: Check that everything works, and enable the access worldwide to the Web in the Firewall (Security Group Inbound Rules).

If you are confident enough, you can use this method to upgrade the OS or base Software of your instance, making it part of your maintenance window. For example, to get the last version of Ubuntu or CentOS, MySQL, Python or PHP, etc…

Blocking some offending Ip’s easily with Ubuntu ufw

Ok, so we know that there are several ip’s that have attempted to hack the blog.

We know they try different urls looking for a exploit, or they try to hack a password by brute force…

We are using Amazon EC2 and the old infrastructure, not a VPC Network, so we cannot block a specific Ip to our Web Server.

In an article from 2015 I explained How to Stop a BitTorrent based DDoS attack, and was using iptables for that.

In this example I will show how to use ufw to block tow specific Ip’s, execute as root or with sudo:

ufw insert 1 deny from 89.35.39.60 to any
ufw insert 2 deny from 85.204.246.240 to any
ufw allow OpenSSH
ufw allow 22/tcp
ufw allow "Apache Full"
ufw enable
ufw status numbered

You can do ufw status numbered to see the status of ufw and the rules order.

root@ip-111-111-111-111:/home/ubuntu# ufw status numbered
Status: active
To Action From
-- ------ ----

[ 1] Anywhere DENY IN 89.35.39.60
[ 2] Anywhere DENY IN 85.204.246.240
[ 3] Apache Full ALLOW IN Anywhere
[ 4] OpenSSH ALLOW IN Anywhere
[ 5] 22/tcp ALLOW IN Anywhere
[ 6] Apache Full (v6) ALLOW IN Anywhere (v6)
[ 7] OpenSSH (v6) ALLOW IN Anywhere (v6)
[ 8] 22/tcp (v6) ALLOW IN Anywhere (v6)
root@ip-111-111-111-111:/home/ubuntu#

If you need to delete a rule, use the number on the left and, just type:

sudo ufw delete 2

Stopping and investigating a WordPress xmlrpc.php attack

One of my Servers got heavily attacked for several days. I describe here the steps I took to stop this.

The attack consisted in several connections per second to the Server, to path /xmlrpc.php.

This is a WordPress file to control the pingback, when someone links to you.

My Server it is a small Amazon instance, a m1.small with only one core and 1,6 GB RAM, magnetic disks and that scores a discrete 203 CMIPS (my slow laptop scores 460 CMIPS).

Those massive connections caused the server to use more and more RAM, and while the xmlrpc requests were taking many seconds to reply, so more and more processes of Apache were spawned. That lead to more memory consumption, and to use all the available RAM and start using swap, with a heavy performance impact until all the memory was exhausted and the mysql processes stopped.

I saw that I was suffering an attack after the shutdown of MySql. I checked the CloudWatch Statistics from Amazon AWS and it was clear that I was receiving many -out of normal- requests. The I/O was really high too.

This statistics are from today to three days ago, look at the spikes when the attack was hitting hard and how relaxed the Server is now (plain line).

blog-carlesmateo-com-statistics-use-last-3-days

First I decided to simply rename the xmlrpc.php file as a quick solution to stop the attack but the number of http connections kept growing and then I saw very suspicious queries to the database.

blog-carlesmateo-suspicious-queries-2014-08-30-00-11-59Those queries, in addition to what I’ve seen in the Apache’s error log suggested me that may be the Server was hacked by a WordPress/plugin bug and that now they were trying to hide from the database’s logs. (Specially the DELETE FROM wp_useronline WHERE user_ip = the Ip of the attacker)

[Tue Aug 26 11:47:08 2014] [error] [client 94.102.49.179] Error in WordPress Database Lost connection to MySQL server during query a la consulta SELECT option_value FROM wp_options WHERE option_name = 'uninstall_plugins' LIMIT 1 feta per include('wp-load.php'), require_once('wp-config.php'), require_once('wp-settings.php'), include_once('/plugins/captcha/captcha.php'), register_uninstall_hook, get_option
[Tue Aug 26 11:47:09 2014] [error] [client 94.102.49.179] Error in WordPress Database Lost connection to MySQL server during query a la consulta SELECT option_value FROM wp_options WHERE option_name = 'uninstall_plugins' LIMIT 1 feta per include('wp-load.php'), require_once('wp-config.php'), require_once('wp-settings.php'), include_once('/plugins/captcha/captcha.php'), register_uninstall_hook, get_option
[Tue Aug 26 11:47:10 2014] [error] [client 94.102.49.179] Error in WordPress Database Lost connection to MySQL server during query a la consulta SELECT option_value FROM wp_options WHERE option_name = 'widget_wppp' LIMIT 1 feta per include('wp-load.php'), require_once('wp-config.php'), require_once('wp-settings.php'), do_action('plugins_loaded'), call_user_func_array, wppp_check_upgrade, get_option

The error log was very ugly.

The access log was not reassuring, as it shown many attacks like that:

94.102.49.179 - - [26/Aug/2014:10:34:58 +0000] "POST /xmlrpc.php HTTP/1.0" 200 598 "-" "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)"
94.102.49.179 - - [26/Aug/2014:10:34:59 +0000] "POST /xmlrpc.php HTTP/1.0" 200 598 "-" "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)"
127.0.0.1 - - [26/Aug/2014:10:35:09 +0000] "OPTIONS * HTTP/1.0" 200 126 "-" "Apache/2.2.22 (Ubuntu) (internal dummy connection)"
94.102.49.179 - - [26/Aug/2014:10:34:59 +0000] "POST /xmlrpc.php HTTP/1.0" 200 598 "-" "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)"
94.102.49.179 - - [26/Aug/2014:10:34:59 +0000] "POST /xmlrpc.php HTTP/1.0" 200 598 "-" "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)"
94.102.49.179 - - [26/Aug/2014:10:35:00 +0000] "POST /xmlrpc.php HTTP/1.0" 200 598 "-" "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)"
94.102.49.179 - - [26/Aug/2014:10:34:59 +0000] "POST /xmlrpc.php HTTP/1.0" 200 598 "-" "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)"

Was difficult to determine if the Server was receiving SQL injections so I wanted to be sure.

Note: The connection from 127.0.0.1 with OPTIONS is created by Apache when spawns another Apache.

As I had super fresh backups in another Server I was not afraid of the attack dropping the database.

I was a bit suspicious also because the /readme.html file mentioned that the version of WordPress is 3.6. In other installations it tells correctly that the version is the 3.9.2 and this file is updated with the auto-update. I was thinking about a possible very sophisticated trojan attack able to modify wp-includes/version.php and set fake $wp_version = ‘3.9.2’;
Later I realized that this blog had WordPress in Catalan, my native language, and discovered that the guys that do the translations forgot to update this file (in new installations it comes not updated, and so showing 3.6). I have alerted them.

In fact later I did a diff of all the files of my WordPress installation against the official WordPress 3.9.2-ca and later a did a diff between the WordPress 3.9.2-ca and the WordPress 3.9.2 (English – default), and found no differences. My Server was Ok. But at this point, at the beginning of the investigation I didn’t know that yet.

With the info I had (queries, times, attack, readme telling v. 3.6…) I balanced the possibility to be in front of something and I decided that I had an unique opportunity to discover how they do to inject those Sql, or discover if my Server was compromised and how. The bad point is that it was the same Amazon’s Server where this blog resides, and I wanted the attack to continue so I could get more information, so during two days I was recording logs and doing some investigations, so sorry if you visited my blog and database was down, or the Server was going extremely slow. I needed that info. It was worth it.

First I changed the Apache config so the massive connections impacted a bit less the Server and so I could work on it while the attack was going on.

I informed my group of Senior friends on what’s going on and two SysAdmins gave me some good suggestions on other logs to watch and on how to stop the attack, and later a Developer joined me to look at the logs and pointed possible solutions to stop the attack. But basically all of them suggested on how to block the incoming connections with iptables and to do things like reinstalling WordPress, disabling xmlrpc.php in .htaccess, changing passwords or moving wp-admin/ to another place, but the point is that I wanted to understand exactly what was going on and how.

I checked the logs, certificates, etc… and no one other than me was accessing the Server. I also double-checked the Amazon’s Firewall to be sure that no unnecessary ports were left open. Everything was Ok.

I took a look at the Apache logs for the site and all the attacks were coming from the same Ip:

94.102.49.179

It is an Ip from a dedicated Servers company called ecatel.net. I reported them the abuse to the abuse address indicated in the ripe.net database for the range.

I found that many people have complains about this provider and reports of them ignoring the requests to stop the spam use from their servers, so I decided that after my tests I will block their entire network from being able to access my sites.

All the requests shown in the access.log pointed to requests to /xmlrpc.php. It was the only path requested by the attacker so that Ip did nothing more apparently.

I added some logging to WordPress xmlrpc.php file:

if ($_SERVER['REMOTE_ADDR'] == '94.102.49.179') {
    error_log('XML POST: '.serialize($_POST));
    error_log('XML GET: '.serialize($_GET));
    error_log('XML REQUEST: '.serialize($_REQUEST));
    error_log('XML SERVER: '.serialize($_SERVER));
    error_log('XML FILES: '.serialize($_FILES));
    error_log('XML ENV: '.serialize($_ENV));
    error_log('XML RAW: '.$HTTP_RAW_POST_DATA);
    error_log('XML ALL_HEADERS: '.serialize(getallheaders()));
}

This was the result, it is always the same:

[Fri Aug 29 19:02:54 2014] [error] [client 94.102.49.179] XML POST: a:0:{}
[Fri Aug 29 19:02:54 2014] [error] [client 94.102.49.179] XML GET: a:0:{}
[Fri Aug 29 19:02:54 2014] [error] [client 94.102.49.179] XML REQUEST: a:0:{}
[Fri Aug 29 19:02:54 2014] [error] [client 94.102.49.179] XML SERVER: a:24:{s:9:"HTTP_HOST";s:24:"barcelona.afterstart.com";s:12:"CONTENT_TYPE";s:8:"text/xml";s:14:"CONTENT_LENGTH";s:3:"287";s:15:"HTTP_USER_AGENT";s:50:"Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)";s:15:"HTTP_CONNECTION";s:5:"close";s:4:"PATH";s:28:"/usr/local/bin:/usr/bin:/bin";s:16:"SERVER_SIGNATURE";s:85:"<address>Apache/2.2.22 (Ubuntu) Server at barcelona.afterstart.com Port 80</address>\n";s:15:"SERVER_SOFTWARE";s:22:"Apache/2.2.22 (Ubuntu)";s:11:"SERVER_NAME";s:24:"barcelona.afterstart.com";s:11:"SERVER_ADDR";s:14:"[this-is-removed]";s:11:"SERVER_PORT";s:2:"80";s:11:"REMOTE_ADDR";s:13:"94.102.49.179";s:13:"DOCUMENT_ROOT";s:29:"/var/www/barcelona.afterstart.com";s:12:"SERVER_ADMIN";s:19:"webmaster@localhost";s:15:"SCRIPT_FILENAME";s:40:"/var/www/barcelona.afterstart.com/xmlrpc.php";s:11:"REMOTE_PORT";s:5:"40225";s:17:"GATEWAY_INTERFACE";s:7:"CGI/1.1";s:15:"SERVER_PROTOCOL";s:8:"HTTP/1.0";s:14:"REQUEST_METHOD";s:4:"POST";s:12:"QUERY_STRING";s:0:"";s:11:"REQUEST_URI";s:11:"/xmlrpc.php";s:11:"SCRIPT_NAME";s:11:"/xmlrpc.php";s:8:"PHP_SELF";s:11:"/xmlrpc.php";s:12:"REQUEST_TIME";i:1409338974;}
[Fri Aug 29 19:02:54 2014] [error] [client 94.102.49.179] XML FILES: a:0:{}
[Fri Aug 29 19:02:54 2014] [error] [client 94.102.49.179] XML ENV: a:0:{}
[Fri Aug 29 19:02:54 2014] [error] [client 94.102.49.179] XML RAW: <?xmlversion="1.0"?><methodCall><methodName>pingback.ping</methodName><params><param><value><string>http://seretil.me/</string></value></param><param><value><string>http://barcelona.afterstart.com/2013/09/27/afterstart-barcelona-2013-09-26/</string></value></param></params></methodCall>
[Fri Aug 29 19:02:54 2014] [error] [client 94.102.49.179] XML ALL_HEADERS: a:5:{s:4:"Host";s:24:"barcelona.afterstart.com";s:12:"Content-type";s:8:"text/xml";s:14:"Content-length";s:3:"287";s:10:"User-agent";s:50:"Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)";s:10:"Connection";s:5:"close";}

So nothing in $_POST, nothing in $_GET, nothing in $_REQUEST, nothing in $_SERVER, no files submitted, but a text/xml Posted (that was logged by storing: $HTTP_RAW_POST_DATA):

<?xmlversion="1.0"?><methodCall><methodName>pingback.ping</methodName><params><param><value><string>http://seretil.me/</string></value></param><param><value><string>http://barcelona.afterstart.com/2013/09/27/afterstart-barcelona-2013-09-26/</string></value></param></params></methodCall>

I show you in a nicer formatted aspect:blog-carlesmateo-com-xml-xmlrpc-requestSo basically they were trying to register a link to seretil dot me.

I tried and this page, hosted in CloudFare, is not working.

accessing-seretil-withoud-id

The problem is that responding to this spam xmlrpc request took around 16 seconds to the Server. And I was receiving several each second.

I granted access to my Ip only on the port 80 in the Firewall, restarted Apache, restarted MySql and submitted the same malicious request to the Server, and it even took 16 seconds in all my tests:

cat http_post.txt | nc barcelona.afterstart.com 80

blog-carlesmateo-com-response-from-the-server-to-xmlrpc-attackI checked and confirmed that the logs from the attacker were showing the same Content-Length and http code.

Other guys tried xml request as well but did one time or two and leaved.

The problem was that this robot was, and still sending many requests per second for days.

May be the idea was to knock down my Server, but I doubted it as the address selected is the blog of one Social Event for Senior Internet Talents that I organize: afterstart.com. It has not special interest, I do not see a political, hateful or other motivation to attack the blog from this project.

Ok, at this point it was clear that the Ip address was a robot, probably running from an infected or hacked Server, and was trying to publish a Spam link to a site (that was down). I had to clarify those strange queries in the logs.

I reviewed the WPUsersOnline plugin and I saw that the strange queries (and inefficient) that I saw belonged to WPUsersOnline plugin.

blog-carlesmateo-com-grep-r-delete-from-wp-useronline-2014-08-30-21-11-21-cut

The thing was that when I renamed the xmlrpc.php the spamrobot was still posting to that file. According to WordPress .htaccess file any file that is not found on the filesystem is redirected to index.php.

So what was happening is that all the massive requests sent to xmlrpc.php were being attended by index.php, then showing an error message that page not found, but the WPUsersOnline plugin was deleting those connections. And was doing it many times, overloading also the Database.

Also I was able to reproduce the behaviour by myself, isolating by firewalling the WebServer from other Ips other than mine and doing the same post by myself many times per second.

I checked against a friend’s blog but in his Server xmlrpc.php responds in 1,5 seconds. My friend’s Server is a Digital Ocean Virtual Server with 2 cores and SSD Disks. My magnetic disks on Amazon only bring around 40 MB/second. I’ve to check in detail why my friend’s Server responds so much faster.

Checked the integrity of my databases, just in case, and were perfect. Nothing estrange with collations and the only errors in the /var/log/mysql/error.log was due to MySql crashing when the Server ran out of memory.

Rechecked in my Server, now it takes 12 seconds.

I disabled 80% of the plugins but the times were the same. The Statistics show how the things changed -see the spikes before I definitively patched the Server to block request from that Spam-robot ip, to the left-.

I checked against another WordPress that I have in the same Server and it only takes 1,5 seconds to reply. So I decided to continue investigating why this WordPress took so long to reply.

blog-carlesmateo-com-statistics-use-last-24-hours

As I said before I checked that the files from my WordPress installation were the same as the original distribution, and they were. Having discarded different files the thing had to be in the database.

Even when I checked the MySql it told me that all the tables were OK, having seen that the WPUserOnline deletes all the registers older than 5 minutes, I guessed that this could lead to fragmentation, so I decided to do OPTIMIZE TABLE on all the tables of the database for the WordPress failing, with InnoDb it is basically recreating the Tables and the Indexes.

I tried then the call via RPC and my Server replied in three seconds. Much better.

Looking with htop, when I call the xmlrpc.php the CPU uses between 50% and 100%.

I checked the logs and the robot was gone. He leaved or the provider finally blocked the Server. I don’t know.

Everything became clear, it was nothing more than a sort of coincidences together. Deactivating the plugin the DELETE queries disappeared, even under heavy load of the Server.

It only was remain to clarify why when I send a call to xmlrpc to this blog, it replies in 1,5 seconds, and when I request to the Barcelona.afterstart.com it takes 3 seconds.

I activated the log of queries in mysql. To do that edit /etc/mysql/my.cnf and uncomment:

general_log_file        = /var/log/mysql/mysql.log
general_log             = 1

Then I checked the queries, and in the case of my blog it performs many less queries, as I was requesting to pingback to an url that was not existing, and WordPress does this query:

SELECT   wp_posts.* FROM wp_posts  WHERE 1=1  AND ( ( YEAR( post_date ) = 2013 AND MONTH( post_date ) = 9 AND DAYOFMONTH( post_date ) = 27 ) ) AND wp_posts.post_name = 'afterstart-barcelona-2013-09-26-meet' AND wp_posts.post_type = 'post'  ORDER BY wp_posts.post_date DESC

As the url afterstart-barcelona-2013-09-26-meet with the dates indicated does not exist in my other blog, the execution ends there and does not perform the rest of the queries, that in the case of Afterstart blog were:

40 Query     SELECT post_id, meta_key, meta_value FROM wp_postmeta WHERE post_id IN (81) ORDER BY meta_id ASC
40 Query     SELECT ID, post_name, post_parent, post_type
FROM wp_posts
WHERE post_name IN ('http%3a','','seretil-me')
AND post_type IN ('page','attachment')
40 Query     SELECT   wp_posts.* FROM wp_posts  WHERE 1=1  AND (wp_posts.ID = '0') AND wp_posts.post_type = 'page'  ORDER BY wp_posts.post_date DESC
40 Query     SELECT * FROM wp_comments WHERE comment_post_ID = 81 AND comment_author_url = 'http://seretil.me/'

To confirm my theory I tried the request to my blog, with a valid url, and it lasted for 3-4 seconds, the same than Afterstart’s blog. Finally I double-checked with the blog of my friend and was slower than before. I got between 1,5 and 6 seconds, with a lot of 2 seconds response. (he has PHP 5.5 and OpCache that improves a bit, but the problem is in the queries to the database)

Honestly, the guys creating WordPress should cache this queries instead of performing 20 live queries, that are always the same, before returning the error message. Using Cache Lite or Stash, or creating an InMemory table for using as Cache, or of course allowing the use of Memcached would eradicate the DoS component of this kind of attacks. As the xmlrpc pingback feature hits the database with a lot of queries to end not allowing the publishing.

While I was finishing those tests (remember that the attacker ip has gone) another attacker from the same network tried, but I had patched the Server to ignore it:

94.102.52.157 - - [31/Aug/2014:02:06:16 +0000] "POST /xmlrpc.php HTTP/1.0" 200 189 "-" "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)"

This was trying to get a link published to a domain called socksland dot net that is a domain registered in Russia and which page is not working.

As I had all the information I wanted I finally blocked the network from the provider to access my Server ever again.

Unfortunatelly Amazon’s Firewall does not allow to block a certain Ip or range.
So you can block at Iptables level or in .htaccess file or in the code.
I do not recommend blocking at code level because sadly WordPress has many files accessible from outside so you would have to add your code at the beginning of all the files and because when there is a WordPress version update you’ll loss all your customizations.
But I recommend proceeding to patch your code to avoid certain Ip’s if you use a CDN. As the POST will be sent directly to your Server, and the Ip’s are Ip’s from the CDN -and you can’t block them-. You have to look at the Header: X-Forwarded-For that indicates the Ip’s the proxies have passed by, and also the Client’s Ip.

I designed a program that is able to patch any PHP project to check for blacklisted Ip’s (even though a proxy) with minimal performance impact. It works with WordPress, drupal, joomla, ezpublish and Framework like Zend, Symfony, Catalonia… and I patched my code to block those unwanted robot’s requests.

A solution that will work for you probably is to disable the pingback functionality, there are several plugins that do that. Disabling completely xmlrpc is not recommended as WordPress uses it for several things (JetPack, mobile, validation…)

The same effect as adding the plugin that disables the xmlrpc pingback can be achieved by editing the functions.php from your Theme and adding:

add_filter( 'xmlrpc_methods', 'remove_xmlrpc_pingback_ping' );
function remove_xmlrpc_pingback_ping( $methods ) {
    unset( $methods['pingback.ping'] );
    
    return $methods;
}

Update: 2016-02-24 14:40 CEST
I got also a heavy dictionary attack against wp-login.php .Despite having a Captcha plugin, that makes it hard to hack, it was generating some load on the system.
What I did was to rename the wp-login.php to another name, like wp-login-carles.php and in wp-login.php having a simply exit();

<?php
exit();

Take in count that this will work only until WordPress is updated to the next version. Then you have to reapply the renaming trick.