Tag Archives: Amazon EC2

News from the blog 2021-12-07

Charity

I’ve donated to Equitas Health.

Equitas Health helps thousands of HIV-positive in Ohio, Dayton and Columbus.

Thousands more are reached with our prevention, testing, and other services. We are excited about embracing our expanded mission as a strategic step to further that legacy and its reach by providing care for all – with a focus on a safe and open space and highest quality healthcare for the LGBTQ community and others who are medically underserved.

https://equitashealth.com/get-involved/give/donate-now/

I did my donation following a post by Terra Field, a former colleague at Blizzard and later leading Netflix’s Trans *ERG, but I didn’t see that she organized a gofund campaign, so I donated again :)

If you want to help them:

https://www.gofundme.com/f/transphobia-is-not-a-joke?utm_source=customer&utm_medium=copy_link_all&utm_campaign=m_pd+share-sheet

https://equitashealth.com/get-involved/give/donate-now/

Articles

I created an article about provisioning to Amazon AWS EC2 and running playbooks (recipes) using Ansible, and Dynamic Directory to store the public ip’s or dns public names in an inventory.

As I saw that there is a lack of clarity in the articles about this theme.

I also provided two alternatives ways, one pure Python3 and the other Bash based (grep awk tr)

Books

The books I publish in LeanPub have two prices, the suggested price, which is the price I consider the right price for the book, and the minimum price, which is the minimum price I authorized a reader can pay to have it.

You can buy it for the minimum price. You know better than anyone your economy.

So when a reader buys one of my books for the suggested price, instead of the minimum price, it’s really showing how they appreciate may work.

So thanks for all the support and appreciation you show!. :)

One of the motives I chose Leanpub platform is because I think is fair. No DRM, no BS. And the reader can ask for a refund within 45 days if they don’t like the book. It also makes very happy seeing that I don’t have any refunds. I appreciate it as a token of the usefulness of my work. Thanks. :)

Updates to Docker Combat File book (v.16 2021-11-24)

I added a nice trick to reverse engineering the original Dockerfile from a running Image.

I also added another typical copy and paste error into the Troubleshoot section.

https://leanpub.com/docker-combat-guide

Automating and Provisioning Amazon AWS (EC2, EBS, S3, CloudWatch) with boto3 (Amazon’s SDK for Python 3) and Python 3 book

I’m writing a book about how to automate your Amazon AWS tasks using Amazon’s AWS Python 3 SDK boto3, provisioning new instances, stopping, starting, creating volumes, creating/deleting buckets in S3, uploading/downloading files from S3…

It is currently 20% completed. With 43 pages it shows EC2 section already.

https://leanpub.com/amazon-aws-boto3

Open Source

I’ve working in carleslibs v.1.0.3. I added MenuUtils class, which allows to assemble menus super quickly, that execute the code referenced in the menu array. Ideal for building CLI applications very fast.

I also added KeyboardUtils class, which allows to ask the user for String within certain lengths allowing or not spaces and/or underscores, and ask user for Integer values within a certain min and max, having 0 for go back.

The plan is to release the new version of carleslibs as soon as I’ve tested it properly.

Social part

For those who follow my recommendations, as always, I have updated the list of new movies I watched and the list of new videogames I played.

News from the Blog 2021-11-11

New Articles

How to communicate with your Python program running inside a Docker Container, using Linux Signals

Hope you’ll have fun reading this article:

Communicating with Docker Containers via Linux Signals and Python

I migrated my last services from Amazon and the blog to Google Compute Engine (GCE / GCP)

I wrote a Postmortem analysis about the process of migrating my last services from my 11 year old Amazon account.

Updates

Updates to articles

I updated the article about Python weird things that you may not know adding the Ellipsis …

I’ve been working in some Cassandra examples. I may publish an article soon about using it from Python and Docker.

Updates to My Books

I updated my Python and Docker books.

I’m currently writing a book about using Amazon AWS Python SDK (boto3).

Updates to Open Source projects

I have updated ctop, fixed two bugs and increased Code Coverage.

I made a new tag and released the last Stable Version:

https://gitlab.com/carles.mateo/ctop/-/tags/0.8.7

On top of my local Unit Testing, I have Jenkins checking that I don’t commit anything that breaks the Tests.

Some time ago I wrote some articles about how you can setup jenkins in a Docker Container.

Miscellaneous

Charity

I’ve donated to Wikipedia.

Only 2% of the viewers donate, so I answered the call every time it was made.

This is my 5th donation to Wikimedia.

I consider that Freedom is very important.

I bought these new books

One of my secrets to be on top is that I’m always studying.

I study all the time, at work and in my free time.

I use Linux Academy and I buy books in paper. I don’t connect with reading in tablets. I think information is stored better when read in paper. I use also a marker and pointers to keep a direct access to the most interesting points on the books.

And I study all kind of themes. Obviously I know a lot of Web Scraping, but there is always room for learning more. And whatever new I learn helps me to be better with my students and more clear writing my books.

I’ve never been a Front End, but I’ve been able to fix bugs in the Front End engines from the companies I worked for, like Privalia. I was passed a bug that prevented the Internet Explorer users to buy just one hour before we launching a massive campaign. I debugged and I found a variable named “value” so the html looked like <input name="value" value="">. In less than 30 minutes I proved to the incredulous Head of Development and the CTO that a bug in Internet Explored was causing a conflict when fetching the value from the input named value. We deployed to Production the update and the campaign was a total success. So I consider knowing Javascript and Front also a need, even if I don’t work directly with it. I want to be able to understand all the requirements and possibilities, and weaknesses, so I can fix bugs and save the day. That allowed me to fix scalability problems in Nodejs and Phantomjs projects too. (They are Javascript Server Side, event driven, projects)

It seems that Amazon.co.uk works well again for Ireland. My two last orders arrived on time and I had no problems of border taxes apparently.

Nice Python article

I enjoyed a lot this article, cause explains part of what I did with my student and friend Albert, in a project that analyzes the access logs from Apache for patterns of attempts of exploits, then feeds a database, and then blocks those offender Ip Addresses in the Firewall.

The article only covers the part of Pandas, of reading the access.log file and working with it, but is a very well redacted article:

https://mmas.github.io/read-apache-access-log-pandas

Nice Virtual Volumes article from VMware

I prefer Open Source, but there are very good commercial products too.

I liked this article about Virtual Volumes from VMWare:

Understanding Virtual Volumes (vVols) in VMware vSphere 6.7/7.0 (2113013)

https://kb.vmware.com/s/article/2113013

Thanks Blizzard (again)

There is a very nice initiative where we can nominate 4 colleagues a year, that we think that deserve a recognition.

My colleagues voted for me, so I received a gift voucher that I can spend in Ireland stores like Ikea, Pc World, Argos, Adidas, App Store & iTunes…

So thanks a million buds. :)

Migrating my 11 years Amazon AWS account services (Postmortem Analysis)

I started to explain that I was migrating some services from Amazon and that some of my sites were under Maintenance and that I would provide more information.

Here is the complete history of why I migrated all the services from my 11 years old Amazon account to other CSP.

Some lessons can be learned from my adventure.

I migrated my last services from Amazon to GCP

Amazon sent me an email on October 6th, this year 2021, telling me that they will disable EC2-Classic by August 2022. I thought I would not be able to keep my Static Ip’s as in the past VPC Ip’s and EC2-Classic Ip’s were not transferable, so considering that I would loss my Static Ip’s anyway I started to migrate to some to other providers like Digital Ocean.

Is not cool losing Static Ip (Elastic Ip in AWS) Addresses as this is bad for SEO, so given that I though I would lose my Static Ips that have been with me for years, I started to migrate certain services to providers much more economic.

Amazon is terrible communicating, and I talked with some product managers in the past about that, when they lost one of my Volumes, and the email was so cold and terrible that actually that hurt more than Amazon losing my Data. I believed that it was a poorly made Scam and when I realized it was true I reached one of my friends, that is manager there, as I know they care for doing things right, and he organized a meeting with two PM so I can pass my feedback.

The Cloud providers are changing things very fast, and nobody is able to be up to date with the changes, unless their work position allows plenty of time to get updated. Even if pages of documentation are provided, you have to react to an event that they externally generated forcing you to action. Action to read all the documentation about EC2-Classic migrations, action to prepare to have migrated by August 2022.

So August 2022… I was counting that I had plenty of time but I’m writing a new book about using the Amazon SDK for Python, boto3, and I was doing some API calls and they started to fail in a very unusual way, Exceptions with timeout, but only for the only region where I had EC2-Classic.

urllib3.exceptions.NewConnectionError: <botocore.awsrequest.AWSHTTPSConnection object at 0x7f0347d545e0>: Failed to establish a new connection: [Errno -2] Name or service not known

My config was:

        o_config = Config(
            region_name="us-east-1a",
            signature_version="v4",
            retries={
                'max_attempts': 10,
                'mode': 'standard'
            }
        )

But if I switched to another region name, it would work:

            region_name='us-west-2',

I made a mistake in here, the region name is “us-east-1” and not “us-east-1a“. “us-east-1a” is the availability zone. So the SDK was giving a timeout because in order to connect to the endpoint it uses the region name as part of the hostname. So it doesn’t find that endpoint because it doesn’t exist.

I never understood why a company like Amazon is unable to provide the SDK with a sample project or projects 100% working, with the source code so people has a base that works to build up.

Every API that I have created, I have provided it with documentation but also with example for several languages for how to use it.

In 2013 I was CTO of an online travel agency, and we had meta-searchers consuming our API and we were having several hundreds of thousands requests per second. Everything was perfectly documented, examples were provided for several languages, the document and the SDK had version numbers…

Everybody forgets about Developers and companies throw terrible and cold products to the poor Developers, so difficult to use. How many Developers would like to say: Listen Mr. President of the big Cloud Company XXXX, I only want to spawn a VM that works, and fast, with easy wizards. I don’t want to learn 50 hours before being able to use your overpriced platform, by doing 20 things before your Ip’s are reflexes of your infrastructure and based in Microservices. Modern JavaScript frameworks can create nice gently wizards even if you have supercold APIs.

Honestly, I didn’t realize my typo in the region and I connected to the Amazon Console to investigate and I saw this.

Honestly, when I read it I understood that they were going to end my EC2 Networking the 30th of October. It was 29th. I misunderstood.

It was my fault not reading it well to the end, I got shocked by the first part telling about shutdown and I didn’t fully understood as they were going to shutdown EC2-Classic for the zones I didn’t had anything running only.

From the long errors (3 exceptions chained) I didn’t realize that the endpoint is built with the region name. (And I was passing the availability zone)

botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "https://ec2.us-east-1a.amazonaws.com/"

Here is when I say that a good SDM would had thought and cared for the Developers more, and would had made the SDK to check if that region exists. How difficult is to create a SDK a bit more clever that detects a invalid region id?. It is not difficult.

It is true that it was late in the evening and I was tired of all the day, and two days of the week between work and zoom university classes I work 15 hours and 13 hours respectively, not counting the assignments, so by the end of the week I am very tired. But that’s why it is very important to follow methodology and to read well. I think Amazon has 50% of the fault by the way they do things: how the created the SDK, how they communicate, and by the errors that the console returned me when I tried to create a VPC instance of an EC2-Classic AMI (they seem related to the fact I had old VPC Network objects with shorter hash than the current they use) and the other 50% was my fault for not identifying the source of the error, and not reading the message in their website well.

But the fact that there were having those errors in the API’s and timeouts made me believe they were going to cut the EC2-Classic Networking the next day.

All the mistakes fall together in a perfect storm.

I checked for documentation and I saw it was possible to migrate my Static Ip’s to VPC Static Ip’s.

It was Friday evening, and I cancelled my plans, in order to migrate the Blog to VPC in an attempt to keep running it with Amazon.

As Cloud Architect, I like to have running instances in several CSP as it allows me to stay up to date with the changes they do.

I checked the documentation for the migration. Disassociating the Static Ip (Elastic Ip in AWS jargon) was easy. Turning into VPC as well.

As I progressed, what had to be easy turned into a nightmare, as I was getting many errors from the Amazon API, without any information, and my Instances were not created.

I figured out that their API could have problems with old VPC objects I created time ago, so I had to create new objects for several things.

I managed to spawn my instances but they were being launch and terminated instantly without information. Frustrating.

When launching a new instance from the AMI (a Snapshot of the blog), I was giving shown options to add more volumes without any sense. My Instance was using 16GB from a 20GB total Space, and I was shown different volume configs, depending on the instance, in some case an additional 20GB volume, in other small SSD, ephemeral and 10 GB for the AMI (which requires at least 16GB).

After some fight I manage to make it work after deleting the volumes that made no sense, and keeping only one of 20GB, the same size of my AMI.

But then my nightmare started to make the VPC Instance to have Internet access and to be seen from outside. I had to create a new Internet Gateway, NAT, Network, etc…

As mentioned the old objects I was trying to reusing were making the process to fail.

I was running out of time, and I thought in few time they were going to shutdown EC2-Classic network (as I did not read correctly), so I decided to download everything and to migrate to another provider. For doing that first I blocked all the traffic, except for my Ip.

I worked in parallel, creating the new config in Google Cloud, just in case I had forgot something. I had created a document for the migration and it was accurate.

I managed to do everything fast enough. The slower part was to download all the Data, as I hold entire VM’s for projects like Cassandra Universal Driver.

Then I powered off my Amazon Instance for the Blog forever.

In GCP I blocked all the traffic in the firewall, except for my Ip, so I could work calmly.

When everything was ready, I had to redirect the DNS to the new static Ip from Google.

The DNS provider I used had implemented some changes in their API so I was getting errors replacing my old entry ‘.’ (their JSON calls returned Internal Server Error). Finally I figured it out how to workaround it and I was able to confirm that the first service was up and running.

I did some tests to make sure there were not unexpected permission problems, entries in the logs, etc…

Only then I opened the Google Firewall. I have a second firewall in each instance where I block or open at Ip tables level what I want. Basically abusive bot’s IPs trying to find exploits or brute force by dictionary passwords.

I checked with my phone, without Wifi that the Firewall was all good. (It is always a good idea to use another external Ip, different from the management one, to check)

I added a post explaining that I was migrating some of my Services and were under maintenance.

I mentioned in the blog that some of my services were being migrated from Amazon to Digital Ocean.

For some reasons, in the Backup of the Database one user was lost, so I created it in the MySQL with the typical commands:

CREATE USER 'username'@'localhost' IDENTIFIED BY 'password';
FLUSH PRIVILEGES;
GRANT ALL PRIVILEGES ON mydatabase.* TO 'username'@'localhost';

News from the blog 2021-09-20

  • I’ve published a very simple game, Tic Tac Toe, that I created for my Python 3 Exercises for Beginners book.
  • I’ve raised back the price for my books to normal levels.
    I’ve been keeping the price to the minimum to help people that wanted to learn during covid-19. I consider that who wanted to learn has already done it.

I still have bundles with a somewhat reduced price, and I authorized LeanPub platform to do discounts up to 50% at their discretion.

Bundle of four books in https://leanpub.com/b/python3-exercises-zfs-assemble-computer

https://leanpub.com/b/python3-exercises-zfs-assemble-computer

  • I’ve been deleting AMIs, Snapshots, Volumes and backups from Amazon instances I’ll no longer use.

I’ve migrated to Docker some sites and WordPress sites and now I’m CSP (Cloud Service Provider) agnostic. I can deploy wherever I want.

We pay per GB used of storage, so my money will get a better usage.

As I said in my old article from 2013, The Cloud is for Scaling. For Startups and for Enterprises. It is too expensive for small and medium companies.

  • For those studying Python there is a Virtual Meetup about Data Analysis, in Spanish ,the 23th of September

https://www.meetup.com/tech-barcelona/events/280791310/

More meetups:

https://www.meetup.com/tech-barcelona/

Upgrading Amazon AWS EC2 Ubuntu 18.04 LTS to Ubuntu 20.04 LTS

I’ve upgraded one of my AWS machines from Ubuntu 18.04 LTS to Ubuntu 20.04 LTS.

The process was really straightforward, basically run:

sudo apt update
sudp apt upgrade

Then Reboot in order to load the last kernel.

Then execute:

sudo do-release-upgrade

And ask two or three questions in different moments.

After, reboot, and that’s it.

All my Firewall rules, were kept, the services were restarted as they were available, or deferred to be executed when the service is reinstalled in case of dependencies (like for PHP, which was upgraded before Apache) and I’ve not found anything out of place, by the moment. The Kernels were special, with Amazon customization too.

I always recommend, for Production, to run the Ubuntu LTS version.

How to recover access to your Amazon AWS EC2 instance if you loss your Private Key for SSH

This article covers the desperate situation where you had generated one or more instances, instructed Amazon to use a SSH Key Pair certs where only you have the Private Key, your instances are running, for example, an eCommerce site, running for months, and then you loss your Private Key (.pem file), and with it the SSH access to your instances’ Data.

Actually I’ve seen this situation happening several times, in actual companies. Mainly Start ups. And I solved it for them.

Assuming that you didn’t have a secondary method to access, which is another combination of username/password or other user/KeyPairs, and so you completely lost the access to the Database, the Webservers, etc… I’m going to show you how to recover the data.

For this article I will consider an scenario where there is only one Instance, which contains everything for your eCommerce: Webserver, code, and Database… and is a simple config, with a single persistent drive.

Warning: be very careful as if you use ephemeral drives, contents will be lost is you power off the instance.

Method 1: Quicker, launching a new instance from the previous

Step1: The first step you will take is to close the access from outside, using the Firewall, to avoid any new changes going to the disk. You can allow access to the instance only from your static Ip in the office/home.

Step 2: You’ll wait for 5 minutes to allow any transaction going on to conclude, and pending writes to be flushed to disk.

Step 3: From Amazon AWS Console, EC2, you’ll request an Snapshot. That step is to try to get extra security. Taking an Snapshot from a live, mounted, filesystem, is not the best of ideas, specially of a Database, but we are facing a desperate situation so we’re increasing the numbers of leaving this situation without Data loss. This is just for extra security and if everything goes well at the end you will not need this snapshot.

Make sure you select No reboot.

Step 4: Be very careful if you have extra drives and ephemeral drives.

Step 5: Wait till the Snapshot completes.

Step 6: Then request a graceful poweroff. Amazon will try to poweroff the Server in a gentle way. This may take two minutes.

Step 7: When the instance is powered off, request a new Snapshot. This is the one we really want. The other was just to be more safe. If you feel confident you can just unclick No Reboot on the previous Step and do only one Snapshot.

Step 8: Wait till the Snapshot completes.

Step 9: Generate and upload the new key you will use to AWS Console, or ask Amazon to generate a key pair for you. You can do it while creating the new instance through the wizard.

Step 10: Launch a new instance, based on your snapshot AMI. This will generate a copy of your previous instance (using the Snapshot) for the new one. Select the new Key pair. Finish assigning the Security groups, the elastic ip…

Step 11: Start the new instance. You can select a different flavor, like a more powerful instance, if you prefer. (scale vertically)

Step 12: Test your access by login via SSH with the new pair keys and from your static Ip which has access in the Firewall.

ssh -i /home/carles/Desktop/Data/keys/carles-ecommerce.pem ubuntu@54.208.225.14

Step 13: Check that the web Starts correctly, check the Database logs to see if there is any corruption. Should not have any if graceful shutdown went well.

Step 14: Reopen the access from the Firewall, so the world can connect to your instance.

Method 2: Slower, access the Data and rebuild whatever you need

The second method is exactly the same until Step 6 included.

Step 7: After this, you will create a new instance based on your favorite OS, with a new pair of Keys.

Step 8: You’ll detach the Volume from the eCommerce previous instance (the one you lost access).

Step 9: You’ll attach the Volume to the new instance.

Step 10: You’ll have access to the Data from the previous instance in the new volume. type cat /proc/partitions or df -h to see the mountpoints available. You can then download or backup, or install the Software again and import the Database…

Step 11: Check that everything works, and enable the access worldwide to the Web in the Firewall (Security Group Inbound Rules).

If you are confident enough, you can use this method to upgrade the OS or base Software of your instance, making it part of your maintenance window. For example, to get the last version of Ubuntu or CentOS, MySQL, Python or PHP, etc…

Adding my Server as Docker, with PHP Catalonia Framework, explained

Update: 2021-07-23 Ubuntu 19.04 is no longer available, so I updated the article in order to work with Ubuntu 20.04. and with PHP 7.4 and all their dependencies.

The previous day I explained how I migrated my old Server (Amazon Instance) to a more powerful model, with more recent OS, WebServer, etc…

This was interesting under the point of view of dealing with elastic Ip’s, Amazon AWS Volumes, etc… but was a process basically manual. I could have generated an immutable image to start from next time, but this is another discussion, specially because that Server Instance has different base Software, including a MySql Database.

This time I want to explain, step by step, how to containerize my Server, so I can port to different platforms, and I can be independent on what the Server Operating System is. It will work always, as we defined the Operating System for the Docker Container.

So we start to use IaC (Infrastructure as Code).

So first you need to install docker.

So basically if your laptop is an Ubuntu 18.04 LTS or 20.04 LTS you have to:

sudo apt install docker.io

Start and Automate Docker

The Docker service needs to be setup to run at startup. To do so, type in each command followed by enter:

sudo systemctl start docker
sudo systemctl enable docker

Create the Dockerfile

For doing this you can use any text editor, but as we are working with IaC why not use a Code Editor?.

You can use the versatile PyCharm, that has modules for understanding Docker and so you can use Control Version like git too.

This is the updated Dockerfile to work with Ubuntu 20.04 LTS

FROM ubuntu:20.04

MAINTAINER Carles <carles@carlesmateo.com>

ARG DEBIAN_FRONTEND=noninteractive

#RUN echo "nameserver 8.8.8.8" > /etc/resolv.conf

RUN echo "Europe/Ireland" | tee /etc/timezone

# Note: You should install everything in a single line concatenated with
#       && and finalizing with 
# apt autoremove && apt clean

#       In order to use the less space possible, as every command is a layer
RUN apt update && apt install -y apache2 ntpdate libapache2-mod-php7.4 mysql-server php7.4-mysql php-dev libmcrypt-dev php-pear git && apt autoremove && apt clean

RUN a2enmod rewrite

RUN mkdir -p /www

# In order to activate Debug
# RUN sed -i "s/display_errors = Off/display_errors = On/" /etc/php/7.2/apache2/php.ini 
# RUN sed -i "s/error_reporting = E_ALL & ~E_DEPRECATED & ~E_STRICT/error_reporting = E_ALL/" /etc/php/7.2/apache2/php.ini 
# RUN sed -i "s/display_startup_errors = Off/display_startup_errors = On/" /etc/php/7.2/apache2/php.ini 
# To Debug remember to change:
# config/{production.php|preproduction.php|devel.php|docker.php} 
# in order to avoid Error Reporting being set to 0.

ENV PATH_CATALONIA /www/www.cataloniaframework.com/
ENV PATH_CATALONIA_WWW /www/www.cataloniaframework.com/www/
ENV PATH_CATALONIA_CACHE /www/www.cataloniaframework.com/cache/

ENV APACHE_RUN_USER  www-data
ENV APACHE_RUN_GROUP www-data
ENV APACHE_LOG_DIR   /var/log/apache2
ENV APACHE_PID_FILE  /var/run/apache2/apache2.pid
ENV APACHE_RUN_DIR   /var/run/apache2
ENV APACHE_LOCK_DIR  /var/lock/apache2
ENV APACHE_LOG_DIR   /var/log/apache2

RUN mkdir -p $APACHE_RUN_DIR
RUN mkdir -p $APACHE_LOCK_DIR
RUN mkdir -p $APACHE_LOG_DIR
RUN mkdir -p $PATH_CATALONIA
RUN mkdir -p $PATH_CATALONIA_WWW
RUN mkdir -p $PATH_CATALONIA_CACHE

# Remove the default Server
RUN sed -i '/<Directory \/var\/www\/>/,/<\/Directory>/{/<\/Directory>/ s/.*/# var-www commented/; t; d}' /etc/apache2/apache2.conf 

RUN rm /etc/apache2/sites-enabled/000-default.conf

COPY www.cataloniaframework.com.conf /etc/apache2/sites-available/

RUN chmod 777 $PATH_CATALONIA_CACHE
RUN chmod 777 $PATH_CATALONIA_CACHE.
RUN chown --recursive $APACHE_RUN_USER.$APACHE_RUN_GROUP $PATH_CATALONIA_CACHE

RUN ln -s /etc/apache2/sites-available/www.cataloniaframework.com.conf /etc/apache2/sites-enabled/

# Note: You should clone locally and COPY to the Docker Image
#       Also you should add the .git directory to your .dockerignore file
#       I made this way to show you and for simplicity, having everything
#       in a single file
##RUN git clone https://github.com/cataloniaframework/cataloniaframework_v1_sample_website /www/www.cataloniaframework.com
##RUN git checkout tags/v.1.16-web-1.0
# In order to change profile to Production
# RUN sed -i "s/define('ENVIRONMENT', DOCKER)/define('ENVIRONMENT', PRODUCTION)/" /var/www/www.cataloniaframework.com/config/general.php 
COPY *.php /www/www.cataloniaframework.com/www

# for debugging
#RUN apt-get install -y vim

RUN service apache2 restart

EXPOSE 80

CMD ["/usr/sbin/apache2", "-D", "FOREGROUND"]

The www.cataloniaframework.com.conf file

As you saw in the Dockerfile you have the line:

COPY www.cataloniaframework.com.conf /etc/apache2/sites-available/

This will copy the file www.cataloniaframework.com.conf that must be in the same directory that the Dockerfile file, to the /etc/apache2/sites-available/ folder in the container.

<VirtualHost *:80>
    ServerAdmin webmaster@cataloniaframework.com
    # Uncomment to use a DNS name in a multiple VirtualHost Environment
    #ServerName www.cataloniaframework.com
    #ServerAlias cataloniaframework.com
    DocumentRoot /www/www.cataloniaframework.com/www
    <Directory /www/www.cataloniaframework.com/www/>
            Options -Indexes +FollowSymLinks +MultiViews
            AllowOverride All
            Order allow,deny
            allow from all
            Require all granted
    </Directory>
    ErrorLog ${APACHE_LOG_DIR}/www-cataloniaframework-com-error.log
    # Possible values include: debug, info, notice, warn, error, crit,
    # alert, emerg.
    LogLevel warn
    CustomLog ${APACHE_LOG_DIR}/www-cataloniaframework-com-access.log combined
</VirtualHost>

Stopping, starting the docker Service and creating the Catalonia image

service docker stop && service docker start

To build the Docker Image we will do:

docker build -t catalonia . --no-cache

I use the –no-cache so git is pulled and everything is reworked, not kept from cache.

Now we can run the Catalonia Docker, mapping the 80 port.

docker run -d -p 80:80 catalonia

If you want to check what’s going on inside the Docker, you’ll do:

docker ps

And so in this case, we will do:

docker exec -i -t distracted_wing /bin/bash

Finally I would like to check that the web page works, and I’ll use my preferred browser. In this case I will use lynx, the text browser, cause I don’t want Firefox to save things in the cache.

Upgrading the Blog after 5 years, AWS Amazon Web Services, under DoS and Spam attacks

Few days ago I was under a heavy DoS attack.

Nothing new, zombie computers, hackers, pirates, networks of computers… trying to abuse the system and to hack into it. Why? There could be many reasons, from storing pirate movies, trying to use your Server for sending Spam, try to phishing or to host Ransomware pages…

Most of those guys doesn’t know that is almost impossible to Spam from Amazon. Few emails per hour can come out from the Server unless you explicitly requests that update and configure everything.

But I thought it was a great opportunity to force myself to update the Operating System, core tools, versions of PHP and MySql.

Forensics / Postmortem of the incident

The task was divided in two parts:

  • Understanding the origin of the attack
  • Blocking the offending Ip addresses or disabling XMLRPC
  • Making the VM boot again (problems with Amazon AWS)
    • I didn’t know why it was not booting so.
  • Upgrading the OS

I disabled the access to the site while I was working using Amazon Web Services Firewall. Basically I turned access to my ip only. Example: 8.8.8.8/32

I changed 0.0.0.0/0 so the world wide mask to my_Ip/3

That way the logs were reflecting only what I was doing from my Ip.

Dealing with Snapshots and Volumes in AWS

Well the first thing was doing an Snapshot.

After, I tried to boot the original Blog Server (so I don’t stop offering service) but no way, the Server appeared to be dead.

So then I attached the Volume to a new Server with the same base OS, in order to extract (dump) the database. Later I would attach the same Volume to a new Server with the most recent OS and base Software.

Something that is a bit annoying is that the new Instances, the new generation instances, run only in VPC, not in Amazon EC2 Classic. But my static Ip addresses are created for Amazon EC2 Classic, so I could not use them in new generation instances.

I choose the option to see all the All the generations.

Upgrading the system base Software had its own challenges too.

Upgrading the OS / Base Software

My approach was to install an Ubuntu 18.04 LTS, and install the base Software clean, and add any modification I may need.

I wanted to have all the supported packages and a recent version of PHP 7 and the latest Software pieces link Apache or MySQL.

sudo apt update

sudo apt install apache2

sudo apt install mysql-server

sudo apt install php libapache2-mod-php php-mysql

Apache2

Config files that before were working stopped working as the new Apache version requires the files or symlinks under /etc/apache2/sites-enabled/ to end with .conf extension.

Also some directives changed, so some websites will not able to work properly.

Those projects using my Catalonia Framework were affected, although I have this very well documented to make it easy to work with both versions of Apache Http Server, so it was a very straightforward change.

From the previous version I had to change my www.cataloniaframework.com.conf file and enable:

    <Directory /www/www.cataloniaframework.com>
Options Indexes FollowSymLinks MultiViews
AllowOverride All
Order allow,deny
allow from all
</Directory>

Then Open the ports for the Web Server (443 and 80).

sudo ufw allow in "Apache Full"

Then service apache restart

Catalonia Framework Web Site, which is also created with Catalonia Framework itself once restored

MySQL

The problem was to use the most updated version of the Database. I could use one of the backups I keep, from last week, but I wanted more fresh data.

I had the .db files and it should had been very straightforward to copy to /var/lib/mysql/ … if they were the same version. But they weren’t. So I launched an instance with the same base Software as the old previous machine had, installed mysql-server, stopped it, copied the .db files, started it, and then I made a dump with mysqldump –all-databases > 2019-04-29-all-databases.sql

Note, I copied the .db files using the mythical mc, which is a clone from Norton Commander.

Then I stopped that instance and I detached that volume and attached it to the new Blog Instance.

I did a Backup of my original /var/lib/mysql/ files for the purpose of faster restoring if something went wrong.

I mounted it under /mnt/blog_old and did mysql -u root -p < /mnt/blog_old/home/ubuntu/2019-04-29-all-databases.sql

That worked well I had restored the blog. But as I was watching the /var/log/mysql/error.log I noticed some columns were not where they should be. That’s because inadvertently I overwritten the MySql table as well, which in MySQL 5.7 has different structure than in MySQL 5.5. So I screwed. As I previewed this possibility I restored from the backup in seconds.

So basically then I edited my .sql files and removed all that was for the mysql database.

I started MySql, and run the mysql import procedure again. It worked, but I had to recreate the users for all the Databases and Grant them permissions.

GRANT ALL PRIVILEGES ON db_mysqlproxycache.* TO 'wp_dbuser_mysqlproxy'@'localhost' IDENTIFIED BY 'XWy$&{yS@qlC|<¡!?;:-ç';

PHP7

Some modules in my blogs where returning errors in /var/log/apache2/mysite-error.log so I checked that it was due to lack of support of latest PHP versions, and so I patched manually the code or I just disabled the offending plugin.

WordPress

As seen checking the /var/log/apache2/blog.carlesmateo.com-error.log some URLs where not located by WordPress.

For example:

The requested URL /wordpress/wp-json/ was not found on this server

I had to activate modrewrite and then restart Apache.

a2enmod rewrite; service apache2 restart

Making the site more secure

Checking at the logs of Apache, /var/log/apache2/blog.carlesmateo.com-access.log I checked for Ip’s accessing Admin areas, I looked for 404 Errors pointing to intents to exploit any unsafe WP Plugin, I checked for POST protocol as well.

I added to the Ubuntu Uncomplicated Firewall (UFW) the offending Ip’s and patched the xmlrpc.php file to exit always.

Stopping and investigating a WordPress xmlrpc.php attack

One of my Servers got heavily attacked for several days. I describe here the steps I took to stop this.

The attack consisted in several connections per second to the Server, to path /xmlrpc.php.

This is a WordPress file to control the pingback, when someone links to you.

My Server it is a small Amazon instance, a m1.small with only one core and 1,6 GB RAM, magnetic disks and that scores a discrete 203 CMIPS (my slow laptop scores 460 CMIPS).

Those massive connections caused the server to use more and more RAM, and while the xmlrpc requests were taking many seconds to reply, so more and more processes of Apache were spawned. That lead to more memory consumption, and to use all the available RAM and start using swap, with a heavy performance impact until all the memory was exhausted and the mysql processes stopped.

I saw that I was suffering an attack after the shutdown of MySql. I checked the CloudWatch Statistics from Amazon AWS and it was clear that I was receiving many -out of normal- requests. The I/O was really high too.

This statistics are from today to three days ago, look at the spikes when the attack was hitting hard and how relaxed the Server is now (plain line).

blog-carlesmateo-com-statistics-use-last-3-days

First I decided to simply rename the xmlrpc.php file as a quick solution to stop the attack but the number of http connections kept growing and then I saw very suspicious queries to the database.

blog-carlesmateo-suspicious-queries-2014-08-30-00-11-59Those queries, in addition to what I’ve seen in the Apache’s error log suggested me that may be the Server was hacked by a WordPress/plugin bug and that now they were trying to hide from the database’s logs. (Specially the DELETE FROM wp_useronline WHERE user_ip = the Ip of the attacker)

[Tue Aug 26 11:47:08 2014] [error] [client 94.102.49.179] Error in WordPress Database Lost connection to MySQL server during query a la consulta SELECT option_value FROM wp_options WHERE option_name = 'uninstall_plugins' LIMIT 1 feta per include('wp-load.php'), require_once('wp-config.php'), require_once('wp-settings.php'), include_once('/plugins/captcha/captcha.php'), register_uninstall_hook, get_option
[Tue Aug 26 11:47:09 2014] [error] [client 94.102.49.179] Error in WordPress Database Lost connection to MySQL server during query a la consulta SELECT option_value FROM wp_options WHERE option_name = 'uninstall_plugins' LIMIT 1 feta per include('wp-load.php'), require_once('wp-config.php'), require_once('wp-settings.php'), include_once('/plugins/captcha/captcha.php'), register_uninstall_hook, get_option
[Tue Aug 26 11:47:10 2014] [error] [client 94.102.49.179] Error in WordPress Database Lost connection to MySQL server during query a la consulta SELECT option_value FROM wp_options WHERE option_name = 'widget_wppp' LIMIT 1 feta per include('wp-load.php'), require_once('wp-config.php'), require_once('wp-settings.php'), do_action('plugins_loaded'), call_user_func_array, wppp_check_upgrade, get_option

The error log was very ugly.

The access log was not reassuring, as it shown many attacks like that:

94.102.49.179 - - [26/Aug/2014:10:34:58 +0000] "POST /xmlrpc.php HTTP/1.0" 200 598 "-" "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)"
94.102.49.179 - - [26/Aug/2014:10:34:59 +0000] "POST /xmlrpc.php HTTP/1.0" 200 598 "-" "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)"
127.0.0.1 - - [26/Aug/2014:10:35:09 +0000] "OPTIONS * HTTP/1.0" 200 126 "-" "Apache/2.2.22 (Ubuntu) (internal dummy connection)"
94.102.49.179 - - [26/Aug/2014:10:34:59 +0000] "POST /xmlrpc.php HTTP/1.0" 200 598 "-" "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)"
94.102.49.179 - - [26/Aug/2014:10:34:59 +0000] "POST /xmlrpc.php HTTP/1.0" 200 598 "-" "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)"
94.102.49.179 - - [26/Aug/2014:10:35:00 +0000] "POST /xmlrpc.php HTTP/1.0" 200 598 "-" "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)"
94.102.49.179 - - [26/Aug/2014:10:34:59 +0000] "POST /xmlrpc.php HTTP/1.0" 200 598 "-" "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)"

Was difficult to determine if the Server was receiving SQL injections so I wanted to be sure.

Note: The connection from 127.0.0.1 with OPTIONS is created by Apache when spawns another Apache.

As I had super fresh backups in another Server I was not afraid of the attack dropping the database.

I was a bit suspicious also because the /readme.html file mentioned that the version of WordPress is 3.6. In other installations it tells correctly that the version is the 3.9.2 and this file is updated with the auto-update. I was thinking about a possible very sophisticated trojan attack able to modify wp-includes/version.php and set fake $wp_version = ‘3.9.2’;
Later I realized that this blog had WordPress in Catalan, my native language, and discovered that the guys that do the translations forgot to update this file (in new installations it comes not updated, and so showing 3.6). I have alerted them.

In fact later I did a diff of all the files of my WordPress installation against the official WordPress 3.9.2-ca and later a did a diff between the WordPress 3.9.2-ca and the WordPress 3.9.2 (English – default), and found no differences. My Server was Ok. But at this point, at the beginning of the investigation I didn’t know that yet.

With the info I had (queries, times, attack, readme telling v. 3.6…) I balanced the possibility to be in front of something and I decided that I had an unique opportunity to discover how they do to inject those Sql, or discover if my Server was compromised and how. The bad point is that it was the same Amazon’s Server where this blog resides, and I wanted the attack to continue so I could get more information, so during two days I was recording logs and doing some investigations, so sorry if you visited my blog and database was down, or the Server was going extremely slow. I needed that info. It was worth it.

First I changed the Apache config so the massive connections impacted a bit less the Server and so I could work on it while the attack was going on.

I informed my group of Senior friends on what’s going on and two SysAdmins gave me some good suggestions on other logs to watch and on how to stop the attack, and later a Developer joined me to look at the logs and pointed possible solutions to stop the attack. But basically all of them suggested on how to block the incoming connections with iptables and to do things like reinstalling WordPress, disabling xmlrpc.php in .htaccess, changing passwords or moving wp-admin/ to another place, but the point is that I wanted to understand exactly what was going on and how.

I checked the logs, certificates, etc… and no one other than me was accessing the Server. I also double-checked the Amazon’s Firewall to be sure that no unnecessary ports were left open. Everything was Ok.

I took a look at the Apache logs for the site and all the attacks were coming from the same Ip:

94.102.49.179

It is an Ip from a dedicated Servers company called ecatel.net. I reported them the abuse to the abuse address indicated in the ripe.net database for the range.

I found that many people have complains about this provider and reports of them ignoring the requests to stop the spam use from their servers, so I decided that after my tests I will block their entire network from being able to access my sites.

All the requests shown in the access.log pointed to requests to /xmlrpc.php. It was the only path requested by the attacker so that Ip did nothing more apparently.

I added some logging to WordPress xmlrpc.php file:

if ($_SERVER['REMOTE_ADDR'] == '94.102.49.179') {
    error_log('XML POST: '.serialize($_POST));
    error_log('XML GET: '.serialize($_GET));
    error_log('XML REQUEST: '.serialize($_REQUEST));
    error_log('XML SERVER: '.serialize($_SERVER));
    error_log('XML FILES: '.serialize($_FILES));
    error_log('XML ENV: '.serialize($_ENV));
    error_log('XML RAW: '.$HTTP_RAW_POST_DATA);
    error_log('XML ALL_HEADERS: '.serialize(getallheaders()));
}

This was the result, it is always the same:

[Fri Aug 29 19:02:54 2014] [error] [client 94.102.49.179] XML POST: a:0:{}
[Fri Aug 29 19:02:54 2014] [error] [client 94.102.49.179] XML GET: a:0:{}
[Fri Aug 29 19:02:54 2014] [error] [client 94.102.49.179] XML REQUEST: a:0:{}
[Fri Aug 29 19:02:54 2014] [error] [client 94.102.49.179] XML SERVER: a:24:{s:9:"HTTP_HOST";s:24:"barcelona.afterstart.com";s:12:"CONTENT_TYPE";s:8:"text/xml";s:14:"CONTENT_LENGTH";s:3:"287";s:15:"HTTP_USER_AGENT";s:50:"Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)";s:15:"HTTP_CONNECTION";s:5:"close";s:4:"PATH";s:28:"/usr/local/bin:/usr/bin:/bin";s:16:"SERVER_SIGNATURE";s:85:"<address>Apache/2.2.22 (Ubuntu) Server at barcelona.afterstart.com Port 80</address>\n";s:15:"SERVER_SOFTWARE";s:22:"Apache/2.2.22 (Ubuntu)";s:11:"SERVER_NAME";s:24:"barcelona.afterstart.com";s:11:"SERVER_ADDR";s:14:"[this-is-removed]";s:11:"SERVER_PORT";s:2:"80";s:11:"REMOTE_ADDR";s:13:"94.102.49.179";s:13:"DOCUMENT_ROOT";s:29:"/var/www/barcelona.afterstart.com";s:12:"SERVER_ADMIN";s:19:"webmaster@localhost";s:15:"SCRIPT_FILENAME";s:40:"/var/www/barcelona.afterstart.com/xmlrpc.php";s:11:"REMOTE_PORT";s:5:"40225";s:17:"GATEWAY_INTERFACE";s:7:"CGI/1.1";s:15:"SERVER_PROTOCOL";s:8:"HTTP/1.0";s:14:"REQUEST_METHOD";s:4:"POST";s:12:"QUERY_STRING";s:0:"";s:11:"REQUEST_URI";s:11:"/xmlrpc.php";s:11:"SCRIPT_NAME";s:11:"/xmlrpc.php";s:8:"PHP_SELF";s:11:"/xmlrpc.php";s:12:"REQUEST_TIME";i:1409338974;}
[Fri Aug 29 19:02:54 2014] [error] [client 94.102.49.179] XML FILES: a:0:{}
[Fri Aug 29 19:02:54 2014] [error] [client 94.102.49.179] XML ENV: a:0:{}
[Fri Aug 29 19:02:54 2014] [error] [client 94.102.49.179] XML RAW: <?xmlversion="1.0"?><methodCall><methodName>pingback.ping</methodName><params><param><value><string>http://seretil.me/</string></value></param><param><value><string>http://barcelona.afterstart.com/2013/09/27/afterstart-barcelona-2013-09-26/</string></value></param></params></methodCall>
[Fri Aug 29 19:02:54 2014] [error] [client 94.102.49.179] XML ALL_HEADERS: a:5:{s:4:"Host";s:24:"barcelona.afterstart.com";s:12:"Content-type";s:8:"text/xml";s:14:"Content-length";s:3:"287";s:10:"User-agent";s:50:"Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)";s:10:"Connection";s:5:"close";}

So nothing in $_POST, nothing in $_GET, nothing in $_REQUEST, nothing in $_SERVER, no files submitted, but a text/xml Posted (that was logged by storing: $HTTP_RAW_POST_DATA):

<?xmlversion="1.0"?><methodCall><methodName>pingback.ping</methodName><params><param><value><string>http://seretil.me/</string></value></param><param><value><string>http://barcelona.afterstart.com/2013/09/27/afterstart-barcelona-2013-09-26/</string></value></param></params></methodCall>

I show you in a nicer formatted aspect:blog-carlesmateo-com-xml-xmlrpc-requestSo basically they were trying to register a link to seretil dot me.

I tried and this page, hosted in CloudFare, is not working.

accessing-seretil-withoud-id

The problem is that responding to this spam xmlrpc request took around 16 seconds to the Server. And I was receiving several each second.

I granted access to my Ip only on the port 80 in the Firewall, restarted Apache, restarted MySql and submitted the same malicious request to the Server, and it even took 16 seconds in all my tests:

cat http_post.txt | nc barcelona.afterstart.com 80

blog-carlesmateo-com-response-from-the-server-to-xmlrpc-attackI checked and confirmed that the logs from the attacker were showing the same Content-Length and http code.

Other guys tried xml request as well but did one time or two and leaved.

The problem was that this robot was, and still sending many requests per second for days.

May be the idea was to knock down my Server, but I doubted it as the address selected is the blog of one Social Event for Senior Internet Talents that I organize: afterstart.com. It has not special interest, I do not see a political, hateful or other motivation to attack the blog from this project.

Ok, at this point it was clear that the Ip address was a robot, probably running from an infected or hacked Server, and was trying to publish a Spam link to a site (that was down). I had to clarify those strange queries in the logs.

I reviewed the WPUsersOnline plugin and I saw that the strange queries (and inefficient) that I saw belonged to WPUsersOnline plugin.

blog-carlesmateo-com-grep-r-delete-from-wp-useronline-2014-08-30-21-11-21-cut

The thing was that when I renamed the xmlrpc.php the spamrobot was still posting to that file. According to WordPress .htaccess file any file that is not found on the filesystem is redirected to index.php.

So what was happening is that all the massive requests sent to xmlrpc.php were being attended by index.php, then showing an error message that page not found, but the WPUsersOnline plugin was deleting those connections. And was doing it many times, overloading also the Database.

Also I was able to reproduce the behaviour by myself, isolating by firewalling the WebServer from other Ips other than mine and doing the same post by myself many times per second.

I checked against a friend’s blog but in his Server xmlrpc.php responds in 1,5 seconds. My friend’s Server is a Digital Ocean Virtual Server with 2 cores and SSD Disks. My magnetic disks on Amazon only bring around 40 MB/second. I’ve to check in detail why my friend’s Server responds so much faster.

Checked the integrity of my databases, just in case, and were perfect. Nothing estrange with collations and the only errors in the /var/log/mysql/error.log was due to MySql crashing when the Server ran out of memory.

Rechecked in my Server, now it takes 12 seconds.

I disabled 80% of the plugins but the times were the same. The Statistics show how the things changed -see the spikes before I definitively patched the Server to block request from that Spam-robot ip, to the left-.

I checked against another WordPress that I have in the same Server and it only takes 1,5 seconds to reply. So I decided to continue investigating why this WordPress took so long to reply.

blog-carlesmateo-com-statistics-use-last-24-hours

As I said before I checked that the files from my WordPress installation were the same as the original distribution, and they were. Having discarded different files the thing had to be in the database.

Even when I checked the MySql it told me that all the tables were OK, having seen that the WPUserOnline deletes all the registers older than 5 minutes, I guessed that this could lead to fragmentation, so I decided to do OPTIMIZE TABLE on all the tables of the database for the WordPress failing, with InnoDb it is basically recreating the Tables and the Indexes.

I tried then the call via RPC and my Server replied in three seconds. Much better.

Looking with htop, when I call the xmlrpc.php the CPU uses between 50% and 100%.

I checked the logs and the robot was gone. He leaved or the provider finally blocked the Server. I don’t know.

Everything became clear, it was nothing more than a sort of coincidences together. Deactivating the plugin the DELETE queries disappeared, even under heavy load of the Server.

It only was remain to clarify why when I send a call to xmlrpc to this blog, it replies in 1,5 seconds, and when I request to the Barcelona.afterstart.com it takes 3 seconds.

I activated the log of queries in mysql. To do that edit /etc/mysql/my.cnf and uncomment:

general_log_file        = /var/log/mysql/mysql.log
general_log             = 1

Then I checked the queries, and in the case of my blog it performs many less queries, as I was requesting to pingback to an url that was not existing, and WordPress does this query:

SELECT   wp_posts.* FROM wp_posts  WHERE 1=1  AND ( ( YEAR( post_date ) = 2013 AND MONTH( post_date ) = 9 AND DAYOFMONTH( post_date ) = 27 ) ) AND wp_posts.post_name = 'afterstart-barcelona-2013-09-26-meet' AND wp_posts.post_type = 'post'  ORDER BY wp_posts.post_date DESC

As the url afterstart-barcelona-2013-09-26-meet with the dates indicated does not exist in my other blog, the execution ends there and does not perform the rest of the queries, that in the case of Afterstart blog were:

40 Query     SELECT post_id, meta_key, meta_value FROM wp_postmeta WHERE post_id IN (81) ORDER BY meta_id ASC
40 Query     SELECT ID, post_name, post_parent, post_type
FROM wp_posts
WHERE post_name IN ('http%3a','','seretil-me')
AND post_type IN ('page','attachment')
40 Query     SELECT   wp_posts.* FROM wp_posts  WHERE 1=1  AND (wp_posts.ID = '0') AND wp_posts.post_type = 'page'  ORDER BY wp_posts.post_date DESC
40 Query     SELECT * FROM wp_comments WHERE comment_post_ID = 81 AND comment_author_url = 'http://seretil.me/'

To confirm my theory I tried the request to my blog, with a valid url, and it lasted for 3-4 seconds, the same than Afterstart’s blog. Finally I double-checked with the blog of my friend and was slower than before. I got between 1,5 and 6 seconds, with a lot of 2 seconds response. (he has PHP 5.5 and OpCache that improves a bit, but the problem is in the queries to the database)

Honestly, the guys creating WordPress should cache this queries instead of performing 20 live queries, that are always the same, before returning the error message. Using Cache Lite or Stash, or creating an InMemory table for using as Cache, or of course allowing the use of Memcached would eradicate the DoS component of this kind of attacks. As the xmlrpc pingback feature hits the database with a lot of queries to end not allowing the publishing.

While I was finishing those tests (remember that the attacker ip has gone) another attacker from the same network tried, but I had patched the Server to ignore it:

94.102.52.157 - - [31/Aug/2014:02:06:16 +0000] "POST /xmlrpc.php HTTP/1.0" 200 189 "-" "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)"

This was trying to get a link published to a domain called socksland dot net that is a domain registered in Russia and which page is not working.

As I had all the information I wanted I finally blocked the network from the provider to access my Server ever again.

Unfortunatelly Amazon’s Firewall does not allow to block a certain Ip or range.
So you can block at Iptables level or in .htaccess file or in the code.
I do not recommend blocking at code level because sadly WordPress has many files accessible from outside so you would have to add your code at the beginning of all the files and because when there is a WordPress version update you’ll loss all your customizations.
But I recommend proceeding to patch your code to avoid certain Ip’s if you use a CDN. As the POST will be sent directly to your Server, and the Ip’s are Ip’s from the CDN -and you can’t block them-. You have to look at the Header: X-Forwarded-For that indicates the Ip’s the proxies have passed by, and also the Client’s Ip.

I designed a program that is able to patch any PHP project to check for blacklisted Ip’s (even though a proxy) with minimal performance impact. It works with WordPress, drupal, joomla, ezpublish and Framework like Zend, Symfony, Catalonia… and I patched my code to block those unwanted robot’s requests.

A solution that will work for you probably is to disable the pingback functionality, there are several plugins that do that. Disabling completely xmlrpc is not recommended as WordPress uses it for several things (JetPack, mobile, validation…)

The same effect as adding the plugin that disables the xmlrpc pingback can be achieved by editing the functions.php from your Theme and adding:

add_filter( 'xmlrpc_methods', 'remove_xmlrpc_pingback_ping' );
function remove_xmlrpc_pingback_ping( $methods ) {
    unset( $methods['pingback.ping'] );
    
    return $methods;
}

Update: 2016-02-24 14:40 CEST
I got also a heavy dictionary attack against wp-login.php .Despite having a Captcha plugin, that makes it hard to hack, it was generating some load on the system.
What I did was to rename the wp-login.php to another name, like wp-login-carles.php and in wp-login.php having a simply exit();

<?php
exit();

Take in count that this will work only until WordPress is updated to the next version. Then you have to reapply the renaming trick.

The Cloud is for Scaling

dell-blades-m4110The Cloud is for Startups, and for Scaling. Nothing more.

In the future will be used by phone operators, to re-dimension their infrastructure and bandwidth in real time according to demand, but nowadays the Cloud is for Startups.

Examine the prices in my post in cmips, take a look, examine the performance also of the different CPU. You see that according to CMIPS v.1.03 a Desktop Processor Intel i7-4770S, worth USD $300, performs better than an Amazon M2 High Memory Quadruple Extra Large and than a Rackspace First gen. 30 GB RAM 8 Cores?.

Today the public cost of an Amazon M2 High Memory Quadruple Extra Large running for a month is USD $1,180.80 so USD $1.64 per hour and the Rackspace First Generation 30 GB RAM 8 Cores 1200 GB of disk costs is USD $1,425.60 so USD $1.98 per hour running.

And that’s the key, the cost per hour.

Because the greatness, the majesty of the Cloud is that you pay per hour, you pay as you need, or as you go. No attaching contracts. All on demand.

I had my company at a time where the hosting companies and the Data Centers were forcing customers to sign yearly contracts. What if a company only needs to host their Servers for three months? What if they have to close?. No options. You take it or you leave it.

Even renting a dedicated hosting was for at least a month or more, and what if the latency was not good? What if the bandwidth of the provider was not enough?.

Amazon irrupted in the market with strength. I really like that company because they grew the best eCommerce company for buying books, they did a system that really worked, and was able to recommend very useful computer books, and the delivery, logistics was so good, also post-sales service. They simply started to rent the same infrastructure they were using to attend their millions of customers and was a total success.

And for a while few people knew about Amazon deep technologies and functionalities, but later became a fashion.

Now people is using Amazon or whatever provider/Service that contains the word “Cloud” because the Cloud is in the mouth of everyone. Magazines and newspapers speak about the Cloud, so many many companies use it simply because everyone is talking about the Cloud. And those ISP that didn’t had a Cloud have invested heavily to create a Cloud, just because they didn’t want to be the ones without a Cloud, since everyone was asking for it and all the ISP companies were offering their “Clouds”.

Every company claims to have “Cloud” where the only many of them have is Vmware servers, Xen servers, Open Stack… running the tenants or instances of the customers always on the same host servers. No real Cloud, professional Cloud, abstract layered in a Professional way like Amazon, only the traditional “shared hosting” with another name, sharing CPU and RAM and Disk storage using virtual machines called instances.

So, Cloud fashion has become a confusing craziness where no one knows why they are in the Cloud but they believe they have to be in.

But do companies need the Cloud?. Cloud instances?

It depends. The best would be to ask that companies Why you choose the Cloud?.

If you compare the cost of having an instance in the Cloud, is much much more expensive than having a dedicated server. And for that high cost you don’t get more performance.

Virtualization is always slower and disk speed is always an issue in Cloud providers, where all the data travels via network from the disk cabins NAS to the Host servers running the guest instances. Data cannot be at local disks, since every time you start an instance, the resources like CPU and RAM are provisioned, and your instance run in totally different hardware. Only your data remain in the NAS (Network Attached Storage).

So unless you run your in-the-Cloud instance in a special provider that offers local disks, like DigitalOcean that offers SSD but monthly paying, (and so you pay the price by losing the hardware abstraction capability because you’re attached to the CPU that has the disk connected, and also you loss the flexibility of paying per hour of use, as you go), then you’ll face a bottleneck that is the hard disk performance (that for real takes all the data from NAS, where is stored, through the local network).

So what are the motivations to use the Cloud?. I try to put some examples, out of these it has no much sense, I think. You can send me your happy-in-Cloud scenarios if you found other good uses.

Example A) Saving initial costs, avoid contract attachment and grow easily own-made

Imagine a Developer that start its own project. May be it works, may be not, but instead of having a monthly contract for a dedicated server, he starts with an Amazon Free Tier (better not, use Small instance at least) and runs a web. If it does not work, simply stop the instance and pay no more. If the project works and has more and more users he can re-dimension the server with a click. Just stop the instance, change the type of instance, start it again with more RAM and more CPU power. Fast.

Hiring a dedicated server implies at least monthly contracts, average of USD $100 per month, and is not easy to move to a bigger server, not fast and is expensive as it requires the ISP tech guys to move the data, to migrate from a Server to another.

Also the available bandwidth is to be taken in consideration. Bandwidth is expensive and Amazon can offer 150 Mbit to smaller machines. Not all the Internet Service Providers can offer that bandwidth even with most advanced packets.

If the project still grows, with a click, in seconds, 20 instances with a lot of bandwidth can be deployed and serving traffic to your customers very quick.

You save the init costs of buying Servers, and the time to deal with hardware, bandwidth limitations and avoid contracts, but you pay an hourly rate a lot more expensive. So in the long run is much much expensive using Amazon and less powerful than having dedicated servers. That happened to Zynga, that was paying $63M annually to Amazon and decided to step back from Amazon to their own Data Centers again. (another fortune tech link)

The limited CPU power was also a deal breaker for many companies that needed really powerful CPU and gigs of RAM for their Database Servers. Now this situation is much better with the introduction of the new Servers.

This developer can benefit from doing bacups with a click, cloning, starting instances from an image, having more static ip’s with a click, deploying built-in (from the Cloud provider) load balancers, using monitoring services like CloudWatch, creating Volumes and attaching to the servers for additional space…

Example B) An Startup with fluctuating number of users and hopes of growing

Imagine an Startup with a wonderful Facebook Application.

During 80% of the day has few visits, may be only need 3 Servers, but during 20% of the hours of the day from 10:00 to 15:00 users connect like hell, so they need 20 servers to attend this traffic and workload, and may be tomorrow needs 30 servers.

With the Cloud they pay for 3 servers 24 hours per day and for the other 17 servers only pay the hours they are on, that’s 5 hours per day. Doing that they save money and they have an unlimited * amount of power. (* There are limits for real, you have to specially request authorisation to run more than default max. servers for the zone, that is normally 20 instances for Amazon. Also it can happen theoretically that when you request new instances the Zone has no instances available).

So well, for an Startup growing, avoiding hiring 20 dedicated servers and instead running into the Cloud as many as they need, for just the time they need, Auto-Scaling up and down, and can use the servers NOW and pay the next month with Visa card, all of that can make a difference for a growing Startup.

If the servers chosen are not powerful enough that is solved with a click, changing instance type. So fast. A minute.

It’s only a matter of money.

Example C) e-Learning companies and online universities

e-Learning platforms also get benefits from the Auto-Scaling for the full occupation hours.

The built-in functionalities of the Cloud to clone instances is very useful to deploy new web servers, or new environments for students doing practices, in the case of teaching Information Technology subjects, where the users need to practice against a real server (Linux or windows).

Those servers can be created and destroyed, cloned from the main -ready to go- template. And also servers can be scheduled to stop at a certain hour and to start also, so saving the money from the hours not needed.

Example D) Digital agencies, sports and other events

When there is an Special event, like motorcycle running, when a Football Team scores, when there is an spot in tv announcing a product…

At those moments the traffic to the site can multiply, so more servers and more bandwidth have to be deployed instantly. That cannot be done with physical servers, hardware, but is very easy to provision instances from the Cloud.

Mass mailing email campaigns can also benefit from creating new Servers when needed.

Example E) Proximity and SEO

Cloud providers have Data Centres everywhere. If you want to have servers in Asia, or static content to be deployed faster, or in South-America, or in Europe… the Cloud providers have plenty of Data Centers all over the world.

Example F) Game aficionado and friends sharing contents

People that loves cooperative games can find the needed hungry bandwidth and at a moderate price. If they run their private server few hours, at night, from 22:00 to 01:00 as example, they will benefit from a great bandwidth from the big Cloud provider and pay only 3 hours per day (the exceed of traffic uses to be paid in most providers, but price of additional GB uses to be really really competitive).

Friends sharing contents in an Ftp also, can benefit from this Cloud servers, but probably they will find more easy to use services like Dropbox.

Example G) Startup serving contents

An Startup serving videos, images, or books, can benefit not only from the great bandwidth of big Cloud providers (this has been covered before), but for a very cheap price for exceeding Gigabyte transferred.

Local ISP can’t offer 150 Mbit for an instance of USD $20 and USD $0.12 per additional GB transferred.

Many Cloud providers also allow unlimited incoming traffic from the Internet, and from Server to Server through private ip’s.

Other cases

For other cases Dedicated Servers are much more Powerful, faster and cheaper, at the price of being “static” in the sense of attached, not layer abstracted, but all the aspects of your Project have to be taken in count before deciding stepping into or out of the Cloud.

In general terms I would say that the Cloud is for Scaling.