# Post-Mortem: The mystery of the duplicated Transactions into an e-Commerce

Me, with 4 more Senior BackEnd Engineers wrote the new e-Commerce for a multinational.

The old legacy Software evolved into a different code for every country, making it impossible to be maintained.

The new Software we created used inheritance to use the same base code for each country and overloaded only the specific different behavior of every country, like for the payment methods, for example Brazil supporting “parcelados” or Germany with specific payment players.

We rewrote the old procedural PHP BackEnd into modern PHP, with OOP and our own Framework but we had to keep the transactional code in existing MySQL Procedures, so the logic was split. There was a Front End Team consuming our JSONs. Basically all the Front End code was cached in Akamai and pages were rendered accordingly to the JSONs served from out BackEnd.

It was a huge success.

This e-Commerce site had Campaigns that started at a certain time, so the amount of traffic that would come at the same time would be challenging.

The project was working very well, and after some time the original Team was split into different projects in the company and a Team for maintenance and evolutives was hired.

At certain point they started to encounter duplicate transactions, and nobody was able to solve the mystery.

I’m specialized into fixing impossible problems. They used to send me to Impossible Missions, and I am famous for solving impossible problems easily.

So I started the task with a SRE approach.

The System had many components and layers. The problem could be in many places.

I had in my arsenal of tools, Software like mysqldebugger with which I found an unnoticed bug in decimals calculation in the past surprising everybody.

Previous Engineers involved believed the problem was in the Database side. They were having difficulties to identify the issue by the random nature of the repetitions.

Some times the order lines were duplicated, and other times were the payments, which means charging twice to the customer.

Redis Cluster could also play a part on this, as storing the session information and the basket.

If transactions from customer were duplicated that mean that in first term those requests have arrived to the System. So that was a good point of start.

With a list of duplicated operations, I checked the Webservers logs.

That was a bit tricky as the Webserver was recording the Ip of the Load Balancer, not the ip of the customer. But we were tracking the sessionid so with that I could track and user request history. A good thing was also that we were using cookies to stick the user to the same Webserver node. That has pros and cons, but in this case I didn’t have to worry about the logs combined of all the Webservers, I could just identify a transaction in one node, and stick into that node’s log.

I was working with SSH and Bash, no log aggregators existing today were available at that time.

So when I started to catch web logs and grep a bit an smile was drawn into my face. :)

There were no transactions repeated by a bad behavior on MySQL Masters, or by BackEnd problems. Actually the HTTP requests were performed twice.

And the explanation to that was much more simple.

Many Windows and Mac User are used to double click in the Desktop to open programs, so when they started to use Internet, they did the same. They double clicked on the Submit button on the forms. Causing two JavaScript requests in parallel.

When I explained it they were really surprised, but then they started to worry about how they could fix that.

Well, there are many ways, like using an UUID in each request and do not accepting two concurrents, but I came with something that we could deploy super fast.

I explained how to change the JavaScript code so the buttons will have no default submit action, and they will trigger a JavaScript method instead, that will set a boolean to True, and also would disable the button so it can not be clicked anymore. Only if the variable was False the submit would be performed. It was almost impossible to get a double click as the JavaScript was so fast disabling the button, that the second click will not trigger anything. But even if that could be possible, only one request would be made, as the variable was set to True on the first click event.

That case was very funny for me, because it was not necessary to go crazy inspecting the different layers of the system. The problem was detected simply with HTTP logs. :)

People often forget to follow the logic steps while many problems are much more simple.

As a curious note, I still see people double clicking on links and buttons on the Web, and some Software not handling it. :)

# News from the blog 2020-10-16

• I’ve been testing and adding more instances to CMIPS. I’m planning on testing the Azure instance with 120 cores.
• News: Microsoft makes an option to permanently remote work

• One of my colleagues showed me dstat, a very nice tool for system monitoring, and bandwidth of a drive monitoring. Also ifstat, as complement to iftop is very cool for Network too. This functionality is also available in CTOP.py
• As I shared in the past news of the blog, I’m resuming my contributions to ZFS Community.

Long time ago I created some ZFS tools that I want to share soon as Open Source.

I equipped myself with the proper Hardware to test on SAS and SATA:

• 12G Internal PCI-E SAS/SATA HBA RAID Controller Card, Broadcom’s SAS 3008, compatible for SAS 9300-8I.
This is just an HDA (Host Data Adapter), it doesn’t support RAID. Only connects up to 8 drives or 1024 through expander, to my computer.
It has a bandwidth of 9,600 MB/s which guarantees me that I’ll be able to add 12 SAS SSD Enterprise grade at almost the max speed of the drives. Those drives perform at 900 MB/s so if I’m using all of them at the same time, like if I have a pool of 8 + 3 and I rebuild a broken drive or I just push Data, I would be using 12×900 = 10,800 MB/s. Close. Fair enough.
• VANDESAIL Mini-SAS Cables, 1m Internal Mini-SAS to 4x SAS SATA Forward Breakout Cable Hard Drive Data Transfer Cable (SAS Cable).
• SilverStone SST-FS212B – Aluminium Trayless Hot Swap Mobile Rack Backplane / Internal Hard Drive Enclosure for 12x 2.5 Inch SAS/SATA HDD or SSD, fit in any 3x 5.25 Inch Drive Bay, with Fan and Lock, black
• Terminator is here.
I ordered this T-800 head a while ago and finally arrived.

Finally I will have my empty USB keys located and protected. ;)

Remember to be always nice to robots. :)

# Refreshing settings in a Docker immutable image with Python and Flask

This is a trick to restart a Service that is running on a immutable Docker, with some change, and you need to refresh the values very quickly without having to roll the CI/CD Jenkins Pipeline and uploading a new image.

So why would you need to do that?.

I can think about possible scenarios like:

• Need to roll out an urgent fix in a time critical manner
• Jenkins is broken
• Somebody screw it on the git master branch
• Docker Hub is down
• GitHub is down
• The lines between your jumpbox or workstation and the secure Server are down and you have really few bandwidth
• You have to fix something critical and you only have a phone with you and SSH only
• Maybe the Dockerfile had latest, and the latest image has changed
FROM os:latest

The ideal is that if you work with immutable images, you roll out a new immutable image and that’s it.

But if for whatever reason you need to update this super fast, this trick may become really handy.

Let’s go for it!.

Normally you’ll start your container with a command similar to this:

docker run -d --rm -p 5000:5000 api_carlesmateo_com:v7 prod 

The first thing we have to do is to stop the container.

So:

docker ps

Locate your container across the list of running containers and stop it, and then restart without the –rm:

docker stop container_name
docker run -d -p 5000:5000 api_carlesmateo_com:v7 prod

the –rm makes the container to cleanup. By default a container’s file system persists even after the container exits. So don’t start it with –rm.

Ok, so login to the container:

docker exec -it container_name /bin/sh 

Edit the config you require to change, for example config.yml

If what you have to update is a password, and is encoded in base64, encode it:

echo -n "ThePassword" | base64
VGhlUGFzc3dvcmQ=

Stop the container. You can do it by stopping the container with docker stop or from inside the container, killing the listening process, probably a Python Flask.

If your Dockerfile ends with something like:

ENTRYPOINT ["./webservice.py"]

And webservice.py has Python Flask code similar to this:

#!/usr/bin/python3
#
# webservice.py
#
# Author: Carles Mateo
# Creation Date: 2020-05-10 20:50 GMT+1
# Description: A simple Flask Web Application
#              Part of the samples of https://leanpub.com/pythoncombatguide
#              More source code for the book at https://gitlab.com/carles.mateo/python_combat_guide
#

import logging

# Sample route so http://127.0.0.1/carles
@app.route('/carles', methods=['GET'])
def carles():
logging.critical("A connection was established")
return "200"

logging.info("Initialized...")

if __name__ == "__main__":
app.run(host='0.0.0.0', port=5000, debug=True)


Then you can kill the process, and so ending the container, from inside the container by doing:

ps -ax | grep webservice
5750 root     56:31 {webservice.py} /usr/bin/python /opt/webservice/webservice.py
kill -9 5790

This will finish the container the same way as docker stop container_name.

Then start the container (not run)

docker start container_name

You can now test from outside or from inside the container. If from insise:

/opt/webservice # wget localhost:5000/carles
Connecting to localhost:5000 (127.0.0.1:5000)
carles               100% |**************************************************************************************************************|     3  0:00:00 ETA
/opt/webservice # cat debug.log
2020-05-06 20:46:24,349 Initialized...
2020-05-06 20:46:24,359  * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
2020-05-06 20:46:24,360  * Restarting with stat
2020-05-06 20:46:24,764 Initialized...
2020-05-06 20:46:24,771  * Debugger is active!
2020-05-06 20:46:24,772  * Debugger PIN: 123-456-789
2020-05-07 13:18:43,890 127.0.0.1 - - [07/May/2020 13:18:43] "GET /carles HTTP/1.1" 200 -


if you don’t use YAML files or what you need is to change the code, all this can be avoided as when you update the Python code, Flash realizes that and reloads. See this line in the logs:

2020-05-07 13:18:40,431  * Detected change in '/opt/webservice/wwebservice.py', reloading

You can also start a container with shell directly:

sudo docker run -it ctop /bin/bash

# Making responsive WordPress Theme Twenty Twelve to support greater resolutions

This is the first article I write about FrontEnd in here, as this is very casual and trivial, and I wanted to specialize the blog in Extreme IT, going deep into knowledge and difficult questions. And in any case, more for BackEnd, Engineering, and Hardware and Operations.

But as it is something useful and myself didn’t found an answer when I googled it, I think is no bad to share it here. Nevertheless I’ll not make it appear in the front page to be loyal to my essence.

So I like Twenty Twelve WP Theme. It’s clear, that’s what I expect from a blog from an Engineer: easy to read. Maybe is to Spartan, but that’s grant.

The instructions to do like me:

1. Make a copy of your original Twenty Twelve Theme in another directory, at the same level
2. Edit the file /var/www/blog.carlesmateo.com/wp-content/themes/2021-blog-carlesmateo-com/style.css
3. Add a new section like this

So I defined a new @media screen with min-width of 1800px.

Why 1800px and not 1920px like Full Hd?. Because Ubuntu use some width for the lateral bar.

Then over body .site section I set a max-width: 1800px that will do the trick for some browsers, and the rem value that will do the trick for Chrome.

Now the main section of the block can be correctly displayed using most of the space available.

# Resources for Microservices and Business Domain Solutions for the Cloud Architect / Microservices Architect

First you have to understand that Python, Java and PHP are worlds completely different.

In Python you’ll probably use Flask, and listen to the port you want, inside Docker Container.

In PHP you’ll use a Frameworks like Laravel, or Symfony, or Catalonia Framework (my Framework) :) and a repo or many (as the idea is that the change in one microservice cannot break another it is recommended to have one git repo per Service) and split the requests with the API Gateway and Filters (so /billing/ goes to the right path in the right Server, is like rewriting URLs). You’ll rely in Software to split your microservices. Usually you’ll use Docker, but you have to add a Web Server and any other tools, as the source code is not packet with a Web Server and other Dependencies like it is in Java Spring Boot.

In Java you’ll use Spring Cloud and Spring Boot, and every Service will be auto-contained in its own JAR file, that includes Apache Tomcat and all other Dependencies and normally running inside a Docker. Tcp/Ip listening port will be set at start via command line, or through environment. You’ll have many git repositories, one per each Service.

Using many repos, one per Service, also allows to deploy only that repository and to have better security, with independent deployment tokens.

It is not unlikely that you’ll use one language for some of your Services and another for other, as well as a Database or another, as each Service is owner of their data.

In any case, you will be using CI/CD and your pipeline will be something like this:

1. Pull the latest code for the Service from the git repository
2. Compile the code (if needed)
3. Run the Unit and Integration Tests
4. Compile the service to an executable artifact (f.e. Java JAR with Tomcat server and other dependencies)
5. Generate a Machine image with your JAR deployed (for Java. Look at Spotify Docker Plugin to Docker build from Maven), or with Apache, PHP, other dependencies, and the code. Normally will be a Docker image. This image will be immutable. You will probably use Dockerhub.
6. Machine image will be started. Platform test are run.
7. If platform tests pass, the service is promoted to the next environment (for example Dev -> Test -> PreProd -> Prod), the exact same machine is started in the next environment and platform tests are repeated.
8. Before deploying to Production the new Service, I recommend running special Application Tests / Behavior-driven. By this I mean, to conduct tests that really test the functionality of everything, using a real browser and emulating the acts of a user (for example with BeHat, Cucumber or with JMeter).
I recommend this specially because Microservices are end-points, independent of the implementation, but normally they are API that serve to a whole application. In an Application there are several components, often a change in the Front End can break the application. Imagine a change in Javascript Front End, that results in a call a bit different, for example, with an space before a name. Imagine that the Unit Tests for the Service do not test that, and that was not causing a problem in the old version of the Service and so it will crash when the new Service is deployed. Or another example, imagine that our Service for paying with Visa cards generates IDs for the Payment Gateway, and as a result of the new implementation the IDs generated are returned. With the mocked objects everything works, but when we deploy for real is when we are going to use the actual Bank Payment. This is also why is a good idea to have a PreProduction environment, with PreProduction versions of the actual Services we use (all banks or the GDS for flights/hotel reservation like Galileo or Amadeus have a Test, exactly like Production, Gateway)

If you work with Microsoft .NET, you’ll probably use Azure DevOps.

We IT Engineers, CTOs and Architects, serve the Business. We have to develop the most flexible approaches and enabling the business to release as fast as their need.

Take in count that Microservices is a tool, a pattern. We will use it to bring more flexibility and speed developing, resilience of the services, and speed and independence deploying. However this comes at a cost of complexity.

Microservices is more related to giving flexibility to the Business, and developing according to the Business Domains. Normally oriented to suite an API. If you have an API that is consumed by third party you will have things like independence of Services (if one is down the others will still function), gradual degradation, being able to scale the Services that have more load only, being able to deploy a new version of a Service which is independent of the rest of the Services, etc… the complexity in the technical solution comes from all this resilience, and flexibility.

If your Dev Team is up to 10 Developers or you are writing just a CRUD Web Application, a PoC, or you are an Startup with a critical Time to Market you probably you will not want to use Microservices approach. Is like killing flies with laser cannons. You can use typical Web services approach, do everything in one single Https request, have transactions, a single Database, etc…

But if your team is 100 Developer, like a big eCommerce, you’ll have multiple Teams between 5 and 10 Developers per Business Domain, and you need independence of each Service, having less interdependence. Each Service will own their own Data. That is normally around 5 to 7 tables. Each Service will serve a Business Domain. You’ll benefit from having different technologies for the different needs, however be careful to avoid having Teams with different knowledge that can have hardly rotation and difficult to continue projects when the only 2 or 3 Devs that know that technology leave. Typical benefit scenarios can be having MySql for the Billing Services, but having NoSQL Database for the image catalog, or to store logs of account activity. With Microservices, some services will be calling other Services, often asynchronously, using Queues or Streams, you’ll have Callbacks, Databases for reading, you’ll probably want to have gradual and gracefully failure of your applications, client load balancing, caches and read only databases/in-memory databases… This complexity is in order to protect one Service from the failure of others and to bring it the necessary speed under heavy load.

Here you can find a PDF Document of the typical resources I use for Microservice Projects.

https://github.com/carlesmateo/awesome-microservices

Do you use other solutions that are not listed?. Leave a message. I’ll investigate them and update the Document, to share with the Community.

Update 2020-03-06: I found this very nice article explaining the same. Microservices are not for everybody and not the default option: https://www.theregister.co.uk/AMP/2020/03/04/microservices_last_resort/

Update 2020-03-11: Qcom with 1,600 microservices says that microservices architecture is the las resort: https://www.theregister.co.uk/AMP/2020/03/09/monzo_microservices/

# Adding my Server as Docker, with PHP Catalonia Framework, explained

The previous day I explained how I migrated my old Server (Amazon Instance) to a more powerful model, with more recent OS, WebServer, etc…

This was interesting under the point of view of dealing with elastic Ip’s, Amazon AWS Volumes, etc… but was a process basically manual. I could have generated an immutable image to start from next time, but this is another discussion, specially because that Server Instance has different base Software, including a MySql Database.

This time I want to explain, step by step, how to conainerize my Server, so I can port to different platforms, and I can be independent on what the Server Operating System is. It will work always, as we defined the Operating System for the Docker Container.

So we start to use IaC (Infrastructure as Code).

So first you need to install docker.

So basically if your laptop is an Ubuntu 18.04 LTS you have to:

sudo apt install docker.io

### Start and Automate Docker

The Docker service needs to be setup to run at startup. To do so, type in each command followed by enter:

sudo systemctl start docker
sudo systemctl enable docker

### Create the Dockerfile

For doing this you can use any text editor, but as we are working with IaC why not use a Code Editor?.

You can use the versatile PyCharm, that has modules for understanding Docker and so you can use Control Version like git too.

This is the Dockerfile

FROM ubuntu:19.04

MAINTAINER Carles <carles@carlesmateo.com>

ARG DEBIAN_FRONTEND=noninteractive

#RUN echo "nameserver 8.8.8.8" > /etc/resolv.conf

RUN echo "Europe/Ireland" | tee /etc/timezone

# Note: You should install everything in a single line concatenated with
#       && and finalising with apt autoremove && apt clean
#       In order to use the less space possible, as every command is a layer
RUN apt-get update && apt-get install -y apache2 ntpdate libapache2-mod-php7.2 \
mysql-server php7.2-mysql php-dev libmcrypt-dev php-pear git && \
apt autoremove && apt clean

RUN a2enmod rewrite

RUN mkdir -p /www

# In order to activate Debug
# RUN sed -i "s/display_errors = Off/display_errors = On/" /etc/php/7.2/apache2/php.ini
# RUN sed -i "s/error_reporting = E_ALL & ~E_DEPRECATED & ~E_STRICT/error_reporting = E_ALL/" /etc/php/7.2/apache2/php.ini
# RUN sed -i "s/display_startup_errors = Off/display_startup_errors = On/" /etc/php/7.2/apache2/php.ini
# To Debug remember to change:
# config/{production.php|preproduction.php|devel.php|docker.php}
# in order to avoid Error Reporting being set to 0.

ENV PATH_CATALONIA_CACHE /www/www.cataloniaframework.com/cache/

ENV APACHE_RUN_USER  www-data
ENV APACHE_RUN_GROUP www-data
ENV APACHE_LOG_DIR   /var/log/apache2
ENV APACHE_PID_FILE  /var/run/apache2/apache2.pid
ENV APACHE_RUN_DIR   /var/run/apache2
ENV APACHE_LOCK_DIR  /var/lock/apache2
ENV APACHE_LOG_DIR   /var/log/apache2

RUN mkdir -p $APACHE_RUN_DIR RUN mkdir -p$APACHE_LOCK_DIR
RUN mkdir -p $APACHE_LOG_DIR # Remove the default Server RUN sed -i '/<Directory \/var\/www\/>/,/<\/Directory>/{/<\/Directory>/ s/.*/# var-www commented/; t; d}' /etc/apache2/apache2.conf RUN rm /etc/apache2/sites-enabled/000-default.conf COPY www.cataloniaframework.com.conf /etc/apache2/sites-available/ RUN chmod 777$PATH_CATALONIA_CACHE
RUN chmod 777 $PATH_CATALONIA_CACHE. RUN chown --recursive$APACHE_RUN_USER.$APACHE_RUN_GROUP$PATH_CATALONIA_CACHE

RUN ln -s /etc/apache2/sites-available/www.cataloniaframework.com.conf /etc/apache2/sites-enabled/

# Note: You should clone locally and COPY to the Docker Image
#       Also you should add the .git directory to your .dockerignore file
#       I made this way to show you and for simplicity, having everything
#       in a single file
RUN git clone https://github.com/cataloniaframework/cataloniaframework_v1_sample_website /www/www.cataloniaframework.com
RUN git checkout tags/v.1.16-web-1.0
# In order to change profile to Production
# RUN sed -i "s/define('ENVIRONMENT', DOCKER)/define('ENVIRONMENT', PRODUCTION)/" /var/www/www.cataloniaframework.com/config/general.php

# for debugging
#RUN apt-get install -y vim

RUN service apache2 restart

EXPOSE 80

CMD ["/usr/sbin/apache2", "-D", "FOREGROUND"]


### The www.cataloniaframework.com.conf file

As you saw in the Dockerfile you have the line:

COPY www.cataloniaframework.com.conf /etc/apache2/sites-available/

This will copy the file www.cataloniaframework.com.conf that must be in the same directory that the Dockerfile file, to the /etc/apache2/sites-available/ folder in the conainer.

<VirtualHost *:80>
    ServerAdmin webmaster@cataloniaframework.com
# Uncomment to use a DNS name in a multiple VirtualHost Environment
    #ServerName www.cataloniaframework.com
    #ServerAlias cataloniaframework.com
    DocumentRoot /www/www.cataloniaframework.com/www
<Directory /www/www.cataloniaframework.com/www/>
            Options -Indexes +FollowSymLinks +MultiViews
            AllowOverride All
            Order allow,deny
            allow from all
            Require all granted
</Directory>
    ErrorLog ${APACHE_LOG_DIR}/www-cataloniaframework-com-error.log  # Possible values include: debug, info, notice, warn, error, crit,  # alert, emerg.  LogLevel warn  CustomLog${APACHE_LOG_DIR}/www-cataloniaframework-com-access.log combined
</VirtualHost>

### Stoping, starting the docker Service and creating the Catalonia image

service docker stop && service docker start

To build the Docker Image we will do:

docker build -t catalonia . --no-cache

I use the –no-cache so git is pulled and everything is reworked, not kept from cache.

Now we can run the Catalonia Docker, mapping the 80 port.

docker run -d -p 80:80 catalonia

If you want to check what’s going on inside the Docker, you’ll do:

docker ps

And so in this case, we will do:

docker exec -i -t distracted_wing /bin/bash

Finally I would like to check that the web page works, and I’ll use my preferred browser. In this case I will use lynx, the text browser, cause I don’t want Firefox to save things in the cache.

# Using Windows 10 Appliance in Ubuntu Virtual Box 4.3.10

Microsoft has released Windows 10, and with it the possibility to Download a Windows 10 Appliance to run under Virtual Box, VMWare player, HyperV (for windows), Parallels (Mac). Their idea is to allow you to test Microsoft Edge new browser in addition of being able to test the older browsers in older VM images.

I wanted to use Windows 10 to check compatibility with my messenger c-client.

Also I wanted to know how Java behaves.

The Windows 10 VM image will work for 90 days. You can download it from here (http://dev.modern.ie/tools/vms/linux/).

Instructions are very precarious and they didn’t specify a minimum version, however if you use Virtual Box under Ubuntu 14.04, so Virtual Box 4.3.10, you’ll not be able to import the Appliance as you’ll get an error.

Update: Thanks to Razvan and Eric!, readers that reported that this also works for Mac OS 10.9.5. + Virtual Box 4.3.12 and VirtualBox 4.3.20 running under Windows 7 respectively.

‘Windows10_64’ is not a valid Guest OS type.

 Result Code: NS_ERROR_INVALID_ARG (0x80070057) Component: VirtualBox Interface: IVirtualBox {fafa4e17-1ee2-4905-a10e-fe7c18bf5554} Callee: IAppliance {3059cf9e-25c7-4f0b-9fa5-3c42e441670b}

I was looking to find a solution and found no solution on the Internet, so I decided to give a chance and try to fix it by myself.

The error is: ‘Windows10_64’ is not a valid Guest OS type. so obviously, the Windows10_64 is not on the list of the VirtualBox yet, it is a pretty new release. Microsoft could had shipped it with OS Type Windows 64 Other, or Windows 8 64 bits, but they did’t. I wondered if I could edit the image to trick it to appear as a recognized image.

I edited the file (MSEdge – Win10.ova) with Bless Hex Editor, an hexadecimal editor.

I looked for the String “Windows10_64” and found two occurrences.

I had to replace the string and leave it with exact number of bytes it has, so the same length (do not insert additional bytes). I searched for the list of supported OSes and found that “WindowsXP_64” would be a perfect match. I replaced that 10 for XP twice.

Then tried to import the Appliance and it worked.

I tried to run it like that, but it froze on the boot, with the new blue logo of windows.

I figured out that Windows XP would probably not be the best similar architecture, so I edited the config and I set Windows 8.1 (64 bit). I also increased the RAM to 4096 MB and set a 32 MB memory for the video card.

Then I just started the VM and everything worked.

Ok, a funny note: Just started, it installed me an update without asking ;)

# Scaling phantomjs with PHP

One of my clients had a problem with a Phantomjs Software.

I was asked to help in their project, that was relying on one of its features.

Phantomjs is an interesting project, but unfortunately it has not had enough maintenance and a terrible lack of sufficient documentation. The last contributions to repo are from mid May, with small frequency. (Latest releases are from Feb 2015, see the Phantomjs releases on github)

The Software from my client ran well for certain requests, but not for others and after a random time, seconds, or minutes, it became irresponsible.

My client wanted to fix that or to use nodejs to scale their phantom code or in the worst case to rewrite the code in nodejs. And it was urgent, because they were losing a lot of money because of their programs malfunctioning.

I began to investigate. That’s the history of how I fixed…

Connections being irresponsible

My client was using the Phantomjs webserver.

The problem with Phantom’s webserver is that it has a hard limit of 10 concurrent connections. After that all the next http connections are queried until one becomes free.

So if you do a telnet to that port, the connection is accepted, but nothing happens. Even sending malformed GET requests.

My guess was that something in the process of parsing the requests was wrong, and then some of those 10 connections became frozen. I started to debug.

I implemented a timedout that will quit the worker after some time.

mTimerExit = setTimeout(forceExitByTimeout, DEFAULT_TIME_TO_EXIT);

Before exiting is important to clear the timers

clearTimeout(mTimerExit);

I also implemented a debug mode to see what was going on with a method consoleDebug that basically did console.log according to if a parameter debug was set to true.

My quickwin system was working, but many urls still were not being parsed by the phantomjs Engine.

Connecting with nodejs

My client had the bad experience of previous versions of Phantomjs crashing a lot.

So it has the idea of running nodejs as the main webserver, for scaling, and invoking Phantomjs from it.

I did several work in this line.

I tried to link with nodejs with products like:

Unfortunately those packets are no longer maintained, having seen the last update from 2013.

It doesn’t work. I found no documentation, and no traces on errors.

I also got errors like:

XMLHttpRequest cannot load http://localhost:8888/start Origin file:// is not allowed by Access-Control-Allow-Origin

And had to figure out what parameters to tune. I did by starting phantomjs with the param:

 --web-security=false

In the js scene products and packages are changing very fast and sadly often breaking retrocompatibility.

So you better have a very well defined package.json that installs exactly the software version that you need, or soon, when you deploy to another server it will be a disaster.

Ghost Town is a product that allows to run phantomjs from inside nodejs.

It is a company maintained product, by a contributor, Teddy.

He was very nice replying my questions, but it didn’t help.

The process was failing with no debug, no info.

The package really lacks documentation, and has only the same sample across all the web.

I provide this ghost-town code sample, in case it is useful for people looking for more:

var phantomClusterOptions = require("./phantomClusterOptions");
var town = require("ghost-town")(phantomClusterOptions);
var PORT = 8080;

if (town.isMaster) {
var express = require('express');
var app = express();
app.get('/', function(req, res) {
// Every request comes here
var data = {url:req.query.url,device:req.query.quality};

town.queue(data, function(err,result) {
res.set('Content-Type', 'text/plain');
if (!err) {
res.send(result);
} else {
res.send(err);
}
}, phantomClusterOptions.pageTries);

});
app.listen(PORT);
console.log('App running');
} else {
town.on("queue", function(page, data, callback) {
town.phantom.set('onError', function(msg,trace){});
// quality is the exported method, you pass the useful page object as parameter
quality(page, data, function(str){
callback(null, str);
});
});
town.on("error", function(err) {console.log("error");});
}

And the file phantomClusterOptions has:

//Options here https://github.com/buzzvil/ghost-town
phantomClusterOptions = {
//phantomBinary:'./phantomjs', //if you want to use a different phantomjs version
//phantomBinary:'/usr/bin/phantomjs',
workerDeath: 3, //number of times that instance of phantom will be reused
pageTries:5, //tries to the page before rejecting
pageCount: 1, //number of pages analysed concurrently by the same phantom instance (1 is recommended)
// This is for versions 1.9 and older of ghost-town
//phantomFlags:['--load-images=no', '--local-to-remote-url-access=yes', '--ignore-ssl-errors=true', '--web-security=false', '--debug=true'] //flags (http://phantomjs.org/api/command-line.html)
// For v.2 and newer versions
phantomFlags: {"load-images" : false, "local-to-remote-url-access" : true, "ignore-ssl-errors" : true, "web-security" : false, "debug" : true}
}
module.exports = phantomClusterOptions;

3) Other products

https://www.npmjs.com/package/node-phantom-simple

https://github.com/sgentle/phantomjs-node

I tried to debug with node debugger from command-line:

node debug myapp.js

And with node-debug (very nice integration with Chrome):

node-debug myapp.js

But I was unable to see what was failing. The nodejs App was up, and the ghost-town queue was increased, but apparently the worker processing the queue was not working or unable to execute phantomjs. But I saw no errors. When I switched the params for ghost-town to v.2, I got some exception, and it really looks like is unable to execute Phantom, or perhaps phantomjs could not exec the .js due to some dependencies problem.

(throw err and error spawn EACCES)

Also:

Error: /mypath/node_modules/ghost-town/node_modules/phantom/node_modules/dnode/node_modules/weak/build/Release/weakref.node: undefined symbol: node_module_register
at Module.require (module.js:364:17)
at require (module.js:380:17)
at bindings (/mypath/node_modules/ghost-town/node_modules/phantom/node_modules/dnode/node_modules/weak/node_modules/bindings/bindings.js:76:44)
at Object.<anonymous> (/mypath/node_modules/ghost-town/node_modules/phantom/node_modules/dnode/node_modules/weak/lib/weak.js:7:35)
at Module._compile (module.js:456:26)
at Object.Module._extensions..js (module.js:474:10)

/mypath/node_modules/ghost-town/node_modules/phantom/node_modules/dnode/node_modules/weak/node_modules/bindings/bindings.js:83
throw e

But I was unable to find more info on the net, I tried to install additional modules and I even straced the processes but I didn’t find the origin of the problem.

I was using:

npm install browserify express ghost-town phantom socket.io URIjs
async dnode forever node-phantom request underscore.string waitfor

Some SysAdmins love CentOs. I’m in love with Ubuntu.

Basically, is per the packages system. They are really well maintained.

Ubuntu has LTS Long Time Support versions, that last for 5 years.

And in the other hand, they release a new version every 6 months, and if you install a modern server, you have the latest stable packages of Software.

Working with Open Source, this is a really important point. As I have access to modern versions of PHP, Apache, Tomcat, etc…

To use phantomjs with CentOS you have to download the sources and compile it, it took like an hour in a Cloud commodity Virtual Server, and there were problems of dependencies. Also using a phantomjs compiled with a CentOS system didn’t worked with a Server with a different CentOS version. So it was a bit painful to distribute across heterogeneous machines.

With an Ubuntu 14.04 LTS, just:

sudo apt-get install phantomjs

did the trick installing phantomjs (1.9.0-1)

Scaling with PHP

So we had the decision to make between:

• rewriting completely the application to nodejs, that certainly would take time
• to invest more time trying to determine why workers freeze under phantomjs

Phantomjs is a headless WebKit scriptable so it was very convenient.

Nodejs is built on Chrome’s Javascript runtime, so it would do what we want to.

As we had a time-constraint and for my client was very important to have the system working asap.

So I decided to debug a bit more.

So I could keep all the url and after a timedout could force a page.open(url) inside the event if it stopped (timedout)

mPage.onNavigationRequested = function(url, type, willNavigate, main) {

That was working, finally, but was not my favourite solution. I wanted to understand why it was failing initially.

The lack of documentation was frustrating, but debugging the problematic urls, I found that they were doing several redirections, and after some I was getting SSL certificate error on one of the destination urls.

As nowadays there many cheap SSL certificates providers, based on chain certificates, and many sites are configuring them wrong, phantomjs was sensible to that and stopping following urls.

--ignore-ssl-errors=true

But investigating I found a very interesting contribution on stackoverflow from user Micah:

http://stackoverflow.com/questions/12021578/phantomjs-failing-to-open-https-site

Note that as of 2014-10-16, PhantomJS defaults to using SSLv3 to open HTTPS connections. With the POODLE vulnerability recently announced, many servers are disabling SSLv3 support.

To get around that, you should be able to run PhantomJS with:

phantomjs --ssl-protocol=tlsv1


Hopefully, PhantomJS will be updated soon to make TLSv1 the default instead of SSLv3.

I decided to give a try to forcing the version of SSL to TLSV1:

--ssl-protocol=tlsv1

And it worked. It did the trick. All the urls were now being parsed right and following the redirects to the end (or to my timedout).

The problem and the solution has been there since 2015 October, and the default use of tlsv1 has not been implemented as default in Phantomjs. That lack of maintenance I found disappointing.

That is why, when recently a multinational interviewed me, and asked me about technologies like nodejs I told them that I’m conservative until it is clear that the version has been proved as stable. And I told that, in any case, a member of the company should me a core member of the contributors to the technology. They were surprised but they shouldn’t! they should have known what I told!. I explained them that if you use a new technology in production, at least you should have a member of your staff in the core of that product. So you pay a guy to build an Open Source technology, basically. This warranties you that if a heavy bug or security flaw appears, you’ll not be screwed until the release. You guy can fix it immediately and share the solution with the community.

That conservativeness is what I drawn in an interview with Facebook Operations, where I was asked about an scenario where I would be requested by some Developers and DevOps to upgrade the Load Balancers Software. They were more for the action, and I told that LB are critical and I was replied that everything in FB was critical. I argued that if a chat component fails, only the chat fails, but if the Load Balancers fail, everything will fail as they are the entrance point. I had the confirmation that I was right when some months ago they had an outage for hours.

Sometimes you have to keep strong, defend your point, because you know you’re right. Even if you are in front of a person that doesn’t see the things like you and will take a decision that will let you out. Being honest is priceless.

Scaling Phantomjs with PHP

So cool, the system was working fine.

But there was something that could be improved.

As Phantomjs had the limit of 10 connections in their webserver, that was the maximum concurrent connections that it can serve at the same time, and so it was a bottleneck.

// Sample code to create a webserver from PhantomJS
mWebserver = require('webserver');
mServer = mWebserver.create();
console.log("Server created");
//consoleDebug('Debug enabled');
mService = mServer.listen(8080,{'keepAlive': true}, function(request, response) {
//consoleDebug('URL:' + request.url);
s_params = request.url;
doRender(s_params, function(res) {
//consoleDebug('Response from URL:' + request.url + ' (processed)');
writeStringResponse(response,res);
});
//consoleDebug('URL:' + request.url + ' ready for processing');
});

I decided to do propose to the company to use one of my tricks.

To launch phantomjs from PHP.

This is doing a wrapper to launch Phantomjs from commandline, and getting the response. I did the same in my CQLSÍ Cassandra wrapper around cqlsh before Cassandra drivers for PHP were available. I did also this to connect the payment gateway of a bank, written in C, with the Java libraries from Ticketing Solutions in 1999.

That way the server would be able to process as many concurrent Phantomjs instances as we want, as each one would be running in its own process.

I modified the js code to remove the webserver functionality and to get parameters from command line.

var system = require('system');
var args = system.args;
var b_debug_write = false;

if (args.length < 2) {
console.log("Minim 2 parameters");
console.log("call with: phantomjs program.js http://myurl.com quality");
console.log("Parameter debug is optional");
args.forEach(function(arg, i) {
console.log(i + ': ' + arg);
});
// Exit with error level 1
phantom.exit(1);
}

var s_url = args[1];
var s_quality = args[2];

if (args.length > 3) {
// Enable debug
b_debug_write = true;
}

consoleDebug("Starting with url:" + s_url + " and quality:" + s_quality);

Then the PHP code:

<?php
/**
* Creator: Carles Mateo
* Date: 2015-05-11 11:56
*/

// Report all PHP errors
error_reporting(E_ALL);

$b_debug = false; if (!isset($_GET['url']) || !isset($_GET['quality'])) { echo 'Invalid parameters'; exit(); } if (isset($_GET['debug'])) {
$b_debug = true; }$s_url = $_GET['url'];$s_quality = $_GET['quality']; // Just in case is not decoded by the PHP installed$s_url = urldecode($s_url); // reencode url$s_url = urlencode($s_url);$s_script = '/mypath/myapp_commandline.sh';

$s_script_with_params =$s_script.' '.$s_url.' '.$s_quality;

if ($b_debug == true) {$s_script_with_params .= ' debug';
echo 'Executing '.$s_script_with_params."<br />\n"; } //$message=shell_exec("/var/www/scripts/testscript 2>&1");
$s_message = shell_exec($s_script_with_params);

echo $s_message; And finally the bash script myapp_comandline.sh: #!/bin/bash PATH_QUALITY=/mypath/ #tlsv1 is recommended to avoid problems with certificates PARAMETERS="--local-to-remote-url-access=yes --ignore-ssl-errors=true --web-security=false --ssl-protocol=tlsv1" cd$PATH_QUALITY

#echo "Debug param1=$1 param2=$2 param3=$3" if [ -z "$3" ]
then
phantomjs $PARAMETERS quality.js$1 $2 else echo "Launching phantomjs with debug. url=$1 quality=$2" phantomjs$PARAMETERS quality.js $1$2 $3 fi  If you don’t need to load the images you can speed up the thing with parameter: --load-images=false So finally we were able to use only 285 MB of RAM to handle more than 20 concurrent phantomjs processes. # Performance of several languages Notes on 2017-03-26 18:57 CEST – Unix time: 1490547518 : 1. As several of you have noted, it would be much better to use a random value, for example, read by disk. This will be an improvement done in the next benchmark. Good suggestion thanks. 2. Due to my lack of time it took more than expected updating the article. I was in a long process with google, and now I’m looking for a new job. 3. I note that most of people doesn’t read the article and comment about things that are well indicated on it. Please before posting, read, otherwise don’t be surprise if the comment is not published. I’ve to keep the blog clean of trash. 4. I’ve left out few comments cause there were disrespectful. Mediocrity is present in the society, so simply avoid publishing comments that lack the basis of respect and good education. If a comment brings a point, under the point of view of Engineering, it is always published. Thanks. (This article was last updated on 2015-08-26 15:45 CEST – Unix time: 1440596711. See changelog at bottom) One may think that Assembler is always the fastest, but is that true?. If I write a code in Assembler in 32 bit instead of 64 bit, so it can run in 32 and 64 bit, will it be faster than the code that a dynamic compiler is optimizing in execution time to benefit from the architecture of my computer?. What if a future JIT compiler is able to use all the cores to execute a single thread developed program?. Are PHP, Python, or Ruby fast comparing to C++?. Does Facebook Hip Hop Virtual machine really speeds PHP execution?. This article shows some results and shares my conclusions. It is as a base to discuss with my colleagues. Is not an end, we are always doing tests, looking for the edge, and looking at the root of the things in detail. And often things change from one version to the other. This article shows not an absolute truth, but brings some light into interesting aspects. It could show the performance for the certain case used in the test, although generic core instructions have been selected. Many more tests are necessary, and some functions differ in the performance. But this article is a necessary starting for the discussion with my IT-extreme-lover friends and a necessary step for the next upcoming tests. It brings very important data for Managers and Decision Makers, as choosing the adequate performance language can save millions in hardware (specially when you use the Cloud and pay per hour of use) or thousand hours in Map Reduce processes. ## Acknowledgements and thanks Credit for the great Eduard Heredia, for porting my C source code to: • Go • Ruby • Node.js And for the nice discussions of the results, an on the optimizations and dynamic vs static compilers. Thanks to Juan Carlos Moreno, CTO of ECManaged Cloud Software for suggesting adding Python and Ruby to the languages tested when we discussed my initial results. Thanks to Joel Molins for the interesting discussions on Java performance and garbage collection. Thanks to Cliff Click for his wonderful article on Java vs C performance that I found when I wanted to confirm some of my results and findings. I was inspired to do my own comparisons by the benchmarks comparing different framework by techempower. It is amazing to see the results of the tests, like how C++ can serialize JSon 1,057,793 times per second and raw PHP only 180,147 (17%). # For the impatients I present the results of the tests, and the conclusions, for those that doesn’t want to read about the details. For those that want to examine the code, and the versions of every compiler, and more in deep conclusions, this information is provided below. # Results This image shows the results of the tests with every language and compiler. All the tests are invoked from command line. All the tests use only one core. No tests for the web or frameworks have been made, are another scenarios worth an own article. More seconds means a worst result. The worst is Bash, that I deleted from the graphics, as the bar was crazily high comparing to others. * As later is discussed my initial Assembler code was outperformed by C binary because the final Assembler code that the compiler generated was better than mine. After knowing why (later in this article is explained in detail) I could have reduced it to the same time than the C version as I understood the improvements made by the compiler. Table of times: Seconds executing Language Compiler used Version 6 s. Java Oracle Java Java JDK 8 6 s. Java Oracle Java Java JDK 7 6 s. Java Open JDK OpenJDK 7 6 s. Java Open JDK OpenJDK 6 7 s. Go Go Go v.1.3.1 linux/amd64 7 s. Go Go Go v.1.3.3 linux/amd64 8 s. Lua LuaJit Luajit 2.0.2 10 s. C++ g++ g++ (Ubuntu 4.8.2-19ubuntu1) 4.8.2 10 s. C gcc gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2 10 s. (* first version was 13 s. and then was optimized) Assembler nasm NASM version 2.10.09 compiled on Dec 29 2013 10 s. Nodejs nodejs Nodejs v0.12.4 14 s. Nodejs nodejs Nodejs v0.10.25 18 s. Go Go go version xgcc (Ubuntu 4.9-20140406-0ubuntu1) 4.9.0 20140405 (experimental) [trunk revision 209157] linux/amd64 20 s. Phantomjs Phantomjs phantomjs 1.9.0 21 s. Phantomjs Phantomjs phantomjs 2.0.1-development 38 s. PHP Facebook HHVM HipHop VM 3.4.0-dev (rel) 44 s. Python Pypy Pypy 2.2.1 (Python 2.7.3 (2.2.1+dfsg-1, Nov 28 2013, 05:13:10)) 52 s. PHP Facebook HHVM HipHop VM 3.9.0-dev (rel) 52 s. PHP Facebook HHVM HipHop VM 3.7.3 (rel) 128 s. PHP PHP PHP 7.0.0alpha2 (cli) (built: Jul 3 2015 15:30:23) 278 s. Lua Lua Lua 2.5.3 294 s. Gambas3 Gambas3 3.7.0 316 s. PHP PHP PHP 5.5.9-1ubuntu4.3 (cli) (built: Jul 7 2014 16:36:58) 317 s. PHP PHP PHP 5.6.10 (cli) (built: Jul 3 2015 16:13:11) 323 s. PHP PHP PHP 5.4.42 (cli) (built: Jul 3 2015 16:24:16) 436 s. Perl Perl Perl 5.18.2 523 s. Ruby Ruby ruby 1.9.3p484 (2013-11-22 revision 43786) [x86_64-linux] 694 s. Python Python Python 2.7.6 807 s. Python Python Python 3.4.0 47630 s. Bash GNU bash, version 4.3.11(1)-release (x86_64-pc-linux-gnu) ## Conclusions and Lessons Learnt 1. There are languages that will execute faster than a native Assembler program, thanks to the JIT Compiler and to the ability to optimize the program at runtime for the architecture of the computer running the program (even if there is a small initial penalty of around two seconds from JIT when running the program, as it is being analysed, is it more than worth in our example) 2. Modern Java can be really fast in certain operations, it is the fastest in this test, thanks to the use of JIT Compiler technology and a very good implementation in it 3. Oracle’s Java and OpenJDK shows no difference in performance in this test 4. Script languages really sucks in performance. Python, Perl and Ruby are terribly slow. That costs a lot of money if you Scale as you need more Server in the Cloud 5. JIT compilers for Python: Pypy, and for Lua: LuaJit, make them really fly. The difference is truly amazing 6. The same language can offer a very different performance using one version or another, for example the go that comes from Ubuntu packets and the last version from official page that is faster, or Python 3.4 is much slower than Python 2.7 in this test 7. Bash is the worst language for doing the loop and inc operations in the test, lasting for more than 13 hours for the test 8. From command line PHP is much faster than Python, Perl and Ruby 9. Facebook Hip Hop Virtual Machine (HHVM) improves a lot PHP’s speed 10. It looks like the future of compilers is JIT. 11. Assembler is not always the fastest when executed. If you write a generic Assembler program with the purpose of being able to run in many platforms you’ll not use the most powerful instructions specific of an architecture, and so a JIT compiler can outperform your code. An static compiler can also outperform your code with very clever optimizations. People that write the compilers are really good. Unless you’re really brilliant with Assembler probably a C/C++ code beats the performance of your code. Even if you’re fantastic with Assembler it could happen that a JIT compiler notices that some executions can be avoided (like code not really used) and bring magnificent runtime optimizations. (for example a near JMP is much more less costly than a far JMP Assembler instruction. Avoiding dead code could result in a far JMP being executed as near JMP, saving many cycles per loop) 12. Optimizations really needs people dedicated to just optimizations and checking the speed of the newly added code for the running platforms 13. Node.js was a big surprise. It really performed well. It is promising. New version performs even faster 14. go is promising. Similar to C, but performance is much better thanks to deciding at runtime if the architecture of the computer is 32 or 64 bit, a very quick compilation at launch time, and it compiling to very good assembler (that uses the 64 bit instructions efficiently, for example) 15. Gambas 3 performed surprisingly fast. Better than PHP 16. You should be careful when using C/C++ optimization -O3 (and -O2) as sometimes it doesn’t work well (bugs) or as you may expect, for example by completely removing blocks of code if the compiler believes that has no utility (like loops) 17. Perl performance really change from using a for style or another. (See Perl section below) 18. Modern CPUs change the frequency to save energy. To run the tests is strictly recommended to use a dedicated machine, disabling the CPU governor and setting a frequency for all the cores, booting with a text only live system, without background services, not mounting disks, no swap, no network (Please, before commenting read completely the article ) # Explanations in details Obviously an statically compiled language binary should be faster than an interpreted language. C or C++ are much faster than PHP. And good code machine is much faster of course. But there are also other languages that are not compiled as binary and have really fast execution. For example, good Web Java Application Servers generate compiled code after the first request. Then it really flies. For web C# or .NET in general, does the same, the IIS Application Server creates a native DLL after the first call to the script. And after this, as is compiled, the page is really fast. With C statically linked you could generate binary code for a particular processor, but then it won’t work in other processors, so normally we write code that will work in all the processors at the cost of not using all the performance of the different CPUs or use another approach and we provide a set of different binaries for the different architectures. A set of directives doing one thing or other depending on the platform detected can also be done, but is hard, long and tedious job with a lot of special cases treatment. There is another approach that is dynamic linking, where certain things will be decided at run time and optimized for the computer that is running the program by the JIT (Just-in-time) Compiler. Java, with JIT is able to offer optimizations for the CPU that is running the code with awesome results. And it is able to optimize loops and mathematics operations and outperform C/C++ and Assembler code in some cases (like in our tests) or to be really near in others. It sounds crazy but nowadays the JIT is able to know the result of several times executed blocks of code and to optimize that with several strategies, speeding the things incredible and to outperform a code written in Assembler. Demonstrations with code is provided later. A new generation has grown knowing only how to program for the Web. Many of them never saw Assembler, neither or barely programmed in C++. None of my Senior friends would assert that a technology is better than another without doing many investigations before. We are serious. There is so much to take in count, so much to learn always, that one has to be sure that is not missing things before affirming such things categorically. If you want to be taken seriously, you have to take many things in count. # Environment for the tests ## Hardware and OS Intel(R) Core(TM) i7-4770S CPU @ 3.10GHz with 32 GB RAM and SSD Disk. Ubuntu Desktop 14.04 LTS 64 bit ## Software base and compilers ### PHP versions Shipped with my Ubuntu distribution: php -v PHP 5.5.9-1ubuntu4.3 (cli) (built: Jul 7 2014 16:36:58) Copyright (c) 1997-2014 The PHP Group Zend Engine v2.5.0, Copyright (c) 1998-2014 Zend Technologies with Zend OPcache v7.0.3, Copyright (c) 1999-2014, by Zend Technologies Compiled from sources: PHP 5.6.10 (cli) (built: Jul 3 2015 16:13:11) Copyright (c) 1997-2015 The PHP Group Zend Engine v2.6.0, Copyright (c) 1998-2015 Zend Technologies PHP 5.4.42 (cli) (built: Jul 3 2015 16:24:16) Copyright (c) 1997-2014 The PHP Group Zend Engine v2.4.0, Copyright (c) 1998-2014 Zend Technologies ### Java 8 version java -showversion java version "1.8.0_05" Java(TM) SE Runtime Environment (build 1.8.0_05-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.5-b02, mixed mode) ### C++ version g++ -v Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.8/lto-wrapper Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 4.8.2-19ubuntu1' --with-bugurl=file:///usr/share/doc/gcc-4.8/README.Bugs --enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.8 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.8 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --disable-libmudflap --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-4.8-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-4.8-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-4.8-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ### Gambas 3 gbr3 --version 3.7.0 ### Go (downloaded from google) go version go version go1.3.1 linux/amd64 ### Go (Ubuntu packages) go version go version xgcc (Ubuntu 4.9-20140406-0ubuntu1) 4.9.0 20140405 (experimental) [trunk revision 209157] linux/amd64 ### Nasm nasm -v NASM version 2.10.09 compiled on Dec 29 2013 ### Lua lua -v Lua 5.2.3 Copyright (C) 1994-2013 Lua.org, PUC-Rio ### Luajit luajit -v LuaJIT 2.0.2 -- Copyright (C) 2005-2013 Mike Pall. http://luajit.org/ ### Nodejs Installed with apt-get install nodejs: nodejs --version v0.10.25 Installed by compiling the sources: node --version v0.12.4 ## Phantomjs Installed with apt-get install phantomjs: phantomjs --version 1.9.0 Compiled from sources: /path/phantomjs --version 2.0.1-development ### Python 2.7 python --version Python 2.7.6 ### Python 3 python3 --version Python 3.4.0 ## Perl perl -version This is perl 5, version 18, subversion 2 (v5.18.2) built for x86_64-linux-gnu-thread-multi (with 41 registered patches, see perl -V for more detail) ## Bash bash --version GNU bash, version 4.3.11(1)-release (x86_64-pc-linux-gnu) Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software; you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. ## Test: Time required for nested loops This is the first sample. It is an easy-one. The main idea is to generate a set of nested loops, with a simple counter inside. When the counter reaches 51 it is set to 0. This is done for: 1. Preventing overflow of the integer if growing without control 2. Preventing the compiler from optimizing the code (clever compilers like Java or gcc with -O3 flag for optimization, if it sees that the var is never used, it will see that the whole block is unnecessary and simply never execute it) Doing only loops, the increment of a variable and an if, provides us with basic structures of the language that are easily transformed to Assembler. We want to avoid System calls also. This is the base for the metrics on my Cloud Analysis of Performance cmips.net project. Here I present the times for each language, later I analyze the details and the code. Take in count that this code only executes in one thread / core. ## C++ C++ result, it takes 10 seconds. Code for the C++: /* * File: main.cpp * Author: Carles Mateo * * Created on August 27, 2014, 1:53 PM */ #include <cstdlib> #include <iostream> #include <stdio.h> #include <stdlib.h> #include <sys/time.h> #include <ctime> using namespace std; typedef unsigned long long timestamp_t; static timestamp_t get_timestamp() { struct timeval now; gettimeofday (&now, NULL); return now.tv_usec + (timestamp_t)now.tv_sec * 1000000; } int main(int argc, char** argv) { timestamp_t t0 = get_timestamp(); // current date/time based on current system time_t now = time(0); // convert now to string form char* dt_now = ctime(&now); printf("Starting at %s\n", dt_now); int i_loop1 = 0; int i_loop2 = 0; int i_loop3 = 0; for (i_loop1 = 0; i_loop1 < 10; i_loop1++) { for (i_loop2 = 0; i_loop2 < 32000; i_loop2++) { for (i_loop3 = 0; i_loop3 < 32000; i_loop3++) { i_counter++; if (i_counter > 50) { i_counter = 0; } } // If you want to test how the compiler optimizes that, remove the comment //i_counter = 0; } } // This is another trick to avoid compiler's optimization. To use the var somewhere printf("Counter: %i\n", i_counter); timestamp_t t1 = get_timestamp(); double secs = (t1 - t0) / 1000000.0L; time_t now_end = time(0); // convert now to string form char* dt_now_end = ctime(&now_end); printf("End time: %s\n", dt_now_end); return 0; } You can try to remove the part of code that makes the checks:  /* if (i_counter > 50) { i_counter = 0; }*/ And the use of the var, later:  //printf("Counter: %i\n", i_counter); Note: And adding a i_counter = 0; at the beginning of the loop to make sure that the counter doesn’t overflows. Then the C or C++ compiler will notice that this result is never used and so it will eliminate the code from the program, having as result and execution time of 0.0 seconds. ## Java The code in Java: package cpu; /** * * @author carles.mateo */ public class Cpu { /** * @param args the command line arguments */ public static void main(String[] args) { int i_loop1 = 0; //int i_loop_main = 0; int i_loop2 = 0; int i_loop3 = 0; int i_counter = 0; String s_version = System.getProperty("java.version"); System.out.println("Java Version: " + s_version); System.out.println("Starting cpu.java..."); for (i_loop1 = 0; i_loop1 < 10; i_loop1++) { for (i_loop2 = 0; i_loop2 < 32000; i_loop2++) { for (i_loop3 = 0; i_loop3 < 32000; i_loop3++) { i_counter++; if (i_counter > 50) { i_counter = 0; } } } } System.out.println(i_counter); System.out.println("End"); } }  It is really interesting how Java, with JIT outperforms C++ and Assembler. It takes only 6 seconds. ## Go The case of Go is interesting because I saw a big difference from the go shipped with Ubuntu, and the the go I downloaded from http://golang.org/dl/. I downloaded 1.3.1 and 1.3.3 offering the same performance. 7 seconds. Source code for nested_loops.go package main import ("fmt" "time") func main() { fmt.Printf("Starting: %s", time.Now().Local()) var i_counter = 0; for i_loop1 := 0; i_loop1 < 10; i_loop1++ { for i_loop2 := 0; i_loop2 < 32000; i_loop2++ { for i_loop3 := 0; i_loop3 < 32000; i_loop3++ { i_counter++; if i_counter > 50 { i_counter = 0; } } } } fmt.Printf("\nCounter: %#v", i_counter) fmt.Printf("\nEnd: %s\n", time.Now().Local()) } ### Assembler Here is the Assembler for Linux code, with SASM, that I created initially (bellow is optimized). %include "io.inc" section .text global CMAIN CMAIN: ;mov rbp, rsp; for correct debugging ; Set to 0, the faster way xor esi, esi DO_LOOP1: mov ecx, 10 LOOP1: mov ebx, ecx jmp DO_LOOP2 LOOP1_CONTINUE: mov ecx, ebx loop LOOP1 jmp QUIT DO_LOOP2: mov ecx, 32000 LOOP2: mov eax, ecx ;call DO_LOOP3 jmp DO_LOOP3 LOOP2_CONTINUE: mov ecx, eax loop LOOP2 jmp LOOP1_CONTINUE DO_LOOP3: ; Set to 32000 loops MOV ecx, 32000 LOOP3: inc esi cmp esi, 50 jg COUNTER_TO_0 LOOP3_CONTINUE: loop LOOP3 ;ret jmp LOOP2_CONTINUE COUNTER_TO_0: ; Set to 0 xor esi, esi jmp LOOP3_CONTINUE ; jmp QUIT QUIT: xor eax, eax ret It took 13 seconds to complete. One interesting explanation on why binary C or C++ code is faster than Assembler, is because the C compiler generates better Assembler/binary code at the end. For example, the use of JMP is expensive in terms of CPU cycles and the compiler can apply other optimizations and tricks that I’m not aware of, like using faster registers, while in my code I use ebx, ecx, esi, etc… (for example, imagine that using cx is cheaper than using ecx or rcx and I’m not aware but the guys that created the Gnu C compiler are) To be sure of what’s going on I switched in the LOOP3 the JE and the JMP of the code, for groups of 50 instructions, INC ESI, one after the other and the time was reduced to 1 second. (In C also was reduced even a bit more when doing the same) To know what’s the translation of the C code into Assembler when compiled, you can do: objdump --disassemble nested_loops Look for the section main and you’ll get something like: 0000000000400470 <main>: 400470: bf 0a 00 00 00 mov$0xa,%edi
400475:    31 c9                    xor    %ecx,%ecx
400477:    be 00 7d 00 00           mov    $0x7d00,%esi 40047c: 0f 1f 40 00 nopl 0x0(%rax) 400480: b8 00 7d 00 00 mov$0x7d00,%eax
400485:    0f 1f 00                 nopl   (%rax)
400488:    83 c2 01                 add    $0x1,%edx 40048b: 83 fa 33 cmp$0x33,%edx
40048e:    0f 4d d1                 cmovge %ecx,%edx
400491:    83 e8 01                 sub    $0x1,%eax 400494: 75 f2 jne 400488 <main+0x18> 400496: 83 ee 01 sub$0x1,%esi
400499:    75 e5                    jne    400480 <main+0x10>
40049b:    83 ef 01                 sub    $0x1,%edi 40049e: 75 d7 jne 400477 <main+0x7> 4004a0: 48 83 ec 08 sub$0x8,%rsp
4004a4:    be 34 06 40 00           mov    $0x400634,%esi 4004a9: bf 01 00 00 00 mov$0x1,%edi
4004ae:    31 c0                    xor    %eax,%eax
4004b0:    e8 ab ff ff ff           callq  400460 <__printf_chk@plt>
4004b5:    31 c0                    xor    %eax,%eax
4004b7:    48 83 c4 08              add    $0x8,%rsp 4004bb: c3 retq Note: this is in the AT&T syntax and not in the Intel. That means that add$0x1,%edx is adding 1 to EDX registerg (origin, destination).

As you can see the C compiler has created a very differed Assembler version respect what I created.
For example at 400470 it uses EDI register to store 10, so to control the number of the outer loop.
It uses ESI to store 32000 (Hexadecimal 0x7D00), so the second loop.
And EAX for the inner loop, at 400480.
It uses EDX for the counter, and compares to 50 (Hexa 0x33) at 40048B.
In 40048E it uses the CMOVGE (Mov if Greater or Equal), that is an instruction that was introduced with the P6 family processors, to move the contents of ECX to EDX if it was (in the CMP) greater or equal to 50. As in 400475 a XOR ECX, ECX was performed, EXC contained 0.
And it cleverly used SUB and JNE (JNE means Jump if not equal and it jumps if ZF = 0, it is equivalent to JNZ Jump if not Zero).
It uses between 4 and 16 clocks, and the jump must be -128 to +127 bytes of the next instruction. As you see Jump is very costly.

Looks like the biggest improvement comes from the use of CMOVGE, so it saves two jumps that my original Assembler code was performing.
Those two jumps multiplied per 32000 x 32000 x 10 times, are a lot of Cpu clocks.

So, with this in mind, as this Assembler code takes 10 seconds, I updated the graph from 13 seconds to 10 seconds.

### Lua

This is the initial code:

local i_counter = 0

local i_time_start = os.clock()

for i_loop1=0,9 do
for i_loop2=0,31999 do
for i_loop3=0,31999 do
i_counter = i_counter + 1
if i_counter > 50 then
i_counter = 0
end
end
end
end

local i_time_end = os.clock()
print(string.format("Counter: %i\n", i_counter))
print(string.format("Total seconds: %.2f\n", i_time_end - i_time_start))

In the case of Lua theoretically one could take great advantage of the use of local inside a loop, so I tried the benchmark with modifications to the loop:

for i_loop1=0,9 do
for i_loop2=0,31999 do
local l_i_counter = i_counter
for i_loop3=0,31999 do
l_i_counter = l_i_counter + 1
if l_i_counter > 50 then
l_i_counter = 0
end
end
i_counter = l_i_counter
end
end

I ran it with LuaJit and saw no improvements on the performance.

### Node.js

var s_date_time = new Date();
console.log('Starting: ' + s_date_time);

var i_counter = 0;

for (var $i_loop1 = 0;$i_loop1 < 10; $i_loop1++) { for (var$i_loop2 = 0; $i_loop2 < 32000;$i_loop2++) {
for (var $i_loop3 = 0;$i_loop3 < 32000; $i_loop3++) { i_counter++; if (i_counter > 50) { i_counter = 0; } } } } var s_date_time_end = new Date(); console.log('Counter: ' + i_counter + '\n'); console.log('End: ' + s_date_time_end + '\n'); Execute with: nodejs nested_loops.js ## Phantomjs The same code as nodejs adding to the end: phantom.exit(0); In the case of Phantom it performs the same in both versions 1.9.0 and 2.0.1-development compiled from sources. ### PHP The interesting thing on PHP is that you can write your own extensions in C, so you can have the easy of use of PHP and create functions that really brings fast performance in C, and invoke them from PHP. <?php$s_date_time = date('Y-m-d H:i:s');

echo 'Starting: '.$s_date_time."\n";$i_counter = 0;

for ($i_loop1 = 0;$i_loop1 < 10; $i_loop1++) { for ($i_loop2 = 0; $i_loop2 < 32000;$i_loop2++) {
for ($i_loop3 = 0;$i_loop3 < 32000; $i_loop3++) {$i_counter++;
if ($i_counter > 50) {$i_counter = 0;
}
}
}
}

$s_date_time_end = date('Y-m-d H:i:s'); echo 'End: '.$s_date_time_end."\n";

Facebook’s Hip Hop Virtual Machine is a very powerful alternative, that is JIT powered.

git clone https://github.com/facebook/hhvm.git
cd hhvm
rm -r third-party
git submodule update --init --recursive
./configure
make

Or just grab precompiled packages from https://github.com/facebook/hhvm/wiki/Prebuilt%20Packages%20for%20HHVM

### Python

from datetime import datetime
import time

print ("Starting at: " + str(datetime.now()))
s_unixtime_start = str(time.time())

i_counter = 0

# From 0 to 31999
for i_loop1 in range(0, 10):
for i_loop2 in range(0,32000):
for i_loop3 in range(0,32000):
i_counter += 1
if ( i_counter > 50 ) :
i_counter = 0

print ("Ending at: " + str(datetime.now()))
s_unixtime_end = str(time.time())

i_seconds = long(s_unixtime_end) - long(s_unixtime_start)
s_seconds = str(i_seconds)

print ("Total seconds:" + s_seconds)

### Ruby

#!/usr/bin/ruby -w

time1 = Time.new

puts "Starting : " + time1.inspect

i_counter = 0;

for i_loop1 in 0..9
for i_loop2 in 0..31999
for i_loop3 in 0..31999
i_counter = i_counter + 1
if i_counter > 50
i_counter = 0
end
end
end
end

time1 = Time.new

puts "End : " + time1.inspect

### Perl

The case of Perl was very interesting one.

This is the current code:

#!/usr/bin/env perl

print "$s_datetime Starting calculations...\n";$i_counter=0;

$i_unixtime_start=time(); for my$i_loop1 (0 .. 9) {
for my $i_loop2 (0 .. 31999) { for my$i_loop3 (0 .. 31999) {
$i_counter++; if ($i_counter > 50) {
$i_counter = 0; } } } }$i_unixtime_end=time();

$i_seconds=$i_unixtime_end-$i_unixtime_start; print "Counter:$i_counter\n";
print "Total seconds: $i_seconds"; But before I created one, slightly different, with the for loops like in the C style: #!/usr/bin/env perl$i_counter=0;

$i_unixtime_start=time(); for (my$i_loop1=0; $i_loop1 < 10;$i_loop1++) {
for (my $i_loop2=0;$i_loop2 < 32000; $i_loop2++) { for (my$i_loop3=0; $i_loop3 < 32000;$i_loop3++) {
$i_counter++; if ($i_counter > 50) {
$i_counter = 0; } } } }$i_unixtime_end=time();

$i_seconds=$i_unixtime_end-$i_unixtime_start; print "Total seconds:$i_seconds";

I repeated this test, with the same version of Perl, due to the comment of a reader (thanks mpapec) that told:

In this particular case perl style loops are about 45% faster than original code (v5.20)

And effectively and surprisingly the time passed from 796 seconds to 436 seconds.

So graphics are updated to reflect the result of 436 seconds.

### Bash

#!/bin/bash
echo "Bash version ${BASH_VERSION}..." date let "s_time_start=$(date +%s)"
let "i_counter=0"

for i_loop1 in {0..9}
do
echo "."
date
for i_loop2 in {0..31999}
do
for i_loop3 in {0..31999}
do
((i_counter++))
if [[ $i_counter > 50 ]] then let "i_counter=0" fi done #((var+=1)) #((var=var+1)) #((var++)) #let "var=var+1" #let "var+=1" #let "var++" done done let "s_time_end=$(date +%2)"

let "s_seconds = s_time_end - s_time_start"
echo "Total seconds: $s_seconds" # Just in case it overflows date ### Gambas 3 Gambas is a language and an IDE to create GUI applications for Linux. It is very similar to Visual Basic, but better, and it is not a clone. I created a command line application and it performed better than PHP. There has been done an excellent job with the compiler. Note: in the screenshot the first test ran for few seconds more than in the second. This was because I deliberately put the machine under some load and I/O during the tests. The valid value for the test, confirmed with more iterations is the second one, done under the same conditions (no load) than the previous tests. ' Gambas module file MMain.module Public Sub Main() ' @author Carles Mateo http://blog.carlesmateo.com Dim i_loop1 As Integer Dim i_loop2 As Integer Dim i_loop3 As Integer Dim i_counter As Integer Dim s_version As String i_loop1 = 0 i_loop2 = 0 i_loop3 = 0 i_counter = 0 s_version = System.Version Print "Performance Test by Carles Mateo blog.carlesmateo.com" Print "Gambas Version: " & s_version Print "Starting..." & Now() For i_loop1 = 0 To 9 For i_loop2 = 0 To 31999 For i_loop3 = 0 To 31999 i_counter = i_counter + 1 If (i_counter > 50) Then i_counter = 0 Endif Next Next Next Print i_counter Print "End " & Now() End Changelog 2015-08-26 15:45 Thanks to the comment of a reader, thanks Daniel, pointing a mistake. The phrase I mentioned was on conclusions, point 14, and was inaccurate. The original phrase told “go is promising. Similar to C, but performance is much better thanks to the use of JIT“. The allusion to JIT is incorrect and has been replaced by this: “thanks to deciding at runtime if the architecture of the computer is 32 or 64 bit, a very quick compilation at launch time, and it compiling to very good assembler (that uses the 64 bit instructions efficiently, for example)” 2015-07-17 17:46 Benchmarked Facebook HHVM 3.9 (dev., the release date is August 3 2015) and HHVM 3.7.3, they take 52 seconds. Re-benchmarked Facebook HHVM 3.4, before it was 72 seconds, it takes now 38 seconds. I checked the screen captures from 2014 to discard an human error. Looks like a turbo frequency issue on the tests computer, with the CPU governor making it work bellow the optimal speed or a CPU-hungry/IO process that triggered during the tests and I didn’t detect it. Thinking about forcing a fixed CPU speed for all the cores for the tests, like 2.4 Ghz and booting a live only text system without disk access and network to prevent Ubuntu launching processes in the background. 2015-07-05 13:16 Added performance of Phantomjs 1.9.0 installed via apt-get install phantomjs in Ubuntu, and Phantomjs 2.0.1-development. Added performance of nodejs 0.12.04 (compiled). Added bash to the graphic. It has so bad performance that I had to edit the graphic to fit in (color pink) in order prevent breaking the scale. 2015-07-03 18:32 Added benchmarks for PHP 7 alpha 2, PHP 5.6.10 and PHP 5.4.42. 2015-07-03 15:13 Thanks to the contribution of a reader (thanks mpapec!) I tried with Perl for style, resulting in passing from 796 seconds to 436 seconds. (I used the same Perl version: Perl 5.18.2) Updated test value for Perl. Added new graphics showing the updated value. Thanks to the contribution of a reader (thanks junk0xc0de!) added some additional warnings and explanations about the dangers of using -O3 (and -O2) if C/C++. Updated the Lua code, to print i_counter and do the if i_counter > 50 This makes it take a bit longer, few cents, but passing from 7.8 to 8.2 seconds. Updated graphics. # Improving performance in PHP This year I was invited to speak at the PHP Conference at Berlin 2014. It was really nice, but I had to decline as I was working hard in a Start up, and I hadn’t the required time in order to prepare the nice conference I wanted and that people deserves. However, having time, I decided to write an article about what I would had speak at the conference. I will cover improving performance in a single server, and Scaling out multi-Server architecture, focusing on the needs of growing and Start up projects. Many of those techniques can be used to improve performance with other languages, not just with PHP. Many of my friends are very good Developing, but know nothing about Architecture and Scaling. Hope this approach the two worlds, Development ad Operatings, into a DevOps bridge. # Improving performance on a single server ## Hosting Choose a good hosting. And if you can afford it choose a dedicated server. Shared hostings are really bad. Some of them kill your http and mysql instances if you reach certain CPU use (really few), while others share the same hardware between 100+ users serving your pages sloooooow. Others cap the amount of queries that your MySql will handle per hour at so ridiculous few amount that even Drupal or WordPress are unable to complete a request in development. Other ISP (Internet Service Providers) have poor Internet bandwidth, and so you web will load slow to users. Some companies invest hundreds of thousands in developing a web, and then spend 20 € a year in the hosting. Less than the cost of a dinner. You can use a decent dedicated server from 50 to 99 €/month and you will celebrate this decision every day. Take in count that virtualization wastes between 20% and 30% of the CPU power. And if there are several virtual machines the loss will be more because you loss the benefits of the CPU caching for optimizing parallel instructions execution and prediction. Also if the hypervisor host allows to allocate more RAM than physically available and at some point it swaps, the performance of all the VM’s will be much worst. If you have a VM and it swaps, in most providers the swap goes over the network so there is an additional bottleneck and performance penalty. To compare the performance of dedicated servers and instances from different Cloud Providers you can take a look at my project cmips.net ## Improve your Server If your Sever has few RAM, add more. And if your project is running slow and you can afford a better Server, do it. Using SSD disk will incredibly improve the performance on I/O operations and on swap operations. (but please, do backups and keep them in another place) If you use a CMS like ezpublish with http_cache enabled probably you will prefer to have a Server with faster cores, rather tan a Server with one or more CPU’s plenty of cores, but slower cores, and that last for a longer time to render the page to the http cache. That may seem obvious but often companies invest 320 hours in optimizing the code 2%, at a cost of let’s say 50 €/h * 320 hours = 16.000 €, while hiring a better Server would had bring between a 20% to 1000% improvement at a cost of additional 50€/month only or at the cost of 100 € of increasing the RAM memory. The point here is that the hardware is cheap, while the time of the Engineers is expensive. And good Engineers are really hard to find. And you probably, as a CEO or PO, prefer to use the talent to warranty a nice time to market for your project, or adding more features, rather than wasting this time in refactorizing. Even with the most optimal code in the universe, if your project is successful at certain point you’ll have to scale. So adding more Servers. To save a Server now at the cost of slowing the business has not any sense. ## Upgrade you PHP version Many projects still use PHP 5.3, and 5.4. Latest versions of PHP bring more and more performance. If you use old versions of PHP you can have a Quick Win by just upgrading to the last PHP version. ## Use OpCache (or other cache accelerator) OpCache is shipped with PHP 5.5 by default now, so it is the recommended option. It is though to substitute APC. To activate OpCache edit php.ini and add: Linux/Unix: zend_extension=/path/to/opcache.so Windows: zend_extension=C:\path\to\php_opcache.dll It will greatly improve your PHP performance. Ensure that OpCache in Production has the optimal config for Production, that will be different from Development Environment. Note: If you plan to use it with XDebug in Development environments, load OpCache before XDebug. ## Disable Profiling and xdebug in Production In Production disable the profiling, xdebug, and if you use a Framework ensure the Development/Debug features are disabled in Production. ## Ensure your logs are not full of warnings Check that Production logs are not full of warnings. I’ve seen systems were every seconds 200 warnings were written to logs, the same all the time, and that obviously was slowing down the system. Typical warnings like this can be easily fixed: Message: date() [function.date]: It is not safe to rely on the system’s timezone settings. You are required to use the date.timezone setting or the date_default_timezone_set() function. In case you used any of those methods and you are still getting this warning, you most likely misspelled the timezone identifier. We selected ‘UTC’ for ‘8.0/no DST’ instead ## Profile in Development To detect where your slow code is, profile it in Development to see where it is spent the most CPU/time. Check the slow-queries if you use MySql. ## Cache html to disk Imagine you have a sort of craigslist and you are displaying all the categories, and the number of new messages in this landing page. To do that you are performing many queries to the database, SELECT COUNTs, etc… every time a user visits your page. That certainly will overload your database with actually few concurrent visitors. Instead of querying the Database all the time, do cache the generated page for a while. This can be achieved by checking if the cache html file exists, and checking the TTL, and generating a new page if needed. A simple sample would be: <?php // Cache pages for 5 minutes$i_cache_TTL = 300;

$b_generate_cache = false;$s_cache_file = '/tmp/index.cache.html';

if (file_exists($s_cache_file)) { // Get creation date$i_file_timestamp = filemtime($s_cache_file);$i_time_now = microtime(true);
if ($i_time_now > ($i_file_timestamp + $i_cache_TTL)) {$b_generate_cache = true;
} else {
// Up to date, get from the disk
$o_fh = fopen($s_cache_file, "rb");
$s_html = stream_get_contents($o_fh);
fclose($o_fh); // If the file was empty something went wrong (disk full?), so don't use it if (strlen($s_html) == 0) {
$b_generate_cache = true; } else { // Print the page and exit echo$s_html;
exit();
}
}
} else {
$b_generate_cache = true; } ob_start(); // Render your page normally here // ....$s_html = ob_get_clean();

if ($b_generate_cache == true) { // Create the file with fresh contents$o_fp = fopen($s_cache_file, 'w'); if (fwrite($o_fp, $s_html) === false) { // Error. Impossible to write to disk // throw new Exception('CacheCantWrite'); } fclose($o_fp);
}

// Send the page to the browser
echo $s_html;  This sample is simple, and works for many cases, but presents problems. Imagine for example that the page takes 5 seconds to be generated with a single request, and you have high traffic in that page, let’s say 500 requests per second. What will happen when the cache expires is that the first user will trigger the cache generation, and the second, and the third…. so all of the 500 requests * 5 seconds will be hitting the database to generate the cache, but… if creating the page per one requests takes 5 seconds, doing this 2,500 times will not last 5 seconds… so your process will enter in a vicious state where the first queries have not ended after minutes, and more and more queries are being added to the queue until: a) Apache runs out of childs/processes, per configuration b) Mysql runs out of connections, per configuration c) Linux runs out of memory, and processes crashes/are killed Not to mention the users or the API client, waiting infinitely for the http request to complete, and other processes reading a partial file (size bigger than 0 but incomplete). Different strategies can be used to prevent that, like: a) using semaphores to lock access to the cache generation (only one process at time) b) using a .lock file to indicate that the file is being generated, and so next requests serving from the cache until the cache generation process ends the task, also writing to a buffer like acachefile.buffer (to prevent incomplete content being read) and finally when is complete renaming to the final name and removing the .lock c) using memcached, or similar, to keep an index in memory of what pages are being generated now, and why not, keeping the cached files there instead of a filesystem d) using crons to generate the cache files, so they run hourly and you ensure only one process generates the cache files If you use crons, a cheap way to generate the .html content is that the crons curls/wget your webpage. I don’t recommend this as has some problems, like if that web request fails for any reason, you’ll have cached an error instead of content. I prefer preparing my projects to being able of rendering the content being invoked from HTTP/S or from command line. But if you use curl because is cheap and easy and time to market is important for your project, then be sure that you check that your backend code writes an Status OK in the HTML that the cron can check to ensure that the content has been properly generated. (some crons only check for http status, like 200, but if your database or a xml gateway you use fails you will likely get a 200 and won’t detect that you’re caching pages with “error I can’t connect to the database” instead) Many Frameworks have their own cache implementation that prevent corruption that could come by several processes writing to the same file at the same time, or from PHP dying in the middle of the render. You can see a more complex MVC implementation, with Views, from my Framework Catalonia here: By serving .html files instead of executing PHP with logic and performing queries to the database you will be able to serve hundreds of thousands requests per day with a single machine and really fast -that’s important for SEO also-. I’ve done this in several Start ups with wonderful results, and my Framework Catalonia also incorporates this functionality very easily to use. Note: This is only one of the techniques to save the load of the Database Servers. Many more come later. ## Cache languages to disk If you have an application that is multi-language, or if your point for the Strings (sections, pages, campaigns..) to be edited by Marketing is the Database, there is no need to query it all the time. Simply provide a tool to “generate language files”. Your languages files can be Javascript files loaded by the page, or can be PHP files generated. For example, the file common_footer_en.php could be generated reading from Database and be like that: <php /* Autogenerated English translations file common_footer_en.php on 2014-08-10 02:22 from the database */$st_translations['seconds']                = 'seconds';
$st_translations['Time'] = 'Time';$st_translations['Vars used']              = 'Vars used in these templates';
$st_translations['Total Var replacements'] = 'Total replaced';$st_translations['Exec time']              = 'Execution time';
$st_translations['Cached controller'] = 'Cached controller';  So the PHP file is going to be generated when someone at your organization updates the languages, and your code is including it normally like with any other PHP file. ## Use the Crons You can set cron jobs to do many operations, like map reduce, counting in the database or effectively deleting the data that the user selected to delete. Imagine that you have classified portal, and you want to display the number of announces for that category. You can have a table NUM_ANNOUNCES to store the number of announces, and update it hourly. Then your database will only do the counting once per hour, and your application will be reading the number from the table NUM_ANNOUNCES. The Cron can also be used to make expire old announces. That way you can avoid a user having to wait for that clean up taking process when you have a http request to PHP. A cron file can be invoked by: php -f cron.php By: ./cron.php If you give permissions of execution with chmod +x and set the first line in cron.php as: #!/usr/bin/env php Or you can do a trick, that is emulate a http request from bash, by invoking a url with curl or with wget. Set the .htaccess so the folder for the cron tasks can only be executed from localhost for adding security. This last trick has the inconvenient that the calling has the same problems as any http requests: restarting Apache will kill the process, the connection can be closed by timed out (e.g. if process is taking more seconds than the max. execution time, etc…) ## Use Ramdisk for PHP files With Linux is very easy to setup a RamDisk. You can setup a RamDisk and rsync all your web .PHP files at system boot time, and when deploying changes, and config Apache to use the Ramdisk folder for the website. That way for every request to the web, PHP files will be served from RAM directly, saving the slow disk access. Even with OpCache active, is a great improvement. At these times were one Gigabyte of memory is really cheap there is a huge difference from reading files from disk, and getting them from memory. (Reading and writing to RAM memory is many many many times faster than magnetic disks, and many times faster than SSD disks) Also .js, .css, images… can be served from a Ram disk folder, depending on how big your web is. ## Ramdisk for /tmp If your project does operations on disk, like resizing images, compressing files, reading/writing large CSV files, etcetera you can greatly improve the performance by setting the /tmp folder to a Ramdisk. If your PHP project receives file uploads they will also benefit (a bit) from storing the temporal files to RAM instead to the disk. ## Use Cache Lite Cache Lite is a Pear extension that allows you to keep data in a local cache of the Web Server. You can cache .html pages, or you can cache Queries and their result. <?php require_once "Cache/Lite.php";$options = array(
'cacheDir' => '/tmp/',
'pearErrorMode' => CACHE_LITE_ERROR_DIE
);
$cache = new Cache_Lite($options);

if ($data =$cache->get('id_of_the_page')) {

// Cache hit !
// Content is in $data echo$data;

} else {

// No valid cache found (you have to make and save the page)
$data = '<html><head><title>test</title></head><body><p>this is a test</p></body></html>'; echo$data;
$cache->save($data);

}

It is nice that Cache Lite handles the TTL and keeps the info stored in different sub-directories in order to keep a decent performance. (As you may know many files in the same directory slows the access much).

## Use HHVM (HipHop Virtual Machine) from Facebook

Facebook Engineers are always trying to optimize what is run on the Servers.

Faster code means, less machines. Even 1% of CPU use improvement means a lot of Servers less. Less Servers to maintain, less money wasted, less space on the Data Centers…

So they created the HHVM HipHop Virtual Machine that is able to run PHP code, much much faster than PHP. And is compatible with most of the Frameworks and Open Source projects.

They also created the Hack language that is an improved PHP, with type hinting.

So you can use HHVM to make your code run faster with the same Server and without investing a single penny.

## Use C extensions

You can create and use your own C extensions.

C extensions will bring really fast execution. Just to get the idea:
I built a PHP extension to compare the performance from calculating the Bernoulli number with PHP and with the .so extension created in C.
In my Core i7 times were:
PHP:
Computed in 13.872583150864 s
PHP calling the C compiled extension:
Computed in 0.038495063781738 s

That’s 360.37 times faster using the C extension. Not bad.

## Use Zephir

Zephir is a an Open Source language, very similar to PHP,  that allows to create and maintain easily extensions for PHP.

## Use Phalcon

Phalcon is a Web MVC Framework implemented as C extension, so it offers a high performance.

The views syntax are very very similar to Twig.

## Check if you’re using the correct Engine for MySql

Many Developers create the tables and never worry about that. And many are using MyIsam by default. It was the by default Engine prior to MySql 5.5.

While MyIsam can bring good performance in some certain cases, my recommendation is to use InnoDb.

Normally you’ll have a gain in performance with MyIsam if you’ve a table were you only write or only read, but in all the other cases InnoDb is expected to be much more performant and safe.

MyIsam tables also get corruption from time to time and need manually fixing and writing to disks are not so reliable than InnoDb.

As MyIsam uses table-locking for updates and deletes to any existing row, it is easy to see that if you’re in a web environment with multiple users, blocking the table -so the other operations have to wait- will make things be slow.

If you have to use Joins clearly you will benefit from using InnoDb also.

## Use InMemory Engine from MySql

MySql has a very powerful Engine called InMemory.

The InMemory Engine will store things in RAM and loss the data when MySql is restarted.

However is very fast and very easy to use.

Imagine that you have a travel application that constantly looks at which country belongs the city specified by user. A Quickwin would be to INSERT all this data in the InMemory Engine of MySql when it is started, and do just one change in your code: to use that Table.

Really easy. Quick improvement.

## Use curl asynchronously

If your PHP has to communicate with other systems using curl, you can do the http/s call, and instead of waiting for a response let your PHP do more things in the meantime, and then check the results.

You can also call to multiple curl calls in parallel, and so avoid doing one by one in serial.

## Serialize

Guess that you have a query that returns 1000 results. Then you add one by one to an array.

Probably you’re going to have substantial gain if you keep in the database a single row, with the array serialized.

So an array like:

\$st_places = Array(‘Barcelona’, ‘Dublin’, ‘Edinburgh’, ‘San Francisco’, ‘London’, ‘Berlin’, ‘Andorra la Vella’, ‘Prats de Lluçanès’);

Would be serialized to an string like:

a:8:{i:0;s:9:”Barcelona”;i:1;s:6:”Dublin”;i:2;s:9:”Edinburgh”;i:3;s:13:”San Francisco”;i:4;s:6:”London”;i:5;s:6:”Berlin”;i:6;s:16:”Andorra la Vella”;i:7;s:19:”Prats de Lluçanès”;}

This can be easily stored as String and unserialized later back to an array.

Note: In Internet we have a lot of encodings, Hebrew, Japanese… languages. Be careful with encodings when serializing, using JSon, XML, storing in databases without UTF support, etc…

## Use Memcached to store common things

Memcached is a NoSql database in memory that can run in cluster.

The idea is to keep things there, in order to offload the load of the database. And as everything is in RAM it really runs fast.

You can use Memcached to cache Queries and their results also.

For example:

You have query SELECT * FROM translations WHERE section=’MAIN’.

Then you look if that String exists as key in the Memcached, and if it exists you fetch the results (that are serialized) and you avoid the query. If it doesn’t exist, you do normally the query to the database, serialize the array and store it in the Memcached with a TTL (Time to Live) using the Query (String) as primary key. For security you may prefer to hash the query with MD5 or SHA-1 and use the hash as key instead of using it plain.

When the TTL is reached the validity of the data would have expired and so it’s time to reinsert the contents in the next query.

Be careful, I’ve seen projects that were caching private data from users without isolating the key properly, so other users were getting the info from other users.

For example, if the key used was ‘Name’ and the value ‘Carles Mateo’ obviously the next user that fetch the key ‘Name’ would get my name and not theirs.

If you store private data of users in Memcache, it is a nice idea to append the owner of that info to the hash. E.g. using key: 10701577-FFADCEDBCCDFFFA10C

Where ‘10701577’ would be the user_id of the owner of the info, and ‘FFADCEDBCCDFFFA10C’ a hash of the query.

Before I suggested that you can keep a table of counting for the announces in a classified portal. This number can be stored in the Memcached instead.

You can store also common things, like translations, or cities like in the example before, rate of change for a currency exchanging website…

The most common way to store things there is serialized or Json encoded.

Be aware of the memory limits of Memcached and contrl the cache hitting ratio to avoid inserting data, and losing it constantly because is used few and Memcached has few memory.

You can also use Redis.

## Use jQuery for Production (small file) and minimized files for js

Use the Production jQuery library in Production, I mean do not use the bigger file Development jQuery library for Production.

There are product that eliminate all the necessary spaces in .js and .css files, and so are served much faster. These process is called minify.

It is important to know that in many emerging markets in the world, like Brazil, they have slow DSL lines. Many 512 Kbit/secons, and even modem connections!.

## Activate compression in the Server

If you send large text files, or Jsons, you’ll benefit from activating the compression at the Server.

It consumes some CPU, but many times it brings an important improvement in speed serving the pages to the users.

## Use a CDN

You can use a Content Delivery Network to offload your Servers from sending plain texts, html, images, videos, js, css…

You can delegate this to the CDN, they have very speedy Internet lines and Servers, so your Servers can concentrate into doing only BackEnd operations.

The most well known are Akamai and Amazon Cloud Front.

Please take attention to the documentation, a common mistake is to send Cache Headers to the CDN servers, while they’ll use this headers to set the cache TTL and ignore their web configuration parameters. (For example s-maxage, like: Cache-Control: public, s-maxage=600)

HTTP/1.1 200 OK
Server: nginx
Date: Wed, 20 Aug 2014 10:50:21 GMT
Content-Type: text/html; charset=UTF-8
Connection: close
Vary: Accept-Encoding
Cache-Control: max-age=0, public, s-maxage=10800
Vary: X-User-Hash,Accept-Encoding
X-Location-Id: 2
X-Content-Digest: ezlocation/2/end5139244ced4b25606ef0a39235982b1662d01cc
Content-Length: 68250
Age: 3

You can take a look at any website by telneting to the port 80 and doing the request manually or easily by using lynx:

## Do you need a Framework?

If you’re processing only BackEnd petitions, like in the video games industry, serving API’s, RESTful, etc… you probably don’t need a Framework.

The Frameworks are generic and use much more resources than you’re really need for a fast reply.

Many times using a heavy Framework has a cost of factor times, compared to use simply PHP.

## Save database connections until really needed

Many Frameworks create a connection to the Database Server by default. But certain parts of your code application do not require to connect to the database.

For example, validating the data from a form. If there are missing fields, the PHP will not operate with the Database, just return an error via JSon or refreshing the page, informing that the required field is missing.

If a not logged user is requesting the dashboard page, there is no need to open a connection to the database (unless you want to write the access try to an error log in the database).

In fact opening connections by default makes easier for attackers to do DoS attacks.

With a Singleton pattern you can easily implement a Db class that handles this transparently for you.

## Memcached session

When you have several Web Servers you’ll need something more flexible than the default PHP handler (that stores to a file in the Web Server).

The most common is to store the Session, serialized, in a Memcached Cluster.

## Use Cassandra

Apache Cassandra is a NoSql database that allows to Scale out very easily.

The main advantage is that scales linearly. If you have 4 nodes and add 4 more, your performance will be doubled. It has no single point of failure, is also resilient to node failures, it replicates the data among the nodes, splits the load over the nodes automatically and support distributed datacenter architectures.

To know more abiut NoSql and Cassandra, read my article: Upgrade your scalability with NoSql. And to start developing with Cassandra in PHP, python or Java read my contributed article: Begin developing with Cassandra.

## Use MySql primary and secondaries

A easy way to split the load is to have a MySql primary Server, that handles the writes, and MySql secondary (or Slave) Servers handling the reads.

Every write sent to the Master is replicated into the Slaves. Then your application reads from the slaves.

You have to tell your code to do the writes to database to the primary Server, and the reads to the secondaries. You can have a Load Balancer so your code always ask the Load Balancer for the reads and it makes the connection to the less used server.

## Do Database sharding

To shard the data consist into splitting the data according to a criteria.

For example, imagine we have 8 MySql Servers, named mysql0 to mysql7. If we want to insert or read data for user 1714, then the Server will be chosen from dividing the user_id, so 1714, between the number of Servers, and getting the MOD.

So 1714 % 8 gives 2. This means that the MySql Server to use is the mysql2.

For the user_id 16: 16 & 8 gives 0, so we would use mysql0. And so.

You can shard according to the email, or other fields as well. And you can have the same master and secondaries for the shards also.

When doing sharding in MySql you cannot do joins to data in other Servers. (but you can replicate all the data from the several shards in one big server in house, in your offices, and so query it and join if you need that for marketing purposes).

I always use my own sharding, but there is a very nice product from CodeFutures called dbshards. It handles the traffics transparently. I used it when in a video games Start up with very satisfying result.

## Use Cassandra assync queries

Cassandra support asynchronous queries. That means you can send the query to the Server, and instead of waiting, do other jobs. And check for the result later, when is finished.

## Consider using Hadoop + HBASE

A Cluster alternative to Cassandra.

You can put a Load Balancer or a Reverse Proxy in front of your Web Servers. The Load Balancer knows the state of the Web Servers, so it will remove a Web Server from the Array if it stops responding and everything will continue being served to the users transparently.

There are many ways to do Load Balancing: Round Robin, based on the load on the Web Servers, on the number of connections to each Web Server, by cookie…

To use a Cookie based Load Balancer is a very easy way to split the load for WordPress and Drupal Servers.

Imagine you have 10 Web Servers. In the .htaccess they set a rule to set a Cookie like:

SERVER_ID=WEB01

That was in the case of the first Web Server.

SERVER_ID=WEB02

Etcetera

When for first time an user connects to the Load Balancer it sends the user to one of the 10 Web Servers. Then the Web Server sends its cookie to the browser of the Client. E.g. WEB07

After that, in the next requests from the client it will be redirected to the server by the Load Balancer to the Server that set the Cookie, so in this example WEB07.

The nice thing of this way of splitting the traffic is that you don’t have to change your code, nor handling the Sessions different.

If you use two Load Balancers you can have a heartbeat process in them and a Virtual Ip, and so in case your main Load Balancer become irresponsible the Virtual Ip will be mapping to the second Load Balancer in milliseconds. That provides HA.

## Use http accelerators

Nginx, varnish, squid… to serve static content and offload the PHP Web Servers.

## Auto-Scale in the Cloud

If you use the Cloud you can easily set Auto-Scaling for different parts of your core.

A quick win is to Scale the Web Servers.

As in the Cloud you pay per hour using a computer, you will benefit from cost reduction in you stop using the servers when you don’t need them, and you add more Servers when more users are coming to your sites.

Video game companies are a good example of hours of plenty use and valleys with few users, although as users come from all the planet it is most and most diluted.

Some cool tools to Auto-Scaling are: ECManaged, RightScale, Amazon CloudWatch.

Actually the Performance of the Google Cloud to Scale without any precedent is great.

Opposite to other Clouds that are based on instances, Google Cloud offers the platform, that will spawn your code across so many servers as needed, transparently to you. It’s a black box.

## Schedule operations with RabbitMQ

Or other Queue Manager.

The idea is to send the jobs to the Queue Manager, the PHP will continue working, and the jobs will be performed asynchronously and notify the end.

RabbitMQ is cool also because it can work in cluster and HA.

## Use GlusterFs for NAS

GlusterFs (and other products) allow you to have a Distributed File System, that splits the load and the data across the Servers, and resist node failures.

If you have to have a shared folder for the user’s uploads, for example for the profile pictures, to have the PHP and general files locally in the Servers and the Shared folder in a GlusterFs is a nice option.

## Avoid NFS for PHP files and config files

As told before try to have the PHP files in a RAM disk, or in the local disk (Linux caches well and also OpCache), and try to not write code that reads files from disk for determining config setup.

I remember a Start up incubator that had a very nice Server, but the PHP files were read from a mounted NFS folder.

That meant that on every request, the Server had to go over the network to fetch the files.

Sadly for the project’s performance the PHP was reading a file called ENVIRONMENT that contained “PROD” or “DEVEL”. And this was done in every single request.

Even worst, I discovered that the switch connecting the Web Server and the NFS Server was a cheap 10 Mbit one. So all the traffic was going at 10 Mbit/s. Nice bottleneck.