Tag Archives: Python

Performance of several languages

Notes on 2017-03-26 18:57 CEST – Unix time: 1490547518 :

  1. As several of you have noted, it would be much better to use a random value, for example, read by disk. This will be an improvement done in the next benchmark. Good suggestion thanks.
  2. Due to my lack of time it took more than expected updating the article. I was in a long process with google, and now I’m looking for a new job.
  3. I note that most of people doesn’t read the article and comment about things that are well indicated on it. Please before posting, read, otherwise don’t be surprise if the comment is not published. I’ve to keep the blog clean of trash.
  4. I’ve left out few comments cause there were disrespectful. Mediocrity is present in the society, so simply avoid publishing comments that lack the basis of respect and good education. If a comment brings a point, under the point of view of Engineering, it is always published.


(This article was last updated on 2015-08-26 15:45 CEST – Unix time: 1440596711. See changelog at bottom)

One may think that Assembler is always the fastest, but is that true?.

If I write a code in Assembler in 32 bit instead of 64 bit, so it can run in 32 and 64 bit, will it be faster than the code that a dynamic compiler is optimizing in execution time to benefit from the architecture of my computer?.

What if a future JIT compiler is able to use all the cores to execute a single thread developed program?.

Are PHP, Python, or Ruby fast comparing to C++?. Does Facebook Hip Hop Virtual machine really speeds PHP execution?.

This article shows some results and shares my conclusions. It is as a base to discuss with my colleagues. Is not an end, we are always doing tests, looking for the edge, and looking at the root of the things in detail. And often things change from one version to the other. This article shows not an absolute truth, but brings some light into interesting aspects.

It could show the performance for the certain case used in the test, although generic core instructions have been selected. Many more tests are necessary, and some functions differ in the performance. But this article is a necessary starting for the discussion with my IT-extreme-lover friends and a necessary step for the next upcoming tests.

It brings very important data for Managers and Decision Makers, as choosing the adequate performance language can save millions in hardware (specially when you use the Cloud and pay per hour of use) or thousand hours in Map Reduce processes.

Acknowledgements and thanks

Credit for the great Eduard Heredia, for porting my C source code to:

  • Go
  • Ruby
  • Node.js

And for the nice discussions of the results, an on the optimizations and dynamic vs static compilers.

Thanks to Juan Carlos Moreno, CTO of ECManaged Cloud Software for suggesting adding Python and Ruby to the languages tested when we discussed my initial results.

Thanks to Joel Molins for the interesting discussions on Java performance and garbage collection.

Thanks to Cliff Click for his wonderful article on Java vs C performance that I found when I wanted to confirm some of my results and findings.

I was inspired to do my own comparisons by the benchmarks comparing different framework by techempower. It is amazing to see the results of the tests, like how C++ can serialize JSon 1,057,793 times per second and raw PHP only 180,147 (17%).

For the impatients

I present the results of the tests, and the conclusions, for those that doesn’t want to read about the details. For those that want to examine the code, and the versions of every compiler, and more in deep conclusions, this information is provided below.


This image shows the results of the tests with every language and compiler.

All the tests are invoked from command line. All the tests use only one core. No tests for the web or frameworks have been made, are another scenarios worth an own article.

More seconds means a worst result. The worst is Bash, that I deleted from the graphics, as the bar was crazily high comparing to others.

* As later is discussed my initial Assembler code was outperformed by C binary because the final Assembler code that the compiler generated was better than mine.

After knowing why (later in this article is explained in detail) I could have reduced it to the same time than the C version as I understood the improvements made by the compiler.


Table of times:

Seconds executing Language Compiler used Version
6 s. Java Oracle Java Java JDK 8
6 s. Java Oracle Java Java JDK 7
6 s. Java Open JDK OpenJDK 7
6 s. Java Open JDK OpenJDK 6
7 s. Go Go Go v.1.3.1 linux/amd64
7 s. Go Go Go v.1.3.3 linux/amd64
8 s. Lua LuaJit Luajit 2.0.2
10 s. C++ g++ g++ (Ubuntu 4.8.2-19ubuntu1) 4.8.2
10 s. C gcc gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2
10 s.
(* first version was 13 s. and then was optimized)
Assembler nasm NASM version 2.10.09 compiled on Dec 29 2013
10 s. Nodejs nodejs Nodejs v0.12.4
14 s. Nodejs nodejs Nodejs v0.10.25
18 s. Go Go go version xgcc (Ubuntu 4.9-20140406-0ubuntu1) 4.9.0 20140405 (experimental) [trunk revision 209157] linux/amd64
20 s. Phantomjs Phantomjs phantomjs 1.9.0
21 s. Phantomjs Phantomjs phantomjs 2.0.1-development
38 s. PHP Facebook HHVM HipHop VM 3.4.0-dev (rel)
44 s. Python Pypy Pypy 2.2.1 (Python 2.7.3 (2.2.1+dfsg-1, Nov 28 2013, 05:13:10))
52 s. PHP Facebook HHVM HipHop VM 3.9.0-dev (rel)
52 s. PHP Facebook HHVM HipHop VM 3.7.3 (rel)
128 s. PHP PHP PHP 7.0.0alpha2 (cli) (built: Jul 3 2015 15:30:23)
278 s. Lua Lua Lua 2.5.3
294 s. Gambas3 Gambas3 3.7.0
316 s. PHP PHP PHP 5.5.9-1ubuntu4.3 (cli) (built: Jul 7 2014 16:36:58)
317 s. PHP PHP PHP 5.6.10 (cli) (built: Jul 3 2015 16:13:11)
323 s. PHP PHP PHP 5.4.42 (cli) (built: Jul 3 2015 16:24:16)
436 s. Perl Perl Perl 5.18.2
523 s. Ruby Ruby ruby 1.9.3p484 (2013-11-22 revision 43786) [x86_64-linux]
694 s. Python Python Python 2.7.6
807 s. Python Python Python 3.4.0
47630 s. Bash GNU bash, version 4.3.11(1)-release (x86_64-pc-linux-gnu)


Conclusions and Lessons Learnt

  1. There are languages that will execute faster than a native Assembler program, thanks to the JIT Compiler and to the ability to optimize the program at runtime for the architecture of the computer running the program (even if there is a small initial penalty of around two seconds from JIT when running the program, as it is being analysed, is it more than worth in our example)
  2. Modern Java can be really fast in certain operations, it is the fastest in this test, thanks to the use of JIT Compiler technology and a very good implementation in it
  3. Oracle’s Java and OpenJDK shows no difference in performance in this test
  4. Script languages really sucks in performance. Python, Perl and Ruby are terribly slow. That costs a lot of money if you Scale as you need more Server in the Cloud
  5. JIT compilers for Python: Pypy, and for Lua: LuaJit, make them really fly. The difference is truly amazing
  6. The same language can offer a very different performance using one version or another, for example the go that comes from Ubuntu packets and the last version from official page that is faster, or Python 3.4 is much slower than Python 2.7 in this test
  7. Bash is the worst language for doing the loop and inc operations in the test, lasting for more than 13 hours for the test
  8. From command line PHP is much faster than Python, Perl and Ruby
  9. Facebook Hip Hop Virtual Machine (HHVM) improves a lot PHP’s speed
  10. It looks like the future of compilers is JIT.
  11. Assembler is not always the fastest when executed. If you write a generic Assembler program with the purpose of being able to run in many platforms you’ll not use the most powerful instructions specific of an architecture, and so a JIT compiler can outperform your code. An static compiler can also outperform your code with very clever optimizations. People that write the compilers are really good. Unless you’re really brilliant with Assembler probably a C/C++ code beats the performance of your code. Even if you’re fantastic with Assembler it could happen that a JIT compiler notices that some executions can be avoided (like code not really used) and bring magnificent runtime optimizations. (for example a near JMP is much more less costly than a far JMP Assembler instruction. Avoiding dead code could result in a far JMP being executed as near JMP, saving many cycles per loop)
  12. Optimizations really needs people dedicated to just optimizations and checking the speed of the newly added code for the running platforms
  13. Node.js was a big surprise. It really performed well. It is promising. New version performs even faster
  14. go is promising. Similar to C, but performance is much better thanks to deciding at runtime if the architecture of the computer is 32 or 64 bit, a very quick compilation at launch time, and it compiling to very good assembler (that uses the 64 bit instructions efficiently, for example)
  15. Gambas 3 performed surprisingly fast. Better than PHP
  16. You should be careful when using C/C++ optimization -O3 (and -O2) as sometimes it doesn’t work well (bugs) or as you may expect, for example by completely removing blocks of code if the compiler believes that has no utility (like loops)
  17. Perl performance really change from using a for style or another. (See Perl section below)
  18. Modern CPUs change the frequency to save energy. To run the tests is strictly recommended to use a dedicated machine, disabling the CPU governor and setting a frequency for all the cores, booting with a text only live system, without background services, not mounting disks, no swap, no network

(Please, before commenting read completely the article )

Explanations in details

Obviously an statically compiled language binary should be faster than an interpreted language.

C or C++ are much faster than PHP. And good code machine is much faster of course.

But there are also other languages that are not compiled as binary and have really fast execution.

For example, good Web Java Application Servers generate compiled code after the first request. Then it really flies.

For web C# or .NET in general, does the same, the IIS Application Server creates a native DLL after the first call to the script. And after this, as is compiled, the page is really fast.

With C statically linked you could generate binary code for a particular processor, but then it won’t work in other processors, so normally we write code that will work in all the processors at the cost of not using all the performance of the different CPUs or use another approach and we provide a set of different binaries for the different architectures. A set of directives doing one thing or other depending on the platform detected can also be done, but is hard, long and tedious job with a lot of special cases treatment. There is another approach that is dynamic linking, where certain things will be decided at run time and optimized for the computer that is running the program by the JIT (Just-in-time) Compiler.

Java, with JIT is able to offer optimizations for the CPU that is running the code with awesome results. And it is able to optimize loops and mathematics operations and outperform C/C++ and Assembler code in some cases (like in our tests) or to be really near in others. It sounds crazy but nowadays the JIT is able to know the result of several times executed blocks of code and to optimize that with several strategies, speeding the things incredible and to outperform a code written in Assembler. Demonstrations with code is provided later.

A new generation has grown knowing only how to program for the Web. Many of them never saw Assembler, neither or barely programmed in C++.

None of my Senior friends would assert that a technology is better than another without doing many investigations before. We are serious. There is so much to take in count, so much to learn always, that one has to be sure that is not missing things before affirming such things categorically. If you want to be taken seriously, you have to take many things in count.

Environment for the tests

Hardware and OS

Intel(R) Core(TM) i7-4770S CPU @ 3.10GHz with 32 GB RAM and SSD Disk.

Ubuntu Desktop 14.04 LTS 64 bit

Software base and compilers

PHP versions

Shipped with my Ubuntu distribution:

php -v
PHP 5.5.9-1ubuntu4.3 (cli) (built: Jul  7 2014 16:36:58)
Copyright (c) 1997-2014 The PHP Group
Zend Engine v2.5.0, Copyright (c) 1998-2014 Zend Technologies
with Zend OPcache v7.0.3, Copyright (c) 1999-2014, by Zend Technologies

Compiled from sources:

PHP 5.6.10 (cli) (built: Jul  3 2015 16:13:11)
Copyright (c) 1997-2015 The PHP Group
Zend Engine v2.6.0, Copyright (c) 1998-2015 Zend Technologies
PHP 5.4.42 (cli) (built: Jul  3 2015 16:24:16)
Copyright (c) 1997-2014 The PHP Group
Zend Engine v2.4.0, Copyright (c) 1998-2014 Zend Technologies


Java 8 version

java -showversion
java version "1.8.0_05"
Java(TM) SE Runtime Environment (build 1.8.0_05-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.5-b02, mixed mode)

C++ version

g++ -v
Using built-in specs.
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 4.8.2-19ubuntu1' --with-bugurl=file:///usr/share/doc/gcc-4.8/README.Bugs --enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.8 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.8 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --disable-libmudflap --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-4.8-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-4.8-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-4.8-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1)

Gambas 3

gbr3 --version

Go (downloaded from google)

go version
go version go1.3.1 linux/amd64

Go (Ubuntu packages)

go version
go version xgcc (Ubuntu 4.9-20140406-0ubuntu1) 4.9.0 20140405 (experimental) [trunk revision 209157] linux/amd64


nasm -v
NASM version 2.10.09 compiled on Dec 29 2013


lua -v
Lua 5.2.3  Copyright (C) 1994-2013 Lua.org, PUC-Rio


luajit -v
LuaJIT 2.0.2 -- Copyright (C) 2005-2013 Mike Pall. http://luajit.org/


Installed with apt-get install nodejs:

nodejs --version

Installed by compiling the sources:

node --version


Installed with apt-get install phantomjs:

phantomjs --version

Compiled from sources:

/path/phantomjs --version

Python 2.7

python --version
Python 2.7.6

Python 3

python3 --version
Python 3.4.0


perl -version
This is perl 5, version 18, subversion 2 (v5.18.2) built for x86_64-linux-gnu-thread-multi
(with 41 registered patches, see perl -V for more detail)


bash --version
GNU bash, version 4.3.11(1)-release (x86_64-pc-linux-gnu)
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Test: Time required for nested loops

This is the first sample. It is an easy-one.

The main idea is to generate a set of nested loops, with a simple counter inside.

When the counter reaches 51 it is set to 0.

This is done for:

  1. Preventing overflow of the integer if growing without control
  2. Preventing the compiler from optimizing the code (clever compilers like Java or gcc with -O3 flag for optimization, if it sees that the var is never used, it will see that the whole block is unnecessary and simply never execute it)

Doing only loops, the increment of a variable and an if, provides us with basic structures of the language that are easily transformed to Assembler. We want to avoid System calls also.

This is the base for the metrics on my Cloud Analysis of Performance cmips.net project.

Here I present the times for each language, later I analyze the details and the code.

Take in count that this code only executes in one thread / core.


C++ result, it takes 10 seconds.

Code for the C++:

* File:   main.cpp
* Author: Carles Mateo
* Created on August 27, 2014, 1:53 PM

#include <cstdlib>
#include <iostream>
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <ctime>

using namespace std;

typedef unsigned long long timestamp_t;

static timestamp_t get_timestamp()
    struct timeval now;
    gettimeofday (&now, NULL);
    return  now.tv_usec + (timestamp_t)now.tv_sec * 1000000;

int main(int argc, char** argv) {

    timestamp_t t0 = get_timestamp();

    // current date/time based on current system
    time_t now = time(0);

    // convert now to string form
    char* dt_now = ctime(&now);

    printf("Starting at %s\n", dt_now);

    int i_loop1 = 0;
    int i_loop2 = 0;
    int i_loop3 = 0;


    for (i_loop1 = 0; i_loop1 < 10; i_loop1++) {
        for (i_loop2 = 0; i_loop2 < 32000; i_loop2++) {
            for (i_loop3 = 0; i_loop3 < 32000; i_loop3++) {

                if (i_counter > 50) {
                    i_counter = 0;
            // If you want to test how the compiler optimizes that, remove the comment
            //i_counter = 0;

    // This is another trick to avoid compiler's optimization. To use the var somewhere
    printf("Counter: %i\n", i_counter);

    timestamp_t t1 = get_timestamp();
    double secs = (t1 - t0) / 1000000.0L;
    time_t now_end = time(0);

    // convert now to string form
    char* dt_now_end = ctime(&now_end);

    printf("End time: %s\n", dt_now_end);

    return 0;


You can try to remove the part of code that makes the checks:

                /* if (i_counter > 50) {
                    i_counter = 0;

And the use of the var, later:

    //printf("Counter: %i\n", i_counter);

Note: And adding a i_counter = 0; at the beginning of the loop to make sure that the counter doesn’t overflows. Then the C or C++ compiler will notice that this result is never used and so it will eliminate the code from the program, having as result and execution time of 0.0 seconds.


The code in Java:

package cpu;

 * @author carles.mateo
public class Cpu {

     * @param args the command line arguments
    public static void main(String[] args) {
        int i_loop1 = 0;
        //int i_loop_main = 0;
        int i_loop2 = 0;
        int i_loop3 = 0;
        int i_counter = 0;
        String s_version = System.getProperty("java.version");
        System.out.println("Java Version: " + s_version);

        System.out.println("Starting cpu.java...");
        for (i_loop1 = 0; i_loop1 < 10; i_loop1++) {            
                for (i_loop2 = 0; i_loop2 < 32000; i_loop2++) {
                    for (i_loop3 = 0; i_loop3 < 32000; i_loop3++) {
                        if (i_counter > 50) { 
                            i_counter = 0;

It is really interesting how Java, with JIT outperforms C++ and Assembler.

It takes only 6 seconds.

Netbeans with Java IDE executing with OpenJDK 1.6 in 6 seconds


The case of Go is interesting because I saw a big difference from the go shipped with Ubuntu, and the the go I downloaded from http://golang.org/dl/. I downloaded 1.3.1 and 1.3.3 offering the same performance. 7 seconds.

blog-carlesmateo-com-go1-3-3-linux-amd64-performance-37Source code for nested_loops.go

package main

import ("fmt"

func main() {
   fmt.Printf("Starting: %s", time.Now().Local())
   var i_counter = 0;
   for i_loop1 := 0; i_loop1 < 10; i_loop1++ {
       for i_loop2 := 0; i_loop2 < 32000; i_loop2++ {
           for i_loop3 := 0; i_loop3 < 32000; i_loop3++ {
               if i_counter > 50 {
                   i_counter = 0;

   fmt.Printf("\nCounter: %#v", i_counter)
   fmt.Printf("\nEnd: %s\n", time.Now().Local())


Here is the Assembler for Linux code, with SASM, that I created initially (bellow is optimized).

%include "io.inc"

section .text
global CMAIN
    ;mov rbp, rsp; for correct debugging
    ; Set to 0, the faster way
    xor     esi, esi

    mov ecx, 10
    mov ebx, ecx
    jmp DO_LOOP2
    mov ecx, ebx
    loop LOOP1
    jmp QUIT

    mov ecx, 32000
    mov eax, ecx
    ;call DO_LOOP3
    jmp DO_LOOP3
    mov ecx, eax
    loop LOOP2

    ; Set to 32000 loops    
    MOV ecx, 32000 
    inc     esi
    cmp     esi, 50
    jg      COUNTER_TO_0

    loop LOOP3
    ; Set to 0
    xor     esi, esi
;    jmp QUIT

    xor eax, eax

It took 13 seconds to complete.

One interesting explanation on why binary C or C++ code is faster than Assembler, is because the C compiler generates better Assembler/binary code at the end. For example, the use of JMP is expensive in terms of CPU cycles and the compiler can apply other optimizations and tricks that I’m not aware of, like using faster registers, while in my code I use ebx, ecx, esi, etc… (for example, imagine that using cx is cheaper than using ecx or rcx and I’m not aware but the guys that created the Gnu C compiler are)

blog-carlesmateo-com-sasm-assembler-linux-64-bits-code-12-13-secondsTo be sure of what’s going on I switched in the LOOP3 the JE and the JMP of the code, for groups of 50 instructions, INC ESI, one after the other and the time was reduced to 1 second.

(In C also was reduced even a bit more when doing the same)

To know what’s the translation of the C code into Assembler when compiled, you can do:

objdump --disassemble nested_loops

Look for the section main and you’ll get something like:

0000000000400470 <main>:
400470:    bf 0a 00 00 00           mov    $0xa,%edi
400475:    31 c9                    xor    %ecx,%ecx
400477:    be 00 7d 00 00           mov    $0x7d00,%esi
40047c:    0f 1f 40 00              nopl   0x0(%rax)
400480:    b8 00 7d 00 00           mov    $0x7d00,%eax
400485:    0f 1f 00                 nopl   (%rax)
400488:    83 c2 01                 add    $0x1,%edx
40048b:    83 fa 33                 cmp    $0x33,%edx
40048e:    0f 4d d1                 cmovge %ecx,%edx
400491:    83 e8 01                 sub    $0x1,%eax
400494:    75 f2                    jne    400488 <main+0x18>
400496:    83 ee 01                 sub    $0x1,%esi
400499:    75 e5                    jne    400480 <main+0x10>
40049b:    83 ef 01                 sub    $0x1,%edi
40049e:    75 d7                    jne    400477 <main+0x7>
4004a0:    48 83 ec 08              sub    $0x8,%rsp
4004a4:    be 34 06 40 00           mov    $0x400634,%esi
4004a9:    bf 01 00 00 00           mov    $0x1,%edi
4004ae:    31 c0                    xor    %eax,%eax
4004b0:    e8 ab ff ff ff           callq  400460 <__printf_chk@plt>
4004b5:    31 c0                    xor    %eax,%eax
4004b7:    48 83 c4 08              add    $0x8,%rsp
4004bb:    c3                       retq

Note: this is in the AT&T syntax and not in the Intel. That means that add $0x1,%edx is adding 1 to EDX registerg (origin, destination).

As you can see the C compiler has created a very differed Assembler version respect what I created.
For example at 400470 it uses EDI register to store 10, so to control the number of the outer loop.
It uses ESI to store 32000 (Hexadecimal 0x7D00), so the second loop.
And EAX for the inner loop, at 400480.
It uses EDX for the counter, and compares to 50 (Hexa 0x33) at 40048B.
In 40048E it uses the CMOVGE (Mov if Greater or Equal), that is an instruction that was introduced with the P6 family processors, to move the contents of ECX to EDX if it was (in the CMP) greater or equal to 50. As in 400475 a XOR ECX, ECX was performed, EXC contained 0.
And it cleverly used SUB and JNE (JNE means Jump if not equal and it jumps if ZF = 0, it is equivalent to JNZ Jump if not Zero).
It uses between 4 and 16 clocks, and the jump must be -128 to +127 bytes of the next instruction. As you see Jump is very costly.

Looks like the biggest improvement comes from the use of CMOVGE, so it saves two jumps that my original Assembler code was performing.
Those two jumps multiplied per 32000 x 32000 x 10 times, are a lot of Cpu clocks.

So, with this in mind, as this Assembler code takes 10 seconds, I updated the graph from 13 seconds to 10 seconds.


This is the initial code:

local i_counter = 0

local i_time_start = os.clock()

for i_loop1=0,9 do
    for i_loop2=0,31999 do
        for i_loop3=0,31999 do
            i_counter = i_counter + 1
            if i_counter > 50 then
                i_counter = 0

local i_time_end = os.clock()
print(string.format("Counter: %i\n", i_counter))
print(string.format("Total seconds: %.2f\n", i_time_end - i_time_start))

In the case of Lua theoretically one could take great advantage of the use of local inside a loop, so I tried the benchmark with modifications to the loop:

for i_loop1=0,9 do
    for i_loop2=0,31999 do
        local l_i_counter = i_counter
        for i_loop3=0,31999 do
             l_i_counter = l_i_counter + 1
             if l_i_counter > 50 then
                 l_i_counter = 0
        i_counter = l_i_counter

I ran it with LuaJit and saw no improvements on the performance.


var s_date_time = new Date();
console.log('Starting: ' + s_date_time);

var i_counter = 0;

for (var $i_loop1 = 0; $i_loop1 < 10; $i_loop1++) {
   for (var $i_loop2 = 0; $i_loop2 < 32000; $i_loop2++) {
       for (var $i_loop3 = 0; $i_loop3 < 32000; $i_loop3++) {
           if (i_counter > 50) {
               i_counter = 0;

var s_date_time_end = new Date();

console.log('Counter: ' + i_counter + '\n');

console.log('End: ' + s_date_time_end + '\n');

Execute with:

nodejs nested_loops.js


The same code as nodejs adding to the end:


In the case of Phantom it performs the same in both versions 1.9.0 and 2.0.1-development compiled from sources.


The interesting thing on PHP is that you can write your own extensions in C, so you can have the easy of use of PHP and create functions that really brings fast performance in C, and invoke them from PHP.


$s_date_time = date('Y-m-d H:i:s');

echo 'Starting: '.$s_date_time."\n";

$i_counter = 0;

for ($i_loop1 = 0; $i_loop1 < 10; $i_loop1++) {
   for ($i_loop2 = 0; $i_loop2 < 32000; $i_loop2++) {
       for ($i_loop3 = 0; $i_loop3 < 32000; $i_loop3++) {
           if ($i_counter > 50) {
               $i_counter = 0;

$s_date_time_end = date('Y-m-d H:i:s');

echo 'End: '.$s_date_time_end."\n";

Facebook’s Hip Hop Virtual Machine is a very powerful alternative, that is JIT powered.

Downloading the code and compiling it is just easy, just:

git clone https://github.com/facebook/hhvm.git
cd hhvm
rm -r third-party
git submodule update --init --recursive

Or just grab precompiled packages from https://github.com/facebook/hhvm/wiki/Prebuilt%20Packages%20for%20HHVM


from datetime import datetime
import time

print ("Starting at: " + str(datetime.now()))
s_unixtime_start = str(time.time())

i_counter = 0

# From 0 to 31999
for i_loop1 in range(0, 10):
    for i_loop2 in range(0,32000):
         for i_loop3 in range(0,32000):
             i_counter += 1
             if ( i_counter > 50 ) :
                 i_counter = 0

print ("Ending at: " + str(datetime.now()))
s_unixtime_end = str(time.time())

i_seconds = long(s_unixtime_end) - long(s_unixtime_start)
s_seconds = str(i_seconds)

print ("Total seconds:" + s_seconds)


#!/usr/bin/ruby -w

time1 = Time.new

puts "Starting : " + time1.inspect

i_counter = 0;

for i_loop1 in 0..9
    for i_loop2 in 0..31999
        for i_loop3 in 0..31999
            i_counter = i_counter + 1
            if i_counter > 50
                i_counter = 0

time1 = Time.new

puts "End : " + time1.inspect


The case of Perl was very interesting one.

This is the current code:

#!/usr/bin/env perl

print "$s_datetime Starting calculations...\n";


for my $i_loop1 (0 .. 9) {
    for my $i_loop2 (0 .. 31999) {
        for my $i_loop3 (0 .. 31999) {
            if ($i_counter > 50) {
                $i_counter = 0;



print "Counter: $i_counter\n";
print "Total seconds: $i_seconds";

But before I created one, slightly different, with the for loops like in the C style:

#!/usr/bin/env perl



for (my $i_loop1=0; $i_loop1 < 10; $i_loop1++) {
    for (my $i_loop2=0; $i_loop2 < 32000; $i_loop2++) {
        for (my $i_loop3=0; $i_loop3 < 32000; $i_loop3++) {
            if ($i_counter > 50) {
                $i_counter = 0;



print "Total seconds: $i_seconds";

I repeated this test, with the same version of Perl, due to the comment of a reader (thanks mpapec) that told:

In this particular case perl style loops are about 45% faster than original code (v5.20)

And effectively and surprisingly the time passed from 796 seconds to 436 seconds.

So graphics are updated to reflect the result of 436 seconds.


echo "Bash version ${BASH_VERSION}..."

let "s_time_start=$(date +%s)"
let "i_counter=0"

for i_loop1 in {0..9}
     echo "."
     for i_loop2 in {0..31999}
         for i_loop3 in {0..31999}
             if [[ $i_counter > 50 ]]
                 let "i_counter=0"
#let "var=var+1"
#let "var+=1"
#let "var++"

let "s_time_end=$(date +%2)"

let "s_seconds = s_time_end - s_time_start"
echo "Total seconds: $s_seconds"

# Just in case it overflows

Gambas 3

Gambas is a language and an IDE to create GUI applications for Linux.
It is very similar to Visual Basic, but better, and it is not a clone.

I created a command line application and it performed better than PHP. There has been done an excellent job with the compiler.

blog-carlesmateo-com-gbr3-gambas-performanceNote: in the screenshot the first test ran for few seconds more than in the second. This was because I deliberately put the machine under some load and I/O during the tests. The valid value for the test, confirmed with more iterations is the second one, done under the same conditions (no load) than the previous tests.

' Gambas module file MMain.module

Public Sub Main()

    ' @author Carles Mateo http://blog.carlesmateo.com
    Dim i_loop1 As Integer
    Dim i_loop2 As Integer
    Dim i_loop3 As Integer
    Dim i_counter As Integer
    Dim s_version As String
    i_loop1 = 0
    i_loop2 = 0
    i_loop3 = 0
    i_counter = 0
    s_version = System.Version
    Print "Performance Test by Carles Mateo blog.carlesmateo.com"    
    Print "Gambas Version: " & s_version

    Print "Starting..." & Now()
    For i_loop1 = 0 To 9
        For i_loop2 = 0 To 31999
            For i_loop3 = 0 To 31999
                i_counter = i_counter + 1
                If (i_counter > 50) Then
                    i_counter = 0
    Print i_counter
    Print "End " & Now()



2015-08-26 15:45

Thanks to the comment of a reader, thanks Daniel, pointing a mistake. The phrase I mentioned was on conclusions, point 14, and was inaccurate. The original phrase told “go is promising. Similar to C, but performance is much better thanks to the use of JIT“. The allusion to JIT is incorrect and has been replaced by this: “thanks to deciding at runtime if the architecture of the computer is 32 or 64 bit, a very quick compilation at launch time, and it compiling to very good assembler (that uses the 64 bit instructions efficiently, for example)”

2015-07-17 17:46

Benchmarked Facebook HHVM 3.9 (dev., the release date is August 3 2015) and HHVM 3.7.3, they take 52 seconds.

Re-benchmarked Facebook HHVM 3.4, before it was 72 seconds, it takes now 38 seconds. I checked the screen captures from 2014 to discard an human error. Looks like a turbo frequency issue on the tests computer, with the CPU governor making it work bellow the optimal speed or a CPU-hungry/IO process that triggered during the tests and I didn’t detect it. Thinking about forcing a fixed CPU speed for all the cores for the tests, like 2.4 Ghz and booting a live only text system without disk access and network to prevent Ubuntu launching processes in the background.

2015-07-05 13:16

Added performance of Phantomjs 1.9.0 installed via apt-get install phantomjs in Ubuntu, and Phantomjs 2.0.1-development.

Added performance of nodejs 0.12.04 (compiled).

Added bash to the graphic. It has so bad performance that I had to edit the graphic to fit in (color pink) in order prevent breaking the scale.

2015-07-03 18:32

Added benchmarks for PHP 7 alpha 2, PHP 5.6.10 and PHP 5.4.42.

2015-07-03 15:13
Thanks to the contribution of a reader (thanks mpapec!) I tried with Perl for style, resulting in passing from 796 seconds to 436 seconds.
(I used the same Perl version: Perl 5.18.2)
Updated test value for Perl.
Added new graphics showing the updated value.

Thanks to the contribution of a reader (thanks junk0xc0de!) added some additional warnings and explanations about the dangers of using -O3 (and -O2) if C/C++.

Updated the Lua code, to print i_counter and do the if i_counter > 50
This makes it take a bit longer, few cents, but passing from 7.8 to 8.2 seconds.
Updated graphics.

Begin developing with Cassandra

This article is contributed to Luna Cloud blog by Carles Mateo.

We architects, developers and start ups are facing new challenges.

We have now to create applications that have to scale and scale at world-wide level.

That puts over the table big and exciting challenges.

To allow that increasing level of scaling, we designed and architect tools and techniques and tricks, but fortunately now there are great products born to scale out and to deal with this problems: like NoSql databases like Cassandra, MongoDb, Riak, Hadoop’s Hbase, Couchbase or CouchDb, NoSql in-Memory like Memcached or Redis, big data solutions like Hadoop, distributed files systems like Hadoop’s HDFS, GlusterFs, Lustre, etc…

In this article I will cover the first steps to develop with Cassandra, under the Developer point of view.

As a first view you may be interested in Cassandra because:

  • Is a Database with no single point of failure
  • Where all the Database Servers work in Peer to Peer over Tcp/Ip
  • Fault-tolerance. You can set replication factor, and the data will be sharded and replicated over different servers and so being resilient to node failures
  • Because the Cassandra Cluster splits and balances the work across the Cluster automatically
  • Because you can scale by just adding more nodes to the Cluster, that’s scaling horizontally, and it’s linear. If you double the number of servers, you double the performance
  • Because you can have cool configurations like multi-datacenter and multi-rack and have the replication done automatically
  • You can have several small, cheap, commodity servers, with big SATA disks with better result than one very big, very expensive, and unable-to-scale-more server with SSD or SAS expensive disks.
  • It has the CQL language -Cassandra Query Language-, that is close to SQL
  • Ability to send querys in async mode (the CPU can do other things while waiting for the query to return the results)

Cassandra is based in key/value philosophy but with columns. It supports multiple columns. That’s cool, as theoretically it supports 2 GB per column (at practical level is not recommended to go with data so big, specially in multi-user environments).

I will not lie to you: It is another paradigm, and comes with a lot of knowledge to acquire, but it is necessary and a price worth to pay for being able of scaling at nowadays required levels.

Cassandra only offers native drivers for: Java, .NET, C++ and Python 2.7. The rest of solutions are contributed, sadly most of them are outdated and unmantained.

You can find all the drivers here:


To develop with PHP

Cassandra has no PHP driver officially, but has some contributed solutions.

By myself I created several solutions: CQLSÍ uses cqlsh to perform queries and interfaces without needing Thrift, and Cassandra Universal Driver is a Web Gateway that I wrote in Python that allows you to query Cassandra from any language, and recently I contributed to a PHP driver that speaks the Cassandra binary protocol (v1) directly using Tcp/Ip sockets.

That’s the best solution for me by now, as it is the fastest and it doesn’t need any third party library nor Thrift neither.

You can git clone it from:


Here we go with some samples:

Create a keyspace

KeySpace is the equivalent to a database in MySQL.


require_once 'Cassandra/Cassandra.php';

$o_cassandra = new Cassandra();

$s_server_host     = '';    // Localhost
$i_server_port     = 9042; 
$s_server_username = '';  // We don't use username
$s_server_password = '';  // We don't use password
$s_server_keyspace = '';  // We don't have created it yet

$o_cassandra->connect($s_server_host, $s_server_username, $s_server_password, $s_server_keyspace, $i_server_port);

// Create a Keyspace with Replication factor 1, that's for a single server
$s_cql = "CREATE KEYSPACE cassandra_tests WITH REPLICATION = { 'class': 'SimpleStrategy', 'replication_factor': 1 };";

$st_results = $o_cassandra->query($s_cql);

We can run it from web or from command line by using:

php -f cassandra_create.php

Create a table


require_once 'Cassandra/Cassandra.php';

$o_cassandra = new Cassandra();

$s_server_host     = '';    // Localhost
$i_server_port     = 9042; 
$s_server_username = '';  // We don't use username
$s_server_password = '';  // We don't use password
$s_server_keyspace = 'cassandra_tests';

$o_cassandra->connect($s_server_host, $s_server_username, $s_server_password, $s_server_keyspace, $i_server_port);

$s_cql = "CREATE TABLE carles_test_table (s_thekey text, s_column1 text, s_column2 text,PRIMARY KEY (s_thekey));";

$st_results = $o_cassandra->query($s_cql);

If we don’t plan to insert UTF-8 strings, we can use VARCHAR instead of TEXT type.

Do an insert

In this sample we create an Array of 100 elements, we serialize it, and then we store it.


require_once 'Cassandra/Cassandra.php';

// Note this code uses the MT notation http://blog.carlesmateo.com/maria-teresa-notation-for-php/
$i_start_time = microtime(true);

$o_cassandra = new Cassandra();

$s_server_host     = '';    // Localhost
$i_server_port     = 9042; 
$s_server_username = '';  // We don't have username
$s_server_password = '';  // We don't have password
$s_server_keyspace = 'cassandra_tests';  

$o_cassandra->connect($s_server_host, $s_server_username, $s_server_password, $s_server_keyspace, $i_server_port);

$s_time = strval(time()).strval(rand(0,9999));
$s_date_time = date('Y-m-d H:i:s');

// An array to hold a emails
$st_data_emails = Array();

for ($i_bucle=0; $i_bucle<100; $i_bucle++) {
    // Add a new email
    $st_data_emails[] = Array('datetime'  => $s_date_time,
                              'id_email'  => $s_time);


// Serialize the Array
$s_data_emails = serialize($st_data_emails);

$s_cql = "INSERT INTO carles_test_table (s_thekey, s_column1, s_column2)
VALUES ('first_sample', '$s_data_emails', 'Some other data');";

$st_results = $o_cassandra->query($s_cql);



$i_finish_time = microtime(true);
$i_execution_time = $i_finish_time-$i_start_time;

echo 'Execution time: '.$i_execution_time."\n";
echo "\n";

This insert took Execution time: 0.0091850757598877 seconds executed from CLI (Command line).

If the INSERT works well you’ll have a [result] => ‘success’ in the resulting array.


Do some inserts

Here we do 9000 inserts.


require_once 'Cassandra/Cassandra.php';

// Note this code uses the MT notation http://blog.carlesmateo.com/maria-teresa-notation-for-php/
$i_start_time = microtime(true);

$o_cassandra = new Cassandra();

$s_server_host     = '';    // Localhost
$i_server_port     = 9042; 
$s_server_username = '';  // We don't have username
$s_server_password = '';  // We don't have password
$s_server_keyspace = 'cassandra_tests';  

$o_cassandra->connect($s_server_host, $s_server_username, $s_server_password, $s_server_keyspace, $i_server_port);

$s_date_time = date('Y-m-d H:i:s');

for ($i_bucle=0; $i_bucle<9000; $i_bucle++) {
    // Add a sample text, let's use time for example
    $s_time = strval(time());

    $s_cql = "INSERT INTO carles_test_table (s_thekey, s_column1, s_column2)
VALUES ('$i_bucle', '$s_time', 'http://blog.carlesmateo.com');";

    // Launch the query
    $st_results = $o_cassandra->query($s_cql);



$i_finish_time = microtime(true);
$i_execution_time = $i_finish_time-$i_start_time;

echo 'Execution time: '.$i_execution_time."\n";
echo "\n";

Those 9,000 INSERTs takes 6.49 seconds in my test virtual machine, executed from CLI (Command line).


Do a Select


require_once 'Cassandra/Cassandra.php';

// Note this code uses the MT notation http://blog.carlesmateo.com/maria-teresa-notation-for-php/
$i_start_time = microtime(true);

$o_cassandra = new Cassandra();

$s_server_host     = '';    // Localhost
$i_server_port     = 9042; 
$s_server_username = '';  // We don't have username
$s_server_password = '';  // We don't have password
$s_server_keyspace = 'cassandra_tests';  

$o_cassandra->connect($s_server_host, $s_server_username, $s_server_password, $s_server_keyspace, $i_server_port);

$s_cql = "SELECT * FROM carles_test_table LIMIT 10;";

// Launch the query
$st_results = $o_cassandra->query($s_cql);
echo 'Printing 10 rows:'."\n";



$i_finish_time = microtime(true);
$i_execution_time = $i_finish_time-$i_start_time;

echo 'Execution time: '.$i_execution_time."\n";
echo "\n";

Printing 10 rows passing the query with LIMIT:

$s_cql = "SELECT * FROM carles_test_table LIMIT 10;";

echoing as array with print_r takes Execution time: 0.01090407371521 seconds (the cost of printing is high).


If you don’t print the rows, it takes only Execution time: 0.00714111328125 seconds.
Selecting 9,000 rows, if you don’t print them, takes Execution time: 0.18086194992065.


The official driver for Java works very well.

The only initial difficulties may be to create the libraries required with Maven and to deal with the different Cassandra native data types.

To make that travel easy, I describe what you have to do to generate the libraries and provide you with a Db Class made by me that will abstract you from dealing with Data types and provide a simple ArrayList with the field names and all the data as String.

Datastax provides the pom.xml for maven so you’ll create you jar files. Then you can copy those jar file to Libraries folder of any project you want to use Cassandra with.

cmateo-cassandra-java-dependenciesMy Db class:

 * By Carles Mateo blog.carlesmateo.com
 * You can use this code freely, or modify it.

package server;

import java.util.ArrayList;
import java.util.List;
import com.datastax.driver.core.*;

 * @author carles_mateo
public class Db {

    public String[] s_cassandra_hosts = null;
    public String s_database = "cchat";
    public Cluster o_cluster = null;
    public Session o_session = null;
    Db() {
        // The Constructor
        this.s_cassandra_hosts = new String[10];
        String s_cassandra_server = "";
        this.s_cassandra_hosts[0] = s_cassandra_server;
        this.o_cluster = Cluster.builder()
                                        .addContactPoints(s_cassandra_hosts[0]) // More than 1 separated by comas
        this.o_session = this.o_cluster.connect(s_database);  // This is the KeySpace

    public static String escapeApostrophes(String s_cql) {
        String s_cql_replaced = s_cql.replaceAll("'", "''");
        return s_cql_replaced;
    public void close() {
        // Destructor calles by the garbagge collector
    public ArrayList query(String s_cql) {
        ResultSet rows = null;
        rows = this.o_session.execute(s_cql);
        ArrayList st_results = new ArrayList();
        List<String> st_column_names = new ArrayList<String>();
        List<String> st_column_types = new ArrayList<String>();

        ColumnDefinitions o_cdef = rows.getColumnDefinitions();

        int i_num_columns = o_cdef.size();
        for (int i_columns = 0; i_columns < i_num_columns; i_columns++) {
        for (Row o_row : rows) {
            List<String> st_data = new ArrayList<String>();
            for (int i_column=0; i_column<i_num_columns; i_column++) {
                if (st_column_types.get(i_column).equals("varchar") || st_column_types.get(i_column).equals("text")) {
                } else if (st_column_types.get(i_column).equals("timeuuid")) {
                } else if (st_column_types.get(i_column).equals("integer")) {
                // TODO: Implement other data types
        return st_results;
    public static String getFieldFromRow(ArrayList st_results, int i_row, String s_fieldname) {
        List<String> st_column_names = (List)st_results.get(0);
        boolean b_column_found = false;
        int i_column_pos = 0;
        for (String s_column_name : st_column_names) {
            if (s_column_name.equals(s_fieldname)) {
                b_column_found = true;
        if (b_column_found == false) {
            return null;
        int i_num_columns = st_results.size();
        List<String> st_data = (List)st_results.get(i_row);
        String s_data = st_data.get(i_column_pos);
        return s_data;


Python 2.7

There is no currently driver for Python 3. I requested Datastax and they told me that they are working in a new driver for Python 3.

To work with Datastax’s Python 2.7 driver:

1) Download the driver from http://planetcassandra.org/client-drivers-tools/ or git clone from https://github.com/datastax/python-driver

2) Install the dependencies for the Datastax’s driver

Install python-pip (Installer)

sudo apt-get install python-pip

Install python development tools

sudo apt-get install python-dev

This is required for some of the libraries used by original Cassandra driver.

Install Cassandra driver required libraries

sudo pip install futures
sudo pip install blist
sudo pip install metrics
sudo pip install scales

Query Cassandra from Python

The problem is the same as with Java, the different data types are hard to deal with.
So I created a function convert_to_string that converts known data types to String, and so later we will only deal with Strings.

In this sample, the results of the query are rendered in xml or in html.

#!/usr/bin/env python
# -*- coding: UTF-8 -*-
# Use with Python 2.7+

__author__ = 'Carles Mateo'
__blog__ = 'http://blog.carlesmateo.com'

import sys

from cassandra import ConsistencyLevel
from cassandra.cluster import Cluster
from cassandra.query import SimpleStatement

s_row_separator = u"||*||"
s_end_of_row = u"//*//"
s_data = u""

b_error = 0
i_error_code = 0
s_html_output = u""
b_use_keyspace = 1 # By default use keyspace
b_use_user_and_password = 1 # Not implemented yet

def return_success(i_counter, s_data, s_format = 'html'):
    i_error_code = 0
    s_error_description = 'Data returned Ok'

    return_response(i_error_code, s_error_description, i_counter, s_data, s_format)

def return_error(i_error_code, s_error_description, s_format = 'html'):
    i_counter = 0
    s_data = ''

    return_response(i_error_code, s_error_description, i_counter, s_data, s_format)

def return_response(i_error_code, s_error_description, i_counter, s_data, s_format = 'html'):

    if s_format == 'xml':
        print ("Content-Type: text/xml")
        print ("")
        s_html_output = u"<?xml version='1.0' encoding='utf-8' standalone='yes'?>"
        s_html_output = s_html_output + '<response>' \
                                        '<status>' \
                                        '<error_code>' + str(i_error_code) + '</error_code>' \
                                        '<error_description>' + '<![CDATA[' + s_error_description + ']]>' + '</error_description>' \
                                        '<rows_returned>' + str(i_counter) + '</rows_returned>' \
                                        '</status>' \
                                        '<data>' + s_data + '</data>' \
        print("Content-Type: text/html; charset=utf-8")
        s_html_output = str(i_error_code)
        s_html_output = s_html_output + '\n' + s_error_description + '\n'
        s_html_output = s_html_output + str(i_counter) + '\n'
        s_html_output = s_html_output + s_data + '\n'


def convert_to_string(s_input):
    # Convert other data types to string

    s_output = s_input

        if value is not None:

            if isinstance(s_input, unicode):
                # string unicode, do nothing
                return s_output

            if isinstance(s_input, (int, float, bool, set, list, tuple, dict)):
                # Convert to string
                s_output = str(s_input)
                return s_output

            # This is another type, try to convert
            s_output = str(input)
            return s_output

            # is none
            s_output = ""
            return s_output

    except Exception as e:
        # Were unable to convert to str, will return as empty string
        s_output = ""

    return s_output

def convert_to_utf8(s_input):
    return s_input.encode('utf-8')

# ********************
# Start of the program
# ********************

s_format = 'xml'  # how you want this sample program to output

s_cql = 'SELECT * FROM test_table;'
s_cluster = ''
s_port = "9042" # default port
i_port = int(s_port)

b_use_keyspace = 1
s_keyspace = 'cassandra_tests'
if s_keyspace == '':
    b_use_keyspace = 0

s_user = ''
s_password = ''
if s_user == '' or s_password == '':
    b_use_user_and_password = 0

    cluster = Cluster([s_cluster], i_port)
    session = cluster.connect()
except Exception as e:
    return_error(200, 'Cannot connect to cluster ' + s_cluster + ' on port ' + s_port + '.' + e.message, s_format)

if (b_use_keyspace == 1):
        return_error(210, 'Keyspace ' + s_keyspace + ' does not exist', s_format)

    o_results = session.execute_async(s_cql)
except Exception as e:
    return_error(300, 'Error executing query. ' + e.message, s_format)

    rows = o_results.result()
except Exception as e:
    return_error(310, 'Query returned result error. ' + e.message, s_format)

# Query returned values
i_counter = 0
    if rows is not None:
        for row in rows:
            i_counter = i_counter + 1

            if i_counter == 1 and s_format == 'html':
                # first row is row titles
                for key, value in vars(row).iteritems():
                    s_data = s_data + key + s_row_separator

                s_data = s_data + s_end_of_row

            if s_format == 'xml':
                s_data = s_data + ''

            for key, value in vars(row).iteritems():
                # Convert to string numbers or other types
                s_value = convert_to_string(value)
                if s_format == 'xml':
                    s_data = s_data + '<' + key + '>' + '<![CDATA[' + s_value + ']]>' + ''
                    s_data = s_data + s_value
                    s_data = s_data + s_row_separator

            if s_format == 'xml':
                s_data = s_data + ''
                s_data = s_data + s_end_of_row

except Exception as e:
    # No iterable data
    return_success(i_counter, s_data, s_format)

# Just print the data
return_success(i_counter, s_data, s_format)


If you did not create the namespace like in the samples before, change those lines to:

s_cql = 'CREATE KEYSPACE cassandra_tests WITH REPLICATION = { \'class\': \'SimpleStrategy\', \'replication_factor\': 1 };'
s_cluster = ''
s_port = "9042" # default port
i_port = int(s_port)

b_use_keyspace = 1
s_keyspace = ''

Run the program to create the Keyspace and you’ll get:

carles@ninja8:~/Desktop/codi/python/test$ ./lunacloud-create.py 
Content-Type: text/xml


Then you can create the table simply by setting:

s_cql = 'CREATE TABLE test_table (s_thekey text, s_column1 text, s_column2 text,PRIMARY KEY (s_thekey));'
s_cluster = ''
s_port = "9042" # default port
i_port = int(s_port)

b_use_keyspace = 1
s_keyspace = 'cassandra_tests'

IDE PyCharm Community Edition

Cassandra Universal Driver

As mentioned above if you use a language Tcp/Ip enabled very new, or very old like ASP or ColdFusion, or from Unix command line and you want to use it with Cassandra, you can use my solution http://www.cassandradriver.com/.


It is basically a Web Gateway able to speak XML, JSon or CSV alike. It relies on the official Datastax’s python driver.

It is not so fast as a native driver, but it works pretty well and allows you to split your architecture in interesting ways, like intermediate layers to restrict even more security (For example WebServers may query the gateway, that will enstrict tome permissions instead of having direct access to the Cassandra Cluster. That can also be used to perform real-time map-reduce operations on the amount of data returned by the Cassandras, so freeing the webservers from that task and saving CPU).

Tip: If you use Cassandra for Development only, you can limit the amount of memory used by editing the file /etc/cassandra/cassandra-env.sh and hardcoding:

    # limit the memory for development environment
    # --------------------------------------------
    # --------------------------------------------

Just before the line:

# set max heap size based on the following

That way Cassandra will believe your system memory is 512 MB and reserve only 256 MB for its use.