Category Archives: Development

Upgrading my new HP 14-bp060sa

As the company I was working for, Sanmina, has decided to move all the Software Development to Colorado, US, and closing the offices in Bishopstown, Cork, I found myself with the need to get a new laptop. At work I was using two Dell laptops, one very powerful and heavy equipped with an Intel Xeon processor and 32 GB of RAM. The other a lightweight one that I updated to 32 GB of RAM.

I had an accident around 8 months ago, that got my spine damaged, and so I cannot carry much weight.

My personal laptops at home, in Ireland are a 15″ with 16 GB of RAM, too heavy, and an Acer 11,6″ with 8GB of RAM and SSD (I upgraded it), but unfortunately the screen crashed. I still use it through the HDMI port. My main computer is a tower with a Core i7, 64GB of RAM and a Samsung NVMe drive. And few Raspberrys :)

I was thinking about what ultra-lightweight laptop to buy, but I wanted to buy it in Barcelona, as I wanted a Catalan keyboard (the layout with the broken c and accents). I tried by Amazon.es but I have problems to have shipped the laptops to my address in Ireland.

I was trying to find the best laptop for me.

While I was investigating I found out that none of the laptops in the market were convincing me.

The ones in around 1Kg, which was my initial target, were too big, and lack a proper HDMI display and Gigabit Ethernet. Honestly, some models get the HDMI from an USB 3.1, through an adapter, or have mini-HDMI, many lack the Gigabit port, which is very annoying. Also most of the models came with 8GB of RAM only and were impossible to upgrade. I enrolled my best friend in my quest, in the research, and had the same conclusions.

I don’t want to have to carry adapters with me to just plug to a monitor or projector. I don’t even want to carry the power charger. I want a laptop that can work with me for a complete day, a full work session, without needing to recharge.

So while this investigation was going on, I decided to buy a cheap laptop to be able to work on the coffee. I needed it for writing documents in Google Docs, creating microservices architectures, programming in Java and PHP, and writing articles in my blog. I also decided that this would be my only laptop with Windows, as honestly I missed playing Star Craft 2, and my attempts with Wine and Linux did not success.

Please note: although I use mainly Linux everywhere (Ubuntu, CentOS, and RedHat mainly) and I contribute to Open Source projects, I do have Windows machines.

I had my Start up in 2004, and I still have Windows Servers, physical machines in a Data Center in Barcelona, and I still have VMs and Instances in Public Clouds with Windows Servers. Also I programmed some tools using Visual Studio and Visual Basic .NET, ASP.NET and C#, but when I needed to do this I found more convenient spawn an instance in Amazon or Azure.

When I created my Start up I offered my infrastructure as a way to get funding too, and I offered VMs with VMWare. I found that having my Mail Servers in VMs was much more convenient for Backups, to scale up, to avoid disruption and for Disaster and Recovery.

I wanted a cheap laptop that will not make feel bad if transporting it in a daily basis gets a hit and breaks, or that if it rains (and this happens more than often in Ireland) and it breaks is not super-hurtful, or even if it gets stolen. Yes, I’m from a big city, like is Barcelona, Catalonia, and thieves are a real problem. I travel, so I want a laptop that I can take to travel, and for going for a coffee, and I feel comfortable enough that if something happens to it is not the end of the world.

Cork is not a big city, so the options were reduced. I found a laptop that meets my needs.

I got a HP s14-bp060sa for 439€.

It is equipped with a Intel® Core™ i3-6006U (2 GHz, 3 MB cache, 2 cores) , a 500GB SAS HDD, and 4 GB of DDR4 RAM.

The information on HP webpage is really scarce, but checking other pages I was able to see that the motherboard has 2 memory banks, accepting a max of 16GB of RAM.

I saw that there was an slot, unclear if supporting NVMe SSD drives, but supporting M.2 SSD for sure.

So I bought in Amazon 2x8GB and a M.2 500GB drive.

Since I was 5 years old I’ve been upgrading and assembling by myself all the computers. And this is something that I want to keep doing. It keeps me sharp, knowing the new ports, CPUs, and motherboard architectures, and keeps me in contact with the Hardware. All my live I’ve thought that specializing Software Engineers and Systems Engineers, like if computers were something separate is a mistake, so I push myself to stay up to date of the news in all the fields.

I removed the spinning 500 GB SATA HDD, cause it’s slow and it consumes a lot of energy. With the M.2 the battery last forever.

The interesting part is how I cloned the drive from the Spinning HDD to the new M.2.

I did:

  • Open the computer (see pics below) and Insert the new drive M.2
  • Boot with an USB Linux Rescue distribution (to do that I had to enable Legacy Boot on BIOS and boot with the USB)
  • Use lsblk command to identify the HDD drive, it was easy as it was the one with partitions
  • dd from if=/dev/sda to of=/dev/sdb with status=progress to see live status and speed (around 70MB/s) and estimated time to complete.
  • Please note that the new drive should be bigger or at least have the same number of bytes to avoid problems with the last partition.
  • I removed the HDD drive, this reduces the weight of my laptop by 100 grams
  • Disable Legacy Boot, and boot the computer. Windows started perfectly :)

I found so few information about this model, that I wanted to share the pictures with the Community. Here are the pictures of the upgrade process.

Here you can see the Crucial M.2 SSD installed and the Spinning HDD removed. Yes, I did in a coffee :)
Final step, installing the 2x8GB RAM memory modules

Creating a VM for compiling ZFS with RHEL6.10

As you know I created the DRAID project, based in ZFS.

One of our customers wanted a special custom version for their RHEL6.10 installation with a custom Kernel.

This post describes how to compile and install ZFS 7.x for RHEL6.

First create a VM with RHEL6.10. Myself I used Virtual Box on Ubuntu.

If you need to install a Custom Kernel matching the destination Servers, do it.

Download the source code from ZFS for Linux.

install the following packages which are required by zfs compiler:

sudo yum groupinstall "Development Tools"
sudo yum install autoconf automake libtool wget libtirpc-devel rpm-build
sudo yum install zlib-devel libuuid-devel libattr-devel libblkid-devel libselinux-devel libudev-devel
sudo yum install parted lsscsi ksh openssl-devel elfutils-libelf-develsudo yum install kernel-devel-$(uname -r)

steps to compile the code:1- make sure  the zfs file exists under zfs/contrib/initramfs/scripts/local-top/

if not exists, create a file called zfs  under zfs/contrib/initramfs/scripts/local-top/  and add the following to that file:

#!/bin/sh
PREREQ=”mdadm mdrun multipath”

prereqs()
{
       echo “$PREREQ”
}

case $1 in
# get pre-requisites
prereqs)
       prereqs
       exit 0
       ;;
esac


#
# Helper functions
#
message()
{
       if [ -x /bin/plymouth ] && plymouth –ping; then
               plymouth message –text=”$@”
       else
               echo “$@” >&2
       fi
       return 0
}

udev_settle()
{
       # Wait for udev to be ready, see https://launchpad.net/bugs/85640
       if [ -x /sbin/udevadm ]; then
               /sbin/udevadm settle –timeout=30
       elif [ -x /sbin/udevsettle ]; then
               /sbin/udevsettle –timeout=30
       fi
       return 0
}


activate_vg()
{
       # Sanity checks
       if [ ! -x /sbin/lvm ]; then
               [ “$quiet” != “y” ] && message “lvm is not available”
               return 1
       fi

       # Detect and activate available volume groups
       /sbin/lvm vgscan
       /sbin/lvm vgchange -a y –sysinit
       return $?
}

udev_settle
activate_vg

exit 0

make the created zfs file executable:

chmod +x  zfs/contrib/initramfs/scripts/local-top/zfs

2-  inside  draid-zfs-2019-05-09 folder, execute the following commands:execute Auto generate script:

./autogen.sh

execute configuration script:

./configure

Please note we use this specific configuration for bettter results:

./configure –disable-pyzfs –with-spec=redhat

create rpms:

make rpm

remove all test rpms:

rm zfs-test*.rpm

3- install all created rpms

yum install *x86_64* -y

4- verify that zfs is been installed

zfs

this command will display zfs help. 

Another interesting trick I instructed my Team to do is to add a version number to zfs, with a parameter -v or –version.

So if you want to do the same, you have to edit:

zfs/cmd/zfs/zfs_main.c

Under:

cmdname = argv[1];

In my code is line 7926, then add:

/* DRAIDTEAM - added new command to display zfs version*/
if ((strcmp(cmdname, "-v") == 0) || (strcmp(cmdname, "--version") == 0)) {
    (void) fprintf(stdout, "0.7.0_DRAID-1.2.9.08021755\n");
    return (0);
}

You can check the Kernel Module info by using modinfo zfs, but I found it handy to allow to just do:

zfs -v

A handy trick command line to get the usages of our Python Methods in the code

We all use powerful code analysis tool, but sometimes you’re presented with a problem and you have just… the terminal.

This Bash code is handy.

grep "def " /home/carles/code/gitlab/cloud/terraform/src/scale/lib/iscsi.py | tr "()" "  " | awk '{ print $2; }' |  grep -v "__init" > ~/function_names_iscsi.txt

So this basically will get all the methods (“def ” whatever), strip the parenthesis with tr, and get the second column with awk, so basically the method name.

Then I will cd to the src directory and execute the seconds part:

cd /home/carles/code/gitlab/cloud/terraform/src/
for fname in $(cat ~/function_names_iscsi.txt); do printf "%s: %s\n" "$fname" "$(grep -r $fname *|grep -v 'def ' -c)"; done > ~/functions_being_used.txt

That will produce a nice list in the form of:

method_name: occurrences

That’s the equivalent to doing Find Usages is PyCharm.

It’s easy to identify dead code then, with method_name: 0.

You can also run this to your Jenkins to warn when there is Dead Code in your repository.

Google Compute Engine Talk for Group Google Developers Cork

My talk in Google Developers Cork Group.
It’s about deploying an Instance in GCE and grows in complexity until Deploying a Load Balancer with AutoScaling for a group of LAMP Webservers.

Join the group at: https://www.meetup.com/GDG-Cork/

The videos:

Keshan Sodimana: Tensors

Curiosity Python string.strip() removes just more than white spaces

Another Python curiosity.

If you see the Official Python3 documentation for strip(), it says that strip without parameters will return the string without the leading and trailing white spaces.
Optionally you can pass a string with the characters you want to eliminate.

The official documentation for Python 2 says:

string.strip(s[, chars])

Return a copy of the string with leading and trailing characters removed. If chars is omitted or None, whitespace characters are removed. If given and not None, chars must be a string; the characters in the string will be stripped from the both ends of the string this method is called on.

Changed in version 2.2.3: The chars parameter was added. The chars parameter cannot be passed in earlier 2.2 versions.

https://docs.python.org/2/library/string.html

A white space is a white space. Is not an Enter.
But strip() without parameters will remove white spaces (space), and Enter \n and Tabs \t.

Probably you will not realize that unless you read from a file that has empty lines at the end for a reason, and you use strip().

You can see a demonstration following this small program, that runs the same for Python2 and Python3.

And the corresponding output for python2 and python3:

The [ ] characters where added to show that there are no hidden tabs or similar after.

Here I paste the code so you can try yourself:

import sys


def print_bar():
    print("-----------------------------------------------------")


def print_between_brackets(s_test):
    print("[" + s_test + "]")


s_string_with_enters = "  Testing strip not only removing white spaces, but Enter and Tabs s well\n\t\n\n"

print("Testing .strip()")
print("You are running Python " + sys.version)
print("This is the original string")
print_bar()
print_between_brackets(s_string_with_enters)
print_bar()
print("Now after strip()...")
print_bar()
print_between_brackets(s_string_with_enters.strip())
print_bar()
print("As you can see the Enters and the Tabs have been removed, not just the spaced")

I think this should be disambiguate so I decided to take action. Is very easy to blame and never contribute. Not me. I went to Python to fix that and I located a bug reporting this issue:

https://bugs.python.org/issue25433

The issue was registed and made specially interesting contributions by Dimitri Papadopoulos Orfanos.

The thread is really interesting to read. I recommend it.

At a glance:

“Python heavily relies on isspace() to detect “whitespace” characters.”

* Lib/string.py near line 23:
  whitespace = ' \t\n\r\v\f'

So all those characters will be stripped in Python2.7 if you use just string.strip()

The ticket was opened the 2015-10-18 12:15. So it’s a shame the documentation has not been updated yet, more than 3 years later. Those are the kind of things, lack of care, that I can’t understand. Not looking for the excellence.

Please, do note that Python3 supports Unicode natively and things are always a bit different than with Python2 and AscII.

Curiosity Python 2.7 and print() from Python 3.6

I lead a project where I decided to go with Python 2.7, for the wide compatibility across all the Servers around.

With RHEL now supporting Python 3 as well, it doesn’t make much sense any more, as all the major brands do support Python 3 directly.

I saw it coming so in my Coding Style Guide for my Team I explained that we will use print(“”) which is the required way to proceed with Python 3, as opposite to print “whatever” from Python 2. Noye Python 2 supports both methods.

But today something unexpected appeared in the Tests. One line of code was making a print of ().

The line of code was:

print()

And not

print("")

And the curiosity is that if you do print() in Python 2.7 it outputs ().

 

Troubleshooting upgrading and loading a ZFS module in RHEL7.4

I illustrate this troubleshooting as it will be useful for some of you.

I requested to one of the members of my Team to compile and to install ZFS 7.9 to some of the Servers loaded with drives, that were running ZFS 7.4 older version.

Those systems were running RHEL7.4.

The compilation and install was fine, however the module was not able to load.

My Team member reported that: when trying to run “modprobe zfs”. It was giving the error:

modprobe: ERROR: could not insert 'zfs': Invalid argument

Also when trying to use a zpool command it gives the error:

Failed to initialize the libzfs library

That was only failing in one of the Servers, but not in the others.

My Engineer ran dmesg and found:

zfs: `' invalid for parameter `metaslab_debug_unload

He though it was a compilation error, but I knew that metaslab_debug_unload is an option parameter that you can set in /etc/zfs.conf

So I ran:

 modprobe -v zfs

And that confirmed my suspicious, so I edited /etc/zfs.conf and commented the parameter and tried again. And it failed.

As I run modprobe -v zfs (verbose) it was returning me the verbose info, and so I saw that it was still trying to load those parameters so I knew it was reading those parameters from some file.
I could have grep all the files in the filesystem looking for the parameter failing in the verbose or find all the files in the system named zfs.conf. To me it looked inefficient as it would be slow and may not bring any result (as I didn’t know how exactly my team member had compiled the code), however I expected to get the result. But what if I found 5 or 7 zfs.conf files?. Slow.
I used strace. It was not installed but the RHEL license was active so I simple did:

 yum install strace

strace is for System Trace and so it records all the System Calls that the programs do.
That’s a pro trick that will accompany you all your career.

So I did strace modprobe zfs

I did not use -v in here cause all the verbose would had been logged as a System Call and made more difficult my search.
I got the output of all the System Calls and I just had to look for which files were being read.

Then I found that zfs.conf under /etc/modprobe.d/zfs.conf
That was the one being read. So I commented the line and tried modprobe zfs and it worked perfectly. :)

 

An Epic fail that are committing all the universities

Article created on: 1528997557 | 2018-06-14 18:32:15 IST

Recently a mentor of the UCC university came to visit me to my office, in order to do the following of one of the members of my Team, an intern.
Conversation was well, and then at some point he asked what courses could do the university teach to their students in order to be more prepared for working with us.
The Head of Business Development, that was in the meeting with me, mentioned something interesting:
– Make the publish their best code in github, bitbucket or similar git repository, and maintain it. It is like a CV.
He pointed that some of the students sent me their repository page, and they have not committed a thing for more than a year. And usually the code that I find there is less than a tic-tac-toe exercise.
– Obviously, to have git experience.
– Having contributed to an Open Source project

I exposed some things that would be helpful to have in the interns and grads that I hire:
– git experience
– Python programming
– C programming
– Unit Testing experience
– Networking experience, in particular iSCSI exports, tcpdump
– Programming Best practices, PEP-8 at least for Python
– Usage of Professional Tools like PyCharm, JetBrains IntelliJ, PHPStorm, Code Lion, Netbeans, Eclipse
– Linux experience. Many of them use Windows at home cause they also play video games. Really few programmers in real life use windows. So at least guys install Virtual Box or VMWare and run Linux in an Virtual Machine.
– Cloud experience. Using instances, CDNs, APIs, tools…

And as the talking advanced I gave him a hint of the Epic fail that all the universities are committing.
They teach git for a semester. They teach Python for one or two semester, the first year usually one, the second year another. And that’s it. Is gone.
When they exit the university they have not programmed in Python for 2 or 3 years, they have not used git, they have not used SQL for the same amount of time, etc…

My boss pointed that the best candidates do side projects in their spare time, and have that bright in their eyes. That sparkling in the eyes is what I call the eye of the tiger, the desire to improve, to learn. That spark.

I told the mentor of my intern that the big mistake is doing things in small parcels, isolated, one block and is gone. That the best way to proceed would be to:
Make the student start a project from the very beginning, from the first semester. Then keep making it bigger and better over time.
Let them improve it over time. Screw it in all the ways possible. Make them reach the limits of their initial architecture. Allow them to face having to redo the thing from the scratch. Allow them to do screw it, to break things, and to learn from their mistakes. Over and over.

Nobody becomes a great programmer coding average things for two semesters.
But let them realize where the problems are. Let them come back to their code of two or three months ago, before holidays, and realize how important is to make comments, to give proper names to the files and to the variables. Let them run that project over so many time, that at some point they have to change computer and they realize that what worked with windows Uppercases and Lowercase mixed files, does not work with Unix (case sensitive).
Let them grow.
Let them see their mistakes over the time.

Let them run the project for so long so they switch several times from Cloud provider, and discover the pros and the cons and the not-to-do, and things like run for your life before using sharing hostings that limit your CPU quota even that kills your MySql instances when they look at the email (true history, connecting to POP3 was raising the CPU and the provider was killing the MySQL instances, and so the queries) or that limits your queries per second, and then ask them to install a drupal and they will learn the hard way why Quality is always better than price and will make the right decisions when they work for somebody else or for their own Startup.

Even many of the supposedly Senior guys never learned from their mistakes, for example the Outsourcing guys, cause they work 6 months to a year in a project and then jump to another. Nobody explains the hell in maintenance and incidental they have left there. Nobody teach them.

Programming an small project for 6 months doesn’t make a master. Doing it for 5 years, growing it, learning from your mistakes and learning the YES and DO-NOT the hard way, the real way that works, cause makes you understand why something is better than other things, is the path.

That also remembers me why I love the MT Notation and many of the guys in Barcelona that saw it criticized the method, while my colleagues at Facebook and Dropbox actually told me that they use it, specially for Python and C/C++.

Allow them to thing about how to solve sorting a list of 1000 items by themselves. Let them think. The lazy will copy, but they will not grow.

Then let them implement a Bubble sort. Let them improve it, if they can. Allow them a week to try to improve that. Then make them sort 1,000,000 items so they see that is bloody slow. How can I improve that?. May I read the data from the drive at once, reading line by line was slow… let them think. Like if they were learning Martial Arts, and so discovering their strengths, that they have fast reflexes, allow them to grow.

Universities have to create good professional, not just machines of passing the exams. Real world demands talent, problem solving abilities, passion, ability to learn, and will to do the things well and to improve, and discipline.

After 5-6 years of programming on a daily basis, with an IDE, git, deploying to the Cloud as the basic, and growing a program and seeing the downsides of the solutions chosen, observing that the caveats where for a reason, learning that the Hardware is important, that is not the same to write to memory that to disk or to network, detecting the problems, redoing things, ending in a cul-de-sac, fixing, improving, learning, growing the project, growing himself/herself as a mind, as a programmer, as a thinker, as an expert, daily, even if it’s 30 minutes per day, then that person is prepared for some serious business.

Like piano, guitar, painting, writing… and any other activity, one require continue training in order to improve.

Students have to follow a journey in order to improve.

Let them start with Command Line, i.e. in C and files. Let’s do add later database support.

Deal with buffer overflow, file descriptor, locks and conversion types. Let them migrate to another language the entire project, using Git from the beginning.

Let them migrate again when they need to add Web support. Allow them to discover that instead of reloading all the page they can use Ajax/JSON. Let them deal with click-click that many common users do on the page buttons (so they submit twice the information). To discover SQL Injections. To use a Web Framework. To add Unit Testing. Add some improvement via Javascript Frameworks like responsive for mobiles.

Allow them to use a new Database, new Webserver or technology that is fashion and everybody on Twitter talks about, so it crashes in their face. And so they discover that they will not play or discover new technologies in actual project time in the Company of their future employers, cause shit happens, and impacts the Schedule, and the Company loses money. Universities: Teach them, let the students learn this for themselves, rather than screwing it up in several companies after university.

A sample way to return in Python not-to-do

Today I was checking the code, the latest push to the git repo, as I always do, and I saw something that was wrong.

Often Engineers can be confused by the ways different languages treat similar operations, so similarly as POSIX I try to use an standard way to program in any language that makes the code very clear and easy to understand, no matter if it’s C, Java, Python, PHP…

My code and the code of my Teams will be clear, and easy to understand. And as the good Engineers jump from language to language upon the needs, is better for all to proceed like this to avoid confusions.

In this case I want to cover a simple case that I detected. A wrong usage.

The code was returning True on success and if not simply return.

Here I show a simple demonstration that return itself will be returning return None.

# Proof of Concept for avoiding return without the type
# Author: Carles Mateo
# Creation Date: 2018-03-27
#

from pprint import pprint

def boolean_test(b_value):
if b_value is False:
return

return True

b_true = boolean_test(True)
b_false = boolean_test(False)

pprint(b_true)
pprint(b_false)

if b_false is False:
print “I detect it as False (even if it’s None)”

if b_false is True:
print “I detect it as True (even if it’s None)”

if b_false is None:
print “It is None!”

print “Be careful”

Variables use the MT Notation. I include tips like this and guidelines in programming guide for my Teams.

See: Wiki Python Programming/Data Types

CSort multithread versus QuickSort with Java Source Code

Updated on 2017-04-04 12:58 Barcelona Time 1491303515:

  • A method writeValuesFromArrayListToDisk(String sFilename) has been introduced as per a request, to easily check that the data is properly sorted.
  • A silly bug in the final ArrayList generation has been solved. It was storing iCounter that was always 1 as this is not the compressed version, for supporting repeated numbers, of the algorithm. I introduced this method for the article, as it is not necessary for the algorithm as it is already sorted, and unfortunately I didn’t do a final test on the output. My fault.
  • Some JavaDoc has been updated

Past Friday I was discussing with my best friend about algorithms and he told me that hadoop is not fast enough, and about when I was in Amazon and as part of the test they asked me to defined an S3 system from the scratch, and I did using Java and multiple streams per file and per node (replication factor) and they told me that what I just created was the exact way their system works, and we ended talking about my sorting algorithm CSort, and he asked me if it could run in MultiThread. Yes, it is one of the advantages in front of QuickSort. Also it can run in multinode, different computers. So he asked me how much faster it would be a MultiThread version of CSort, versus a regular QuickSort.

Well here is the answer with 500 Million of registers, with values from 1 to 1000000, and deduplicating.

2017-03-26 18:50:41 CSort time in seconds:0.189129089
2017-03-26 18:51:47 QuickSort cost in seconds:61.853190885

That’s Csort is 327 times faster than QuickSort!. In this example and with my busy 4 cores laptop. In my 8 cores computer it is more than 525 times faster. Imagine in a Intel Xeon Server with a 64 cores!.

How is it possible? The answer is easy, it has O(n) complexity. I use linear access.

This depends on your universe of Data. Please read my original posting about CSort here, that explains it on detail.

Please note that CSort with compression is also available for keeping duplicated values and also saving memory (space) and time, with equally totally astonishing results.

Please note that in this sample I first load the values to an Array, and then I work from this. This is just to avoid bias by discarding the time of loading the data from disk, but, in the other article you have samples where CSort sorts at the same time that loads the data from disks. I have a much more advanced algorithm that self allocates the memory needed for handling an enormous universe of numbers (big numbers and small with no memory penalty), but I’m looking forward to discuss this with a tech giant when it hires me. ;) Yes, I’m looking for a job.

In my original article I demonstrated it in C and PHP, this time here is the code in Java. It uses the MT Notation.

You can download the file I used for the tests from here:

Obviously it runs much more faster than hashing. I should note that hashing and CSorting with .containsKey() is faster than QuickSorting. (another day I will talk about sorting Strings faster) ;)

/*
 * (c) Carles Mateo blog.carlesmateo.com
 * Proof of concept of CSort, with multithread, versus QuickSort
 * For the variables notation MT Notation is used.
 */
package com.carlesmateo.blog;

import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Date;

/**
 * @author carles mateo
 */
public class CSortMultiThread extends Thread {
    // Download the original file from the blog
    public static final int piNUM_REGISTERS_IN_FILE = 50000000;
    // Max value found, to optimize the memory used for CSort
    // Note that CSort can be implemented in the read from file mechanism
    // in order to save memory (space)
    public static int piMaxValue = 0;
    // The array containing the numbers read from disk
    public static int[] paiNumbers;
    // The array used by CSort (if not using direct loading from disk)
    public static int[] paiNumbersCsorted;
    // Final ArrayList Sorted. CSort and QuickSort finally fullfil this
    public static ArrayList<Integer> pliNumbers = new ArrayList<>();
   
    // For the Threads
    private Thread oT;
    private String sThreadName;
    private int piStart;
    private int piEnd;
    private boolean bFinished = false;
   
    CSortMultiThread (String name, int iStart, int iEnd) {
        sThreadName = name;
        piStart = iStart;
        piEnd = iEnd;
        writeWithDateTime("Creating " +  sThreadName );
    }
   
    public void run() {
        writeWithDateTime("Running " +  sThreadName + " to sort from " + piStart + " to " + piEnd);
        int iCounter;
        int iNumber;        

        for (iCounter=piStart; iCounter < piEnd; iCounter++) {
            iNumber = paiNumbers[iCounter];
            paiNumbersCsorted[iNumber] = 1;
        }          

      System.out.println("Thread " +  sThreadName + " exiting.");
      bFinished = true;
    }
   
    public void start () {
        writeWithDateTime("Starting " +  sThreadName );
        if (oT == null) {
            oT = new Thread (this, sThreadName);
            oT.start ();
        }
    }
    /**
     * Write values to Disk to demonstrate that are sorted ;)
     * @param sFilenameData 
     */
    private static void writeValuesFromArrayListToDisk(String sFilenameData) {
        ObjectOutputStream out = null;
        int iCounter;
        try {
            out = new ObjectOutputStream(new FileOutputStream(sFilenameData));
            
            for (iCounter=0; iCounter<pliNumbers.size(); iCounter++) {
                out.writeChars(pliNumbers.get(iCounter).toString() + "\n");
            }
            
            // To store the object instead
            //out.writeObject(pliNumbers);
            
        } catch (IOException e) {
            System.out.println("I/O Error!");
            e.printStackTrace();
            displayHelpAndQuit(10);
        } finally {
            if (out != null) {
                try {
                    out.close();    
                } catch (IOException e) {
                    System.out.println("I/O Error!");
                    e.printStackTrace();
                    displayHelpAndQuit(10);
                }   
            }
        }
    }
   
    /** 
     *  Reads the data from the disk. The file has 50M and we will be duplicating
     *  to get 500M registers.
     *  @param sFilenameData 
     */
    private static void readValuesFromFileToArray(String sFilenameData) {

        BufferedReader oBR = null;
        String sLine;
        int iCounter = 0;
        int iRepeat;
        int iNumber;
        
        // We will be using 500.000.000 items, so dimensionate the array
        paiNumbers = new int[piNUM_REGISTERS_IN_FILE * 10];

        try {

            oBR = new BufferedReader(new FileReader(sFilenameData));

            while ((sLine = oBR.readLine()) != null) {
                for (iRepeat = 0; iRepeat < 10; iRepeat++) {
                    int iPointer = (piNUM_REGISTERS_IN_FILE * iRepeat) + iCounter;
                    iNumber = Integer.parseInt(sLine);
                    paiNumbers[iPointer] = iNumber;
                    if (iNumber > piMaxValue) {
                        piMaxValue = iNumber;
                    }
                }                
                
                iCounter++;
            }
            
            if (iCounter < piNUM_REGISTERS_IN_FILE) {
                write("Warning... only " + iCounter + " values were read");
            }

        } catch (FileNotFoundException e) {
            System.out.println("File not found! " + sFilenameData);
            displayHelpAndQuit(100);
        } catch (IOException e) {
            System.out.println("I/O Error!");
            e.printStackTrace();
            displayHelpAndQuit(10);
        } finally {
            if (oBR != null) {
                try {
                    oBR.close();
                } catch (IOException e) {
                    System.out.println("I/O Error!");
                    e.printStackTrace();
                    displayHelpAndQuit(10);
                }
            }
        }        
    }
    
    private static String displayHelp() {
        String sHelp = "Help\n" +
                "====\n" +
                "Csort from Carles Mateo blog.carlesmateo.com\n" +
                "\n" +
                "Proof of concept of the fast load algorithm\n" +
                "\n";
        
        return sHelp;
    }
    
    /**
     * Displays Help Message and Quits with and Error Level
     * Errors:
     *  - 1   - Wrong number of parameters
     *  - 10  - I/O Error
     *  - 100 - File not found
     * @param iErrorLevel 
     */
    private static void displayHelpAndQuit(int iErrorLevel) {
        System.out.println(displayHelp());
        System.exit(iErrorLevel);
        
    }
    
    // This is QuickSort from vogella http://www.vogella.com/tutorials/JavaAlgorithmsQuicksort/article.html
    public static void sort() {
        int piNumber = paiNumbers.length;
        quicksort(0, piNumber - 1);        
    }
    
    private static void quicksort(int low, int high) {
        int i = low, j = high;
        // Get the pivot element from the middle of the list
        int pivot = paiNumbers[low + (high-low)/2];

        // Divide into two lists
        while (i <= j) {
            // If the current value from the left list is smaller then the pivot
            // element then get the next element from the left list
            while (paiNumbers[i] < pivot) {
                i++;
            }
            // If the current value from the right list is larger then the pivot
            // element then get the next element from the right list
            while (paiNumbers[j] > pivot) {
                j--;
            }

            // If we have found a values in the left list which is larger then
            // the pivot element and if we have found a value in the right list
            // which is smaller then the pivot element then we exchange the
            // values.
            // As we are done we can increase i and j
            if (i <= j) {
                exchange(i, j);
                i++;
                j--;
            }
        }
        // Recursion
        if (low < j)
            quicksort(low, j);
        if (i < high)
            quicksort(i, high);
    }

    private static void exchange(int i, int j) {
        int temp = paiNumbers[i];
        paiNumbers[i] = paiNumbers[j];
        paiNumbers[j] = temp;
    }
    
    /** 
     * We want to remove duplicated values
     */
    private static void removeDuplicatesFromQuicksort() {
        int iCounter;
        int iOldValue=-1;
        int iNewValue;
        
        for (iCounter=0; iCounter<paiNumbers.length; iCounter++) {
            iNewValue = paiNumbers[iCounter];
            if (iNewValue != iOldValue) {
                iOldValue = iNewValue;
                pliNumbers.add(iNewValue);
            }
        }
    }
// End of vogella QuickSort code

    /**
     * Generate the final Array
     */
    private static void copyFromCSortToArrayList() {
        int iCounter;
        int iNewValue;
        
        for (iCounter=0; iCounter<=piMaxValue; iCounter++) {
            iNewValue = paiNumbersCsorted[iCounter];
            if (iNewValue > 0) {
                pliNumbers.add(iCounter);
            }
        }
    }

    /**
     * Write with the date
     * @param sText 
     */
    private static void writeWithDateTime(String sText) {
        DateFormat oDateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
        Date oDate = new Date();
        String sDate = oDateFormat.format(oDate);
        write(sDate + " " + sText);
    }
    
    /**
     * Write with \n
     * @param sText 
     */
    private static void write(String sText) {
        System.out.println(sText + "\n");
    }    
   
    public static void main(String args[]) throws InterruptedException {
        // For Profiling
        long lStartTime;
        double dSeconds;
        long lElapsedTime;
        
        int iThreads = 8;
        int iRegistersPerThread = piNUM_REGISTERS_IN_FILE / iThreads;
        CSortMultiThread[] aoCSortThreads = new CSortMultiThread[iThreads];
        
        writeWithDateTime("CSort MultiThread proof of concept by Carles Mateo");
        writeWithDateTime("Reading values from Disk...");
        readValuesFromFileToArray("/home/carles/Desktop/codi/java/carles_sort.txt");
        writeWithDateTime("Readed");

        writeWithDateTime("Total values to sort de deduplicate " + paiNumbers.length);
        writeWithDateTime("The max value between the Data is " + piMaxValue);
        paiNumbersCsorted = new int[piMaxValue + 1];
                
        writeWithDateTime("Performing CSort with removal of duplicates");
        lStartTime = System.nanoTime();
        for (int iThread=0; iThread < iThreads; iThread++) {
            int iStart = iThread * iRegistersPerThread;
            int iEnd = ((iThread + 1) * iRegistersPerThread) - 1;
            if (iThread == (iThreads -1)) {
                // Last thread grabs the remaining. 
                // For instance 100/8 = 12 so each Thread orders 12 registers,
                // but last thread orders has 12 + 4 = 16
                iEnd = piNUM_REGISTERS_IN_FILE -1 ;
            }
            CSortMultiThread oThread = new CSortMultiThread("Thread-" + iThread, iStart, iEnd);
            oThread.start();
            
            aoCSortThreads[iThread] = oThread;
        }       
        
        boolean bExit = false;
        while (bExit == false) {
            bExit = true;
            for (int iThread=0; iThread < iThreads; iThread++) {
                if (aoCSortThreads[iThread].bFinished == false) {
                    bExit = false;
                    // Note: 10 milliseconds. This takes some CPU cycles, but we need
                    // to ensure that all the threads did finish.
                    sleep(10);
                    continue;
                }
            }
        }
        writeWithDateTime("Main loop ended");
                
        writeWithDateTime("Copy to the ArrayList");
        copyFromCSortToArrayList();
        
        writeWithDateTime("The final array contains " + pliNumbers.size());
        
        lElapsedTime = System.nanoTime() - lStartTime;
        dSeconds = (double)lElapsedTime / 1000000000.0;
        writeWithDateTime("CSort time in seconds:" + dSeconds);

        writeWithDateTime("Writing values to Disk...");
        writeValuesFromArrayListToDisk("/home/carles/Desktop/codi/java/carles_sort-csorted.txt");

        // Reset the ArrayList
        pliNumbers = new ArrayList<>();
        
        lStartTime = System.nanoTime();
        /** QuickSort begin **/
        writeWithDateTime("Sorting with QuickSort");
        sort();
        writeWithDateTime("Finished QuickSort");
        
        writeWithDateTime("Removing duplicates from QuickSort");
        removeDuplicatesFromQuicksort();
        writeWithDateTime("The final array contains " + pliNumbers.size());
        lElapsedTime = System.nanoTime() - lStartTime;
        dSeconds = (double)lElapsedTime / 1000000000.0;
        
        writeWithDateTime("QuickSort cost in seconds:" + dSeconds);
    }
}

The complete traces:

run:
2017-03-26 19:28:13 CSort MultiThread proof of concept by Carles Mateo
2017-03-26 19:28:13 Reading values from Disk...
2017-03-26 19:28:39 Readed
2017-03-26 19:28:39 Total values to sort de deduplicate 500000000
2017-03-26 19:28:39 The max value between the Data is 1000000
2017-03-26 19:28:39 Performing CSort with removal of duplicates
2017-03-26 19:28:39 Creating Thread-0
2017-03-26 19:28:39 Starting Thread-0
2017-03-26 19:28:39 Creating Thread-1
2017-03-26 19:28:39 Starting Thread-1
2017-03-26 19:28:39 Running Thread-0 to sort from 0 to 6249999
2017-03-26 19:28:39 Creating Thread-2
2017-03-26 19:28:39 Starting Thread-2
2017-03-26 19:28:39 Running Thread-1 to sort from 6250000 to 12499999
2017-03-26 19:28:39 Creating Thread-3
2017-03-26 19:28:39 Running Thread-2 to sort from 12500000 to 18749999
2017-03-26 19:28:39 Starting Thread-3
2017-03-26 19:28:39 Creating Thread-4
2017-03-26 19:28:39 Starting Thread-4
2017-03-26 19:28:39 Running Thread-3 to sort from 18750000 to 24999999
2017-03-26 19:28:39 Creating Thread-5
2017-03-26 19:28:39 Running Thread-4 to sort from 25000000 to 31249999
2017-03-26 19:28:39 Starting Thread-5
2017-03-26 19:28:39 Creating Thread-6
2017-03-26 19:28:39 Starting Thread-6
2017-03-26 19:28:39 Running Thread-5 to sort from 31250000 to 37499999
2017-03-26 19:28:39 Creating Thread-7
2017-03-26 19:28:39 Starting Thread-7
2017-03-26 19:28:39 Running Thread-6 to sort from 37500000 to 43749999
2017-03-26 19:28:39 Running Thread-7 to sort from 43750000 to 49999999

Thread Thread-0 exiting.
Thread Thread-2 exiting.
Thread Thread-1 exiting.
Thread Thread-6 exiting.
Thread Thread-7 exiting.
Thread Thread-5 exiting.
Thread Thread-4 exiting.
Thread Thread-3 exiting.
2017-03-26 19:28:39 Main loop ended

2017-03-26 19:28:39 Copy to the ArrayList
2017-03-26 19:28:39 The final array contains 1000001
2017-03-26 19:28:39 CSort time in seconds:0.189129089

2017-03-26 19:28:39 Sorting with QuickSort
2017-03-26 19:29:40 Finished QuickSort
2017-03-26 19:29:40 Removing duplicates from QuickSort
2017-03-26 19:29:41 The final array contains 1000001
2017-03-26 19:29:41 QuickSort cost in seconds:61.853190885

BUILD SUCCESSFUL (total time: 1 minute 28 seconds)