Note 2019-05-28: This article was written in 2013. It is still valid, but since then 10Gbps have dropped in price a lot. In my DRAID Solution I’ve qualified 10GbE based in copper, RJ45, and for fiber: 10, 25, 40 and 100 Gbps. Mellanox switches and NICs are a reference. 10GbE based in copper are cheap and easy to deploy, as you can reuse existing infrastructure and grow your segments. https://en.wikipedia.org/wiki/10_Gigabit_Ethernet
I’ve found this problem in several companies, and I’ve had to show their error and convince experienced SysAdmins, CTOs and CEO about the erroneous approach. Many of them made heavy investments in NAS, that they are really wasting, and offering very poor performance.
Normally the rack servers have their local disks, but for professional solutions, like virtual machines, blade servers, and hundreds of servers the local disk are not used.
NAS – Network Attached Storage- Servers are used instead.
This NAS Servers, when are powerful (and expensive) offer very interesting features like hot backups, hot backups that do not slow the system (the most advanced), hot disk replacement, hot increase of total available space, the Enterprise solutions can replicate and copy data from different NAS in different countries, etc…
Smaller NAS are also used in configurations like Webservers’ Webfarms, were all the nodes has to have the same information replicated, and when a used uploads a new profile image, has to be available to all the webservers for example.
In this configurations servers save and retrieve the needed data from the NAS Servers, through LAN (Local Area Network).
The main error I have seen is that no one ever considers the pipe where all the data is travelling, so most configurations are simply Gigabit, and so are bottleneck.
This enclosure hosts 16 servers, hot plugable, with up to two CPU’s each blade, we also call those blade servers “pizza” (like we call before to rack servers).
A common use is to use those servers to have Vmware, OpenStack, Xen or other virtualization software, so the servers run instances of customers. In this scenario the virtual disks (the hard disk of the virtual machines) are stored in the NAS Server.
So if a customer shutdown his virtual server, and start it later, the physical server where its virtual machine is running will be another, but the data (the disk of the virtual server) is stored in the NAS and all the data is saved and retrieved from the NAS.
The enclosure is connected to the NAS through a Gigabit connection, as 10 Gigabit connections are still too expensive and not yet supported in many servers.
Once we have explained that, imagine, those 16 servers, each with 4 or 5 virtual machines, accessing to their disks through a Gigabit connection.
If only one of these 80 virtual machines is accessing to disk, the will be no problem, but if more than one is accessing the Gigabit connection, that’s a maximum of 125 MB (Megabytes) per second, will be shared among all the virtual machines.
So imagine, 70 virtual machines are accessing NAS to serve web pages, with not much traffic, OK, but the other 10 virtual machines are doing heavy data transmission: for example one is serving data through FTP server, the other is broadcasting video, the other is copying heavy log files, and so… Imagine that scenario.
The 125 MB per second is divided between the 80 servers, so those 10 servers using extensively the disk will monopolize the bandwidth, but even those 10 servers will have around 12,5 MB each, that is 100 Mbit each and is very slow.
Imagine one of the virtual machines broadcast video. To broadcast video, first it has to get it from the NAS (the chunks of data), so this node serving video will be able to serve different videos to few customers, as the network will not provide more than 12,5 MB under the circumstances provided.
This is a simplified scenario, as many other things has to be taken in count, like the SATA, SCSI and SAS disks do not provide sustained speeds, speed depends on locating the info, fragmentation, etc… also has to be considered that NAS use protocol iSCSI, a sort of SCSI commands sent through the Ethernet. And Tcp/Ip uses verifications in their protocol, and protocol headers. That is also an overhead. I’ve considered only traffic in one direction, so the servers downloading from the NAS, as assuming Gigabit full duplex, so Gigabit for sending and Gigabit for receiving.
So instead of 125 MB per second we have available around 100 MB per second with a Gigabit or even less.
Also the virtualization servers try to handle a bit better the disk access, by keeping a cache in memory, and not writing immediately to disk.
So you can’t do dd tests in virtual machines like you would do in any Linux with local disks, and if you do go for big files, like 10 GB with random data (not just 0, they have optimizations for that).
Let’s recalculate it now:
70 virtual machines using as low as 0.10 MB/second each, that’s 7 MB/second. That’s really optimist as most webservers running PHP read many big files for attending a simple request and webservers server a lot of big images.
10 virtual machines using extensively the NAS, so sharing 100 MB – 7 MB = 93 MB. That is 9.3 MB each.
So under these circumstances for a virtual machine trying to read from disk a file of 1 GB (1000 MB), this operation will take 107 seconds, so 1:47 minutes.
So with this considerations in mind, you can imagine that the performance of the virtual machines under those configurations are leaved to the luck. The luck that nobody else of the other guests in the servers are abusing the disk I/O.
I’ve explained you in a theoretical plan. Sadly reality is worst. A lot worst. Those 70 web virtual machines with webservers will be so slow that they will leave your company very disappointed, and the other 10 will not even be happier.
One of the principal problems of Amazon EC2 has been always disk performance. Few months ago they released IOPS, high performance disks, that are more expensive, but faster.
It has to be recognized that in Amazon they are always improving.
They have also connection between your servers at 10 Gbit/second.
Returning to the Blades and NAS, an easy improvement is to aggregate two Gigabits, so creating a connection of 2 Gbit. This helps a bit. Is not the solution, but helps.
Probably different physical servers with few virtual machines and a dedicated 1 Gbit connection (or 2 Gbit by 1+1 aggregated if possible) to the NAS, and using local disks as much as possible would be much better (harder to maintain at big scale, but much much better performance).
But if you provide infrastructure as a Service (IaS) go with 2 x 10 Gbit Fibre aggregated, so 20 Gigabit, or better aggregate 2 x 20 Gbit Fibre. It’s expensive, but crucial.
Now compare the 9.3 MB per second, or even the 125 MB theoretical of Gigabit of the average real sequential read of 50 MB/second that a SATA disk can offer when connected on local, or nearly the double for modern SAS 15.000 rpm disks… (writing is always slower)
… and the 550 MB/s for reading and 550 MB/s for writing that some SSD disks offer when connected locally. (I own two OSZ SSD disks that performs 550 MB).
I’ve seen also better configuration for local disks, like a good disk controller with Raid 5 and disks SSD. With my dd tests I got more than 900 MB per second for writing!.
So if you are going to spend 30.000 € in your NAS with SATA disks (really bad solution as SATA is domestic technology not aimed to work 24×7 and not even fast) or SAS disks, and 30.000 € more in your blade servers, think very well what you need and what configuration you will use. Contact experts, but real experts, not supposedly real experts.
Otherwise you’ll waste your money and your customers will have very very poor performance on these times where applications on the Internet demand more and more performance.