Ignite Day 3: Getting My Nerd On

Geek Speak


Aside from the expo hall, I spent a long day learning about the technical details of Windows Server 2016, Azure, Hyper-V and more. Microsoft has done an excellent job of providing speakers who are leaders or senior members of the product team and in several sessions they had a few others “on the bench”. These folks sat in the audience to ensure that questions were answered. In only one of the sessions did I hear, “I’ll need to get back to you.”

Windows Server 2016
As you can imagine, there are a ton of new features and improvements in the new version. It was released at the beginning of Ignite so if you manage servers, now is the time to get up to speed if you have not already.

As an architect of cloud solutions, I have been watching storage infrastructure evolve very rapidly. SANs ruled this space for years. Then in Windows Server 2012/2012 R2 we were able to deploy Storage Spaces and Scale Out File Servers (SOFS). With these technologies inexpensive disk shelves and converged networks dramatically reduced cost while improving speed and maintaining high availability.

Converged networks mean all traffic – storage and data – use the same network. Obviously this greatly simplifies the network architecture and increases reliability. RDMA (Remote Direct Memory Access) is also used to ensure speed of loading data from disk into memory. In addition, it also reduces CPU usage for storage access significantly.

Finally, disk technologies have improved with the advent of SSDs and NVMe. Although capacities have grown and SSD prices have fallen somewhat, these are still much more expensive than traditional hard disks.

So what’s new in this area of Windows Server 2016?

First of all, what I am about to describe is in Windows Server 2016 Datacenter edition. As always, be sure to check the features available in the particular edition you plan to deploy.

Server Storage Health
With Windows Server 2016, Microsoft has incorporated storage health monitoring and statistics into the Windows kernel. This makes it much more stable, less resource intensive and more comprehensive. There is a new set of storage health PowerShell commands to provide fast, easy access to a ton of data. Everything from the health of individual disks to average IOPS and latency across an entire storage pool.

One demo showed how some fairly simple PowerShell scripts could gather real-time data and then they used Graphana to create a web-based dashboard. You could use PowerBI or any other business intelligence tool of your choosing.

Demo

Hyper Convergence
Obviously health, monitoring and reporting is necessary and awesome. I mentioned “converged” networks earlier. So what is “hyper converged”? For those of us who work on high availability (that’s all of us, right?), this is making our job a lot easier. Hyper converged means that all resources are on all servers (computer, storage and network).

One of the effects of this is we no longer have to have a separate storage platform (e.g. SAN or SOFS). We can build a set of identical servers – same RAM, same CPU and a bunch of local storage. Then we can use these servers to provide highly available services (e.g. VMs) without anything else (except for some network switches).

Using Storage Spaces Direct (S2D) and RDMA, Server B has direct, high-speed, access to storage on Server A. This not only simplifies the design and deployment but it also lowers our cost. At the same time other storage features dramatically improve performance and further lower disk hardware cost.

Storage Design
RAID1, RAID5, RAID6, etc. have been around for years. There are tradeoffs between speed and storage efficiency. Storage efficiency is the capacity available vs. the raw disk capacity. RAID1 is fastest but at a cost of 50% - two disks gets you the capacity of one and a three disk mirror gets you 33%.

No longer do we need to have the disks all in the same chassis. What? With the advancements in storage networking I described, now the disks can reside in different servers/chassis and still make up a redundancy set. Again, this means we don’t need fancy cables – it goes over Ethernet, nor do we need shared storage busses.

You can set this up on as few as two servers. It scales all the way up to 16 servers and more than 400 disks. That enables over 3 PetaBytes (PB) in a single cluster! These high performance and highly available designs do require some flash drives for caching (SSD or NVMe). This is the only way to ensure the kind of IOPS we need. In fact, Microsoft is setting world records for IOPS and storage throughput. Their maximum to date is 6,000,000 IOPS (100% 4k Read, N = 16, 4 X P3700 NVMe / Node, Intel Xeon E5 2699v4, Chelsio 40 GbE). If you are not familiar with NVMe and maximum IOPS and throughput are critical for your workloads, look them up. Yes, they are more expensive per GB than SSD but they are much cheaper per IOP by more than 50%.

Back to the storage efficiency conversation. Microsoft Research has developed new algorithms and designs to allow huge improvements. With these improvements, much greater efficiency is gained while also improving speed.

Typically, in a 3 disk mirror you get 33.3% efficiency (300TB physical = 100TB usable). With four or more servers you can sustain up to two failures and can achieve up to 80% efficiency. As an example, with 300TB of physical storage you can have 240TB of usable space! However, this introduces some issues. First, to rebuild from one failure you have to read from every column which chews up IOPS. Second, every write requires an update to the erasure coding which uses CPU.

Microsoft Research has figured out a way to extend erasure coding to hard disks. This is what can provide up to 80% efficiency. Accelerated Erasure Coding solves the issues I mentioned by using an SSD/NVMe mirror for hot data and HDD for cold data. Data is written to the mirror - extremely fast - when the mirror becomes full or data is cold, it is moved into the slower HDD.

From a redundancy standpoint, there is now an option to designate chassis or racks to ensure fault tolerance across chassis (e.g. blade chassis) or rack failure. This automatically ensures that, if you designated it, data is written across multiple chassis or racks on the same way that we write across multiple disks to prevent data loss even if hardware fails.

Disk pools are also now self-healing. If a disk fails, data is automatically evacuated to other disks, information on the failed disk – down to the make, model and serial number, are sent up through the health service. When the disk is replaced, the pool will automatically recognize the new disk and begin reallocation across the new disk.

day3fig2

Using this same function, we can take a disk offline, replace it with a larger disk, wait for the automatic repair and then move on to the next disk. Ultimately increasing the capacity of the pool simply by upgrading the disks one by one. Obviously you MUST wait for the repair to finish after each disk swap. Gone are the days of having to build new RAID arrays and migrate data to use larger disks.

Windows Defender
No, it’s not a typo. Defender has been steadily improving over the years and it is an excellent protection for clients. Now it is on by default on Windows Server 2016. The GUI isn’t installed but that’s easy to install. No longer are your servers deployed unprotected and you may decide an add-on anti-malware solution is no longer a requirement on your servers.

Funny side note: the hip techs refer to infrastructure as “infra” 

I have one more day here and am excited to share some of my Day 4 with you. I also have a growing list for future posts on specific topics. And I’ve also got a list of things I will add it change in my sessions for the upcoming Tour de Cloud which starts next week!

PS - Join us for the Tour de Cloud workshops in California next week and Florida the following week. Lear more and sign-up at www.tourdecloud.com