Category Archives: Uncategorized

SMB 3.1 Quick Overview

Microsoft announced preview details about SMB 3.1 with the emphasis being on improved security.

SMB 3.0 already offered the ability to encrypt data packets. SMB 3.0 also offered an enhanced level of security that including signing to detect man in the middle attacks. The key shortcoming of SMB 3.0 (that is addressed in SMB 3.1) is that the SMB 3.0 signing algorithm first negotiates signing keys. These negotiation packets are vulnerable to a man in the middle attack that would cause the SMB protocol level negotiated down to CIFS (SMB 1), which is completely vulnerable. Effectively, somebody can bypass SMB 3.0 security features by making sure the data is exchanged using older less secure protocols other than SMB 3.0. SMB 3.1 allows both the client and the server to detect such attacks.

SMB 3.0 supports only AES-128-CCM as the sole encryption algorithm. SMB 3.1 extends the encryption capability in two ways:

  • SMB 3.1 allows for negotiation of the encryption algorithm and thus makes the encryption capability extensible
  • SMB 3.1 introduces AES-128-GCM as an encryption algorithm. AES-128-GCM is equally secure as AES-128-CMM, but much more conducive to computation and this enables higher IOPS and throughput.
  • SMB 3.1 continues to support Multi Channel where TCP channels are aggregated at the SMB protocol layer for both speed and reliability.

Finally, SMB 3.1 introduces the capability to have a mixed cluster where some cluster nodes are running SMB 3.0 and some are running SMB 3.1. The SMB 3.1 enhancements allow for clients that connect to a SMB 3.1 node to only failover to a node that is also running SMB 3.1

Advertisement

Why NAS hosting Hyper-V VDI VMs needs to support SMB 2 not just SMB 3

If you spend some time on storage startup websites, no matter whether they are providing Flash storage, converged storage/computing, or VM aware storage, you will find that all of them seem to find VDI as a low hanging fruit. They all have a dedicated description of how they can supply great VDI storage solution.

Hyper-V requires NAS to be SMB 3.0 capable to host Hyper-V VM files. This is reasonable, given that SMB 3.0 provides both the speed and reliability that SMB 2.X cannot.

 But while Microsoft Hyper-V 2012 imposes the requirement of the NAS being SMB 3.0 capable, customer requirements often impose an additional requirement of the NAS also being SMB 2.X capable. And some other requirements as well, as we shall shortly see.

 This is because many customers often use Hyper-V to run VDI VMs. If these VMs run Microsoft Windows 7, the VMs are only SMB 2 capable. Further, a typical use of VDI VMs is to redirect all logged in user home directories to a NAS share. This is where the Microsoft Excel, Word, and PowerPoint files generated by the VDI VM users are stored.

And the customer will always ask “I just bought a NAS to store my Hyper-V VMs. Why can the same NAS also offer a share for the user home directories?”. And that is where comes in the requirement that not only should the Hyper-V capable SMB 3.0 NAS not only offer SMB 2.X support, but also a richer support in terms of supporting oplocks and byte range locks and other such features used by Microsoft Office, but not by Hyper-V.

This is why a particular company, in which I have an interest www.HvNAS.com has implemented both SMB 2 and SMB 3, and regularly tests that its protocol stack implements the full range of SMB 2 and SMB 3 features, especially so all SMB features regularly exercised by Microsoft Office.

Windows Write caching – Part 1 Overview


Certain Windows applications such as database applications need to ensure their I/O is committed to media, even at the cost of reduced throughput. However, at times an administrator has faith in the hardware and is willing to accept a small risk of data corruption in favor of a higher throughput by allowing caching to occur .

This is a 3 part blog that concentrates on the write caching behavior in the Windows storage stack.

  1. Part 1 presents an overview of the Windows storage stack with specific reference to write caching
  2. Part 2 presents the “knobs” an application programmer can twist and turn to affect write caching
  3. Part 3 presents the “knobs” a system administrator can twist and turn to affect write caching

Windows Storage Stack

Figure 1 Windows Storage I/O Stack

Figure 1 shows a simplified overview of the Windows Storage I/O stack. Starting from the top of Figure 1,

  • Applications make I/O requests. Figure 1 concentrates on write requests and hence the unidirectional arrows from the application towards the disk media.
  • Depending upon the nature of the I/O (decided partly by the way the application opens a file or volume), some I/O requests completely bypass the Windows Cache Manager and go straight from the file system to the Volume Manager layer. This is labeled Unbuffered I/O in Figure 1. As will be explained later, applications can ensure that their I/O is Unbuffered.
  • Alternatively, Application I/O may traverse the buffered path labeled in Figure 1. While applications may strive to ensure their I/O is buffered, in reality, there is no way to ensure this. I/O is buffered depending upon a number of factors such as nature of file open, the type of I/O, the history of the application I/O, the load on the system, etc.
  • The Volume Manager performs sector I/O. While the application may strive to ensure that there is no caching at the sector I/O level, the reality is that applications have limited success in some cases. This is discussed in more detail within the document.

Different types of Write Caching

Irrespective of whether the data is written at the file system level or at the disk block level, write caching can be broadly classified into two categories:

  • Write-through caching:  where data is written to cache AND also written to non volatile media. The data integrity is high, but write performance is slower whereas read performance is enhanced
  • Writeback caching: where data is written to cache, the operating system write request is completed, and the data is lazily written to media at a later point in time. Writeback Caching emphasizes write performance, but at the possible loss of data integrity.

Part 2 of this blog will describe the APIs an application developer can use to control write caching behavior.

VHDX file investigations on regular spinning disk

Refer my earlier blog on VHDX file best practices. This blog explains the details behind how I arrived at those recommended best practices. Based upon a TechEd talk, I ran a series of experiments with VHDX files. While the investigations led to the summary published in my previous blog, there also remain some unanswered questions. So far, my attempts to find answers to these questions have remained unfruitful. Presumably the appropriate Microsoft personnel are extremely busy. Once I publish what I found, I will publish my questions as purely that – questions, in the hope that some fellow MVPs may have some insight.

For the investigation, I created a Windows 8 based guest VM running on a Windows Server 2012 Hyper-V host. I created a dynamically expanding VHDX file and attached it as a SCSI attached volume to the Windows 8 guest. The investigation consists of performing a series of operations on the Windows 8 SCSI attached volume and observing the size of the VHDX file backing that SCSI attached volume. For the purposes of this particular blog, the VHDX file was placed on an external USB spinning hard disk, which was not thin provisioning capable.

Operation

VHDX file size in   bytes

Create an   empty dynamically expanding VHDX file that is not yet initialized and not yet   NTFS formatted

4,194, 304

 

Initialize   the disk as MBR partitioned & NTFS format the volume

205,520,896

Copy a   single file of size 949,350,400 into the VHDX file

1,178,599,124

Delete the   file, make sure it is deleted from recycle bin as well

1,178,599,124

Shut down   the VM

1,178,599,124

Run PS   cmdlet Optimize-VHD with Mode Retrim on the VHDX

1,178,599,124

 

The fact that Optimize-VHD did not shrink the VHDX file was both a little surprising and disappointing, but presumably the error is mine. After numerous presentations that claimed the TRIM/UNMAP commands flowed from the guest VM into the parent partition, I was under the impression that given a Windows 8 guest OS and a Windows Server 2012 Hyper-V host, the TRIM/UNMAP commands flowed natively. Perhaps they do, and I don’t understand it. But requiring a PS cmdlet to make them flow is not what I would call native.

The next experiment I performed was to see if the VHDX file reused space freed up

Operation

VHDX file size in   bytes

Create an   empty dynamically expanding VHDX file that is not yet initialized and not yet   NTFS formatted

4,194, 304

 

Initialize   the disk as MBR partitioned & NTFS format the volume

205,520,896

Copy a   single file of size 949,350,400 into the VHDX file

1,178,599,124

Delete the   file, make sure it is deleted from recycle bin as well

1,178,599,124

Copy the   file again

2,118,123,520

 

Note: The implication is quite obvious. Without any intervention, the VHDX fie does not neccesarily reuse the recently freed up disk space, even when the amount of new space required is exactly equal to the recently freed up space

The next experiment I consisted of using the PS cmdlet Optimize-Volume, again with the ReTrim option.

Operation

VHDX file size in   bytes

Create an   empty dynamically expanding VHDX file that is not yet initialized and not yet   NTFS formatted

4,194, 304

 

Initialize   the disk as MBR partitioned & NTFS format the volume

205,520,896

Copy a   single file of size 949,350,400 into the VHDX file

1,178,599,124

Delete the   file, make sure it is deleted from recycle bin as well

1,178,599,124

Run PS   cmdlet Optimize-Volume with Retrim option

1,178,599,124

Shut down   the VM

205,520,896

 Note: Shutting down the VM is not really an option in production systems, but it certainly is an option for people like my fellow MVPs running VMs on their laptops.

And now, let me describe the last, but most important experiment for this blog. I expect to write more blogs on this topic.

Operation

VHDX file size in   bytes

Create an   empty dynamically expanding VHDX file that is not yet initialized and not yet   NTFS formatted

4,194, 304

 

Initialize   the disk as MBR partitioned & NTFS format the volume

205,520,896

Copy a   single file of size 949,350,400 into the VHDX file

1,178,599,124

Delete the   file, make sure it is deleted from recycle bin as well

1,178,599,124

Run PS   cmdlet Optimize-Volume with Retrim option

1,178,599,124

Copy the   same single file again

1,178,599,124

 

Note: The implication again is quite obvious. Running the PS cmdlet Optimize-Volume allows the VHDX to reuse the newly freed up space.  

 

Best practices for utilizing VHDX files in Windows Server 2012

A number of blogs and presentations have explained the advantages of VHDX files in Windows Server 2012 over the VHD files in Windows Server 2008. These advantages include:

  • VHDX files can host larger data volumes – in particular, VHDX files can host 64 TB volumes versus 2 TB for VHD files
  • VHDX files have  their metadata aligned on 1MB boundaries whereas VHD files have 512 byte sized metadata that cause alignment issues and in particular, cause significant slowdown when used with newer disks that use 4096 byte sectors instead of 512 byte sectors
  • VHDX files are more resistant to corruption

However, VHDX files have one more huge advantage that has not been as adequately explained, or at least, seems to be less widely known. This is the fact that VHDX files are much more disk space efficient in a number of ways, especially so when the system administrator follows proper best practices. This blog and some successor blogs will explain the necessary and sufficient conditions for each particular use case, and also lay out the best practices to obtain the maximum benefit.

In particular:

  • VHDX files declare themselves to be storage volumes capable of thin provisioning
  • When used properly, VHDX files based on appropriate type of storage can free up disk space and utilize the thin provisioning features of the underlying storage
  • Even when VHDX files are placed on regular storage, best practices will prevent the VHDX files from growing in size by reusing free space within the VHDX file

So what are the necessary and sufficient conditions to take advantage of these VHDX file features? They can be summarized as:

  • These tips apply to running Windows Server 2012 and Windows 8 as guest operating systems. I am hoping to find a way to have them apply to Windows Server 2008R2 and Windows 7 as well, but as of now, that is just a possibility and not a reality.
  • Separate the data volume from the system volume within the VM. Don’t just put data into the system volume
  • Seriously consider using dynamically expanding VHDX files rather than statically fully allocated VHDX files. Meta data overheads, for VHDX files have been considerably reduced in Windows Server 2012, VHDX files grow less often, and with the tips in this blog, they will grow even less often. The idea is to accept the minimal overhead of expanding VHDX files in exchange for storage space optimization. Note that if the VHDX file is placed on thin provisioned storage e.g. a thin provisioned Storage Space, using a fixed size VHDX file does not guarantee that the VHDX file will never face an issue of running out of disk space.
  • Always present the VHDX based data volume as either SCSI or virtual HBA attached storage within the VM .  
  • Periodically run a PS cmdlet to optimize the VHDX based volume. In particular, run the PS cmdlet Optmize-Volume with the ReTrim option. Optimize-Volume is described at this MSDN link, and the description is certainly incomplete. While the MSDN web page claims this PS cmdlet applies to only Windows Server 2012, it certainly also applies to Windows Server 8 as well. Yours truly has submitted the web page feedback and hopefully the page is updated soon. Further note that this PS cmdlet needs to be run with administrative privileges, again something the MSDN web page does not mention.

Details of the investigation that led to this summary of recommendations will be published in follow up blogs soon.   

Aligned I/O and large sector disks – VHD, VHDX, and VS 2012 install

The issues of aligned disk I/O have been understood for a while. One example is where the Microsoft VHD file format ensured that ¾ of data blocks within a VHD file would be aligned on a 512 byte boundary, and not a 4096 sector boundary. See the Technet blog I authored a few years ago “Some observations on Dynamic VHD Performance”.

That makes using VHD files with large sector drives (which read and write data in 4k units) extremely painful. See the MSDN KB article “Using Hyper-V with large sector drives on Windows Server 2008 and Windows Server 2008 R2”. The KB article mentions a performance degradation of up to 30%.

The problem here is that Hyper-V performs non-cached I/Os which result in a Read, Modify, Write cycle as the KB article describes. For example, if 4kb of data needs to be written, and if that data spans 2 adjacent physical 4kb sectors, Windows needs to read 8kb, modify 4kb of this 8kb data, and then write back 8kb of data.

Recognizing the problem, the Hyper-V team introduced the VHDX file format in Windows Server 2012. Look at a fellow MVP blog “Why Windows Server Hyper-V VHDX 4k alignment is so important

And now, let us return to my earlier blog about Visual Studio (VS) 2012 installer performing non-aligned writes. Since these are cached writes, the performance degradation on large sector disks will not be as noticeable. But just because one issues cached writes does not mean the data stays in cache until you are done writing to the file and close the file, when the cache is flushed. Since Visual Studio writes in a pattern where the first and last writes to a file are odd length bytes, and all intermediate writes are even bytes, the data in the cache, while the file is being written to, is always odd bytes in length. So should the cache get flushed, for any reason, you now have the equivalent of misaligned I/Os, and the penalty on large sector disks is even more noticeable.

I will be looking to buy a large sector disk and try installing VS 2012 to that disk. In the meanwhile, I would love to hear from any reader that has already tried this experiment.

The perils of alignment for memory access and disk I/O

In my earlier blog, I described how Visual Studio (VS) 2012 is now a requirement for writing kernel mode drivers on both the x86/x64 Intel/AMD, and also the ARM version of Windows 8. So I installed VS 2012 RC on two different laptops and was unhappy with the installation time. I must place on my record my appreciation for the Visual Studio team, which has been very diligent in following up and looking into the issue. Of course, I will acknowledge that my belief of “it takes too long” could be incorrect, and I may be encountering unusual circumstances on both my systems. So with that caveat that perhaps “I am encountering a one off situation”, here we go with my analysis.
First, a couple of references are in order.

  1.  To quote from MSDN “In this document we explain why you should care about data alignment, the costs if you do not, how to get your data aligned, and what to do when you cannot. You will never look at your data access the same way again.” The point is; aligned memory access in Windows is very important.
  2. It is equally important to ensure that writes are aligned as well. Most current disks write data in 512 chunks called sectors. So if you write 512 bytes at offset zero, a single write suffices. But if you write 512 bytes at offset 1, the I/O spans 2 disk sectors. So a single write becomes read 2 disk sectors, copy over the new 512 bytes of data, and issue 2 sector writes, each of 512 bytes. So a 512 byte write becomes a 1024 byte read and a 1024 byte write. Here is an MSDN blog explaining among other things, the importance of aligned I/Os for SQL. And here is a another MSDN SQL blog explaining the importance of aligned I/O

Now back to the topic at hand – installing Visual Studio 2012 RC and analyzing possible causes for why it takes as long as it does. So I decided to investigate further, by tracing the I/Os using Sysinternals (now part of MSDN) tool Process Monitor.

Here is a screen shot showing a small part of the I/O of the installation. Note that I randomly located this I/O pattern. I also cursorily checked that other files have similar behavior; in particular, write an odd number of bytes at offset zero, and then proceed to write the rest of the file.

Image

For file DataCollection.dll, please notice

  1. The write at offset zero for 32,447 bytes
  2. The write at offset 32,447 for 32,768 bytes
  3. The write at offset 62,215 for 16761 bytes
  4. The total file size is 81,976 bytes and 32,447 + 32,768 + 16,761 = 81976

Now apply the logic of the references quoted – in particular, the importance of aligned memory access, and aligned disk I/O access.

At the very least, the each of the 3 I/Os will consist of a 1 or 3 byte copy, a copy of some N DWORDs, followed by a 1 or 3 byte copy. This could have been completely avoided by doing 3 I/Os, each consisting of an even number of bytes. There is a penalty to be paid for the 1 byte and 3 byte memory access.

I must admit that this trace is at the file system layer. It is certain that before the I/O hits the disk, which is a block mode I/O, the Windows Cache Manager and I/O subsystem will have intervened to make the I/O aligned and an integral number of sectors. There will still be some disk I/O penalties however, when some writes get split across 2 adjacent sectors. This could be avoided. Consider the case where say part of the file has been written, and is in cache. And the I/O pattern guarantees that there will be an odd number of bytes cached, until the final odd length write arrives. Now imagine that for some reason, the cache gets flushed before the last write arrives. This could be because the file is very large, or there is memory pressure. This means that the cache manager will zero fill a buffer until the end of a sector (an odd number of bytes) and then write out that sector. When the next write arrives, this just flushed sector needs to be read, the zero filled bytes are copied over with the newly arrived data, and then the same sector is written – again!

There is no perceivable advantage in making the I/O nonaligned – and significant potential harm. It is difficult to estimate how much VS 2012 installation will speed up, were the writes to be aligned.
There are other oddities as well in the trace, but I will write about those in future blogs.

I invite reader comments on whether they believe this I/O pattern is within acceptable bounds. For readers willing to trace their VS 2012 install, I would also welcome feedback as to whether they observe this pattern.

Windows 8, ReFS, and Extended Attributes

By now, a lot of my readers will be aware of the new ReFS file system in Windows 8. I personally believe ReFS will evolve, and will add support for some of the missing features. To recap, some of the features present in NTFS, but absent from ReFS (at least the ReFS in Windows 8 include:

  • No support for Extended Attributes
  • No support for Alternate Data Streams
  • No support for quotas
  • No support for bootable volumes

This is not a comprehensive list. I decided to look briefly at each of these features in a series of blog posts.

Extended Attributes are known to be used in two specific cases, if one believes some of the postings on the Internet, including on Technet forums. To be clear, I am not attributing these postings to Microsoft employees, I am simply saying users have posted on some Microsoft owned forums:

  • Internet Explorer and Windows use Extended Attributes to mark files downloaded from the Internet and provide a security warning when the user attempts to execute the file.
  • Windows Client Side Caching (yes, the name keeps evolving) uses Extended Attributes to mark files in its cache

What, if anything else, uses Extended Attributes ?

Since there was no convenient way, I wrote a small utility to examine a specified directory and inspect each of its component files and sub-directories to determine if any of them have Extended Attributes. I am happy to provide this utility free for any curious folks. I ran the utility on a Windows 8 Server box and a partial output is at the end of this blog.

What I found is that a number of Windows system files and directories use Extended Attributes.

It will be interesting to see what path Microsoft takes to improve ReFS functionality e.g. if and when Microsoft enhances ReFS to provide support for bootable volumes, will ReFS evolve to support Extended Attributes, or will Windows evolve to stop using Extended Attributes.

Here is the promised partial output

Directory \??\C:\Documents and Settings has Extended Attributes.

Directory \??\C:\ProgramData\Application Data has Extended Attributes.

Directory \??\C:\ProgramData\Desktop has Extended Attributes.

Directory \??\C:\ProgramData\Documents has Extended Attributes.

 File \??\C:\ProgramData\Microsoft\Windows\Hyper-V\Resource Types6FF76FA-2D58-4BAF-9F8D-455773824F37.xml has Extended Attributes.

File \??\C:\ProgramData\Microsoft\Windows\Hyper-V\Resource Types\118C3BE5-0D31-4804-85F0-5C6074ABEA8F.xml has Extended Attributes.

File \??\C:\ProgramData\Microsoft\Windows\Hyper-V\Resource Types\146C56A0-3546-469B-9737-FCBCF82428F4.xml has Extended Attributes.

File \??\C:\ProgramData\Microsoft\Windows\Hyper-V\Resource Types\19839BFF-6F04-4B24-B4B5-1AFCCBE729DE.xml has Extended Attributes.

Windows 8 client/server file traffic approaches DAS speed

Traditionally, client server file traffic has always been considered to be very slow, as compared to file access when the file is placed on Direct Attached Storage (DAS).

Note that clients as used in this particular blog apply to not just clients as in laptops, but also server to server communications where one of the servers acts as a client. For example, a laptop connects across the Internet to an IIS Server, and the IIS Server fetches some files from a file server to satisfy the client request.

Microsoft just posted a white paper showing some very interesting performance benchamrsk for file access over SMB2.2 when both client and server are running Windows 8. The paper can be found here

A one line summary of the paper could be “client/server file access speeds go from high twenty percent to almost parity with speed of Direct Attached Storage” ; and in particular from say 28% to 97%.

Note that the “client” used in the test had 48GB RAM, and while I admit that the test involved non cached I/O, so the extra RAM was not used for caching, it is still worth noting that this is not a typical client as in a laptop client. This is more like a server behaving as a client.

Nevertheless, this makes the client/server world more interesting, and also makes a compelling case for upgrading to Windows 8. Especially so when you can simultaneously upgrade all your servers e.g. SQL, IIS, and NAS file servers to be Windows 8. Upgrading just a single server does not help much, since in that case, the client is still an old client that does not speak the SMB 2.2 protocol.

Windows Server 8 encrypts user data in flight between client and server

I have been at the MVP Summit in Redmond, and have been carefully noting “advances” between Windows Server 8 Developer Preview (made available at the BUILD conference in Anaheim) and the newer Windows Server 8 Beta build made available on Feb 29 2012.

One new feature is that the SMB 2.2 protocol used for client/server file data exchange now allows for data encryption. Both the client and the server need to be running the new SMB 2.2 protocol – so this means both need to be Windows 8, or the Server needs to be a fairly new one from a vendor that has implemented SMB 2.2

Verticals such as the health care industry should find this particularly useful. The huge overhead of using a solution such as IPSEC can now be avoided.

Details such as what encryption algorithm, as well as performance characteristics of this encryption will be blogged about, once I perform some experiments with the beta bits.

The only public Microsoft information as of today is at http://technet.microsoft.com/en-us/library/hh831795.aspx