SMB 3.1 Quick Overview

Microsoft announced preview details about SMB 3.1 with the emphasis being on improved security.

SMB 3.0 already offered the ability to encrypt data packets. SMB 3.0 also offered an enhanced level of security that including signing to detect man in the middle attacks. The key shortcoming of SMB 3.0 (that is addressed in SMB 3.1) is that the SMB 3.0 signing algorithm first negotiates signing keys. These negotiation packets are vulnerable to a man in the middle attack that would cause the SMB protocol level negotiated down to CIFS (SMB 1), which is completely vulnerable. Effectively, somebody can bypass SMB 3.0 security features by making sure the data is exchanged using older less secure protocols other than SMB 3.0. SMB 3.1 allows both the client and the server to detect such attacks.

SMB 3.0 supports only AES-128-CCM as the sole encryption algorithm. SMB 3.1 extends the encryption capability in two ways:

  • SMB 3.1 allows for negotiation of the encryption algorithm and thus makes the encryption capability extensible
  • SMB 3.1 introduces AES-128-GCM as an encryption algorithm. AES-128-GCM is equally secure as AES-128-CMM, but much more conducive to computation and this enables higher IOPS and throughput.
  • SMB 3.1 continues to support Multi Channel where TCP channels are aggregated at the SMB protocol layer for both speed and reliability.

Finally, SMB 3.1 introduces the capability to have a mixed cluster where some cluster nodes are running SMB 3.0 and some are running SMB 3.1. The SMB 3.1 enhancements allow for clients that connect to a SMB 3.1 node to only failover to a node that is also running SMB 3.1

Advertisement

SMB 3 NAS is preferable to DAS in a Windows environment

Microsoft is investing heavily in the Network Attached Storage (NAS) protocol SMB 3 and is clearly laying out a road map that suggests NAS is the future as opposed to Direct Attached Storage (DAS). Consider:

  • SQL Server 2012 system d/b and user d/bs, as well as Hyper-V 2012 workloads can be placed on NAS provided the NAS is SMB 3!
  • Microsoft made significant speed improvements in the SMB 3 client and server to have NAS achieve 97% of the speed of DAS, and this is without hardware acceleration.
  • Microsoft invested in SMB 3 Multi Channel by aggregating the bandwidth using parallel TCP channels using multiple NICs at the SMB 3 protocol layer. Multi Channel is all about speed AND reliability where failed I/Os are seamlessly moved to a different TCP channel when one channel fails.
  • Continuing on the speed theme, Microsoft invested in RDMA support via SMB Direct, which requires not just SMB 3, but also SMB 3 Multi Channel. The maximum IOPS on a Windows system is achieved when using SMB 3 NAS with SMB Direct support, NOT with DAS!
  • Going back to the reliability theme, SMB 3 includes support for Persistent Handles, which combined with the Witness Protocol, ensure applications such as SQL, Exchange, and Hyper-V never see an I/O failure, and the I/O is seamlessly moved to a different node as needed. This only works with SMB 3 NAS, and does NOT work with DAS!
  • I have been asked numerous times “But Microsoft has invested in Storage Spaces and Tiering where data is moved between SSD and spinning media to optimize performance. Does that not indicate Microsoft advocates DAS?” And my answer has always been “Storage Spaces is even more valuable when used as the storage backing a Windows Server 2012/R2 NAS!” Using Storage Spaces does not mean one has to abandon NAS.
  • Microsoft supports deduplication of VDI VMs, but the only supported configuration is with the VDI VM files residing on an SMB 3 based Windows Server 2012 R2 based NAS! (and not with DAS!)
  • To provide examples of other Microsoft efforts leveraging SMB3 , consider the simple “copy” or “xcopy” command to say copy a GBs large file. Microsoft changed the CopyFileEx API to leverage all SMB 3 features including SMB 3 credits, SMB 3 Multi Channel, and SMB Direct (RDMA) to ensure the file copy is as fast as possible.
  • The Microsoft Hyper-V team re-wrote live migration in Hyper-V 2012 R2 to leverage SMB 3. While migrating a VM, Hyper-V 2012 setup its own TCP channel to copy the VM RAM. Hyper-V 2012 R2 uses SMB 3, and thereby gets the speed/reliability improvements of SMB 3 while doing the same copy.

Why NAS hosting Hyper-V VDI VMs needs to support SMB 2 not just SMB 3

If you spend some time on storage startup websites, no matter whether they are providing Flash storage, converged storage/computing, or VM aware storage, you will find that all of them seem to find VDI as a low hanging fruit. They all have a dedicated description of how they can supply great VDI storage solution.

Hyper-V requires NAS to be SMB 3.0 capable to host Hyper-V VM files. This is reasonable, given that SMB 3.0 provides both the speed and reliability that SMB 2.X cannot.

 But while Microsoft Hyper-V 2012 imposes the requirement of the NAS being SMB 3.0 capable, customer requirements often impose an additional requirement of the NAS also being SMB 2.X capable. And some other requirements as well, as we shall shortly see.

 This is because many customers often use Hyper-V to run VDI VMs. If these VMs run Microsoft Windows 7, the VMs are only SMB 2 capable. Further, a typical use of VDI VMs is to redirect all logged in user home directories to a NAS share. This is where the Microsoft Excel, Word, and PowerPoint files generated by the VDI VM users are stored.

And the customer will always ask “I just bought a NAS to store my Hyper-V VMs. Why can the same NAS also offer a share for the user home directories?”. And that is where comes in the requirement that not only should the Hyper-V capable SMB 3.0 NAS not only offer SMB 2.X support, but also a richer support in terms of supporting oplocks and byte range locks and other such features used by Microsoft Office, but not by Hyper-V.

This is why a particular company, in which I have an interest www.HvNAS.com has implemented both SMB 2 and SMB 3, and regularly tests that its protocol stack implements the full range of SMB 2 and SMB 3 features, especially so all SMB features regularly exercised by Microsoft Office.

Protocol Converter between CIFS, SMB2, SMB3, and NFS

As a Microsoft Storage MVP, I am always looking for ways to fill in the gaps between what Windows natively offers, and what seems to be useful for enterprise and consumer scenarios.

One intriguing product idea that I have built and have an advanced prototype running is a “Protocol Converter”. I am open to different names for the product since it really does not do justice to the myriad of use cases I can see. And I am sure some readers will point out even more use cases than I am missing at the moment. This is the first of a series of planned blogs around this “Protocol Converter” idea.

The “product requirements” as I set them include:

  • Be able to freely convert any of CIFS, SMB2, SMB3, NFS to any of the other protocols. So in particular, be able to do all of these conversions

o   CIFS <-> SMB2, CIFS <->SMB3 CIFS <->NFS

o   SMB2 <-> CIFS, SMB 2 <-> SMB3, SMB2 <-> NFS

o   SMB3 <-> CIFS, SMB3 <-> SMB2, SMB3<->NFS

o   NFS <-> CIFS, NFS <-> SMB2, NFS <-> SMB3

  • Develop this product with minimal resources
  • Develop a highly maintainable product
  • Develop a product with a very high probability of working with future protocol revisions such as SMB 3.1 or SMB 4.0 (imagined names). Of course, some testing and development may be needed depending upon what features these unknown protocols will have.
  • Have an enterprise ready product, but of course, even such products begin life as a prototype

Slide1

 

Figure 1 summarizes what this conceptual Protocol Converter looks like.

After reflecting on the product requirements for a while, I decided to write as little protocol specific code as possible. I have spent years developing CIFS & SMB 2/SMB 3 stacks, and while this work has been enjoyable, I decided the world does not need yet another implementation of any of CIFS, NFS, SMB 2 or SMB 3 protocol stacks. BTW I am extremely happy about the awesome SMB2/SMb 3 protocol stack that we have developed at www.HvNAS.com . It runs on any Linux/Unix and any CPU including Intel x86 and little endian CPUs. But back to the Protocol Converter, where a key observation is that Windows Server 2012 (and 2012 R2) already ship with all the protocol parsers I am looking for, both on the client and the server side! And that is what the prototype code leverages – as of now, it has zero protocol specific code!

I see three main phases in terms of code development for this project:

  • Develop a “data path solution” where all data I/O for all protocol conversions works. So things such as file creation, deletion, enumeration, read, write, etc. work. This piece is already working, though it needs more testing.
  • Develop a security solution that enforces enterprise class access control across multiple domains, etc. The “data path” solution does not enforce proper access control, but then again, this is just a product development milestone and not a shipping product yet!
  • Add some protocol specific features that deal with differences between protocols e.g. deal with oplocks that exist on one side of the Protocol Converter, but not the other

I will write blogs to track progress on these additional development tasks as well as what I perceive to be use cases for this Protocol Converter.

I welcome any potential beta testers for this product.


Backup performance and SMB 3 Multi Channel

In this day and age of exploding data amounts, backup and restore is both increasingly important, and becoming more common and taken for granted. But not all backup “target systems” i.e. the systems to which data is backed up are created equal. Especially so, when the system being backed up is Windows based.

  1.  If your backup target system is based upon CIFS (also sometimes referred to as SMB 1), backup (and restore) is limited to 64kb serial I/O. In other words, the backup/restore software does a 64kb I/O, waits for the I/O to complete, and only then issues the next I/O. In fact it is worse than this. The total payload is limited to 64kb and hence well behaved apps that want to perform I/O in 4MB block size will only use a 60kb payload (data).
  2. If your backup target system is running SMB 2.0, the I/O is 1MB serial, which is certainly an improvement.
  3. If your backup target is SMB 2.1, the I/O is again 1MB, but SMB 2.1 has a server issuing multiple credits which means the client can issue multiple I/Os without having to wait for any one of the I/Os to complete. A typical Windows to Windows flow will show 10 1MB I/Os on the wire at the same time. Note that this is all on a single TCP channel. So the backup/restore speed is significantly higher
  4. Now recall that in most cases, BOTH the system being backed up AND the backup target are servers. For example, you could be backing up a file server or SQL server or Hyper-V server, and of course, the backup target also operates typically as a NAS (file server).  Thus it is very likely that at least one of the two has multiple NICs. If any one (or both) ends of an SMB 3 connection have multiple NICs, and provided these NICs are 10GB RSS capable (which are fairly cheap now), SMB 3 Multi Channel will kick in. SMB 3 Multi Channel establishes multiple TCP channels and engages multiple credits on each TCP channel. So with just 2 TCP channels, you could now have 20MB I/O in flight at any given moment.

In short, if Windows and especially so Windows 2012 is part of your IT environment (or planned environment), make sure your backup target has an upgrade path to SMB 3! And don’t be fooled by just the SMB 3 label! Ask your vendor if it is SMB 3 Multi Channel. The SMB 3 protocol allows a storage device to negotiate SMB 3, but not support SMB 3 Multi Channel!

Wishing you higher backup/restore speeds with SMB 3 Multi Channel!

The need for Change Block Tracking to perform Differential backups of Hyper-V VMs

Hyper-V has been gaining momentum and with Hyper-V 2012 supporting SMB 3.0 based NAS storage, this momentum is likely to accelerate. And of course, any commercial deployment needs to have a proper backup policy. This blog examines some simple, but often overlooked problems in Hyper-V backup, which of course, all backup vendors have tackled in some way or the other.

In general, there are two ways to do a backup with VMs

  1. Backup from within a VM
  2. Backup from the hypervisor aka Hyper-V parent partition

In some particular cases, only of these choices is feasible. Figure 1 shows a VM that uses an iSCSI LUN that is passed through directly to the VM.

HvBack1

After the advent of the VHDX file format and the associated performance and robustness improvements, there are even less reasons to use the configuration depicted in Figure 1. However, in this configuration, the only way to backup is by running a backup application within the VM. The speed boost you get by eliminating the NTFS stack within the parent partition is marginal, and live migration of a VM with this configuration involves a LUN transfer when the iSCSI volume needs to be moved to a different Hyper-V host.

HvBack2

Figure 2 shows a more typical configuration with a VM using a VHD(x) file (VHD or VHDX). Figure 2 shows this VM being backed up from within the VM, even though other choices exist. The main drawback here is that if the Hyper-V host were running 20 such VMs, one would have to pay the cost of 20 Backup App licenses.

HvBack3

Figure 3 shows a VM again using a VHD(x) file, but backup being performed from the Hyper-V parent partition. This is a popular configuration since the cost of the Backup App can be amortized over all the VMs being hosted. The BackUp App depends upon the VSS infrastructure Microsoft has created that runs in both the Hyper-V parent partition and inside the VM. Of course, if the VM is running an OS where no VSS IC requestor exists, this configuration is not feasible.

Once a snapshot is created and a fullback is done, the full backup will include a complete copy of the VHD(x) file. Given that the VHD(x) file will be at least 10s of GBs as in ranging anywhere from 20GB to 100GB or more, it is highly desirable that the subsequent backups be differential backups which only backup changed data within the VHD(x) file.

HvBack4

And that is where the problem lies. None of Windows 2012, Windows 2012 R2, or Hyper-V 2012 provide a facility to determine the changed blocks within the VHD(x) file.

An ideal solution would install in the Hyper-V parent partition and would install, uninstall, load, unload without requiring a reboot of the Hyper-V parent partition. The VMs would have to be restarted for the change tracking to work. This is shown in Figure 5.

HvBack5

Any ISV or OEM looking for such a generic solution is encourage to contact me via LinkedIn.

How SMB 3 Witness Protocol detects failure without any timeouts

The SMB 3 protocol that first shipped with Windows Server 2012 (and Windows 8) is remarkable for making Network Attached Storage (NAS) comparable, and in some senses, even superior to Direct Attached Storage (DAS). NAS is now almost as fast as DAS when used without hardware acceleration. When used with hardware acceleration using the sister protocol SMB Direct also referred to as RDMA, the speed can be even higher! Further, SMB 3.0 based NAS is as reliable since it provides detection of node failures and failover of open file handles (without invalidating the handle), all within a matter of 5 seconds or less. See Jose Barreto blogs for descriptions of SMB Direct and SMB Multi Channel that emphasize the speed aspects of SMB 3.0.

Given that SMB timeouts are of the order of 40 seconds, and TCP timeouts are also of a similar order of time, SMB 3.0 cannot reply upon timeouts to detect failures. This blog explains the basics of how the Witness Protocol works in conjunction with SMB 3.0 to achieve the required failure detection and failover.

This blog provides an overview and is NOT aimed at a developer audience since some technical details are skipped.

It all starts with an SMB 3 client connecting to an SMB 3 clustered file server as shown in Diagram 1

Slide1

The client notices the highly available share and using the Witness Protocol (which is RPC based), requests the node to which it connected for data path access to return a list of IP addresses for each cluster node running the Witness Protocol Service. This is shown in Diagram 2.

Slide2

As shown in diagram 3, the server responds with a list of IP addresses for all cluster nodes running the Witness Service Protocol. The protocol allows for returning both IPv4 and IPv6 addresses.

Slide3

The client receives this information and registers a notification with one of the cluster nodes other than Node A, with which it is already connected to consume data via SMB 3.0. The idea is that the cluster nodes will be running a cluster quorum protocol, whatever it is, and hence the cluster nodes B, C, D (in this example) will notice if and when node A becomes unavailable. This is shown in Diagram 4.

Slide4

Now imagine that node A becomes unavailable for some reason as shown in Diagram 5. The exact reason is immaterial. It could be a power failure or network failure or a system crash or some other reason.

Slide5

Node B (and also C and D) notice that node A is unavailable via the cluster quorum protocol running within the cluster. Node B (in this example) issues an RPC callback to the client notifying it that Node A is unavailable.

Slide6

The client then performs an SMB Session Setup, Tree Connect etc to any one of the other remaining nodes. In Diagram 7 in this example, the client connected to Node C.

Slide7

Note that the “client” can itself be another server e.g. the client could be a SQL server or an IIS server.

Tiered Storage and write back caching in Windows Server 2012 R2

With Windows Server 2012 R2, Microsoft introduces support for tiered storage and write back caching. With only rudimentary details available, this blog examines some highlights and also asks a few questions that I hope to make the content of a future blog.

Tiered storage and write back caching with Windows Server 2012 R2 requires:

  • A Storage Spaces capable set of rotating hard disks i.e. SAS, SATA, or USB hard disks. Obviously USB disks have their limitations in terms of IOPS.
  • A Storage Spaces (set) of flash storage – the word “flash” is used to loosely include SLC, MLC, and other kinds of SSD; again these must be SAS, SATA, or USB
  • Creation of a Storage Space that includes both rotating hard disk and flash disks

Tiered storage in Windows Server 2012 R2 provides just two levels. At any given time, a particular file may be

  • Fully on SSD because that is how Windows decided the file should be
  • Fully on HDD because that is how Windows decided the file should be
  • Partly on SSD and partly on HDD because that is how Windows decided the file should be
  • Pinned fully to either HDD or SSD by the administrator

With tiered storage, Windows tracks access to file ranges with a granularity of 1MB ranges. By default, a scheduled job runs at 1AM and moves the often accessed ranges of the file to SSD and the less often accessed parts of the file to HDD. The retiering can also be run on demand by the system administrator.

Windows Server 2012 R2 also introduces write back caching along with tiered storage. When writes happen, some (or all) of them end up with the new data on SSD tier. Presumably, at a later time when the scheduled optimization job runs, the data is moved to HDD.

The pros of the Windows Server 2012 tiered storage and write back caching:

  • Built into the operating system and free
  • If the understanding that write back caching simply places data on SSD and uses regular file structures is correct, the likelihood of  data corruption due to cache coherency and cache corruption is minimized

The cons of Windows server 2012 tiered storage and write back caching:

  • Only works with Storage spaces which requires SAS, SATA or USB, and in addition requires all storage to be non-RAID
  • Does not work with dedicated SSD designated as cache or in other words, the likelihood of the SSD becoming full and then write back caching being turned off is higher
  • Is not “real time” in the sense that potentially all writes go into the SSD. I could be wrong here since not enough details are available. But certainly, the process of moving often accessed file ranges to SSD, and less often accessed file ranges to HDD is not real time. It does allow the file to still be used during this retiering process, but it is still only periodic and by default only once per day.
  • The retiering process and also the process of monitoring and logging statistics as to what file ranges are actively accessed and what file ranges are less actively accessed may be resource intensive.

Hyper-V 2012 operations and the importance of SMB 3.0 Multichannel

Jose Baretto from Microsoft  has put out numerous blogs and talks, including some on SMB 3.0 and Multichannel. Some examples include The basics of SMB 3.0 Multichannel and Windows Server 2012 NIC Teaming and Multichannel. While it is difficult to add to the voluminous material Jose has contributed, this blog highlights the Hyper-V 2012 operational scenarios where SMB 3.0 Multi-Channel is useful and also points out an often overlooked fact of a configuration that is SMB 3.0 Multichannel capable. The goal of the blog is to draw attention to things that Jose has adequately already explained, but people have missed for some reason.

The Hyper-V 2012 team rewrote pieces of Hyper-V to support placing Hyper-V workloads on an SMB 3.0 share. Significantly, Hyper-V uses SMB 3.0 as a transport to move large amount of data/files from one SMB 3.0 NAS to another during live migration. But SMB 3.0 Multichannel is useful in scenarios other than live migration as well.

With Windows Server 2012, Microsoft also rewrote APIs such as the CopyFile API to leverage SMB 3.0 and its performance. CopyFile ensures there are multiple 1MB I/Os in flight and using the SMB 2/3 credit algorithm, the number of I/Os in flight can be increased. Also, these multiple I/Os in flight can flow on parallel TCP connections using SMB 3.0 Multi-Channel. Deploying a VM would typically involve copying a large vhdx file from say the test server to the production server. Another example would be Microsoft System Center copying a large (10s of GBs) vhdx file from a System Center Library Server to the production Hyper-V server. These file copies would benefit from SMB Multichannel.

And that brings me to the last point in this blog. Many folks I have talked to miss the fact that there is a case that allows for Multi-Channel without requiring multiple NICs. Here are the cases where SMB 3.0 Multichannel can come into play:

  • Either client OR server has multiple 1GB NICs. Note that it is not necessary for BOTH client and server to have multiple NICs.
  • Either client OR server has multiple 10GB NICs. Note that it is not necessary for BOTH client and server to have multiple NICs.
  • Both client AND server have a single 10 GB NIC and both the client and server NICs are RSS capable

As Jose Baretto has pointed out multiple times, even when both client and server each have only a single 10GB RSS capable NIC, that is sufficient to enable Multichannel. Microsoft observed that without RSS, TCP send/receive completion interrupts get serviced by a single CPU and that CPU becomes a bottleneck and prevents a typical Windows system from pumping a full 10Gbps through the NIC. A single TCP channel using RSS can alleviate the single CPU bottleneck, but it cannot use the full capability of the 10Gbps NIC. But a Multichannel TCP using RSS does have a chance of saturating the full 10Gbps capability.

Given that backup and vhdx file copy scenarios will occur often in a typical Hyper-V 2012 environment, SMB Multichannel thus not only plays an important role, but is very likely to be enabled with the hardware used for commercial deployment of Hyper-V 2012 VM workloads.

Windows Write Caching – Part 3 – An Overview For System Administrators

The Windows Cache Manager (also referred to as System Cache) acts as a single system-wide cache that contains driver code, application code, data for both, user mode applications as well as driver data. While an application can make API calls in a manner that guarantees the application data bypasses this cache, there is no way for an application to guarantee that its data WILL be cached. Because the behavior of the cache depends upon a number of factors and is very often non repeatable, the application and system administrator can only increase the likelihood that the application data will be cached. In other words, executing the same program multiple times is very likely to result in slightly different cache behavior each time. This is part of the reason why applications such as Microsoft SQL and Microsoft Exchange bypass the System Cache.

To illustrate the complexities involved, consider the seemingly simple act of copying a file from one volume to another. Some, but not all, of these have been originally described in References 8 and 9.

  • Either the source or the destination volume may be a local volume or a network volume
  • The access speed for the source and destination volumes may either be the same, or one may be significantly slower than the other. Further, the access speed can change depending upon a variety of factors such as network load, system load in terms of other application execution, resource usage e.g. I/O may switch from being cached to non cached and vice versa.
  • The optimum I/O size for the source and destination volumes may be either the same or significantly different
  • If both the source and destination volumes are on a Windows system, then the System Cache is involved in both reads and writes
  • The Windows team has spent a considerable amount of resources fine tuning the CopyFile and CopyFileEx APIs. Details are described in Reference 8, but the lesson to take away is the complexity of the issue and that further changes are probably forthcoming

Applications may

  • Use the CopyFile or CopyFileEx APIs and utilize the system cache
  • Use the CreateFile, ReadFile, WriteFile APIs and utilize the system cache for the source only, destination only, or both, or none.

Once you combine all of the various permutations and combinations offered by the above mentioned elements, the following situations can and do occur, when large files are being copied:

  • The system cache on the computer hosting the source file gets filled to a large extent with data from the source file. At the very least, this will affect other programs executing on that system.
  • The system cache on the computer system hosting the destination file gets filled with data for the destination file. This occurs fairly often since in the beginning, all of the destination file data is cached and thus writes appear to complete quickly. Once the destination file system cache hits a limit, disk writes (for flushing that cache) may occur slowly because the disk subsystem may be relatively slow
  • To complicate matters further, even when the data is flushed from system cache, it may be cached inside the block storage device (storage array)

When suspecting problems that may involve the System Cache, an administrator can

  • Inspect the application being used and switch to using a different application that explicitly does not use the System Cache. The Microsoft Server Performance Team Blog (Reference 7) explicitly suggests using Microsoft Exchange EseUtil as a file copy tool. The legal implications of using software shipped with Microsoft Exchange on a regular file server are beyond the scope of this document and best decided by your legal department
  • Use some other means to affect the System Cache e.g. use some other application that will consume up the System Cache, but not otherwise unduly load the system.
  • Attempt to administer the System Cache behavior utilizing in built utilities and/or registry keys

System caching can be controlled using administrative utilities and or a registry key.

To change the setting by editing the registry – as always beware of making registry changes and do so at your own risk – edit the registry key

HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\LargeSystemCache

By default this DWORD is set to one (enabled) on Server SKUs and to zero (disabled)  on desktop SKUs.

On Windows XP, Microsoft provides a GUI to make the same changes which is preferable to making these changes via registry edits. Figure 2 shows the GUI that results from launching the Control Panel System Applet and then clicking the Advanced Tab

wincachefig2

Figure 2 Windows XP Control Panel System Applet Advanced Tab

Figure 3 shows the resulting System Cache size adjustment GUI when the advanced tab is clicked in Figure 2 on a Windows XP system

WinCacheFig3

Figure 3 Windows XP Control Panel System Applet Advanced Tab to adjust System Cache Size

Note that this GUI to change the System Cache size has been removed in Windows Vista.

Windows Server 2003, Windows Vista, and Windows Server 2008 Block Storage Cache Administration

Recall the earlier explanation of the bug in previous versions of Windows that ignored application requests to ensure data/metadata got committed to storage media and the subsequent fix made in Windows Server 2000 SP3 and also Windows XP SP2. To allow system administrators an informed choice, Microsoft made available a cache administration utility called DskCache.exe. This utility was only available by calling Microsoft PSS and could be obtained without incurring any monetary charge. To make it very clear that the DskCache utility should only be used in rare circumstances, Microsoft labeled it the “Power Protected Write Cache” and shipped it natively with Windows Server 2003 and higher versions of Windows. The new utility name emphasizes that it should be used only when the administrator is sure that the disk storage cache has a battery backup to ensure data integrity.

For Windows Server 2003 and higher versions of Windows, Microsoft has provided the equivalent of the DskCache.exe tool built into Windows. To use this feature:

  • Start Device Manager
  • Select the drive for which you wish to administer the caching policy
  • Select Properties
  • Click on Policies tab
  • Look for the option  “Enable write caching on the disk”  and make sure it is selected
  • And just below that, look for an option “Enable advanced performance”. This  option favors throughput/speed at the potential risk of data corruption.

The resulting GUI from following these steps is shown in Figure 4.

wincachefig4

 

Figure 4 – Windows Server 2003, Windows Vista & Windows Server 2008 disk caching policy administration

For Windows Server 2012, here is what the disk caching policy GUI looks like

WinCacheFig5

Figure 5 – Windows Server 2012 disk caching policy administration

Conclusion

This article described means by which application programmers can

  • Ensure that their file level data does not get cached in the Windows System Cache
  • Ensure that their file data does not get cached in the block storage layer and does get committed to storage media, given the correct hardware
  • Attempt to ensure, with no guarantee of success, that their file data does indeed get cached in the Windows System Cache

This article also describes means by which system administrators can attempt to ensure that data gets committed to storage media and does not get cached at either the System Cache or any block storage cache.

References

  1. Microsoft KB 241374 (http://support.microsoft.com/kb/241374/EN-US/) : Read and Write Access Required for SCSI Pass Through Request
  2. Microsoft KB 8373314: About Cache Manager in Windows Server 2003
  3. Microsoft KB 332023 Slow Disk Performance When Write Caching Is Disabled
  4. Nuances of Windows NT and SCSI disk performance article by Dilip Naik
  5. Force Unit Access Proposal
  6. Microsoft KB  870894 You receive a “Delayed Write Failed” error message in Windows XP Service Pack 2 or Windows XP Tablet PC Edition 2005
  7. Slow Large File Copy Issues – Microsoft Server Technical Support Performance Team Blog http://blogs.technet.com/askperf/archive/2007/05/08/slow-large-file-copy-issues.aspx
  8. Inside Vista SP1 File Copy Improvements – Mark Russinovich Blog http://blogs.technet.com/markrussinovich/
  9. Server Generates Delayed Errors Copying Very Large Files http://www.eggheadcafe.com/software/aspnet/32252624/server-generates-delayed.aspx
  10. Microsoft KB 920739  http://support.microsoft.com/kb/920739  Decreased Performance when copying files larger than 500 MB
  11. Serial ATA Program Revision 1.2 http://www.sata-io.org/documents/Interop_UnifiedTest_Rev1_2_v10_091707_000.pdf
  12. Disks, Lies, and damn disks http://perspectives.mvdirona.com/2008/04/17/DisksLiesAndDamnDisks.aspx
  13. Serial ATA in the Microsoft operating system environment http://www.microsoft.com/whdc/device/storage/serialATA_FAQ.mspx
  14. Enforcing Database Recoverability on Disks that lack Write-Through ftp://ftp.research.microsoft.com/pub/tr/TR-2008-36.pdf