Monthly Archives: November 2011

Notes on Windows 8 VHDX file format

Last year, I wrote an article on Technet regarding the performance issues with the dynamic VHD file format. That article can be fund here . The main point made in the article was that the dynamic VHD file format ensured that 3 out of every 4 2MB data blocks were misaligned in the sense that they did not begin on a 4096 byte boundary. With the future of large sector disks, this will become even more important.

With Windows 8, Microsoft appears to have addressed this issue with the new VHDX file format. As disclosed in some talks at the BUILD conference, the VHDX file format

  • Extends VHD file based volumes to 16TB from the current 2TB limit
  • Improves performance
  • Makes the file format more resilient to corruption

While the best way to evaluate these features would be to compare the VHDX file format with the VHD file format, unfortunately Microsoft has not yet released the VHDX file format.

I decided to look into this a little further and wrote a little applet that writes 4kb at offset zero in a file, and 4kb at offset 2MB in a file. I ran the program twice

  • Once with the file housed in a VHD based volume
  • Once with the file housed in a VHDX based volume

 Given that the VHD file format in Windows Server 2008 R2 uses 2MB data blocks, the applet effectively writes at the beginning of the first 2 data blocks.

While I plan to analyze the I/Os in more detail, for now, there is one interesting observation. Here are the I/Os on the VHD file traced using Process Monitor in the Windows Server 2008 R2 parent.

And here are the I/Os traced with the file housed in a VHDX based volume

The immediate observation is that there are some 512 byte writes on the VHD based volume whereas the VHDX based volume shows no 512 byte writes at all. The 512 byte writes are presumably the writes to the Sector bitmap of the VHD (file format). While the conclusion is not definitive, one is drawn to believe that the 512 byte sector bitmap has been replaced and/or moved – maybe the sector bitmaps are now all together and not interspersed between data blocks.

More on this topic in a later blog.

NTFS volume defragmentation – Part 1 – a developer perspective

This blog is about NTFS volume fragmentation from a developer perspective.

As a developer, my perspective is that many applets, including, but not limited to Microsoft tools and utilities, provide NTFS with insufficient information to place a file on a volume so as to certainly avoid defragmentation.

As an example, assume we are coding a file copy applet. The typical applet, with some oversimplification, might look something like this

Open source file

Open destination file

While (!EndOfSourceFile)

{

                Read(SourceFile)

                CheckForEndOfFile

                WriteToDestinationFile (including write a partial buffer if any)

}

Close source file

Close destination file

The example ignores the code to get and set file attributes and ACLs, purely to concentrate on the fragmentation that occurs while writing the default data stream.

Assume both files are buffered. At some point in time – and this point is typically non deterministic – it depends upon system state including what else is in the Cache Manager buffers, what other apps are running etc. the Cache Manager will decide to flush the destination file. At this point in time, the Cache Manager will have N buffers cached, with N being a number depending upon system state. If you copy the same file one more time, N may be the same, or may be different, when the Cache Manager decides to flush the file buffers during that second copy.

Now consider what NTFS needs to do. It needs to decide where to place the destination file, and write out the buffers being flushed. In effect, the copy applet we wrote is telling NTFS:

  1. Find a place for the file on the volume
  2. I am not telling you how big the file is, but make sure it does not cause fragmentation!

Unconfirmed rumors are that NTFS guesses the file size to be twice the size of the Cache Manager contents and based on that assumption, places the file on the volume. Whether that is true or not is immaterial.

The takeaway is that NTFS needs to know the size of a file before it decides where to optimally place the file – and very often, that information about the file size is unavailable when NTFS is asked to make that file placement decision.

A future blog will explore this further and examine what APIs an application can invoke to provide NTFS with all the information it needs to avoid fragmentation.