Category Archives: Hyper-V

Windows 8 VHDX file instant dedupe wish list

I have been testing the Windows 8 dedupe feature, especially so for large VHDX files. But the testing has revealed a major “wish”. Hopefully somebody from the right department at Microsoft reads this and at least puts it on a feature list for the future.

Here is a scenario I exercised – and it seems to be a very common scenario

  1. Create a Windows Server VM inside a 40GB VHDX file- call it VM1.vhdx
  2. Xcopy – (and yes – xcopy /J –see my previous blog “Tips for copying VHD and VHDX files”)  the VM1.vhdx  file to say VM2.vhdx. That’s 40 GB of reads and 40 GB of writes.  
  3. Repeat the xcopy to a different destination file – Xcopy /J VM1.vhdx to VM3.vhdx and that’s 40 GB more reads and 40GB more writes.
  4. Fire up VM1, enter license info, assign computer name, assign IP address, etc. Turn into a file server
  5. Fire up VM2, enter license info, etc, install Microsoft Exchange into a second VM and turn into an Exchange Server
  6. Fire up VM3, enter license info, etc and install SQL Server into a third VM and turn it into a SQL Server
  7. Now let the system idle, make sure it does not hibernate, wait for dedupe to read all 3 VHDX files ( 3 X 40 GB worth of reads, etc) and dedupe the files.

Instead, here is an alternative sequence that would be really useful

  1. Create a Windows Server VM inside a 40GB VHDX file
  2. Run a PS script that creates an instantly deduped second copy of this  VHDX file – with all the associated dedupe metadata. So now I have 2 VHDX files that are identical and have been deduped. The PS script would have to invoke some custom dedupe code Microsoft could ship. Create a new file entry for say VM2.vhdx and create the dedupe metadata for VM1.vhdx and VM2.vhdx.
  3. Repeat the same PS script with different parameters and now I have 3 identical VHDX files, all deduped
  4. Repeat steps 4 through 6 from the first sequence – step 7 – the dedupe step is not needed

This would save 100s of GBs of reads and writes, and administrator time, increasing productivity. Whether you call this instant dedupe or not is up to you.

In the interest of keeping the focus on the instant dedupe scenario, I have deliberately avoided the details of requiring Sysprep’ed installations. But the audience I am targeting with this blog will certainly understand the nuances of requiring Sysprep.

If you are a Microsoft MVP reading this blog, and you agree, please comment on the blog, and email your MVP lead asking for this feature.

Advertisement

Tips for copying large VHD and VHDX files

I have been copying VHD files for a while and have been partly putting up with some issues, but finally devoted the time to look at the issues a little closer.

I have 2 different systems running Windows Server 2008 R2, one with 4GB of physical RAM and one with 16GB of physical RAM. Obviously these are developer systems and certainly not production systems. The problem happens when I copy large VHD files, large being defined as anything significantly larger than the amount of physical RAM on the system that is doing the copying. So for these 2 systems, say anything larger than 20GB in size.

I used copy or xcopy with the default options to copy the large VHD file from one local volume to another. Due investigation showed that the physical RAM in use grew to 100% and stayed there while the copy was happening. I was careful not to run any other tasks on either system. It also appeared that the physical RAM was all being consumed by the Cache Manager.

Once the physical RAM usage hit 100% (as observed via PerfMon), I tried starting up NotePad. In a highly unscientific study that does not have enough data points, I found that at least half the times, the system “hiccupped” noticeably before NotePad ran – it took a while even for me to be able to type “Start Run notepad”.  There certainly were times, especially so early in the copy process, that the systems were highly responsive, even with physical RAM usage at 100%.

My speculation – and I have not done any investigation to verify – is that the physical RAM is being consumed by the Cache Manager for 2 different purposes. One is read ahead of the source VHD file, and the second is caching the data written into the destination VHD file. The cache manager is more willing – and able – to give up memory that has read ahead cached data. The cache manager has to work harder – and will consume system resources – when it needs to free up RAM that has cached data written to the destination VHD file.

Looking around, I noticed that the problem of copying large files on Windows seems to be a well known problem. The Microsoft performance team has written a blog “Slow large file copy issues”. Clearly they conclude that the solution is to copy the large file non cached. While the Microsoft Performance team suggests using Exchange EseUtil, I am not sure how many of my readers have access to that utility. I will also point out that I don’t understand the legal issues in taking an utility that ships with Microsoft Exchange and copying it to a non Microsoft Exchange system!

The simpler solution, again as the Microsoft Performance Team advocates, is using the Windows 7 or Windows Server 2008 R2 xcopy utility and making sure to specify the /J option indicating that the file should be opened and copied in a non-cached manner.

My same unscientific testing shows that xcopy /J works well in copying large VHD files. Someday, I will trace this to figure out whether xcopy /J performs non-cached I/O on both the source and destination VHD files, or on just one of them.

In the meanwhile, do certainly consider using xcopy /J to copy large VHD  and VHDX files.

Notes on Windows 8 VHDX file format

Last year, I wrote an article on Technet regarding the performance issues with the dynamic VHD file format. That article can be fund here . The main point made in the article was that the dynamic VHD file format ensured that 3 out of every 4 2MB data blocks were misaligned in the sense that they did not begin on a 4096 byte boundary. With the future of large sector disks, this will become even more important.

With Windows 8, Microsoft appears to have addressed this issue with the new VHDX file format. As disclosed in some talks at the BUILD conference, the VHDX file format

  • Extends VHD file based volumes to 16TB from the current 2TB limit
  • Improves performance
  • Makes the file format more resilient to corruption

While the best way to evaluate these features would be to compare the VHDX file format with the VHD file format, unfortunately Microsoft has not yet released the VHDX file format.

I decided to look into this a little further and wrote a little applet that writes 4kb at offset zero in a file, and 4kb at offset 2MB in a file. I ran the program twice

  • Once with the file housed in a VHD based volume
  • Once with the file housed in a VHDX based volume

 Given that the VHD file format in Windows Server 2008 R2 uses 2MB data blocks, the applet effectively writes at the beginning of the first 2 data blocks.

While I plan to analyze the I/Os in more detail, for now, there is one interesting observation. Here are the I/Os on the VHD file traced using Process Monitor in the Windows Server 2008 R2 parent.

And here are the I/Os traced with the file housed in a VHDX based volume

The immediate observation is that there are some 512 byte writes on the VHD based volume whereas the VHDX based volume shows no 512 byte writes at all. The 512 byte writes are presumably the writes to the Sector bitmap of the VHD (file format). While the conclusion is not definitive, one is drawn to believe that the 512 byte sector bitmap has been replaced and/or moved – maybe the sector bitmaps are now all together and not interspersed between data blocks.

More on this topic in a later blog.