This blog is about NTFS volume fragmentation from a developer perspective.
As a developer, my perspective is that many applets, including, but not limited to Microsoft tools and utilities, provide NTFS with insufficient information to place a file on a volume so as to certainly avoid defragmentation.
As an example, assume we are coding a file copy applet. The typical applet, with some oversimplification, might look something like this
Open source file
Open destination file
While (!EndOfSourceFile)
{
Read(SourceFile)
CheckForEndOfFile
WriteToDestinationFile (including write a partial buffer if any)
}
Close source file
Close destination file
The example ignores the code to get and set file attributes and ACLs, purely to concentrate on the fragmentation that occurs while writing the default data stream.
Assume both files are buffered. At some point in time – and this point is typically non deterministic – it depends upon system state including what else is in the Cache Manager buffers, what other apps are running etc. the Cache Manager will decide to flush the destination file. At this point in time, the Cache Manager will have N buffers cached, with N being a number depending upon system state. If you copy the same file one more time, N may be the same, or may be different, when the Cache Manager decides to flush the file buffers during that second copy.
Now consider what NTFS needs to do. It needs to decide where to place the destination file, and write out the buffers being flushed. In effect, the copy applet we wrote is telling NTFS:
- Find a place for the file on the volume
- I am not telling you how big the file is, but make sure it does not cause fragmentation!
Unconfirmed rumors are that NTFS guesses the file size to be twice the size of the Cache Manager contents and based on that assumption, places the file on the volume. Whether that is true or not is immaterial.
The takeaway is that NTFS needs to know the size of a file before it decides where to optimally place the file – and very often, that information about the file size is unavailable when NTFS is asked to make that file placement decision.
A future blog will explore this further and examine what APIs an application can invoke to provide NTFS with all the information it needs to avoid fragmentation.