Home > Business Critical Applications, VMware > The Case for Larger Than 2TB Virtual Disks and The Gotcha with VMFS

The Case for Larger Than 2TB Virtual Disks and The Gotcha with VMFS

Hypervisor competition is really starting to heat up. VMware just released vSphere 5.1 and Microsoft has recently released Windows Server 2012 and the new version of Hyper-V. A significant  new feature available now in Hyper-V / Windows 2012 is a new disk format VHDX, which has a maximum size of 64TB. With the new filesystem in Windows Server 2012 (ReFS) the maximum volume size increases to 256TB ( NTFS was limited to 16TB @ 4K cluster size). So how does vSphere 5 and 5.1 compare and what are the key considerations and gotchas? What are the implications for business critical applications? Read on to find out.

Before we get started I’d like to say this article isn’t going to cover performance of large volumes. But rather the argument for supporting larger than 2TB individual virtual disks and large volumes. There are many considerations around performance, and I will cover some of the implications when you start to scale up volume size, but for particular performance design considerations I’d like to recommend you read my article titled Storage Sizing Considerations when Virtualizing Business Critical Applications.

The Case for Larger than 2TB Virtual Disks

Recently I have been having an interesting debate with some of my VCDX peers on the merits and reasons for having larger than 2TB virtual disk support in vSphere. As of vSphere 5 VMware supports 64TB VMFS5 datastores, and 64TB Physical Mode (Pass-through) Raw Device Maps (RDM’s), but the largest single VMDK file supported on a VMFS5 volume is still 2TB-512b (hereon after referred to as 2TB). The same 2TB limit applies to virtual mode RDMs also. In this debate I’ve been suggesting that for now “most” applications can be supported with the 2TB virtual disk limit. If larger than 2TB volumes are required for a VM that is very easily accommodated with in guest volume managers and device concatenation of multiple 2TB disks, or using an alternative to VMFS. However realistically this can only go so far. I plan to cover both the pros and the cons as I see them.


  • Support for an individual VM with larger than 120TB storage requirements, which is the theoretical limit with 4 x vSCSI controllers, each with 15 disks (60 disks total) at the  maximum size of 2TB each. You’ll find out why it’s a theoretical limit later.
  • Easier to manage less devices and less volumes and space can potentially be more efficiently utilised.
  • No need to use in guest volume managers for very large volumes.
  • Easier to support very large individual files >2TB without the use of in guest volume managers.
  • It could be argued that losing one 2TB device from a in guest managed volume has the same risk profile as losing a single large volume of the same size as in both cases the entire volume is potentially lost.


  • Larger individual devices and volumes take longer to backup and restore. This may require a major change in data protection architecture.
  • Larger volumes will potentially take longer to replicate and recover in a DR scenario.
  • The risk profile of losing a large volume or device is significantly higher than losing a smaller device or volume. Losing a single smaller device where no volume manager is being used results in only the small device having to be recovered instead of everything.
  • Larger individual devices still have the same number of IO queues to the vSCSI controller which effectively limits their performance. This increases the risk of running out of performance before running out of capacity (until ultra low latency solid state flash storage is of massive capacity and abundantly available anyway).
  • Significantly harder to take snapshots. A snapshot could still grow to be equally as large as the original virtual disk. This is probably one of the more significant reasons that VMware hasn’t yet introduced VMDK’s above 2TB.
  • Significantly longer to check disk for integrity if there is any type of corruption, how will it be recovered if it’s very large?
  • Impact on Storage vMotion times.

In my opinion the arguments are pretty even. But as I always err on the side of performance, and I think having more devices of a smaller size in a lot of cases is a better option as this gives you far more access to more queues and more parallel IO channels. However this is only relevant for some applications, mostly OLTP and messaging type applications. File servers, data warehousing, big data and the like may well benefit greatly from larger volume sizes, and it would make those applications significantly easier to manage. But the requirements will all be driven by the applications and at the moment I only see a very small minority of workloads require storage capacities that would justify very large individual SCSI devices and where the performance tradeoffs from an IO parallelism perspective are acceptable. Most of those corner cases have a suitable alternative for now (discussed below). I agree with my friend Alastair Cooke that I don’t want hypervisor limitations dictating my designs. Yet all designs have constraints we have to work within. Alastair has posted a good article on this topic in response to this titled VM Disks Greater Than 2TB and I recommend you read it.

Options for Larger than 2TB Volumes

So if you’ve looked at the requirements for your application and you decide that you need a volume larger than 2TB, what are your options with vSphere 5.x?

  1. Using one or more VMFS volumes with virtual disks up to 2TB and in guest volume managers to concatenate them. Implications: The more devices the more storage IO queues and potentially the more performance. Oracle RAC vMotion Supported. Theoretically supports up to 120TB storage per VM.
  2. Physical Mode RDM – Support up to 64TB individual device, more than 3PB per VM. Implications: No Storage vMotion, No Hypervisor Snapshot Support, No Cloning, No vSphere API’s for Data Protection Support (vADP), No vCloud Director Support, No FT Support, No Oracle RAC vMotion Support, No Clustering vMotion Support.
  3. In Guest iSCSI – Supports up to 16TB or greater individual devices depending on iSCSI target. Implications: No Storage vMotion (of iSCSI devices), No Hypervisor Snapshot Support (of iSCSI devices), No Cloning (of iSCSI devices), No vSphere API’s for Data Protection Support (vADP) (of iSCSI devices), vCloud Director Supported, FT Supported, vMotion Supported, Clustering vMotion Support, higher CPU utilization.
  4. In Guest NFS – Supports very large volumes depending on the array. Implications: No Storage vMotion (of NFS devices), No Hypervisor Snapshot Support (of NFS devices), No Cloning (of NFS devices), No vSphere API’s for Data Protection Support (vADP) (of NFS devices), vCloud Director Supported, FT Supported, vMotion Supported, Oracle RAC vMotion Support, higher CPU utilization.

You can’t evaluate the alternatives in isolation and to be fair they are workarounds that you wouldn’t even have to consider if larger than 2TB VMDK’s were possible. Physical Mode RDM’s in particular have operational implications, especially as you can’t use hypervisor snapshots, cloning, and no backup API integration, just to name a few. So any alternative you choose needs to be thoroughly considered.

The Gotcha with VMFS

If you are going to have databases or systems with large disk footprints (and have multiple per host) you may need to modify the ESXi VMFS Heap Size by changing the advanced setting VMFS3.MaxHeapSizeMB. Review KB 1004424Jason Boche’s article Monster VMs & ESX(i) Heap Size: Trouble In Storage Paradise and Virtual Kenneth’s article VMFS3 Heap Size. Currently VMFS5 is limited to a maximum of 25TB of virtual disks open per host (Yes per host). With a default setting allowing only 8TB of VMDK’s to be open per host. This means even if it is acceptable to you for a single VM to have multiple virtual disks of 2TB and using in guest volume managers you would not be able to configure or open more than 25TB total maximum on a single host (was 32TB with VMFS3). This is why the limit of 120TB per VM on VMFS is at this point purely theoretical.

If you want to work around this limitation you will need to adopt option 2, 3 or 4 above or use virtual mode RDMs. The reason is this limit is purely with VMFS and doesn’t impact RDM’s (physical or virtual) or in guest iSCSI or NFS.

[Updated 20/09/2012] A great example where it would be good to be able to support > 25TB VMDK’s per host and > 2TB per VMDK is where a customer has a requirement such as virtualizing 20 x 4TB File Servers. Each fileserver may not need much in the way or RAM or CPU, but does need a decent amount of storage. In theory these 20 VM’s could easily be consolidated on a single host (although wouldn’t be for availability requirements), but because the VMFS limitation this is not possible, and due to the limit of 2TB per VMDK limit you will require a minimum of 2 VMDK’s per VM. It may be more convenient to have a single 4TB VMDK for these types of servers. One option is to design for a consolidation ration of 5:1 and size the physical hosts accordingly, making sure to increase the default VMFS heap size. However this would introduce additional operational costs and effort. This brings us back to option 2, 3 and 4 above again. In this case vRDM may be a better option than pRDM even with the 2TB limit as it allows easy migration to VMFS / VMDK’s in the future. pRDM would have the advantage of reducing the number of LUNs in total required for the VM’s, which might be 60 LUNs in total, not taking into account other VM’s and LUNs in the cluster (which could bring them close to the 256 LUN limit per host), but with a tradeoff of a harder migration path in the future.

Final Word

Microsoft appears to have put the cat squarely among the pigeons in terms of large virtual disk storage support with their latest release of Windows 2012 and Hyper-V. In this respect VMware is indeed playing catch up. But are greater than 2TB virtual disks really required right now for most applications? In my opinion no. For the majority of applications the existing vSphere hypervisor can adequately cater for their size and performance needs. But this is only going to last so long. There are some good use cases documented in Cormac Hogan’s blog article How Much Storage Can I Present to a Virtual Machine.

Most applications in my experience, especially the performance and latency sensitive messaging and OLTP database applications would benefit more from a greater number of SCSI devices and queues. In their case supporting more than 256 datastores per host would be of benefit, especially if there are multiple of them all grouped in a cluster.  The benefits of using VMFS and virtual disks are compelling and not being able to support very large virtual disks is definitely going to be a major problem in the future, considering VMFS5 already supports 64TB volumes. Especially considering the explosive growth of data. But do we want larger virtual disks and to sacrifice functionality, such as snapshots? I don’t think so. I hope that VMware will support larger virtual disks, even if they increase it up to 4TB or 16TB, and without sacrificing functionality. However in the meantime the alternatives such as RDMs and in guest storage access will fill the gap for some of the minority of workloads that need it, with the resulting trade offs in functionality. For those workloads where the workarounds are unacceptable they may not be virtualization candidates, at least on vSphere anyway, till some of these problems are solved.

Just because you can do something doesn’t mean you necessarily should. The back end array architecture needs to be considered and so does the data protection and disaster recovery protection aspects of the solution. It’s not good having a massive volume and a massive amount of storage per VM if you can’t protect that data and recover it in a reasonable timeframe when required. I would like to know of your use cases that require greater than 2TB virtual disks and of your very large data Monster VM’s. Hopefully if there are enough customers that require larger than 2TB VMDK’s VMware will implement the necessary changes.

Here is what I’d like to see from VMware (In no particular order):

  • Larger than 2TB VMDK Support
  • More than 4 vSCSI Controllers per VM
  • More than 256 SCSI Devices per Host

I would be very interested to get your feedback on this.

This post first appeared on the Long White Virtual Clouds blog at longwhiteclouds.comby Michael Webster +. Copyright © 2012 – IT Solutions 2000 Ltd and Michael Webster +. All rights reserved. Not to be reproduced for commercial purposes without written permission.

  1. Simon Williams
    September 17, 2012 at 9:47 pm


    Can you suggest any names / contacts for meetings with my CEO & I in Wellington on the 27th & 28th?

    So far we are meeting BNZ and Weta… Cheers.


    Simon Williams

    Sales Director – Australia & New Zealand
    Ph. +61 488 488 328
    Twitter: @simwilli
    Email: swilliams@fusionio.com

  2. September 18, 2012 at 9:42 am

    Alternatives have worked alright for now but my main concern with this is how it effects the performance. Moreover, backups and restores take ridiculously long time which could be a problem depending on company’s RTO/RPO requirements.

  3. November 5, 2012 at 1:13 am

    Thanks Mike. Always enjoyed reading your article. Just adding some points, and do correct me if I’m wrong:
    – doing in-guest means the IP storage traffic is not visible to ESXi (and hence vCenter). So it can’t be monitored using the standard (built-in) tools. vCenter Operations will also “miss” this data as it won’t classify it as Storage. For example, high workload on this vmnic will not impact the Workload badge of the corresponding VM.
    – doing in-guest means the VM sees the storage network. This creates complexity and security should be incorporated to address this, because VM local admin is typically given to the VM owner (the Sys Admin of that VM). Personally, I’d like to keep the separation clean, so it’s easier operationally.
    – I’m not 100% certain if doing concatenation at software level (be it hypervisor OS or Guest OS) is a potential of bottleneck. I thought it’s always the physical spindle. An EMC Resident Engineer told me that’s the bottleneck when we were discussing a storage issue at a large client.

    I also agree with Alastair. Well said mate 🙂

    • November 5, 2012 at 12:27 pm

      Hi Iwan, You’ve raised some good and valid points that should also be considered when looking at guest storage design. The back end storage isn’t always the bottleneck though. Often the Guest OS configuration is also a bottleneck. The bottlenecks will vary greatly between different customers, different workloads and different designs or configurations. For example if you have SSD’s backing the VM that can easily handle a queue depth of 255 and you’re using a VM with a single virtual disk and a queue depth of 32 your Guest VM config could be a major bottleneck. But even with concatenation it’s not a silver bullet solution if all the IO’s happening on a single virtual disk that makes up the larger volume. It’ll all depend on the workload. Most solutions are never perfect as there are always constraints and compromises that need to be made.

  4. Jim Nickel
    December 10, 2012 at 12:41 pm

    I recently had to use in guest disk managers to build 1 20 tb file server. Then i also made 2 20 tb Exchange mailbox servers.

    Both of these for a fairly large client. While this works today, I can see potential problems with this in the future.

    I would very much like to see >2TB VMDK support soon.


  5. Troy MacVay
    December 13, 2012 at 8:30 am

    Very interesting post. We are a Cloud Provider and had a long standing issue in our CommVault environment that was the result of Heap Size. We run CommVault on stad alone ESXi hosts and use the HotAdd transport for backup. For us we started getting random HotAdd failures. We spend way too much time troubleshooting with out any real resolution. We even had worked with VMware support. Come to find out that we found the Heap Size issue in some last ditch troubleshooting and for us it totally added up.

    We were limited to 8TB per host of active VMDK. This was an issue for us as we tend to HotAdd much more than this on the hosts as part of the backup process. Increased the value to max and HotAdd issues are gone.

    It does for us lead us to some questions around the possibility of large RAM hosts and capacity planning. Think of a host that has 1TB of RAM and I can bet it will need to have more than 25TB of attached VMDK’s to support the VM workloads.


  6. January 7, 2013 at 2:41 pm

    We are also a cloud provider and have design issues while trying to stay below the 256 LUN limit per host. When each customer has multiple Datastores, the number of Datastores (and thus LUNs or NFS mounts) and escalate pretty quickly.

    The 25TB of attached VMDK’s seems a bit absurd. I certainly hope some of these scalability issues are taken seriously soon. It seems that ever since day one, whether it be ESX or vCenter, VMware hasn’t thought this through carefully, and instead let customers troubleshoot ridiculous issues, while the support staff at VMware has little or no real-world knowledge of larger environments.


  1. September 18, 2012 at 12:24 am
  2. September 18, 2012 at 12:27 am
  3. September 29, 2012 at 9:38 am
  4. January 7, 2013 at 4:22 pm

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: