Microsoft SQL Server Virtual Machine Best Practices

By: Eric Blinn | Updated: 2021-08-27 | Comments (4) | Related: > Virtualization

Problem

I am building a virtual machine to run SQL Server and I need to know what the best practices are for such a machine.

Solution

The recorded presentation, Running SQL Server on Virtual Machines Best Practices, talks about many different settings related to building SQL Server VMs. This tip will cover many of those same settings in summary and try to answer some of the FAQ that came out of that session. Both the presentation and this tip are skewed heavily toward the Windows operating system.

Operating Systems

While it used to be common to have a difficult decision between editions of Windows Server, with newer versions of the operating system there isn't a significant reason to select one edition over another when building a SQL Server virtual machine. Both Windows 2019 Standard and Datacenter support 64 sockets and 24TB of RAM.

The big decision to be made with regards to the operating system is whether or not to include the full UI or to use the Windows "Core" installation. SQL Servers VMs should focus on serving clients of the SQL Server and not on running RDP sessions with SSMS or any other resource-hogging applications. As such, installing the full UI should be the exception rather than the rule.

If the full UI needs to be installed it would still be a best practice to limit the RDP sessions on the server. Do this by installing client tools like SSMS on workstations or jump boxes that are dedicated to such activities.

Does Microsoft still recommend rebooting SQL server monthly?

I reboot my SQL Servers only when they need to be patched. The often means monthly anyway, but if there were no patches I would not reboot.

Editions of SQL Server

The following information is accurate for SQL Server 2019 as of the publication date.

There are very significant differences between the 3 main editions of SQL Server. SQL Server Express is completely free, but is limited to referencing just under 1.4GB of RAM, 1 socket/4 cores of CPU, and a 10GB database. SQL Server Express also does not come with the SQL Server Agent Job scheduler.

SQL Server Standard Edition is the most common edition and can reference 128GB of RAM and 4 sockets/24 cores of CPU. It includes the SQL Server Agent and Basic Availability Groups. Basic Availability Groups are limited to exactly 2 nodes. Starting with SQL Server 2019 Transparent Data Encryption (TDE) became a Standard Edition feature after being Enterprise only since its inception in SQL Server 2008.

SQL Server Enterprise Edition is the most expensive with a list price over 3x higher than Standard Edition. Enterprise Edition removes the CPU and RAM constraints of Express and Standard. It is limited only by operating system limits. It includes a more robust set of HADR options (like Availability Groups with more than 2 nodes) along with several features designed to improve uptime such as online index rebuilds.

Both Standard and Enterprise Editions of SQL Server are most commonly licensed by the CPU core with a minimum of 4 core licenses required per server. A less common method to license Standard Edition is to license each server (regardless of cores) and purchase a client access license (CAL) for every client.

Question regarding Server-based licensing: What if there is only a web service account that is accessing the SQL Server instance and users are connecting via the application itself? Does each user still need to be licensed in SQL?

Yes. Even an indirect hit like this requires a CAL for server-based licensing. This is why server-based licensing is almost never the right choice for SQL Server supporting web sites and you should use core based licensing.

vCPUs

Pay attention to the sockets vs cores. SQL Server licensing has separate limits for cores and sockets so this is an easy misstep.

Since the most common method to license SQL Server is by the core, there is real value in spending the money to make sure the VM hosts have high-powered CPUs rather than CPUs that might have a lower cost, lower power consumption, or lower performance statistics. The money spent on the physical hardware might be able to be recouped with savings on the SQL Server licensing cost when the VM runs with less vCPUs.

If possible, get physical CPUs that have a large cache. Modern CPUs often have a power-saving mode which can be great in some scenarios, but not for SQL Server. Make sure all cores are running at full power rather than any sort of power-saver mode.

What if the vendor wants a bunch of CPUs but the specs seem like they are based on physical vs. vCPU? Do you have a thought on assigning "too many" vCPUs to SQL?

If the vendor advice is old enough to be based on physical cores, then they are probably thinking about you having less powerful CPU cores than are commonly available on the market today. Since you generally pay for your SQL Server licenses by the core I would see if I could get by with less cores. In the virtual world you can always add more later if you need to. It is very hard to give vCPU cores back, especially if you already paid for the SQL Server licensing.

Memory

The biggest issue to consider regarding memory on virtual machines running SQL Server is that SQL Server does not like it when the amount of memory assigned to the operating system changes. Do not use Dynamic Memory on a SQL Server. Instead, get a full reservation of all the memory required.

Any opinion on the Lock Pages in Memory setting?

Personally, I don't use it for SQL Servers running on VMs. I used to enable the setting when I had physical boxes that had smaller amounts of memory where I felt a stronger need to protect what memory I had. Also, since I no longer let users RDP into my servers, I have less processes running that are likely to attempt to steal pages from the SQL Server buffer pool.

Disk

For this section the FAQ cover it pretty well. Jumping right in…

What is your opinion or recommendation of creating different LUNs for Data/TLOG/Temp?

Gone are the days where we need separate volumes for each type of SQL Server file. That was common when disks were internal to servers and different RAID choices were needed to support different workloads. It is still useful to have OS and other binaries on a separate logical volume from SQL Server data and log files.
The goal here is to get enough IOPS to get the performance we desire. Whether that is completed via many LUNs or one big LUN doesn't matter.

Should system database files live on the same logical volume as the SQL binaries or with the user databases?

I generally keep my system database files right next to my user database files. I would not recommend storing them on the OS volume – despite their small size.

If we can set up separate logical volumes to be optimized for different workloads, would it make sense to separate data/log/tempdb files?

If you are able to create different logical volumes that are optimized for different workloads then it makes sense to separate file by type onto these tuned disks.

Do you recommend formatting a Windows logical volume that is going to be used for SQL Server data/log files with a 64k block size?

Yes! I probably should have mentioned that in the presentation. Great catch!

What about RAID type for SQL Server for performance?

In most modern architectures the disk is supplied to the VMs from very large aggregates on SAN or NAS storage devices. These storage devices often use their own proprietary array technologies rather than traditional RAID numbers. If you are building an environment where you are using internal storage or otherwise allowed to select a RAID type then RAID 1+0 would be the best in most situations.

If you have multiple databases on an instance and one database has a large IO load on the server and some other databases combine to have a much smaller IO footprint - would it help to move this high IO database to its own disk?

Would the new volume come from the same disk aggregate or different disks? If it comes from the same aggregate that the first disk uses then not much has been accomplished as no new IOPS are likely to be made available to SQL Server. If the new volume would come from different disks and introduce new IOPS then it would surely help.

Backups

Backing up a SQL Server VM is an important step, but backing up the SQL Server data is of paramount importance. These 2 operations are not the same thing! Backing up the SQL Server data must be accomplished with database backups.

Virtual Machine snapshots or backups are NOT database backups.

SAN snapshots are NOT database backups.

These database backups can be accomplished with native SQL backups (full, differential, log) or by many 3^rd party applications that are database aware.

Remember, if something goes wrong with a VM running SQL Server, a new, blank SQL Server VM could be rebuilt from a template in very short order. In that same situation restoring the SQL Server data without effective backups isn't going to happen.

Is my current backup utility database aware?

I got this question a lot and can't answer them all here. If you aren't sure, you can put it in the comments below and I'll try to answer everyone there.

Miscellaneous Questions

When I used VM for SQL Server 2008, neither hot add CPU nor hot add memory were possible without a restart. What might be the cause?

I'm no expert in such things, but it is my understanding that this is a function of the guest OS version and the options chosen during the initial build of the VM. Being back in the age of SQL Server 2008 it seems like it may well have been a limitation of the OS.

What would be causing Page Life Expectancy (PLE) to fluctuate so much throughout the day?

PLE is showing us the age of the data page in the buffer pool that has been there the longest. If some large activity, such as an index maintenance operation, needs a whole lot of data that isn't already in the buffer pool it will push out a whole bunch of data pages from the buffer pool to make room for its own operation. This can cause the PLE value to drop very quickly. After the large operation is over the buffer pool will start to reload with more commonly used data. Since the metric shows us the oldest page in the buffer pool it can only go up by 1 second per second. It will do this until the next time the data is purged from the pool and the value goes down again.

My buffer cache hit ratio is 5173. Why is it not just a 2-digit value like you were showing?

It would seem that I just got lucky to get a 2-digit number on my screen during that demo. I referred to Microsoft's books online which told me I could get that value from the dm_os_performance_counters DMV using the "Buffer cache hit ratio" row. Looking around online after getting your question suggested that in order to get the true ratio, I needed to divide that counter by the "Buffer cache hit ratio base" value which books online declares is "for internal use only". See below for the code that will determine the actual cache hit ratio.

DECLARE @base DECIMAL(10,1)ELECT @base = cntr_value
FROM sys.dm_os_performance_counters  
WHERE counter_name = 'Buffer cache hit ratio base'
  AND [OBJECT_NAME] = 'SQLServer:Buffer Manager';
 
SELECT cntr_value / @base * 100
FROM sys.dm_os_performance_counters 
WHERE counter_name = 'Buffer cache hit ratio'
  AND [OBJECT_NAME] = 'SQLServer:Buffer Manager';

Next Steps

About the author

Eric Blinn is the Sr. Data Architect for Squire Patton Boggs. He is also a SQL author and PASS Local Group leader.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips

Article Last Updated: 2021-08-27

SQL Server VM Best Practices