One of our SQL Servers was experiencing fatal errors on a frequent basis during batch processing. This was a SQL Server 2008 on Windows 2008 operating system. Here are some details of the error log which were captured:
This fatal error suggests that something is wrong on the network which is causing network packets to drop. The error which is captured in the SQL Server error log can be due to different reasons. I searched this error on the web and found some MSDN forums where similar issues were discussed. Some responses suggested disabling the TCP/IP Chimney Offload feature, so I decided to research this further. Let's start with TCP/IP Chimney offloading, RSS features and NetDMA.
Microsoft released the Scalable Networking Pack (SNP) that consists of three main features. These three features are TCP/IP Chimney, Receive Side Scaling (RSS) and NetDMA.
- TCP/IP Chimney Offload, as per TechNet, is designed to take processing of the network such as packet segmentation and reassembly processing tasks, from a computer's CPU to a network adapter that supports TCP Chimney Offload. This has the effect of reducing the workload on the host CPU and moving it to the NIC, allowing both the Host OS to perform quicker and also speed up the processing of network traffic.
- Receive Side Scaling (RSS) enables the network load from a network adapter to be distributed across multiple CPUs in a multiprocessor computer.
- Network Direct Memory Access (NetDMA) provides services for offloading the memory copy operation that is performed by the networking subsystem to a dedicated direct memory access (DMA) engine when receiving network packets.
Now let's check these settings on our impacted system. We ran the below command to check the existing values of these SNP settings:
netsh int tcp show global
As we can see, all these settings were enabled as shown in the above screenshot. By default, TCP Chimney Offload is disabled in Windows 2008 and later versions, but sometimes vendor applications can turn them on. We can also check whether TCP Chimney Offload is working or not by running the netstat -t command.
I checked these settings on other production servers for the same application and found that these settings were disabled on those servers. See the below output of SNP settings on two production servers:
SERVER: DELHI2013SQL01 TCP Global Parameters ---------------------------------------------- Receive-Side Scaling State : disabled Chimney Offload State : disabled NetDMA State : disabled Direct Cache Access (DCA) : disabled Receive Window Auto-Tuning Level : normal Add-On Congestion Control Provider : ctcp ECN Capability : disabled RFC 1323 Timestamps : disabled SERVER: DELHI2013SQL02 TCP Global Parameters ---------------------------------------------- Receive-Side Scaling State : disabled Chimney Offload State : disabled NetDMA State : disabled Direct Cache Access (DCA) : disabled Receive Window Auto-Tuning Level : normal Add-On Congestion Control Provider : ctcp ECN Capability : disabled RFC 1323 Timestamps : disabled
The problem is that many NICs report to the OS that they support these features, which they indeed do, but many didn't perform these functions very well in reality. These SNP options looks good from an OS perspective, but due to misbehaving NIC drivers they turn into lot of weird issues so it is better to turn them off to fix such issues, get an update from the NIC vendor or use a better NIC driver that can use these features in an effective way.
We should disable these SNP features in Windows and NIC hardware setting as well with the vendor's firmware tools if we don’t have an update for the NIC driver or NIC drivers are not compatible with these features. We can check these settings in network adaptors by launching device manager and then expanding the network adaptors section as shown in the below screenshot.
Steps to Disable NIC Settings
So we decided to replicate the same settings (which were on other production servers) into our impacted server where the fatal errors were reported. We used this Microsoft article to disable the TCP/ IP Chimney, RSS and NetDMA state by using the below steps.
1: Open a command prompt with administrative privileges.
2: Run the below command to disable TCP/IP Chimney Offload.
netsh int tcp set global chimney=disabled
3: Run the below command to disable RSS.
netsh int tcp set global rss=disabled
4: NetDMA will be disabled through the registry, so make sure to backup your registry before doing the next steps. To open the Registry Editor, click Start, click Run, type regedit, and then click OK.
5: Locate the registry sub-key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters and click on it.
6: Locate the EnableTCPA registry entry. If this registry entry does not exist, right-click the Parameters sub-key, point to New, and then click DWORD Value.
7: Replace the New Value #1 by typing EnableTCPA, and then press ENTER. Double-click the EnableTCPA registry value you just created and type 0 in the Value to disable NetDMA, and then click OK.
8: TCP Chimney Offload and RSS will be disabled just after executing the above commands, but NetDMA requires a system reboot after making changes to the registry.
After the above changes and the reboot we can run the below command to check whether all these settings are disabled, which they are as shown below.
netsh int tcp show global
After we made these changes we no longer received any fatal errors. These features are made to enhance the capability of Windows, but unfortunately it is not always useful if your NIC drivers do not support these features.
- If you run into these errors, use the steps in this tip to see if these features are enabled for your NICs.
- If you have advanced NIC drivers that can utilize these features then enable TCP Chimney Offloading otherwise it is suggested to turn them off.
Last Update: 2015-03-05
About the author
View all my tips