SQL Server Monitoring with SolarWinds SAM
I need to monitor the SQL Server instances in my environment and our IT team has a need to monitor the health and availability of our overall environment. We need to understand the performance issues in the environment at a high level, prioritize our findings then drill into the root cause in order to correct the issues. Ideally, we would like to have a single monitoring tool for the DBA, Development and Networking teams to use. If we can also provide an interface for our first line managers and higher-level technical management that would be a plus. How can we tackle these needs?
There are a number of different options available to address your monitoring needs. As a SQL Server DBA, we have tools like Extended Events, Query Store, Profiler, Performance Monitor, Dynamic Management Objects, T-SQL, WMI, SNMP, JMX, VMware protocols, PowerShell scripts and API pollers, etc. that capture performance counters, SQL statements and wait stats. The same is true for Microsoft Windows, Virtual Machines, Exchange, Dynamics, etc.; there are native tools that ship with the product that give you insight into the performance metrics from your infrastructure and applications. Each of these tools can be configured separately to capture data, store the historical data and create reports manually, which can quickly become time consuming for servers on-premises and in the cloud. This situation gets even more complex when each of these data sets are stored in different formats, in different locations and correlation becomes very problematic to associate the data.
The business needs to be supported, so this performance monitoring data and reporting is necessary for optimization of the entire environment. As a technology professional we need to nimbly respond to unexpected issues impacting the business and be proactively alerted to prevent issues. Building a system to support these needs and more, is no small feat. This is where turning to a tool focused on monitoring, alerting and reporting could be your best answer.
Let me introduce you to Server & Application Monitor (or SAM for short) from SolarWinds to see if they can help you solve your monitoring issues. They have been helping organizations around the globe monitor their environment at a broad level, so the entire organization will benefit from their monitoring solution, which is a portion of the Orion platform.
What Microsoft SQL Server problems can SolarWinds SAM solve for you?
- Overall status for the environment, servers and prioritize issues
- Graphical and detailed views of metrics for easy comparison
- AppInsight for SQL metrics and monitoring
- Top 10 issues to address in terms of Response Time, CPU, Memory, IO, etc.
- Current application alerts and events
What overall IT problems can SolarWinds SAM solve for my IT team?
- Single web based interface for application performance monitoring Microsoft Windows, Linux, Hyper-V, VMware, SQL Server, Azure, AWS, Oracle, Dynamics, DNS, IIS, Exchange, IBM DB2, MySQL, SharePoint, Active Directory and more
- Summarization of issues with the ability to drill into the details in an intuitive manner
- Insight into active processes, Service Control Manager and real time Event Log Viewer
- Inventory of your environment
Overall SQL Server Environment Status from SolarWinds SAM
SAM has a number of dashboards within the monitoring tool that can help organize and easily determine the overall health of your environment; there is even one specifically for SQL Server. For SQL Server DBAs, Developers and Network Admins responsible for SQL Server 2008 to 2019, the dashboard below is your launching pad to start to determine the status for your responsibilities. From this interface, you can get details related to the following:
- Prioritized data to guide your application performance troubleshooting
- Insight into the SQL Server instances and SQL Server databases which are all color coded to easily determine status
- Active application alerts
- Last 25 application events
- Numerous Top 10 counters
Here is a sample screen shot of the SQL Server Application Summary:
Managing more than a few servers in your head becomes very difficult to configure, manage and improve. With SQL Server DBAs having to manage SQL Server performance for hundreds of instances, this single consolidated view of the environment helps to prioritize your monitoring (alerts for downtime, warnings, unknown as well as uptime metrics), distribute tasks among team members and ensure the business is properly supported.
Graphical and Detailed Views of Metrics for Easy Comparison in SolarWinds SAM
From the dashboard above there are a number of ways to monitor the SQL Server environment and overall database performance with the polling technology that SAM delivers. One of the ways is with the All Applications interface as shown below with a subset of the counters. Keep in mind there are over a 100 SQL Server counters that you have probably seen before in Profiler, Perfmon and the DMVs. In the All Applications interface, you have the ability to drill down from the server name to the instance, then to specific counters as well as to the database level. At each level you have a new view of the data to help guide you to identify and, in some circumstances, directly correct the issue. This is accomplished by the AppInsight for SQL that was recently introduced by SolarWinds for the Server & Application Monitor suite of tools.
As we drill down in the All Applications interface, we can see which counters are problematic based on the color coding. Since we see issues with the Page Writes/sec and Page Splits/sec on this instance, we are able to click on the counter and see two valuable interfaces. First are details related to the counter value, definition and remediation. The second interface is a graphical view of your data that is fairly interactive. You have the ability to see your average, min and max values on the chart as well as hovering your mouse over the counter to get a detailed value. Further you can fine tune your time period with the slider on the bottom or zooming in with the buttons on the top left.
Another set of functionality from the All Applications interface is accessing information about the databases. In this interface there are more than 100 metrics and 15 sub reports with valuable information succinctly provided in each section to understand the status for the database. In this interface you have the ability to review the following data collection:
- Database and transaction log size, disk IO, queue length, latency, reads, writes, growth, etc.
- Overall database metrics such as availability, active transactions, transactions per second, number of active users, fragmentation per index, etc.
- Table size with the overall size, percentage of index space and total row count
- Top 10 expensive queries with the complete set of code and CPU time in milliseconds
As mentioned above, the database interface is packed full of valuable information, but one sub report in particular is very interesting. It is the database storage report as shown below. The data in this report is compact and provides a great deal of information that can be easily scanned to determine issues.
AppInsight for SQL Metrics and Monitoring
AppInsight for SQL provides SQL Server DBAs, Developers and Network Administrators with detailed metrics related to the SQL Server instance in about 20 sub reports. These sub reports cover the following aspects of a SQL Server database:
- Application Details - Instance name, version, edition, status, etc.
- All Databases - Name, status, database size, transaction log size
- Buffer Manager - Cache hit ratio, lazy writes, total pages, buffer cache, etc.
- Cache - Used, object counts, object counts in use
- Latches and Locks - Waits, timeouts, deadlocks, bottlenecks, etc.
- SQL Server - Compilations and recompilations, probe and range scans, batch requests, etc.
- Sessions and Connections - Active, inactive, system and user
- Event Log - Message Details, Active Alerts and SQL Server Error Log entries
- Application Availability - Status, Online, Offline, etc.
- Processes and Services - Statistics for CPU, memory and IOPS
- Memory and Pages - Total memory usage, cache, lock, optimizer, page life expectancy, free pages, page faults, etc.
- Disk Activity - Average disk read and write time per second, read ahead, disk space, etc.
- Expensive Queries - Code, CPU Time, Database with the ability to expand and see all code
- SQL Server Agent Job Status - Job name, last run, duration and outcome
Two components in the SAM App Insight for SQL Server that are highlighted below are the SQL Server and the Pages sub report. These specific sub reports are being highlighted based on the data available as well as to demonstrate the functionality in SAM. In the first sub report, as you hover your mouse over the report, a pop-up shows the detailed statistics and as your mouse is moved across the sub report, the data is updated to understand the metrics at various time periods. In the Pages report, the individual data sets are highlighted red to catch your attention and notify you of an issue.
Identify Top 10 Issues in SQL Server with SolarWinds SAM
As a portion of many of the dashboards in SolarWinds SAM, there are numerous Top 10 lists to help outline the most used resources. Depending on the dashboard, the data can be rolled up across the environment, per server or per application. The format for each of these interfaces is the same and includes the following:
- The metric such as response time
- Process Name
- Application Name
- Server Name or Network Node
- Metric value
Here is a listing of some of the Top 10 lists available:
- Top 10 Most Expensive Queries
- Top 10 Response Time, Utilization, Transactions by Duration
- Top 10 CPU Load - Total and Average
- Top 10 Memory - Physical and Virtual
- Top 10 IO operations by total, reads and writes
- Top 10 Errors
- Top 10 Packet Loss and Traffic
- Top 10 Storage Utilization
A second Top 10 List to note is the Most Expensive Queries by CPU Time. This sub report contains the full set of code, CPU time and database where the code is being issued.
As an aside, when you click on any of the code from the Top 10 Expensive Queries by CPU time sub report, the Most Expensive Queries report loads as shown below where you can review the code and metrics related to the host, database, login, number of executions, CPU time, reads, writes and duration. This is a good example of being able to review high level metrics and then drilling into your code for additional metrics and troubleshooting.
Current Application Alerts and Events Report in SolarWinds
In SAM on the SQL Server Application Summary, there are two sub reports to review alerts and events. The alerts are at a server or application level as shown below with the date and time of the alert, server name, application name and message. These alerts are merely a summary across your environment as shown below.
From the summary page, you can click on the application name loading the AppInsight for SQL page for the individual SQL Server instance, and you can see the detailed alerts as well as the SQL Server Error Log as shown below. These alerts and error log entries are a point in time set of information to guide your monitoring efforts.
Single Web Based Interface for Monitoring your Environment
SAM has a number of beneficial features to help monitor your SQL Server environment. Some of the information is beneficial for multiple IT professionals, which can help support the IT team. Along these lines, other portions of SAM are intended to monitor the following technologies and more:
- Operating Systems - Windows, Linux, AIX, Solaris, Hyper-V, VMware, etc.
- Databases - SQL Server, Oracle, IBM DB2, MySQL, PostgreSQL, Azure, AWS, etc.
- Applications - Exchange, Dynamics, SharePoint, etc.
- Network Resources - DNS, IIS, Apache, DHCP, Citrix, Symantec Backup, etc.
With this type of depth and breadth of monitoring, IT management can check out a few of SAM's interfaces to get a snapshot of the environment. The first interface is the All Applications sub report and is found on the Application Summary page. This sub report provides an alphabetical view of all the resources monitored by SAM with color coding to indicate the status as well as a textual description of the status. This sub report provides a consolidated launch pad to learn more about the servers or to contact your team to find out more.
The second interface that quickly provides value to IT management is the Applications with Problems sub report. On this sub report there are a listing of each application issue on the associated server. You can click on each resource and get additional details on the issues as show previously with SQL Server. This single detailed summarization of issues can save your management a great deal of time to understand the environment status in two simple reports.
In terms of interfaces, there are also a few other SAM sub reports to acknowledge that help provide a quick overview of the status. The first interface is the Application and Hardware Health Overview. It provides a total count of the monitored resources in addition to a pie chart and counts of resources Up, Critical, Unknown, Warning, Down or Other. The counts on the right are hyperlinks to the associated servers where you can click on the count (URL) to review the details on a new page (not shown).
Another set of sub reports found throughout SAM are gauges. These are located on some of the high level dashboards as well as on the individual server pages. These gauges are once again a simple means to determine the status for a particular counter to visually get a sense of the value as well as determine if the value is problematic when in the yellow or red zone. See one example below displaying the CPU Load & Memory Utilization for a single server. In other interfaces you will see gauges for user connections, logins per second, transactions per second, SQL compilations per second, average response time, packet loss, etc.
A final feature that can help you quickly determine the status of a server are the pop-ups available on the top navigation in SAM as shown below as well as throughout the entire tool. Depending on where you are in the hierarchy, you can hover your mouse over the server or application to get a pop-up with the current status, CPU load, memory utilization and components with problems as shown below. You can also get information on pop-up screens depending on the information you are reviewing as you hover your mouse over server names, application names, process names, events, etc. These high level metrics on the pop-up screens, along with the others shown in the previous screen shots, show the level of effort taken to deliver valuable data in a streamlined manner to the IT Professional.
Insight into Active Processes, Service Control Manager and Real Time Event Log Viewer
On a few of the reports in SAM, there is information about the server or node highlighting the status, server resources and basic configuration. This report is easy to scan in order to find out the foundation information about the server and use it to begin troubleshooting. See the first sub report from the image below.
In the second sub report, there is a means to run four applications:
- Real Time Process Explorer
- Service Control Manager
- Real Time Event Log Viewer
These applications are conveniently included in some of the sub reports to review additional real time information and take action if necessary.
For example, below is an image of the Real Time Process Explorer for one server. This interface is also web based, but very similar to the native interface in Windows where you can see which processes are using the most resources (CPU, Memory, Disk), sort the data (Name, Process ID, User Name, Command, etc.) and end a process. As added functionality with SAM, you also have the ability to start polling and start monitoring processes in real time to determine if they are the culprit to the issue at hand.
SAM API Pollers - REST API
SAM has always provided out-of-the-box monitoring and the capability to build custom templates using WMI, SNMP, JMX, and VMware protocols. More recently, they’ve added API pollers that monitor metrics via External REST APIs. You can use the API poller feature to monitor metrics via external REST APIs, providing an easier way to collect data from modern application stacks, unique hardware, IoT devices, and other entities using the REST API to pass information. Define your API endpoints, pick the metrics you want to monitor, provide credentials, and let SAM begin monitoring without the need for custom scripts. Create alerts for numeric and string monitoring; the SAM API poller also allows you to chain multiple requests within a single poller. SAM provides an API poller management page to simplify API poller management to create, modify, and deploy API pollers in bulk across your SAM nodes.
Inventory of your Environment
On the SAM Application Summary dashboard, there is a single sub report with servers with expired warranties, servers with warranties set to expire in 90 days and the next set of server warranties to expire. This information is neatly organized in a simple single interface infrastructure monitoring as shown below with the ability to drill into additional details about each server and take action to contact your hardware vendor to correct a warranty.
Once you drill into a server, there you will find all the node details well beyond just warranty expiration information. The four main tabs on the left provide the following information:
- All Details - Nearly all of the information from the three tabs below
- Summary - Node Details, Hardware Health, Polling Details, Availability Statistics, CPU and Response Time, Last 5 Audit Events, etc.
- Vital Stats - Average Response Time, CPU Load, Memory, Volumes, etc.
- Asset Inventory - Manufacturer, Operating System, Processors, Memory, NIC's, Hard Drives, Logical Volumes, Peripherals, Controllers, Firmware, Software, Drivers, etc.
Below is a screen shot of the Asset Inventory tab in SAM for a single server. In terms of the basic warranty information, it is available in the System Information sub report, but this is only the tip of the iceberg in terms of the server information available. This tab provides a detailed view of all hardware (Memory, CPU, Disk, Controllers, NICs, etc.) and software (Firmware, Software, Drivers, etc.) resources to provide a single consolidated view of the server.
SolarWinds SAM Take-A-Ways
Check out the following take-a-ways from SolarWinds SAM:
- Interface - Simple web based interface available to your entire team with color coding and inherit prioritization to streamline your monitoring
- Reports - Sub reports with meaningful data presented in list, chart, gauge and in pop-up screens to quickly monitor and troubleshoot issues
- SQL Server - SQL Server monitoring and performance tuning at the instance level with 100+ counters, database information related to storage, tables, resource utilization, identification of problematic code, access to the event logs, real time processes and more
- Enterprise Server Monitoring and Management - Monitoring across numerous operating systems, database platforms and applications - all consolidated into a single view, to understand the dependencies
- Investment - Prices for SAM start at $2995 USD for 150 monitors including the first year of maintenance and support
- Learn more about SolarWinds Server & Application Monitor - click here and if you are interested in downloading a fully functional evaluation version of SAM - click here.
- Check out the SolarWinds Thwack community for more tips, tutorials and videos about SAM.
- Get the details on AppInsight for SQL - click here.
MSSQLTips.com Product Spotlight sponsored by SolarWinds
Last Updated: 2021-02-20
About the author
View all my tips