By: Aaron Bertrand | Comments (3) | Related: > SQL Server 2019
Problem
The first public CTP of SQL Server 2019 was released today and, let me tell you, it is overflowing with enhancements and new features (many of which you will also find, in preview form, in Azure SQL Database). I've had the luxury of early access, allowing me to share my experience with many of the changes, even if only at a high level. You can also check out the latest posts from the SQL Server team and the updated documentation.
Solution
I'm going to discuss several of the new engine features, at a high level, under five categories: performance, troubleshooting, security, availability, and development. At this time, I have more details for some features than others, and in fact have already written full posts detailing a few. I will come back and update these sections as more documentation and articles are made available. Rest assured, this is not an exhaustive look, only the things I've checked out, up to and including CTP 2.0. There is plenty more to come.
Performance
Table variable deferred compilation
Table variables have a bit of a bad reputation, mostly because of estimates.
By default, SQL Server will estimate that only one row will ever be present in the
table variable, which can lead to some interesting plan selection when the row counts
end up being much higher. A typical workaround to avoid this is to use OPTION
(RECOMPILE)
, but this requires a code change, and it is wasteful to recompile
every single time when row counts are usually the same.
Trace flag 2453 was introduced to simulate the recompile behavior, but requires
running under a trace flag, and only happens when a significant change in rows is
observed.
In 150 compatibility, you now get deferred compilation when table variables are involved, meaning it won't build a plan until the table variable has been populated once. Estimates will be based on the first use of that table variable, and no recompiles will occur after that. It is a compromise between recompiling all the time to get accurate estimates every time, and recompiling never and always having an estimate of 1. This is great if your row counts remain relatively stable (and better if that number is further away from 1), but may be less beneficial if they fluctuate widely.
I dug deeper into this functionality in a recent tip, Table Variable Deferred Compilation in SQL Server, and Brent Ozar has talked about it too, in Faster Table Variables (And New Parameter Sniffing Issues).
Row mode memory grant feedback
SQL Server 2017 introduced batch mode memory grant feedback, which is described in detail here. Essentially, for any memory grant involved with a plan involving batch mode operators, SQL Server will evaluate the memory used by the query, and compare it to the memory requested. If the memory requested is too low or too high, leading to spills or wasted memory, it will adjust the memory grant associated with the execution plan the next time it runs. This will either reduce the grant to allow higher concurrency, or increase it to improve performance.
Now we get this behavior for row mode queries as well, under compatibility level 150. If a query is found to spill to disk, the memory grant will be increased for subsequent executions. If the actual memory used by the query is less than half the memory granted, subsequent grant requests will be lower. Brent Ozar goes into more detail in his post on Adaptive Memory Grants.
Batch mode over rowstore
Since SQL Server 2012, queries against tables with columnstore indexes have benefitted from the performance enhancements of batch mode. The improvements transpire due to the query processor performing batch processing rather than row-by-row. The rows also surface from the storage engine in batches, and parallelism exchange operators can be avoided. Paul White (@SQL_Kiwi) reminded me that if you introduce an empty columnstore table to make batch mode operations possible, the surfaced rows are gathered into batches by an invisible operator. However, this hack may negate any improvements gained from the batch mode processing. Some info in this Stack Exchange answer.
Under compatibility level 150, SQL Server 2019 will automatically choose batch mode in certain “sweet spot” cases, even when no columnstore index is present. You might think, well, why not just create a columnstore index and be done with it? Or keep using the hack mentioned above? This is being extended to traditional rowstore objects because columnstore indexes are not always possible, for a variety of reasons, including lack of feature support (e.g. triggers), the overhead in an update- or delete-heavy workload, or lack of 3rd party vendor support. And the hack is simply bad news.
I created a very simple table with 10 million rows and a single clustered index on an integer column, then ran this query:
SELECT sa5, sa2, SUM(i1), SUM(i2), COUNT(*) FROM dbo.FactTable WHERE i1 > 100000 GROUP BY sa5, sa2 ORDER BY sa5, sa2;
The plan clearly shows a clustered index seek and parallelism, but no sign of a columnstore index anywhere (as shown in SentryOne Plan Explorer, a free query tuning tool):
And if you dig a little deeper, you can see that almost all of the operators ran in batch mode, even the sort and the compute scalar:
You can disable this functionality by staying in a lower compatibility level,
by changing a database scoped configuration (to arrive in a future CTP), or by using
the DISALLOW_BATCH_MODE
query hint:
SELECT … OPTION (USE HINT ('DISALLOW_BATCH_MODE'));
In this case the above query plan has an additional exchange operator, all the operators execute in row mode, and the query takes almost three times as long to run:
You can see it to some degree in the diagram, but in the plan tree details, you can also see the impact of the predicate not truly being able to eliminate rows until after the sort:
The selection of batch mode isn't always a slam dunk – the heuristics involved in making the decision include row counts, the types of operators involved, and the expected benefit of batch mode.
APPROX_COUNT_DISTINCT
This new aggregate function is designed for data warehouse scenarios, and is
an equivalent for COUNT(DISTINCT())
. Instead of performing expensive
distinct sort operations to determine actual counts, it relies instead on statistics
to get something relatively accurate. You should find that the margin of error is
within 2% of the precise count, 97% of the time, which is usually fine for high-level
analytics, values that populate a dashboard, or quick estimates.
On my system I created a table with integer columns ranging from 100 to 1,000,000
unique values, and string columns ranging from 100 to 100,000 unique values. There
were no indexes other than a clustered primary key on the leading integer column.
Here are the results of COUNT(DISTINCT())
vs. APPROX_COUNT_DISTINCT()
against those columns, so you can see where it is off by a bit (but always well
within 2%):
The payoff is huge if you're memory-bound, which applies to most of us. If you look at the execution plans in this specific case, you'll see a huge difference in memory footprint on the hash match operator:
Note that you will typically only see significant performance improvements if you are already memory-bound. On my system, the runtime was slightly longer due to more CPU being used by the new function:
That would likely be very different if I had bigger tables, less memory available to SQL Server, higher concurrency, or any combination.
Query-scoped compatibility level hints
Have a specific query that runs better under a certain compatibility level that is not the same as the current database? Now you can, with new query hints supporting six different compatibility levels and five different cardinality estimation models. The following shows the compatibility levels available, example syntax, and the CE model that is used in each case. You can see how it can affect estimates, even for system catalog views:
Long story short: no more memorizing trace flags, or wondering if you still need to worry about whether your optimizer fix is covered under TF 4199 or undone by some service pack or cumulative update. Note that these additional hints were also recently added to SQL Server 2017 Cumulative Update #10 (see Pedro Lopes' blog post for more details). You can see all the available hints using:
SELECT name FROM sys.dm_exec_valid_use_hints;
But please always keep in mind that hints are a last resort – they are often great for getting you out of a jam, but you shouldn't plan on leaving code like that long term, since the behavior can change with subsequent updates.
Troubleshooting
Lightweight profiling on by default
This enhancement requires a bit of background. SQL Server 2014 introduced the
DMV sys.dm_exec_query_profiles
, allowing the user running the query
to also collect diagnostic information about all of the operators throughout the
duration of the query. This information can be used after the query is finished
to determine which operators actually did the most work and why. Any user not
running that specific query could still get visibility into this data for any session
that enabled STATISTICS XML
or STATISTICS PROFILE
, or
for all sessions through the query_post_execution_showplan
extended event – though that event specifically can put a potentially large
strain on overall performance.
Management Studio 2016 added functionality to show the data moving through the execution plan in real time based on the information collected from the DMV, which makes it a lot more powerful for troubleshooting purposes. Plan Explorer also offers both live and replay capabilities for visualizing the data throughout the duration of a query.
Starting with SQL Server 2016 SP1, you could also enable a lightweight version
of this data collection for all sessions, using trace flag 7412 or the query_thread_profile
extended event, so that you could immediately get relevant information about any
session without them having to enable anything explicitly in their session (particularly
things that affect performance negatively). Much more information on this available
in
this blog post by Pedro Lopes.
In SQL Server 2019, this thread profile will be enabled by default – so
you don't need a special Extended Events session running, or any trace flags, or
STATISTICS
options on in any individual query; you'll just be able
to look at data from the DMV at any time for all concurrent sessions. You'll be
able to turn this off using a new database scoped configuration called LIGHTWEIGHT_QUERY_PROFILING
,
but the syntax won't work in CTP 2.0 – that will be fixed in a forthcoming
CTP.
Clustered Columnstore Index Statistics Available in Clone Databases
In current versions of SQL Server, cloning a database brings over only the original
statistic object from clustered columnstore indexes, ignoring any updates that have
been made to the table after creation. If you are using the clone for query tuning
and other performance testing that relies on cardinality estimates, those use cases
may not be valid. Parikshit Savjani described the limitation in
this blog post, and provided a workaround – before initiating the clone,
you generate a script that runs DBCC SHOW_STATISTICS … WITH STATS_STREAM
against every object. This is potentially expensive, and definitely easy to forget.
In SQL Server 2019, those updated stats will just be automatically available
in the clone too, so you can test different query scenarios and get reliable plans
based on real statistics, without manually running STATS_STREAM
against
all of the tables.
Compression estimates for Columnstore
In current versions, the procedure sys.sp_estimate_data_compression_savings
has the following check:
if (@data_compression not in ('NONE', 'ROW', 'PAGE'))
Which means it allows you to check for row or page compression (or to see the impact of removing the current compression). In SQL Server 2019, that check now looks like this:
if (@data_compression not in ('NONE', 'ROW', 'PAGE', 'COLUMNSTORE', 'COLUMNSTORE_ARCHIVE'))
This is great news, because it allows you to roughly predict the impact of adding a columnstore index to a table that currently has none, or converting a table or partition to the more aggressive columnstore format, without having to restore that table to another system and actually try it. I had a 10 million row table on my system, and ran the stored procedure for each of the five options:
EXEC sys.sp_estimate_data_compression_savings @schema_name = N'dbo', @object_name = N'FactTable', @index_id = NULL, @partition_number = NULL, @data_compression = N'NONE'; -- repeat for ROW, PAGE, COLUMNSTORE, COLUMNSTORE_ARCHIVE
Results:
As with the other compression types, the accuracy depends wholly on the rows sampled, and how representative they are of the rest of the data. Still, it's a pretty powerful way to get a ballpark guess without any of the work required for more accuracy.
New function to retrieve page info
DBCC PAGE
and DBCC IND
have been used for a long time
to gather information about the pages that comprise a partition, index, or table.
But they are undocumented and unsupported commands, and can be quite tedious to
automate solutions around problems that involve more than one index or page.
Along came sys.dm_db_database_page_allocations
, a dynamic management
function (DMF) that returns a set representing all of the pages in the specified
object. Still undocumented, this function exhibits a predicate pushdown issue that
can be a real problem on larger tables: even to get the information about a single
page, it has to read the entire structure, which can be quite prohibitive.
SQL Server 2019 introduces another DMF, sys.dm_db_page_info
. This
returns basically all of the information on a page, without the overhead of the
allocations DMF. In current builds, though, you have to already know the page number
you're looking for in order to use the function. That may be intentional, as it
could be the only way to ensure performance. So if you're trying to determine all
of the pages in an index or table, you'll still need to use the allocations DMF.
I'll be writing a lot more about this function in an upcoming tip.
Security
Always Encrypted using Secure Enclaves
Today, Always Encrypted protects sensitive data over the wire and in memory by encrypting/decrypting at each end of the process. Unfortunately, this introduces often critical processing constraints, such as not being able to perform computations and filtering – meaning the whole data set has to be sent across in order to perform, say, a range search.
An enclave is a protected area of memory where such computations and filtering
can be delegated (in Windows, this uses
virtualization-based security) – the data remains encrypted in the engine
but can be encrypted or decrypted, securely, inside the enclave. You simply need
to add the ENCLAVE_COMPUTATIONS
option to the master key, which you
can do in SSMS by checking the “Allow enclave computations” checkbox:
Now you'll be able to encrypt data almost instantly when compared to the old way (in which the wizard, or the Set-SqlColumnEncyption cmdlet, or your application, would have to move all the data completely out of the database, encrypt it, and then send it back):
ALTER TABLE dbo.Patients ALTER COLUMN SSN char(9) -- currently not encrypted! ENCRYPTED WITH ( COLUMN_ENCRYPTION_KEY = ColumnEncryptionKeyName, ENCRYPTION_TYPE = Randomized, ALGORITHM = 'AEAD_AES_256_CBC_HMAC_SHA_256' ) NOT NULL;
This allows for wildcard and range searches, ordering, and so on, as well as in-place encryption within queries, with no loss of security, since the enclave allows both encryption and decryption to happen on the server. You can also perform encryption key rotation within the enclave.
I suspect this will move the goal posts for a lot of organizations, but in this CTP some of these optimizations are still being perfected, so they are not enabled by default – but you can see how to turn them on in the topic enable rich computations.
Certificate Management in Configuration Manager
Managing SSL and TLS certificates has always been a pain, and many folks have to perform gobs of tedious works and home-grown scripts to deploy and maintain certificates across their enterprise. With SQL Server 2019, updates to SQL Server Configuration Manager will help, allowing you to quickly view and validate any instance's certificates, find certs close to expiration, and sync deployments of certificates across all of the replicas in an Availability Group (from one place: the primary) or all of the nodes in a Failover Cluster Instance (from one place: the active node).
I haven't tried all of these operations, but they should work across older versions of SQL Server, as long as they are performed from a SQL Server 2019 version of SQL Server Configuration Manager.
Built-In Data Classification and Auditing
In SSMS 17.5, the SQL Server team added the ability to classify data within SSMS, so that you could identify any columns that might contain sensitive information or could otherwise interfere with compliance to various standards (HIPAA, SOX, PCI, and of course GDPR). The wizard uses an algorithm to suggest columns it thinks are likely to cause compliance issues, but you can add your own, adjust its suggestions, and eliminate any columns from the list. It stores these classiciations using extended properties; an SSMS-based report uses that same information to display those columns that have been identified. Outside of the report, these properties are not highly visible.
In SQL Server 2019 there is a new command for this metadata, already available
in Azure SQL Database, called ADD SENSITIVITY CLASSIFICATION
. This
allows you to do the same sort of thing as the SSMS wizard, but the information
is no longer stored as extended properties, and any access to that data automatically
shows in audits in a new XML column called data_sensitivity_information
.
This contains all of the types of information that were accessed during the audited
event.
As a quick example, let's say I have a table for outside contractors:
CREATE TABLE dbo.Contractors ( FirstName sysname, LastName sysname, SSN char(9), HourlyRate decimal(6,2) );
Just looking at that, I know that all four columns are either vulnerable to data breach, or simply should not be available to everyone who can access the table. Permissions can play a role here but, at a minimum, we need to raise the visibility. So we can classify these columns in different ways:
ADD SENSITIVITY CLASSIFICATION TO dbo.Contractors.FirstName, dbo.Contractors.LastName WITH (LABEL = 'Confidential GDPR', INFORMATION_TYPE = 'Personal Info'); ADD SENSITIVITY CLASSIFICATION TO dbo.Contractors.SSN WITH (LABEL = 'Highly Confidential', INFORMATION_TYPE = 'National ID'); ADD SENSITIVITY CLASSIFICATION TO dbo.Contractors.HourlyRate WITH (LABEL = 'Highly Confidential', INFORMATION_TYPE = 'Financial');
Now, instead of looking in sys.extended_properties
, you can see
this metadata in sys.sensitivity_classifications
:
And if we're auditing selects (or DML) against this table, we don't have to change
anything about the audit itself; a SELECT *
after the classifications
are created is going to yield this type of info in the audit log's new data_sensitivity_information
column:
<sensitivity_attributes> <sensitivity_attribute label="Confidential - GDPR" information_type="Personal Info" /> <sensitivity_attribute label="Highly Confidential" information_type="National ID" /> <sensitivity_attribute label="Highly Confidential" information_type="Financial" /> </sensitivity_attributes>
This obviously doesn't solve all compliance issues, but it can give you a real
head start. If you use the wizard to automatically identify columns, and then automate
the translation of those sp_addextendedproperty
calls into ADD
SENSITIVITY CLASSIFICATION
commands, you can be well on your way to compliance.
I will write more about this in an upcoming tip.
You can also automate generating (or updating) permissions based on the label
metadata – creating a dynamic SQL command that denies access to all
Confidential – GDPR
columns to a user, group, or role is definitely
manageable. Something I may explore in a future tip.
Availability
Resumable online index creation
SQL Server 2017 introduced the ability to pause and resume online index rebuilds, which can be very useful if you need to change the number of CPUs being used, continue where you left off after a failover event, or simply bridge a gap between maintenance windows. I talked about this feature in a previous tip, More on Resumable Online Index Rebuilds in SQL Server 2017.
In SQL Server 2019, you will be able to use the same syntax to create online indexes in such a way that you can pause and resume, and even set a runtime cap (at which point it will pause):
CREATE INDEX foo ON dbo.bar(blat) WITH (ONLINE = ON, RESUMABLE = ON, MAX_DURATION = 10 MINUTES);
If this is running too long, you can use ALTER INDEX
from another
session to pause (even though the index doesn't technically exist yet):
ALTER INDEX foo ON dbo.bar PAUSE;
In current builds you cannot reduce degree of parallelism when you resume, like you can with a rebuild. If you try to reduce DOP:
ALTER INDEX foo ON dbo.bar RESUME WITH (MAXDOP = 2);
You will get:
Cannot resume index build as required DOP 4 (DOP operation was started with) is not available. Please ensure sufficient DOP is available or abort existing index operation and try again.
The statement has been terminated.
In fact, if you try that, and then you simply issue a resume with no options, you'll get the same error message, at least on current builds. I guess the resume attempt gets logged somewhere and is reused. You'll need to specify the correct (or higher) DOP in order to continue:
ALTER INDEX foo ON dbo.bar RESUME WITH (MAXDOP = 4);
To be clear, you can increase DOP when resuming a paused index creation, you just can't go lower.
An additional benefit to all of this is that you will be able to make online
and/or resumable index operations the default for a database, using the new database
scoped configurations ELEVATE_ONLINE
and ELEVATE_RESUMABLE
.
Online create / rebuild for Clustered Columnstore Indexes
In addition to resumable online index creation, we also get the ability to create or rebuild clustered columnstore indexes online. This is a big change in that you no longer have to take a maintenance window to perform maintenance on these indexes or – a more compelling use case – to convert from rowstore to columnstore:
CREATE TABLE dbo.splunge ( id int NOT NULL ); GO CREATE UNIQUE CLUSTERED INDEX PK_Splunge ON dbo.splunge(id); GO CREATE CLUSTERED COLUMNSTORE INDEX PK_Splunge ON dbo.splunge WITH (DROP_EXISTING = ON, ONLINE = ON);
One caveat: converting from a traditional clustered index to a clustered columnstore index is only possible online if your existing clustered index was created in a certain way. If it is part of an explicit primary key constraint, inline or not…
CREATE TABLE dbo.splunge ( id int NOT NULL CONSTRAINT PK_Splunge PRIMARY KEY CLUSTERED (id) ); GO -- or after the fact -- ALTER TABLE dbo.splunge ADD CONSTRAINT PK_Splunge PRIMARY KEY CLUSTERED(id);
…you'll receive this error:
Cannot recreate index 'PK_Splunge'. The new index definition does not match the constraint being enforced by the existing index.
You'll have to drop the constraint first in order to convert to a clustered columnstore index, but you'll still be able to perform both operations online:
ALTER TABLE dbo.splunge DROP CONSTRAINT PK_Splunge WITH (ONLINE = ON); GO CREATE CLUSTERED COLUMNSTORE INDEX PK_Splunge ON dbo.splunge WITH (ONLINE = ON);
That works, but on larger tables, it will likely take longer than if the primary key had initially been implemented as a unique clustered index. I am not sure yet if this is intended behavior or just a limitation in current CTP builds.
Secondary to Primary Replica Connection Redirection
This feature allows you to configure redirects without a listener, so you can shift a connection to the primary even if a secondary is explicitly named in the connection string. You can use this feature when your clustering technology does not support a listener, or when you are using cluster-less AGs, or when you have a complex redirection scheme in a multi-subnet scenario. This will prevent a connection from, for example, attempting write operations against a read-only replica (and failing).
Development
Graph enhancements
Graph relationships will now support the MERGE
statement for node
or edge tables using MATCH
predicates; now one statement can update
an existing edge or insert a new one. A new edge constraint will allow you to dictate
which nodes an edge can connect.
UTF-8
SQL Server 2012 added support for UTF-16 and supplementary characters through
a set of collations named with an _SC
suffix, like Latin1_General_100_CI_AI_SC
,
for use with Unicode columns (nchar
/nvarchar
). In SQL
Server 2017, it is possible to import and export data in UTF-8 format, to and from
these columns, through facilities like
BCP and
BULK INSERT.
In SQL Server 2019, there are new collations to support natively storing UTF-8
data within SQL Server. So you can create a char
or varchar
column to properly store UTF-8 data using new collations with an _SC_UTF8
suffix, like Latin1_General_100_CI_AI_SC_UTF8
. This can help improve
compatibility with external applications and other database platforms and systems,
without paying the performance and storage penalties of nvarchar
.
A little easter egg I found
For as long as I can remember, SQL Server users have been complaining about this vague error message:
String or binary data would be truncated.
In the CTP builds I've been playing with, I've spotted an interesting error message that didn't exist before:
String or binary data would be truncated in table '%.*ls', column '%.*ls'. Truncated value: '%.*ls'.
I don't think anything else needs to be stated here; this is a great (if long overdue) improvement, and is going to make many people very happy. However, this functionality won't be available in CTP 2.0; I am just giving you a little sneak peek at a forthcoming enhancement that you might not have discovered on your own. Brent Ozar has listed out all of the new messages found in the current CTP, and peppered some helpful commentary throughout his post about sys.messages : Revealing more features.
Summary
SQL Server 2019 offers substantial enhancements that are going to improve the way you use your favorite relational database platform, and there are plenty of changes I haven't talked about here. Persistent memory support, clustering for Machine Learning Services, replication and distributed transactions on Linux, Kubernetes, connectors for Oracle / Teradata / MongoDB, synchronous AG replicas bump up to five Java support (implementation similar to Python/R) and, last but not least, a new endeavor called the “Big Data Cluster.” Some of these features you'll need to sign up for using this EAP form.
Bob Ward's forthcoming book, Pro SQL Server on Linux - Including Container-Based Deployment with Docker and Kubernetes, might give you some clues about some other things that are coming as well. And this post from Brent Ozar talks about a potential forthcoming fix to scalar user-defined functions.
But even in this first public CTP, there is something substantial for just about everyone, and I encourage you to try it out for yourself and let me know what you think!
Next Steps
Read on to learn more about SQL Server 2019 through these tips and other resources:
- All SQL Server 2019 tips
- SQL Server 2019 Information – Download CTP 2.0!
- SQL Server 2019 Official Documentation
- Watch the SQL Server Blog for more SQL Server 2019 info
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips