What's New in the First Public CTP of SQL Server 2019

By: Aaron Bertrand | Comments (3) | Related: > SQL Server 2019

Problem

The first public CTP of SQL Server 2019 was released today and, let me tell you, it is overflowing with enhancements and new features (many of which you will also find, in preview form, in Azure SQL Database). I've had the luxury of early access, allowing me to share my experience with many of the changes, even if only at a high level. You can also check out the latest posts from the SQL Server team and the updated documentation.

Solution

I'm going to discuss several of the new engine features, at a high level, under five categories: performance, troubleshooting, security, availability, and development. At this time, I have more details for some features than others, and in fact have already written full posts detailing a few. I will come back and update these sections as more documentation and articles are made available. Rest assured, this is not an exhaustive look, only the things I've checked out, up to and including CTP 2.0. There is plenty more to come.

Performance

Table variable deferred compilation

Table variables have a bit of a bad reputation, mostly because of estimates. By default, SQL Server will estimate that only one row will ever be present in the table variable, which can lead to some interesting plan selection when the row counts end up being much higher. A typical workaround to avoid this is to use OPTION (RECOMPILE), but this requires a code change, and it is wasteful to recompile every single time when row counts are usually the same. Trace flag 2453 was introduced to simulate the recompile behavior, but requires running under a trace flag, and only happens when a significant change in rows is observed.

In 150 compatibility, you now get deferred compilation when table variables are involved, meaning it won't build a plan until the table variable has been populated once. Estimates will be based on the first use of that table variable, and no recompiles will occur after that. It is a compromise between recompiling all the time to get accurate estimates every time, and recompiling never and always having an estimate of 1. This is great if your row counts remain relatively stable (and better if that number is further away from 1), but may be less beneficial if they fluctuate widely.

I dug deeper into this functionality in a recent tip, Table Variable Deferred Compilation in SQL Server, and Brent Ozar has talked about it too, in Faster Table Variables (And New Parameter Sniffing Issues).

Row mode memory grant feedback

SQL Server 2017 introduced batch mode memory grant feedback, which is described in detail here. Essentially, for any memory grant involved with a plan involving batch mode operators, SQL Server will evaluate the memory used by the query, and compare it to the memory requested. If the memory requested is too low or too high, leading to spills or wasted memory, it will adjust the memory grant associated with the execution plan the next time it runs. This will either reduce the grant to allow higher concurrency, or increase it to improve performance.

Now we get this behavior for row mode queries as well, under compatibility level 150. If a query is found to spill to disk, the memory grant will be increased for subsequent executions. If the actual memory used by the query is less than half the memory granted, subsequent grant requests will be lower. Brent Ozar goes into more detail in his post on Adaptive Memory Grants.

Batch mode over rowstore

Since SQL Server 2012, queries against tables with columnstore indexes have benefitted from the performance enhancements of batch mode. The improvements transpire due to the query processor performing batch processing rather than row-by-row. The rows also surface from the storage engine in batches, and parallelism exchange operators can be avoided. Paul White (@SQL_Kiwi) reminded me that if you introduce an empty columnstore table to make batch mode operations possible, the surfaced rows are gathered into batches by an invisible operator. However, this hack may negate any improvements gained from the batch mode processing. Some info in this Stack Exchange answer.

Under compatibility level 150, SQL Server 2019 will automatically choose batch mode in certain “sweet spot” cases, even when no columnstore index is present. You might think, well, why not just create a columnstore index and be done with it? Or keep using the hack mentioned above? This is being extended to traditional rowstore objects because columnstore indexes are not always possible, for a variety of reasons, including lack of feature support (e.g. triggers), the overhead in an update- or delete-heavy workload, or lack of 3^rd party vendor support. And the hack is simply bad news.

I created a very simple table with 10 million rows and a single clustered index on an integer column, then ran this query:

SELECT sa5, sa2, SUM(i1), SUM(i2), COUNT(*)
  FROM dbo.FactTable
  WHERE i1 > 100000
  GROUP BY sa5, sa2 
  ORDER BY sa5, sa2;

The plan clearly shows a clustered index seek and parallelism, but no sign of a columnstore index anywhere (as shown in SentryOne Plan Explorer, a free query tuning tool):

Execution plan for batch mode over rowstore

And if you dig a little deeper, you can see that almost all of the operators ran in batch mode, even the sort and the compute scalar:

Batch mode and thread distribution for batch mode over rowstore

You can disable this functionality by staying in a lower compatibility level, by changing a database scoped configuration (to arrive in a future CTP), or by using the DISALLOW_BATCH_MODE query hint:

SELECT … OPTION (USE HINT ('DISALLOW_BATCH_MODE'));

In this case the above query plan has an additional exchange operator, all the operators execute in row mode, and the query takes almost three times as long to run:

You can see it to some degree in the diagram, but in the plan tree details, you can also see the impact of the predicate not truly being able to eliminate rows until after the sort:

Execution mode and thread distribiution without batch mode

The selection of batch mode isn't always a slam dunk – the heuristics involved in making the decision include row counts, the types of operators involved, and the expected benefit of batch mode.

APPROX_COUNT_DISTINCT

This new aggregate function is designed for data warehouse scenarios, and is an equivalent for COUNT(DISTINCT()). Instead of performing expensive distinct sort operations to determine actual counts, it relies instead on statistics to get something relatively accurate. You should find that the margin of error is within 2% of the precise count, 97% of the time, which is usually fine for high-level analytics, values that populate a dashboard, or quick estimates.

On my system I created a table with integer columns ranging from 100 to 1,000,000 unique values, and string columns ranging from 100 to 100,000 unique values. There were no indexes other than a clustered primary key on the leading integer column. Here are the results of COUNT(DISTINCT()) vs. APPROX_COUNT_DISTINCT() against those columns, so you can see where it is off by a bit (but always well within 2%):

Results of COUNT(DISTINCT()) vs. APPROX_COUNT_DISTINCT()

The payoff is huge if you're memory-bound, which applies to most of us. If you look at the execution plans in this specific case, you'll see a huge difference in memory footprint on the hash match operator:

Difference in memory usage for COUNT(DISTINCT()) vs. APPROX_COUNT_DISTINCT()

Note that you will typically only see significant performance improvements if you are already memory-bound. On my system, the runtime was slightly longer due to more CPU being used by the new function:

Duration and CPU differences for COUNT(DISTINCT()) vs. APPROX_COUNT_DISTINCT()

That would likely be very different if I had bigger tables, less memory available to SQL Server, higher concurrency, or any combination.

Query-scoped compatibility level hints

Have a specific query that runs better under a certain compatibility level that is not the same as the current database? Now you can, with new query hints supporting six different compatibility levels and five different cardinality estimation models. The following shows the compatibility levels available, example syntax, and the CE model that is used in each case. You can see how it can affect estimates, even for system catalog views:

Results of query run at different compatibility levels via hints

Long story short: no more memorizing trace flags, or wondering if you still need to worry about whether your optimizer fix is covered under TF 4199 or undone by some service pack or cumulative update. Note that these additional hints were also recently added to SQL Server 2017 Cumulative Update #10 (see Pedro Lopes' blog post for more details). You can see all the available hints using:

SELECT name FROM sys.dm_exec_valid_use_hints;

But please always keep in mind that hints are a last resort – they are often great for getting you out of a jam, but you shouldn't plan on leaving code like that long term, since the behavior can change with subsequent updates.

Troubleshooting

Lightweight profiling on by default

This enhancement requires a bit of background. SQL Server 2014 introduced the DMV sys.dm_exec_query_profiles, allowing the user running the query to also collect diagnostic information about all of the operators throughout the duration of the query. This information can be used after the query is finished to determine which operators actually did the most work and why. Any user not running that specific query could still get visibility into this data for any session that enabled STATISTICS XML or STATISTICS PROFILE, or for all sessions through the query_post_execution_showplan extended event – though that event specifically can put a potentially large strain on overall performance.

Management Studio 2016 added functionality to show the data moving through the execution plan in real time based on the information collected from the DMV, which makes it a lot more powerful for troubleshooting purposes. Plan Explorer also offers both live and replay capabilities for visualizing the data throughout the duration of a query.

Starting with SQL Server 2016 SP1, you could also enable a lightweight version of this data collection for all sessions, using trace flag 7412 or the query_thread_profile extended event, so that you could immediately get relevant information about any session without them having to enable anything explicitly in their session (particularly things that affect performance negatively). Much more information on this available in this blog post by Pedro Lopes.

In SQL Server 2019, this thread profile will be enabled by default – so you don't need a special Extended Events session running, or any trace flags, or STATISTICS options on in any individual query; you'll just be able to look at data from the DMV at any time for all concurrent sessions. You'll be able to turn this off using a new database scoped configuration called LIGHTWEIGHT_QUERY_PROFILING, but the syntax won't work in CTP 2.0 – that will be fixed in a forthcoming CTP.

Clustered Columnstore Index Statistics Available in Clone Databases

In current versions of SQL Server, cloning a database brings over only the original statistic object from clustered columnstore indexes, ignoring any updates that have been made to the table after creation. If you are using the clone for query tuning and other performance testing that relies on cardinality estimates, those use cases may not be valid. Parikshit Savjani described the limitation in this blog post, and provided a workaround – before initiating the clone, you generate a script that runs DBCC SHOW_STATISTICS … WITH STATS_STREAM against every object. This is potentially expensive, and definitely easy to forget.

In SQL Server 2019, those updated stats will just be automatically available in the clone too, so you can test different query scenarios and get reliable plans based on real statistics, without manually running STATS_STREAM against all of the tables.

Compression estimates for Columnstore

In current versions, the procedure sys.sp_estimate_data_compression_savings has the following check:

if (@data_compression not in ('NONE', 'ROW', 'PAGE'))

Which means it allows you to check for row or page compression (or to see the impact of removing the current compression). In SQL Server 2019, that check now looks like this:

if (@data_compression not in ('NONE', 'ROW', 'PAGE', 'COLUMNSTORE', 'COLUMNSTORE_ARCHIVE'))

This is great news, because it allows you to roughly predict the impact of adding a columnstore index to a table that currently has none, or converting a table or partition to the more aggressive columnstore format, without having to restore that table to another system and actually try it. I had a 10 million row table on my system, and ran the stored procedure for each of the five options:

EXEC sys.sp_estimate_data_compression_savings
     @schema_name      = N'dbo',
     @object_name      = N'FactTable',
     @index_id         = NULL,
     @partition_number = NULL, 
     @data_compression = N'NONE'; 
     -- repeat for ROW, PAGE, COLUMNSTORE, COLUMNSTORE_ARCHIVE

Results:

As with the other compression types, the accuracy depends wholly on the rows sampled, and how representative they are of the rest of the data. Still, it's a pretty powerful way to get a ballpark guess without any of the work required for more accuracy.

New function to retrieve page info

DBCC PAGE and DBCC IND have been used for a long time to gather information about the pages that comprise a partition, index, or table. But they are undocumented and unsupported commands, and can be quite tedious to automate solutions around problems that involve more than one index or page.

Along came sys.dm_db_database_page_allocations, a dynamic management function (DMF) that returns a set representing all of the pages in the specified object. Still undocumented, this function exhibits a predicate pushdown issue that can be a real problem on larger tables: even to get the information about a single page, it has to read the entire structure, which can be quite prohibitive.

SQL Server 2019 introduces another DMF, sys.dm_db_page_info. This returns basically all of the information on a page, without the overhead of the allocations DMF. In current builds, though, you have to already know the page number you're looking for in order to use the function. That may be intentional, as it could be the only way to ensure performance. So if you're trying to determine all of the pages in an index or table, you'll still need to use the allocations DMF. I'll be writing a lot more about this function in an upcoming tip.

Security

Always Encrypted using Secure Enclaves

Today, Always Encrypted protects sensitive data over the wire and in memory by encrypting/decrypting at each end of the process. Unfortunately, this introduces often critical processing constraints, such as not being able to perform computations and filtering – meaning the whole data set has to be sent across in order to perform, say, a range search.

An enclave is a protected area of memory where such computations and filtering can be delegated (in Windows, this uses virtualization-based security) – the data remains encrypted in the engine but can be encrypted or decrypted, securely, inside the enclave. You simply need to add the ENCLAVE_COMPUTATIONS option to the master key, which you can do in SSMS by checking the “Allow enclave computations” checkbox:

Now you'll be able to encrypt data almost instantly when compared to the old way (in which the wizard, or the Set-SqlColumnEncyption cmdlet, or your application, would have to move all the data completely out of the database, encrypt it, and then send it back):

ALTER TABLE  dbo.Patients
ALTER COLUMN SSN char(9) -- currently not encrypted! 
ENCRYPTED WITH 
(
  COLUMN_ENCRYPTION_KEY = ColumnEncryptionKeyName, 
  ENCRYPTION_TYPE       = Randomized,
  ALGORITHM             = 'AEAD_AES_256_CBC_HMAC_SHA_256'
) NOT NULL;

This allows for wildcard and range searches, ordering, and so on, as well as in-place encryption within queries, with no loss of security, since the enclave allows both encryption and decryption to happen on the server. You can also perform encryption key rotation within the enclave.

I suspect this will move the goal posts for a lot of organizations, but in this CTP some of these optimizations are still being perfected, so they are not enabled by default – but you can see how to turn them on in the topic enable rich computations.

Certificate Management in Configuration Manager

Managing SSL and TLS certificates has always been a pain, and many folks have to perform gobs of tedious works and home-grown scripts to deploy and maintain certificates across their enterprise. With SQL Server 2019, updates to SQL Server Configuration Manager will help, allowing you to quickly view and validate any instance's certificates, find certs close to expiration, and sync deployments of certificates across all of the replicas in an Availability Group (from one place: the primary) or all of the nodes in a Failover Cluster Instance (from one place: the active node).

I haven't tried all of these operations, but they should work across older versions of SQL Server, as long as they are performed from a SQL Server 2019 version of SQL Server Configuration Manager.

Built-In Data Classification and Auditing

In SSMS 17.5, the SQL Server team added the ability to classify data within SSMS, so that you could identify any columns that might contain sensitive information or could otherwise interfere with compliance to various standards (HIPAA, SOX, PCI, and of course GDPR). The wizard uses an algorithm to suggest columns it thinks are likely to cause compliance issues, but you can add your own, adjust its suggestions, and eliminate any columns from the list. It stores these classiciations using extended properties; an SSMS-based report uses that same information to display those columns that have been identified. Outside of the report, these properties are not highly visible.

In SQL Server 2019 there is a new command for this metadata, already available in Azure SQL Database, called ADD SENSITIVITY CLASSIFICATION. This allows you to do the same sort of thing as the SSMS wizard, but the information is no longer stored as extended properties, and any access to that data automatically shows in audits in a new XML column called data_sensitivity_information. This contains all of the types of information that were accessed during the audited event.

As a quick example, let's say I have a table for outside contractors:

CREATE TABLE dbo.Contractors
(
  FirstName  sysname, 
  LastName   sysname, 
  SSN        char(9), 
  HourlyRate decimal(6,2)
);

Just looking at that, I know that all four columns are either vulnerable to data breach, or simply should not be available to everyone who can access the table. Permissions can play a role here but, at a minimum, we need to raise the visibility. So we can classify these columns in different ways:

ADD SENSITIVITY CLASSIFICATION TO dbo.Contractors.FirstName, dbo.Contractors.LastName
WITH (LABEL = 'Confidential GDPR', INFORMATION_TYPE = 'Personal Info');

ADD SENSITIVITY CLASSIFICATION TO dbo.Contractors.SSN
WITH (LABEL = 'Highly Confidential', INFORMATION_TYPE = 'National ID');

ADD SENSITIVITY CLASSIFICATION TO dbo.Contractors.HourlyRate
WITH (LABEL = 'Highly Confidential', INFORMATION_TYPE = 'Financial');

Now, instead of looking in sys.extended_properties, you can see this metadata in sys.sensitivity_classifications:

SELECT * FROM sys.sensitivity_classifications

And if we're auditing selects (or DML) against this table, we don't have to change anything about the audit itself; a SELECT * after the classifications are created is going to yield this type of info in the audit log's new data_sensitivity_information column:

<sensitivity_attributes>
    <sensitivity_attribute label="Confidential - GDPR" information_type="Personal Info" />
    <sensitivity_attribute label="Highly Confidential" information_type="National ID" />
    <sensitivity_attribute label="Highly Confidential" information_type="Financial" />
</sensitivity_attributes>

This obviously doesn't solve all compliance issues, but it can give you a real head start. If you use the wizard to automatically identify columns, and then automate the translation of those sp_addextendedproperty calls into ADD SENSITIVITY CLASSIFICATION commands, you can be well on your way to compliance. I will write more about this in an upcoming tip.

You can also automate generating (or updating) permissions based on the label metadata – creating a dynamic SQL command that denies access to all Confidential – GDPR columns to a user, group, or role is definitely manageable. Something I may explore in a future tip.

Availability

Resumable online index creation

SQL Server 2017 introduced the ability to pause and resume online index rebuilds, which can be very useful if you need to change the number of CPUs being used, continue where you left off after a failover event, or simply bridge a gap between maintenance windows. I talked about this feature in a previous tip, More on Resumable Online Index Rebuilds in SQL Server 2017.

In SQL Server 2019, you will be able to use the same syntax to create online indexes in such a way that you can pause and resume, and even set a runtime cap (at which point it will pause):

CREATE INDEX foo ON dbo.bar(blat)
WITH (ONLINE = ON, RESUMABLE = ON, MAX_DURATION = 10 MINUTES);

If this is running too long, you can use ALTER INDEX from another session to pause (even though the index doesn't technically exist yet):

ALTER INDEX foo ON dbo.bar PAUSE;

In current builds you cannot reduce degree of parallelism when you resume, like you can with a rebuild. If you try to reduce DOP:

ALTER INDEX foo ON dbo.bar RESUME WITH (MAXDOP = 2);

You will get:

Msg 10666, Level 16, State 1, Line 3
Cannot resume index build as required DOP 4 (DOP operation was started with) is not available. Please ensure sufficient DOP is available or abort existing index operation and try again.
The statement has been terminated.

In fact, if you try that, and then you simply issue a resume with no options, you'll get the same error message, at least on current builds. I guess the resume attempt gets logged somewhere and is reused. You'll need to specify the correct (or higher) DOP in order to continue:

ALTER INDEX foo ON dbo.bar RESUME WITH (MAXDOP = 4);

To be clear, you can increase DOP when resuming a paused index creation, you just can't go lower.

An additional benefit to all of this is that you will be able to make online and/or resumable index operations the default for a database, using the new database scoped configurations ELEVATE_ONLINE and ELEVATE_RESUMABLE.

Online create / rebuild for Clustered Columnstore Indexes

In addition to resumable online index creation, we also get the ability to create or rebuild clustered columnstore indexes online. This is a big change in that you no longer have to take a maintenance window to perform maintenance on these indexes or – a more compelling use case – to convert from rowstore to columnstore:

CREATE TABLE dbo.splunge
(
  id int NOT NULL
);
GO

CREATE UNIQUE CLUSTERED INDEX PK_Splunge ON dbo.splunge(id);
GO

CREATE CLUSTERED COLUMNSTORE INDEX PK_Splunge ON dbo.splunge
  WITH (DROP_EXISTING = ON, ONLINE = ON);

One caveat: converting from a traditional clustered index to a clustered columnstore index is only possible online if your existing clustered index was created in a certain way. If it is part of an explicit primary key constraint, inline or not…

CREATE TABLE dbo.splunge
(
  id int NOT NULL CONSTRAINT PK_Splunge PRIMARY KEY CLUSTERED (id)
);
GO

-- or after the fact
-- ALTER TABLE dbo.splunge ADD CONSTRAINT PK_Splunge PRIMARY KEY CLUSTERED(id);

…you'll receive this error:

Msg 1907, Level 16
Cannot recreate index 'PK_Splunge'. The new index definition does not match the constraint being enforced by the existing index.

You'll have to drop the constraint first in order to convert to a clustered columnstore index, but you'll still be able to perform both operations online:

ALTER TABLE dbo.splunge DROP CONSTRAINT PK_Splunge
  WITH (ONLINE = ON);
GO

CREATE CLUSTERED COLUMNSTORE INDEX PK_Splunge
  ON dbo.splunge
  WITH (ONLINE = ON);

That works, but on larger tables, it will likely take longer than if the primary key had initially been implemented as a unique clustered index. I am not sure yet if this is intended behavior or just a limitation in current CTP builds.

Secondary to Primary Replica Connection Redirection

This feature allows you to configure redirects without a listener, so you can shift a connection to the primary even if a secondary is explicitly named in the connection string. You can use this feature when your clustering technology does not support a listener, or when you are using cluster-less AGs, or when you have a complex redirection scheme in a multi-subnet scenario. This will prevent a connection from, for example, attempting write operations against a read-only replica (and failing).

Development

Graph enhancements

Graph relationships will now support the MERGE statement for node or edge tables using MATCH predicates; now one statement can update an existing edge or insert a new one. A new edge constraint will allow you to dictate which nodes an edge can connect.

UTF-8

SQL Server 2012 added support for UTF-16 and supplementary characters through a set of collations named with an _SC suffix, like Latin1_General_100_CI_AI_SC, for use with Unicode columns (nchar/nvarchar). In SQL Server 2017, it is possible to import and export data in UTF-8 format, to and from these columns, through facilities like BCP and BULK INSERT.

In SQL Server 2019, there are new collations to support natively storing UTF-8 data within SQL Server. So you can create a char or varchar column to properly store UTF-8 data using new collations with an _SC_UTF8 suffix, like Latin1_General_100_CI_AI_SC_UTF8. This can help improve compatibility with external applications and other database platforms and systems, without paying the performance and storage penalties of nvarchar.

A little easter egg I found

For as long as I can remember, SQL Server users have been complaining about this vague error message:

Msg 8152
String or binary data would be truncated.

In the CTP builds I've been playing with, I've spotted an interesting error message that didn't exist before:

Msg 2628
String or binary data would be truncated in table '%.*ls', column '%.*ls'. Truncated value: '%.*ls'.

I don't think anything else needs to be stated here; this is a great (if long overdue) improvement, and is going to make many people very happy. However, this functionality won't be available in CTP 2.0; I am just giving you a little sneak peek at a forthcoming enhancement that you might not have discovered on your own. Brent Ozar has listed out all of the new messages found in the current CTP, and peppered some helpful commentary throughout his post about sys.messages : Revealing more features.

Summary

SQL Server 2019 offers substantial enhancements that are going to improve the way you use your favorite relational database platform, and there are plenty of changes I haven't talked about here. Persistent memory support, clustering for Machine Learning Services, replication and distributed transactions on Linux, Kubernetes, connectors for Oracle / Teradata / MongoDB, synchronous AG replicas bump up to five Java support (implementation similar to Python/R) and, last but not least, a new endeavor called the “Big Data Cluster.” Some of these features you'll need to sign up for using this EAP form.

Bob Ward's forthcoming book, Pro SQL Server on Linux - Including Container-Based Deployment with Docker and Kubernetes, might give you some clues about some other things that are coming as well. And this post from Brent Ozar talks about a potential forthcoming fix to scalar user-defined functions.

But even in this first public CTP, there is something substantial for just about everyone, and I encourage you to try it out for yourself and let me know what you think!

Next Steps

Read on to learn more about SQL Server 2019 through these tips and other resources:

About the author

Aaron Bertrand (@AaronBertrand) is a passionate technologist with industry experience dating back to Classic ASP and SQL Server 6.5. He is editor-in-chief of the performance-related blog, SQLPerformance.com, and also blogs at sqlblog.org.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips

Wednesday, October 3, 2018 - 6:08:53 PM - Tony Green	Back To Top (77832)
Brent Ozars article regarding "...sys.messages : Revealing more features..." has been moved to https://www.brentozar.com/archive/2018/09/whats-new-in-sql-server-2019s-sys-messages-more-unannounced-features/

Monday, October 1, 2018 - 10:54:08 AM - Brian Battles	Back To Top (77780)
Just that "string or binary data" error message improvement is worth buying SS 2019!

Monday, September 24, 2018 - 1:33:26 PM - jeff_yao	Back To Top (77716)
Very informative summary, Aaron !