By: Aaron Bertrand | Updated: 2023-03-22 | Comments (3) | Related: > Views
Problem
As much as we tell people to use SCHEMABINDING and avoid SELECT *, there is still a wide range of reasons people do not. A well-documented problem with SELECT * in views, specifically, is that the system caches the metadata about the view from the time the view was created, not when the view is queried. If the underlying table later changes, the view doesn't reflect the updated schema without refreshing, altering, or recreating the view. Wouldn't it be great if you could stop worrying about that scenario and have the system automatically keep the metadata in sync?
Solution
To illustrate the problem, let's create a simple table and view representing job candidates. When a candidate enters the system, we enter a row for that candidate and when they started the application process:
CREATE TABLE dbo.Candidates ( ID int, StartDate date ); GO INSERT dbo.Candidates(ID, CreateDate) VALUES(1, '19000101'); GO
There are more columns like name, what job they applied for, and so on, but they are irrelevant to the problem. Let's also say there's a view, and I'm leaving out the reason for a view – maybe it joins to another table to make sure a manager can only see candidates for their department or to prevent an employee from seeing candidates they referred. Anyway, keeping the view as simple as humanly possible:
CREATE VIEW dbo.CandidatesView AS SELECT * FROM dbo.Candidates; GO
If we query the view, we see the data we expect:
SELECT * FROM dbo.CandidatesView;
Results:
Let's say we've been instructed to rename the column StartDate to CreatedDate because someone realized StartDate implies they were hired. We rename the column:
EXEC sys.sp_rename @objname = N'dbo.Candidates.StartDate', @newname = N'CreatedDate', @objtype = 'COLUMN';
We get the below warning, but I suspect most people ignore that, don't even notice it, or may have no idea there are any objects out there that use SELECT *:
Caution: Changing any part of an object name could break scripts and stored procedures.
Now, if we select from the view again, we get… interesting results compared to selecting from the table:
SELECT * FROM dbo.Candidates; SELECT * FROM dbo.CandidatesView;
Next, let's add a column to track their candidate score.
ALTER TABLE dbo.Candidates ADD Score tinyint;
Now when we select from both objects:
Further, let's say we do want to add a StartDate column to indicate that a candidate did get hired:
ALTER TABLE dbo.Candidates ADD StartDate date; GO UPDATE dbo.Candidates SET StartDate = '20010101';
We don't get any warning here as we do with the rename.
And let's say our query against the view only looked at ID and StartDate, making the problem a lot harder to spot:
SELECT ID, StartDate FROM dbo.Candidates; SELECT ID, StartDate FROM dbo.CandidatesView;
Results:
Oof. I would classify these as major problems that all have the potential to break other modules and external applications – especially if they start explicitly referencing updated or new column names.
We can fix the issue, of course, as explained in a previous tip, "Table changes not automatically reflected in a SQL Server View." Running the following code addresses all the issues highlighted above and once again makes SELECT * against the view behave the same as SELECT * against the table:
EXEC sys.sp_refreshview @viewname = N'dbo.CandidatesView';
Other scenarios that are less of a problem:
Dropping a column has an immediate impact when a SELECT * view is referenced:
Msg 4502, Level 16, State 1
View or function 'dbo.CandidatesView' has more
column names specified than columns defined.
And even worse when the view references the column explicitly:
Msg 207, Level 16, State 1, Procedure CandidatesView
Invalid column name 'Score'
Msg 4413, Level 16, State 1
Could not use view or function 'dbo.CandidatesView' because of binding errors.
We can't fix these scenarios by simply refreshing the view as we can for the simple SELECT * case where columns have been added. In larger teams – and even in one-person teams – it can be unrealistic to expect humans to take care of this. It would mean that every time anyone modifies any table, they must find all the views they might not even know they're inadvertently affecting and run that procedure against all of them. We should be able to use a DDL trigger to automate the corrections, even if they won't resolve all possible scenarios.
A DDL Trigger to the Rescue
I've written about DDL triggers before; they are objects that sit at the database (or server) level and respond to defined events. In this case, we can hook to an event called ALTER_TABLE, determine all views that depend on the table, and generate sp_refreshview commands to run against each one. We'll run these commands and output diagnostics, letting the user know – which will only help if they run ALTER TABLE from a context that consumes PRINT or SELECT output. If alters come in from pipelines, PowerShell, or any source that isn't expecting output, you may consider adding logging, as I discussed in the aforementioned tip.
Let's create a new set of tables and views so the updates have more substance.
CREATE TABLE dbo.Authors ( AuthorID int, Name nvarchar(255) ); INSERT dbo.Authors(AuthorID, Name) VALUES(1,N'Me Tarzan'),(2,N'You Jane'); CREATE TABLE dbo.Books ( BookID int, Title nvarchar(4000), MainAuthorID int ); INSERT dbo.Books(BookID, Title, MainAuthorID) VALUES(1,N'Stuff',1),(2,N'Things',1),(3,N'Such',2);
And a few silly views:
CREATE VIEW dbo.AuthorBooks AS SELECT * FROM dbo.Authors AS a INNER JOIN dbo.Books AS b ON a.AuthorID = b.MainAuthorID; GO CREATE VIEW dbo.BooksByTarzan AS SELECT * FROM dbo.Books WHERE MainAuthorID = 1; GO CREATE VIEW dbo.AuthorsLikeJane AS SELECT * FROM dbo.Authors WHERE Name LIKE N'%Jane%'; GO
If we query these views using SELECT *, there are no surprises:
But if we add a column to each table:
ALTER TABLE dbo.Authors ADD LastName nvarchar(50); ALTER TABLE dbo.Books ADD ISBN varchar(12);
You might want to buckle up before querying them again:
We added a column to each table. For the first view, which joined the two tables, it shifted the columns from the second table in the join to the right. What's labeled as BookID is actually the new column LastName, the Title is BookId, and MainAuthorID is Title. ISBN didn't even make the cut. For the other two views that only reference a single table, the new column simply isn't presented by the view.
Like before, it can be fixed by manually running sp_refreshview, but let's tackle our DDL trigger to do that for us.
First, drop the objects and recreate them.
The trigger below determines the schema and object name of the table affected by the ALTER, then builds a series of sp_refreshview commands for every view that references that table, according to sys.sql_expression_dependencies. This specific implementation relies on STRING_AGG. So it is meant for SQL Server 2017+, but there are other ways to handle that on older versions.
CREATE TRIGGER FixViews ON DATABASE FOR ALTER_TABLE AS BEGIN DECLARE @EventData xml = EVENTDATA(); DECLARE @sch sysname = QUOTENAME(@EventData.value (N'(/EVENT_INSTANCE/SchemaName)[1]', N'sysname')), @obj sysname = QUOTENAME(@EventData.value (N'(/EVENT_INSTANCE/ObjectName)[1]', N'sysname')), @base nvarchar(max) = N'EXEC sys.sp_refreshview N''$sch$.$obj$'';', @sql nvarchar(max); SELECT @sql = STRING_AGG(REPLACE(REPLACE(@base, N'$sch$',QUOTENAME(s.name)), N'$obj$',QUOTENAME(o.name)), char(13)) FROM sys.schemas AS s INNER JOIN sys.objects AS o ON s.[schema_id] = o.[schema_id] INNER JOIN sys.sql_expression_dependencies AS d ON o.[object_id] = d.referencing_id WHERE d.referenced_id = OBJECT_ID(CONCAT(@sch,N'.',@obj)) AND o.type = 'V'; SELECT @sql; EXEC sys.sp_executesql @sql; END
Now, if we again alter these tables:
ALTER TABLE dbo.Authors ADD LastName nvarchar(50); ALTER TABLE dbo.Books ADD ISBN varchar(12);
We get two sets of commands in the output:
EXEC sys.sp_refreshview N'[dbo].[AuthorBooks]'; EXEC sys.sp_refreshview N'[dbo].[AuthorsLikeJane]'; EXEC sys.sp_refreshview N'[dbo].[AuthorBooks]'; EXEC sys.sp_refreshview N'[dbo].[BooksByTarzan]';
And the views now work without any manual interference.
A Caveat
This doesn't help the scenario where a column has been added, dropped, or renamed, and the query in the view actually has to change (maybe it's not even a SELECT * view, maybe it's part of a WHERE clause or needs to be added to an ORDER BY). Refreshing the view also doesn't magically handle cases where the table itself is renamed, the table is transferred to a different schema, or the table and view don't exist on the same database or instance (though some of those could likely also be addressed in a future enhancement). This is only meant to take away the tedious process of refreshing views that use SELECT * simply because they are an abstraction layer for the underlying table(s). This could still leave some views in an unusable state after certain types of table changes.
The Real Lesson
If you can, avoid SELECT * in views, and use SCHEMABINDING. Sure, it makes maintenance a little more cumbersome, but it's a great way to prevent some of the issues presented earlier in the article.
Next Steps
See these tips and other resources:
- Table changes not automatically reflected in a SQL Server View
- Detect and rename invalid VIEWs in a SQL Server database using PowerShell
- Benefits of SCHEMABINDING in SQL Server
- Why You Should Avoid SELECT * in SQL Server T-SQL Code
- SQL Server DDL Triggers to Track All Database Changes
- Enforce SQL Server Database Naming Conventions Using DDL Triggers
- Extending SQL Server DDL Triggers for more functionality: Part 1
- Extend SQL Server DDL Triggers for more functionality: Part 2
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2023-03-22