SQL Server UDF to pad a string


By:   |   Updated: 2009-04-27   |   Comments (8)   |   Related: More > Functions - User Defined UDF

Problem

Unlike other relational database management systems that shall remain nameless, SQL Server's underlying coding language, T/SQL, does not have a built-in function for padding string values.  I recently took it upon myself to create my own and as you'll see I got a little carried away.

Solution

The task seems simple enough:  create a user-defined function that will allow padding of a string value with a finite count of a desired character.  What ended up as an experiment out of necessity became a slightly more-involved function once I decided that padding position should be customizable to meet the end user's needs. 

While Transact-SQL (T/SQL) does not offer a comparable function similar to LPAD or RPAD available in other RDBMSs, the SQL Server Professional does have the REPLICATE() function that can be used to build a simple user-defined function that can be used to pad a string.  Let's take a look at the REPLICATE() function and what it offers before moving on to the code for the custom padding function.

REPLICATE (string_expression ,integer_expression) will allow you to replicate a character string (the string_expression parameter the number of times consecutively per the integer_expression parameter). 

A simple example of this function is presented below:

SELECT REPLICATE('ABCDE|', 3)  

Results

------------------ 
ABCDE|ABCDE|ABCDE| 

(1 row(s) affected)

Whereas the REPLICATE() function will allow you to return a string to a maximum size of 8000 bytes, the function I'll be creating will be based upon an output value of varchar(100).  You will be able to modify this value to fit your needs, however I very rarely have a need to pad a string value greater than even 20 characters.  I thought it worthwhile to create a single function for padding either to the left or right of the unpadded string.  Then it became interesting, what if for some reason you wished to pad the center of the string?  What about padding both the left and right sides of the string evenly?  Whether you use those options or not, the functionality is there.  The code below represents the function I've created.

CREATE FUNCTION [dbo].[usp_pad_string]  
 ( 
 @string_unpadded VARCHAR(100),  
 @pad_char VARCHAR(1),  
 @pad_count tinyint,  
 @pad_pattern INT) 
RETURNS VARCHAR(100) 
AS 
BEGIN 
 DECLARE @string_padded VARCHAR(100) 

 SELECT @string_padded =  
 CASE @pad_pattern 
   WHEN 0 THEN REPLICATE(@pad_char, @pad_count) + @string_unpadded --pad left 
   WHEN 1 THEN @string_unpadded + REPLICATE(@pad_char, @pad_count) --pad right 
   WHEN 2 THEN  
     --pad center 
     LEFT(@string_unpadded, FLOOR(LEN(@string_unpadded)/2))  
     + REPLICATE(@pad_char, @pad_count)  
     + RIGHT(@string_unpadded, LEN(@string_unpadded) - FLOOR(LEN(@string_unpadded)/2))  
   WHEN 3 THEN  
     --pad edges 
     REPLICATE(@pad_char, FLOOR(@pad_count/2))  
     + @string_unpadded  
     + REPLICATE(@pad_char, @pad_count - FLOOR(@pad_count/2))   
 END 
 RETURN @string_padded 
END

The function expects four parameters:

  • @string_unpadded - the raw string value you wish to pad. 
  • @pad_char - the single character to pad the raw string. 
  • @pad_count - the amount of times to repeat the padding character
  • @pad_pattern -  an integer value that determines where to insert the pad character
    • 0 - all pad characters placed left of the raw string value (pad left)
    • 1 - all pad characters placed right of the raw string value (pad right)
    • 2 - all pad characters placed at the midpoint of the raw string value (pad center)
    • 3 - the raw string value will be centered between the pad characters (pad ends)

Notes

  • If either of the length of the supplied @pad_count or @string_unpadded values are odd, centering will be affected.  We will show that behavior in the samples presented below. 
  • Since the return value of the function is limited to 100 characters, the length of parameter corresponding to the unpadded string must be sized according to allow for padding.  The length of the return value and this parameter are completely customizable to fit your needs. 
  • While the REPLICATE() function allows you to pad more than a single character, this function expects you'll only pad a single character.

Example 1 Evenly-Distributable Padding

--Even distribution possible 
--Pad Left 
SELECT '1234' AS [raw string], dbo.[usp_pad_string]('1234', 'X', 4, 0) AS [padded string], '0 - pad LEFT' AS [pad pattern value]; 

--Pad Right 
SELECT '1234' AS [raw string], dbo.[usp_pad_string]('1234', 'X', 4, 1) AS [padded string], '1 - pad RIGHT' AS [pad pattern value]; 

--Pad Center 
SELECT '1234' AS [raw string], dbo.[usp_pad_string]('1234', 'X', 4, 2) AS [padded string], '2 - pad CENTER' AS [pad pattern value]; 

--Pad Edges 
SELECT '1234' AS [raw string], dbo.[usp_pad_string]('1234', 'X', 4, 3) AS [padded string], '3 - pad EDGES' AS [pad pattern value]; 
raw string

The previous set of examples shows what results to expect when the parameters are evenly distributed.  What happens though when this is not possible?  This occurs under two situations: you are attempting to center-pad a raw string that has an odd number of characters or you're trying to pad an odd number of characters using the pad edges option (@pad_pattern = 3).  I am not going to show you some method for splitting an odd string at a quantum level.  Sorry, I am not all that brilliant as it is.  Therefore I had to make a judgment call on how the behavior is going to occur using the structure of the three dimensions we have to work with here currently.  The following examples will present how this function will behave.

Example 2 - Center-Pad an Odd-Length Raw String

--Pad Center 
SELECT '12345' AS [raw string], dbo.[usp_pad_string]('12345', 'X', 4, 2) AS [padded string], '2 - pad CENTER' AS [pad pattern value]; 
raw string

Example 3 - Center-Edges an Odd Number of Times

--Pad Edges 
SELECT '1234' AS [raw string], dbo.[usp_pad_string]('1234', 'X', 5, 3) AS [padded string], '3 - pad EDGES' AS [pad pattern value];
padded string

As you can see, the padding will be right-heavy in both cases.  Though I doubt you'll have much use for center or edge-padding strings in Transact-SQL, you never really know.

Padding is a frequent need in returning string results to the end user of a RDBMS.  This function allows you to do so as if the functionality existed inherently in T-SQL.

Next Steps


Last Updated: 2009-04-27


get scripts

next tip button



About the author
MSSQLTips author Tim Ford Tim Ford is a Senior Database Administrator with MindBody.

View all my tips





Comments For This Article




Thursday, September 29, 2011 - 9:30:04 AM - Usman Butt Back To Top

Hi Mr. Jeff Moden,

Thanks a lot for your response. I have taken the same function as an example and test it WithoutSchemaBinding, WithSchemaBinding and through iTVF on a 100,000 rows table. Following are the time statistics

For WithoutSchemaBinding
SQL Server parse and compile time:
   CPU time = 0 ms, elapsed time = 23 ms.

(100000 row(s) affected)

SQL Server Execution Times:
   CPU time = 3563 ms,  elapsed time = 7897 ms.

 

For WithSchemaBinding
SQL Server parse and compile time:
   CPU time = 0 ms, elapsed time = 11 ms.

(100000 row(s) affected)

SQL Server Execution Times:
   CPU time = 3343 ms,  elapsed time = 8073 ms.

 

For iTVF (With Cross Apply)
SQL Server parse and compile time:
   CPU time = 0 ms, elapsed time = 2 ms.

(100000 row(s) affected)

SQL Server Execution Times:
   CPU time = 78 ms,  elapsed time = 1943 ms.

 

Your suggested iTVF function has outperformed both the scalar functions (With and without schema binding). This is really nice to see.

But I am not able to understand why withschemabound function is a bit slower than withoutschemabound function. It should have atleast matched with withoutschemabinding function.

Anyhow these are interesting results and I will work on this more. Although its early to ask but I have more questions in my mind

Is it safe to say iTVF would always perform better than Scalar functions? if Yes then, do we need Scalar functions if they can be replaced by iTVF?

I believe another great tip is waiting to be written by you. Cheers.

 

Best Regards,
Usman Butt


Wednesday, September 28, 2011 - 8:18:16 AM - Jeff Moden Back To Top

To answer Usman's questions...

1. Can we safely use SCHEMABINDING for performance matters

That's actually a 3 part question... Yes, you can safely use SCHEMABINDING.  No, it won't always help performance matters.  Yes, because of the TableSpool it eliminates, it will almost always reduce the impact scalar functions have on Temp DB in the form of "Work" tables.

2. Is there any need to Declare a variable and then return that variable. I guess we can return the Scalar statement.

No... but if you can write the code without such a return variable, why not turn it into an iTVF (Inline Table Valued Function) even if it only returns one value.  Much like VIEWs, iTVF's are evaluated as part of whatever SQL Statement you've written whereas Scalar and mTVF's (Multi-line Table Valued Functions) are not.

To be sure, the use of SCHEMABINDING is a good idea to save resources and possibly improve performance but it is NOT a performance panacea.  I'll try to get back tonight with some examples of this.

 

 

 


Wednesday, September 28, 2011 - 1:39:21 AM - Usman Butt Back To Top

Hi,

Since Mr. Jeff Moden is active on this tip and I am a keen follower of him, I would request him and others to shed some light on two things in such UDFs (in order to optimize them a bit further)

1. Can we safely use SCHEMABINDING for performance matters

2. Is there any need to Declare a variable and then return that variable. I guess we can return the Scalar statement.

Thanks.

 

Best Regards,
Usman Butt


Tuesday, September 27, 2011 - 7:51:02 AM - Jeff Moden Back To Top

dmill2001 wrote:

>> Searching all over web for guidance on ZERO PADDING a numeric that may or may not have a negative and how keep the negative all the way to the left consuming on of  the pad positions.

Just pad the ABS value of the number and, if the SIGN of the number is negative, then STUFF the first character of the padded value with a "-".


Tuesday, September 27, 2011 - 7:47:12 AM - Jeff Moden Back To Top

Hmmm... I thought "Padding" also produced a fixed length regardless of content of the original string.  For example, the value of "999" with a "0" LPAD of 4 would yield 0999 and the same functionality would yield 0009 for a value of "9". A "0" LPAD of 4 for a value of "99999" would result in "****" to indicate the overflow.


Monday, September 26, 2011 - 8:33:52 AM - Rick Back To Top

While this is an interesting function since padding is almost always right-padding it seems it would be more efficient to simply use LEFT(originalfield+REPLICATE(' ',100),desired_length) in your original query.


Friday, June 26, 2009 - 6:54:26 AM - dmill2001 Back To Top

I really like this example but really curious how it can be chnaged to deal with a NEGATIVE value and shift a (-) to the left of padding a value?


Searching all over web for guidance on ZERO PADDING a numeric that may or may not have a negative and how keep the negative all the way to the left consuming on of  the pad positions.

 

Thanks,

Miller


Tuesday, April 28, 2009 - 5:38:53 AM - MichDBA Back To Top

Nice tip (although very specific to your need).

Due to performance implications of UDFs in SQL statements, I would recommend using the CASE statement rather than the function.



download


Recommended Reading

Auto Generate Create Table Script Based on SQL Server Query

Validate Integer and Decimal Values in SQL Server

Four ways to improve scalar function performance in SQL Server

Computed Columns with Scalar Functions SQL Server Performance Issue

SQL Server User Defined Function Example





get free sql tips
agree to terms


Learn more about SQL Server tools