Varchar Data Types in SQL Server, Oracle and PostgreSQL
We have seen in one of my previous tutorials, SQL String functions in SQL Server, Oracle and PostgreSQL, the basic functions used to manipulate strings in Microsoft SQL Server, Oracle and PostgreSQL. In this tutorial, we are going to look at how to store the maximum number of characters in a variable length column and the differences between these three database systems.
In this tutorial we will review the string data types used in SQL Server, Oracle and PostgreSQL in order to store a large number of characters. In particular, we will see the various procedures used in order to have big strings stored in a column to handle large text or structured data like JSON (this will be a separate tip).
As always, we will use the github freely downloadable database sample Chinook, as it is available in multiple RDBMS formats. It is a simulation of a digital media store, with some sample data, all you have to do is download the version you need and you have all the scripts for data structure and all the inserts for the data.
In SQL Server the data types string data are essentially: VARCHAR(n), VARCHAR(MAX) and NVARCHAR(n), NVARCHAR(MAX), there is also TEXT, but since it is deprecated it should not be used anymore.
In the VARCHAR data types, the n indicates the maximum number of bytes stored, it is not the number of characters, and it can be a maximum of 8000. If instead you use MAX, this indicates the maximum possible size is up to 2 GB (storage size) for the character data type that are non-Unicode.
Similarly, in the NVARCHAR data types the n indicates the byte-pairs so the limit is 4000 (maximum length) and the MAX indicates the maximum amount of 2 GB (maximum storage) for Unicode characters.
Obviously, the numbers of bytes or byte-pairs for Latin character sets is equivalent to the number of characters, but in other sets this may not be true!
Let’s setup a T-SQL example creating a new table in which we will store messages to customer service for the digital media shop.
CREATE TABLE CustomerMessages( MessageId int not null, CustomerId int NOT NULL, InsertDate smalldatetime NOT NULL, UpdateDate smalldatetime NULL, MessageContent varchar(5000) CONSTRAINT PK_Message PRIMARY KEY CLUSTERED ( MessageId ASC ) ON [PRIMARY] ) ON [PRIMARY]; ALTER TABLE CustomerMessages ADD CONSTRAINT [FK_CustomerMessages_Customer] FOREIGN KEY(CustomerId) REFERENCES Customer (CustomerId)
AAs you can see, we have created the new table adding a Foreign Key in order to reference the CustomerID and check that the customer exists, as well as a Primary Key for the MessageId. In this case we have given a maximum of 5000 bytes (in our case, since we have Latin character set, 5000 characters) to the MessageContent column.
Now let’s suppose that we’ve seen that 5000 characters is not enough in some cases and also that we noticed the text stored in Messages varies greatly from a few words to long paragraphs, so we want to change to varchar(max). This can be done as follows.
ALTER TABLE CustomerMessages ALTER COLUMN MessageContent varchar(max)
This is very simple to change in SQL Server as you can see!
IIn Oracle the string data types used are similar, but we have different limits. First of all, we have VARCHAR2 and NVARCHAR2 which have a limit of 4000 bytes if the MAX_STRING_SIZE parameter is set to STANDARD (more on this later). These are more or less like VARCHAR and NVARCHAR variable length strings in SQL Server.
Two other data types used for storing large texts are CLOB and NCLOB. LOB is an acronym for Large Objects and in this case the C stands for characters, and N as in NVARCHAR identifies the National Character set for Unicode data. These datatypes are quite handy to store a large amount of data and the limit in this case is 4 GB. These can be very tricky and generally speaking not good for performance, as they are stored in separate tablespaces and also require special handling from the application side.
Let’s add the same table as in SQL Server.
CREATE TABLE CHINOOK.CUSTOMERMESSAGES (MESSAGEID NUMBER NOT NULL ENABLE, CUSTOMERID NUMBER NOT NULL ENABLE, INSERTDATE date NOT NULL ENABLE, UPDATEDATE date, MESSAGECONTENT varchar2(4000), CONSTRAINT PK_MESSAGE PRIMARY KEY (MESSAGEID)); ALTER TABLE CHINOOK.CUSTOMERMESSAGES add CONSTRAINT FK_CUSTOMERID FOREIGN KEY (CUSTOMERID) REFERENCES CHINOOK.CUSTOMER (CUSTOMERID) ENABLE;
The structure and data types of the table are the same as in SQL Server, but notice that the MESSAGECONTENT column is limited to 4000 bytes (characters in our case), now letís try to change it to 5000 like in SQL Server.
ALTER TABLE CHINOOK.CUSTOMERMESSAGES MODIFY (MESSAGECONTENT VARCHAR2(5000 BYTE));
As you can see we have reached a limit, but starting with version 12c of Oracle there is a solution, as I hinted before, that allows us to raise the limit of VARCHAR2 and NVARCHAR2 data types to 32767 bytes.
First of all letís check the MAX_STRING_SIZE parameter that I mentioned before.
SHOW PARAMETER MAX_STRING_SIZE
As expected the parameter is set to the default which is STANDARD, in order to raise the limit of VARCHAR2 and NVARCHAR2 to 32767 bytes we need to set this initialization parameter to EXTENDED.
Before starting the procedure to change this initialization parameter we need to check the COMPATIBLE parameter that should be set to 184.108.40.206 or higher in order to do this procedure.
SELECT name, value, description FROM v$parameter WHERE name = 'compatible';
Now that we are sure, we can go ahead but must keep in mind that this procedure will require at least a couple of restarts of the database and two external conversion scripts to be run.
I will show the steps for changing the parameter on a single PDB, for an explanation of CDB/PDB Multitenant Database architecture in Oracle please refer to my previous tip: Relational Database Comparison between SQL Server, Oracle and PostgreSQL.
The first step is to shut down the PDB and reopen it in UPGRADE (or migrate) mode, that can be done connecting in ssh to the host, then connecting to the Container DB or CDB using sqlplus like I did in this previous tip: SQL Identity Column Gaps and Conversion in SQL Server, Oracle and PostgreSQL and issuing:
We can now issue the command for opening it in migrate mode:
We can then connect to the PDB and change the initialization parameter:
We check the parameter:
Now we can run these two scripts for the conversion and recompile objects in our database:
Both should end with this statement:
Now we are ready to restart the database (PDB):
And again check the parameter:
Now we can try again to modify the column MESSAGECONTENT to 5000 bytes:
ALTER TABLE CHINOOK.CUSTOMERMESSAGES MODIFY (MESSAGECONTENT VARCHAR2(5000 BYTE));
And checking the table in Oracle Developer we can see that now we have a column of 5000 bytes:
Remember that the limit is 32767 bytes so VARCHAR2(32767) is the maximum that we can now define.
ALTER TABLE CHINOOK.CUSTOMERMESSAGES MODIFY (MESSAGECONTENT VARCHAR2(32767 BYTE) );
In PostgreSQL we have two data types that are used for storing large texts: VARCHAR(N) and TEXT.
Here we have a couple of differences from SQL Server and Oracle: first the N is the number of character and not bytes, moreover we can omit this number and at that point is equivalent to VARCHAR(MAX) in SQL Server and to the TEXT data type. In this case the limit of characters is not specified in the official documentation of PostgreSQL, however there is this information: "In any case, the longest possible character string that can be stored is about 1 GB." In fact googling around there is this answer in StackExchange shows limit of 1 GB in PostgreSQL source code.
So now we can try to create the same table as in the other two RDBMS:
CREATE TABLE IF NOT EXISTS public."CustomerMessages" ( "MessageId" integer not null, "CustomerId" integer NOT NULL, "InsertDate" timestamp without time zone NOT NULL, "UpdateDate" timestamp without time zone, "MessageContent" varchar(5000), CONSTRAINT "PK_Message" PRIMARY KEY ("MessageId") ) ALTER TABLE public."CustomerMessages" ADD CONSTRAINT "FK_CustomerMessages_Customer" FOREIGN KEY("CustomerId") REFERENCES public."Customer" ("CustomerId") MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION;
Please notice the clause IF NOT EXISTS in the CREATE TABLE that is very handy!
Now since we saw that 5000 characters is not enough we will modify the column taking out the limit so we can store the maximum amount in this column:
ALTER TABLE "CustomerMessages" ALTER COLUMN "MessageContent" type varchar
That’s it for PostgreSQL!
IIn this tutorial we have reviewed the data types commonly used in to store big strings in the SQL Server, Oracle and PostgreSQL. We’ve seen the different limits and syntax as well as a modification in the initialization parameters of the database in order to have a bigger limit in Oracle.
- Here are links to the official documentation:
- Some links to other tips regarding string data types:
About the author
View all my tips
Article Last Updated: 2022-01-31