Optimal Storage of IP addresses in a SQL Server database

By:   |   Comments (12)   |   Related: > Database Design


Problem

Often databases are used with web-based interfaces and recording the IP address of the end user can assist with debugging, marketing, bandwidth planning and collation selection to name a few. In a scenario where each page access is logged, is there an optimal way to store IP addresses?

Solution

This tutorial will work through two obvious options and an obscure one on different ways to store IP addresses in your database.


The first option is using a varchar(15):

  • From CGI variables, the four octets are separated with periods (.) which form a string.
  • Writing this string as a varchar to the database is straight-forward.
  • The simplicity has some tradeoffs.
    • Some space is under-utilized whenever an octet is less than 100, and the mask always uses three bytes. (101.100.99.9 uses 12 characters.)
    • Octets need to parsed using either SUBSTRING or CHARINDEX to sort and search by geographic location (see here).

The second option is using four tinyints:

  • Octets range between 0 and 255 making them perfect matches for tinyints and therefore storage space is perfectly utilized.
  • Some parsing will need to be done when writing the data, but the benefits are many.
    • Because they are numbers, sorting will work as expected.
    • A clustered index can be created on all four fields to improve performance.

A view can be created to display the original string...

CREATE VIEW V_IP_ADDRESS AS 
SELECT IP_1, IP_2, IP_3, IP_4, 
 CAST(IP_1) AS VARCHAR + '.' + 
 CAST(IP_2) AS VARCHAR + '.' + 
 CAST(IP_3) AS VARCHAR + '.' + 
 CAST(IP_4) AS VARCHAR 
 AS IP_ALL 
FROM IP_ADDRESS

...as can a computed column.

CREATE TABLE IP_ADDRESS ( 
 IP_1 TINYINT NOT NULL, 
 IP_2 TINYINT NOT NULL, 
 IP_3 TINYINT NOT NULL, 
 IP_4 TINYINT NOT NULL, 
 IP_ALL AS 
 CAST(IP_1 AS VARCHAR(3)) + '.' + 
 CAST(IP_2 AS VARCHAR(3)) + '.' + 
 CAST(IP_3 AS VARCHAR(3)) + '.' + 
 CAST(IP_4 AS VARCHAR(3)) 
 PERSISTED, 
 CONSTRAINT PK_IP PRIMARY KEY (IP_1, IP_2, IP_3, IP_4) 
)

Note that this example creates a clustered index on the four IP octets immediately. Another option is an autonumber Primary Key (PK) to start, then dropping that and creating a clustered index on the IP segments later.


The third option is using four binary(1)s:

  • For completeness, four binary(1)s use the same amount of disk space as four tinyints, but their use is more complex.
  • Writing will require parsing and casting.
    • To write, each octet needs to be parsed and casted [to 0x__] using this function master.dbo.fn_varbintohexstr(): master.dbo.fn_varbintohexstr(cast (IP_1 as varbinary)).
    • A computed column or view will need to cast to tinyint then varchar to return the original address.
  • The main benefit of this approach is to obfuscate anyone viewing the database directly, and chances are if they are viewing your data they can probably also read hexadecimal.
CREATE TABLE IP_ADDRESS_BIN ( 
 IP_1 binary(1) NOT NULL, 
 IP_2 binary(1) NOT NULL, 
 IP_3 binary(1) NOT NULL, 
 IP_4 binary(1) NOT NULL, 
 IP_ALL AS 
  CAST(cast(IP_1 as tinyint) AS VARCHAR(3)) + '.' + 
  CAST(cast(IP_2 as tinyint) AS VARCHAR(3)) + '.' + 
  CAST(cast(IP_3 as tinyint) AS VARCHAR(3)) + '.' + 
  CAST(cast(IP_4 as tinyint) AS VARCHAR(3)) 
 PERSISTED, 
 CONSTRAINT PK_IP_BIN PRIMARY KEY (IP_1, IP_2, IP_3, IP_4) 
)

Here is a comparison of the three options outlined above:

Comparison Table
Datatype Storage Writing Displaying Sorting Drawback(s) Benefit(s)
varchar(15) 15 b Verbatim Verbatim Parse needed Highest disk space needed Readability, ease to insert
four tinyints 4 b Parse and one CAST View or computed column Native Sew together with cast to recreate string Ideal storage size, sorting, performance
four binary(1)s 4 b Parse and two CASTs View or computed column Requires two casts to sort and read, code is harder to read/maintain Ideal storage size, obscure values Obfuscate the data
Next Steps
  • Consider migrating IP addresses into their own table with an autonumber PK and link using foreign keys; this will save space.
  • Read about VIEWs and COMPUTED COLUMNs.


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Brent Shaub Brent Shaub has been creating, renovating and administering databases since 1998.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Sunday, January 26, 2020 - 10:18:55 AM - Johannes Back To Top (83989)

Hi!

IPv4 addresses are 32bit unsigned integers. So store them in a database unsinged long is the way to go. The four byte dotted notation is ideal for human beings, not computers. Consider using functions to translate unsinged longs from and to four byte dotted notation. Python and PHP both have function that do so.


Thursday, December 8, 2011 - 1:40:27 AM - James Back To Top (15315)

For MS SQL, your best bet is to store IP addresses in IPv6 format (BINARY(16)), and stuff IPv4 addresses appropriately.  This is especially true on SQL 2008 and later, as you can use filtered indexes to have an index specifically to find IPv4 addresses rapidly.


Wednesday, December 7, 2011 - 10:54:33 AM - Scott C Back To Top (15310)

The code I posted earlier (32-bit INT version) was mostly an attempt to improve on using BIGINT to store the addresses simply to avoid dealing with negative integers.  It works if you only have IPv4 addresses to deal with, and the sort order doesn't matter if you only need an index or key to lookup values and ensure uniqueness.  If the sort order for negative addresses is an issue, use BINARY(4) instead.  I think using four separate byte columns would be less efficient to sort, especially if they are nullable.

I would certainly recommend allowing for IPv6 addresses in any new code, but there will be a lot of legacy code that can't deal with them and it doesn't hurt to have some functions available to handle IPv4 addresses is the most efficient way possible.

IPv6 addresses are defined as 128-bit values by the way, so I have to question the sanity of anyone who suggests storing them as NVARCHAR(50).  (Hexadecimal isn't good enough for you, you want to use BASE 65536 representation and need Unicode digits?)  The preferred format may change, but the most obvious storage types are BINARY(16) or two BIGINT/BINARY(8) (64-bit network prefix and 64-bit host address).

Assuming BINARY(16) is used, here are some text conversion functions for the standard representation (according to Wikipedia).  If other variations on the format are desired, at some point you have to fall back on letting the database store/retrieve 16 bytes and put the formatting in the client apps.

DECLARE @ipstr VARCHAR(50) = '2001:db8:85a3::00:8a2e:370:7334'
SELECT  CAST('' AS XML).value('xs:hexBinary( sql:column("y.ips") )', 'binary(16)')
FROM (
    SELECT  ips = (SELECT RIGHT('0000' + SUBSTRING(ips, number, CHARINDEX(':', ips, number) - number), 4)
                   FROM master.dbo.spt_values,
                        ( SELECT ips = ':' + @ipstr + ':') x
                   WHERE type = 'P' AND number BETWEEN 2 AND LEN(ips)
                         AND SUBSTRING(ips, number-1, 1) = ':'
                   FOR XML PATH('')) ) y
-- 0x20010DB885A3000000008A2E03707334
GO
DECLARE @ip BINARY(16) = 0x20010DB885A3000000008A2E03707334
SELECT STUFF((
    SELECT ':' + CAST('' AS XML).value('xs:hexBinary( sql:column("x.p") )', 'char(4)')
    FROM (
        SELECT  p = SUBSTRING(@ip, 2*number+1, 2)
        FROM master.dbo.spt_values
        WHERE type = N'P' AND number BETWEEN 0 AND 7
    ) x   
    FOR XML PATH('')), 1, 1, '')
-- 2001:0DB8:85A3:0000:0000:8A2E:0370:7334

There is also a hybrid IPv4-IPv6 format that extends an IPv4 address with 80 zero bits and 16 one bits.  The text-to-binary routine can be extended to handle this, anyone who wants to also output this format is free to write their own.

DECLARE @ipstr VARCHAR(50) = '::ffff:192.0.2.128'
DECLARE @ipv6 BINARY(16)
IF @ipstr LIKE '::ffff:%.%.%.%'
    SELECT  ipv6 = 0x00000000000000000000FFFF + [1] + [2] + [3] + [4]
    FROM (
        SELECT  byte = CAST(CAST(SUBSTRING(ips, number, CHARINDEX('.', ips, number) - number) AS TINYINT) AS BINARY(1)),
                rn = ROW_NUMBER() OVER (ORDER BY number)
        FROM master.dbo.spt_values,
            ( SELECT ips = '.' + RIGHT(@ipstr, CHARINDEX(':', REVERSE(@ipstr))-1) + '.' ) x
        WHERE type = 'P' AND number BETWEEN 2 AND LEN(ips) AND SUBSTRING(ips, number-1, 1) = '.'
    ) b
    PIVOT (MAX(byte) FOR rn IN ([1],[2],[3],[4])) p
ELSE
    SELECT  @ipv6 = CAST('' AS XML).value('xs:hexBinary( sql:column("y.ips") )', 'binary(16)')
    FROM (
        SELECT  ips = (SELECT    RIGHT('0000' + SUBSTRING(ips, number, CHARINDEX(':', ips, number) - number), 4)
        FROM master.dbo.spt_values,
            ( SELECT ips = ':' + @ipstr + ':') x
        WHERE type = 'P' AND number BETWEEN 2 AND LEN(ips) AND SUBSTRING(ips, number-1, 1) = ':'
        FOR XML PATH('')) ) y
SELECT @ipv6
-- 0x00000000000000000000FFFFC0000280

Wednesday, December 7, 2011 - 10:49:43 AM - Robert Back To Top (15309)

You bring up a good point for storage space for IP addresses.  I had never previously thought of splitting the IP address into its octects to store.  It allows for easy sorting.

 

Personally, I favor hex for working with IP addresses.  Generally, the applications that I am storing and using IP addresses, I need to work with other fundimentals of IP addressing, like subnets.  If the application uses hex for all its internal functions and only displays as an IP address for displaying to the user, the number of type conversions is reduced, which improves performance.  In my opinion, a little bit of extra disk space utilization is a good trade off for perofrmance.


Wednesday, December 7, 2011 - 3:27:24 AM - Remi Back To Top (15306)

Let's create another painful bug for the next 10 years,

 

To store IP adress, there is only one way : nvarchar(50), it's an adress, its format may change from one year to an other, if you need to make some special treatment on IPv4 adresses, just create a temp table, but for storage, you shouldn't over engineer like that?


Wednesday, December 7, 2011 - 1:34:51 AM - Drew Back To Top (15305)

Hi Wilfred,

My code below takes your 4 binary(1)'s example and merges them into one binary(4) field. I think it has an advantage over Scott's int example in that it will still sort correctly as-is. I also think it being viewed as hex in the table is a bit more readable that a negative int.

Let me know what you guys think.

Drew

 

create function dbo.ip2bin_drew(@ip as varchar(15)) returns binary(4) as
begin
declare @bin binary(4)
set @bin =convert(binary(1),convert(tinyint,parsename(@ip,4))) +
convert(binary(1),convert(tinyint,parsename(@ip,3))) +
convert(binary(1),convert(tinyint,parsename(@ip,2))) +
convert(binary(1),convert(tinyint,parsename(@ip,1)))
return @bin
end
go
 
create function dbo.bin2ip_drew(@bin as binary(4)) returns varchar(15) as
begin
declare @ip varchar(15)
set @ip =convert(varchar(3),(convert(tinyint,substring(@bin,1,1)))) + '.' +
convert(varchar(3),(convert(tinyint,substring(@bin,2,1)))) + '.' +
convert(varchar(3),(convert(tinyint,substring(@bin,3,1)))) + '.' +
convert(varchar(3),(convert(tinyint,substring(@bin,4,1))))
return @ip
end
go
 
declare @ip varchar(15), @bin binary(4)
set @ip = '74.125.237.114'
set @bin = dbo.ip2bin_drew(@ip)
 
select @ip, @bin, dbo.bin2ip_drew(@bin)

Wednesday, December 7, 2011 - 1:28:28 AM - tommyh Back To Top (15304)

Your writing an articel at the end of 2011 for IPv4?!? Seriously if one is in the works off storing IPadresses today it should surely include support for IPv6. Its the 2000 problem all over.

 

/T


Wednesday, November 30, 2011 - 2:53:53 AM - Wilfred van Dijk Back To Top (15243)

Couldn't find the credits for my suggestion (probably from Scott). However in my enthousiasm to reformat the T-SQL some things got mixed up.

Thanks Scott for the INT version of these functions.

Wilfred

 


Monday, November 28, 2011 - 10:36:51 AM - Brent Shaub Back To Top (15225)

Hi Scott,

I was looking at the max/min of int and thinking it would handle all IPs, but wasn't sure how to get the computer to agree.

Using just the right amount of space needed to get the job done, especially if the system will scale in volume, is prudent and elegant.  Your second point that bigint can store invalid data while int could not is also sound.

I'm confused by the PARSENAME() function, but I was able to get both functions to work on the first go.  Addresses 100.100.100.100, 128.100.100.100 and 255.100.100.100 all returned integers, and those integers returned the same IPs.

This is true bit-shifting, and I find the code readable.  Perhaps a VIEW with a bigint column could create a human-readable address from the four-byte integer that wraps IPs starting with 128 and over.  That or a view that uses dbo.INT2IP.

This solution may be ideal by having one column rather than four, while using just four bytes as well.  I took the code above and popped it into Notepad to have those two long lines display.

Thank you for posting, Scott.

- Brent

CREATE FUNCTION dbo.IP2INT_scott(@ip AS CHAR(15)) RETURNS INT AS
BEGIN
    DECLARE @b BIGINT = 256 ;
   
    RETURN CAST((((PARSENAME(@ip,4) * @b
   + PARSENAME(@ip,3)) * @b
                 + PARSENAME(@ip,2)) * @b
                 + PARSENAME(@ip,1)) AS BINARY(4))
END
GO

CREATE FUNCTION dbo.INT2IP_scott(@ip AS INT) RETURNS CHAR(15) AS
BEGIN
    DECLARE @bip BIGINT = @ip & CAST(0xFFFFFFFF AS BIGINT) ;
    RETURN LTRIM(@bip / 0x1000000 & 0xFF) + '.' +
    LTRIM(@bip / 0x10000 & 0xFF) + '.' +
    LTRIM(@bip / 0x100 & 0xFF) + '.' +
    LTRIM(@bip & 0xFF)
END

select dbo.IP2INT_scott('255.100.100.100')
select dbo.INT2IP_scott(-10197916)


Monday, November 28, 2011 - 9:56:54 AM - Scott C Back To Top (15222)

I would prefer to store IP addresses as INT rather than BIGINT, why waste the extra 4 bytes?  Of course you have to use BIGINT values in the conversion functions to avoid arithmetic overflow errors, but you only need INT for storage.  All addresses with leading octets of 128 or higher will appear as negative integers and will affect sorting, but they are still perfectly valid for use in indexes and primary keys.

CREATE FUNCTION dbo.IP2INT(@ip AS CHAR(15)) RETURNS INT AS
BEGIN
    DECLARE @b BIGINT = 256 ;
    RETURN CAST((((PARSENAME(@ip,4) * @b
            + PARSENAME(@ip,3)) * @b
            + PARSENAME(@ip,2)) * @b
            + PARSENAME(@ip,1)) AS BINARY(4))
END
GO
CREATE FUNCTION dbo.INT2IP(@ip AS INT) RETURNS CHAR(15) AS
BEGIN
    DECLARE @bip BIGINT = @ip & CAST(0xFFFFFFFF AS BIGINT) ;
    RETURN LTRIM(@bip / 0x1000000 & 0xFF) + '.' + LTRIM(@bip / 0x10000 & 0xFF) + '.' + LTRIM(@bip / 0x100 & 0xFF) + '.' + LTRIM(@bip & 0xFF)
END

You can use BIGINT if you want to see human-readable integers that sort correctly such as 100100100100, and don't mind the extra space required.  I'm not comfortable with this solution because INT can store all valid IP addresses, while BIGINT allows 4,294,967,295 invalid values for every valid one (total of 18,446,744,069,414,584,320 invalid values).

CREATE FUNCTION dbo.IP2BIGINT(@ip AS CHAR(15)) RETURNS BIGINT AS
BEGIN
    DECLARE @b BIGINT = 1000 ;
    RETURN (((PARSENAME(@ip,4) * @b
            + PARSENAME(@ip,3)) * @b
            + PARSENAME(@ip,2)) * @b
            + PARSENAME(@ip,1))
END
GO
CREATE FUNCTION dbo.BIGINT2IP(@ip AS BIGINT) RETURNS CHAR(15) AS
BEGIN
    RETURN LTRIM(@ip / 1000000000 % 1000) + '.' + LTRIM(@ip / 1000000 % 1000) + '.' + LTRIM(@ip / 1000 % 1000) + '.' + LTRIM(@ip % 1000)
END

Monday, November 28, 2011 - 7:45:49 AM - Brent Shaub Back To Top (15219)

Hi Wildred,

Thanks for the alternative.  A few comments in reply:

  1. I'm not getting the udf_INT2IP function's use of parsename() function.  It takes a sysname parameter and not a bigint.  Perhaps you're referring to user-defined parsing function that extracts octets from an entire IP and just bumped an existing function name?  It will return object name, schema name, database name and server name respectively.
  2. The storage space of a bigint is 8 bytes: larger than the article's options.  I'm unsure the benefit of this alternative, but I value discussing the multiple ways to move forward.
  3. Kudos for a creative solution.  I was curious to see this in action, and with a bigint parameter of 100100100100, the first function returns NULL and the second an arithmetic overflow for tinyint.  Did you test it before posting?  What values have you used, and what were the results?

I appreciate the brainstorming.  Why would you use this method over any of the others?  Storing the full IP in one field?

- Brent


Monday, November 28, 2011 - 3:41:27 AM - wilfred van dijk Back To Top (15215)

You could also use a BIGINT to store an IP address:

if exists (select 1 from information_schema.routines where routine_name = 'udf_INT2IP' and routine_schema='dbo' and routine_type='FUNCTION')

drop function [dbo].[udf_INT2IP];

go

 

create function dbo.udf_INT2IP(@IP AS bigint)

returns bigint

as

begin

return(

CONVERT(bigint, PARSENAME(@IP,1)) +

convERT(bigint, PARSENAME(@IP,2)) * 256 +

CONVERT(bigint, PARSENAME(@IP,3)) * 65536 +

cONVERT(bigint, PARSENAME(@IP,4)) * 16777216

)

end

go

 

if exists (select 1 from information_schema.routines where routine_name = 'udf_IP2INT' and routine_schema='dbo' and routine_type='FUNCTION')

drop function [dbo].[udf_IP2INT];

go

 

create function dbo.udf_IP2INT(@IP BIGINT)

returns varchar(15)

as

begin

DECLARE @Octet1 tinyint

DECLARE @Octet2 tinyint

DECLARE @Octet3 tinyint

DECLARE @Octet4 tinyint

DECLARE @RestOfIP bigint

 

SET @Octet1 = @IP / 16777216

SET @RestOfIP = @IP - (@Octet1 * 16777216)

SET @Octet2 = @RestOfIP / 65536

SET @RestOfIP = @RestOfIP - (@Octet2 * 65536)

SET @Octet3 = @RestOfIP / 256

SET @Octet4 = @RestOfIP - (@Octet3 * 256)

RETURN(CONVERT(varchar, @Octet1) + '.' +CONVERT(varchar, @Octet2) + '.' +CONVERT(varchar, @Octet3) + '.' +CONVERT(varchar, @Octet4))

end

go

 














get free sql tips
agree to terms