Optimal Storage of IP addresses in a SQL Server database
Often databases are used with web-based interfaces and recording the IP address of the end user can assist with debugging, marketing, bandwidth planning and collation selection to name a few. In a scenario where each page access is logged, is there an optimal way to store IP addresses?
This tutorial will work through two obvious options and an obscure one on different ways to store IP addresses in your database.
The first option is using a varchar(15):
- From CGI variables, the four octets are separated with periods (.) which form a string.
- Writing this string as a varchar to the database is straight-forward.
- The simplicity has some tradeoffs.
- Some space is under-utilized whenever an octet is less than 100, and the mask always uses three bytes. (188.8.131.52 uses 12 characters.)
- Octets need to parsed using either SUBSTRING or CHARINDEX to sort and search by geographic location (see here).
The second option is using four tinyints:
- Octets range between 0 and 255 making them perfect matches for tinyints and therefore storage space is perfectly utilized.
- Some parsing will need to be done when writing the data, but the benefits are many.
- Because they are numbers, sorting will work as expected.
- A clustered index can be created on all four fields to improve performance.
A view can be created to display the original string...
CREATE VIEW V_IP_ADDRESS AS SELECT IP_1, IP_2, IP_3, IP_4, CAST(IP_1) AS VARCHAR + '.' + CAST(IP_2) AS VARCHAR + '.' + CAST(IP_3) AS VARCHAR + '.' + CAST(IP_4) AS VARCHAR AS IP_ALL FROM IP_ADDRESS
...as can a computed column.
CREATE TABLE IP_ADDRESS ( IP_1 TINYINT NOT NULL, IP_2 TINYINT NOT NULL, IP_3 TINYINT NOT NULL, IP_4 TINYINT NOT NULL, IP_ALL AS CAST(IP_1 AS VARCHAR(3)) + '.' + CAST(IP_2 AS VARCHAR(3)) + '.' + CAST(IP_3 AS VARCHAR(3)) + '.' + CAST(IP_4 AS VARCHAR(3)) PERSISTED, CONSTRAINT PK_IP PRIMARY KEY (IP_1, IP_2, IP_3, IP_4) )
Note that this example creates a clustered index on the four IP octets immediately. Another option is an autonumber Primary Key (PK) to start, then dropping that and creating a clustered index on the IP segments later.
The third option is using four binary(1)s:
- For completeness, four binary(1)s use the same amount of disk space as four tinyints, but their use is more complex.
- Writing will require parsing and casting.
- To write, each octet needs to be parsed and casted [to 0x__] using this function master.dbo.fn_varbintohexstr(): master.dbo.fn_varbintohexstr(cast (IP_1 as varbinary)).
- A computed column or view will need to cast to tinyint then varchar to return the original address.
- The main benefit of this approach is to obfuscate anyone viewing the database directly, and chances are if they are viewing your data they can probably also read hexadecimal.
CREATE TABLE IP_ADDRESS_BIN ( IP_1 binary(1) NOT NULL, IP_2 binary(1) NOT NULL, IP_3 binary(1) NOT NULL, IP_4 binary(1) NOT NULL, IP_ALL AS CAST(cast(IP_1 as tinyint) AS VARCHAR(3)) + '.' + CAST(cast(IP_2 as tinyint) AS VARCHAR(3)) + '.' + CAST(cast(IP_3 as tinyint) AS VARCHAR(3)) + '.' + CAST(cast(IP_4 as tinyint) AS VARCHAR(3)) PERSISTED, CONSTRAINT PK_IP_BIN PRIMARY KEY (IP_1, IP_2, IP_3, IP_4) )
Here is a comparison of the three options outlined above:
|varchar(15)||15 b||Verbatim||Verbatim||Parse needed||Highest disk space needed||Readability, ease to insert|
|four tinyints||4 b||Parse and one CAST||View or computed column||Native||Sew together with cast to recreate string||Ideal storage size, sorting, performance|
|four binary(1)s||4 b||Parse and two CASTs||View or computed column||Requires two casts to sort and read, code is harder to read/maintain||Ideal storage size, obscure values||Obfuscate the data|
- Consider migrating IP addresses into their own table with an autonumber PK and link using foreign keys; this will save space.
- Read about VIEWs and COMPUTED COLUMNs.
About the author
View all my tips
Article Last Updated: 2011-11-28