Saturday, August 1, 2009

Query For An IP In A Database

Overview



Below are several ways to query for an IP within a database and the shortcomings of each one. I'll assume that you have a PHP Web page where you would like to get the visitor's IP and look for that IP within your MySQL database. Also, I'll assume that you are able to query your database.



What Is An IP?



IP Addresses are 32-bit values which are typically represented as 8-bit quadruplets or "dotted quads" (IPv4). It consists of four unsigned integers separated by three dots. Each integer is less than or equal to 255. For example, one of the IPs that Google has registered would be 216.239.32.0 as seen on ARIN WHOIS



The format of an IP is [0-255].[0-255].[0-255].[0-255] and there are 4,294,967,296 (2^32) possible combinations (256*256*256*256), but some are reserved for private networks or multicast networks. Any user may use the reserved blocks within their network. For simplicity sake, any IP that is not in the table below should be considered external to your network.









Private Network IPs
IP address rangelargest CIDR block (subnet mask)
10.0.0.0 - 10.255.255.25510.0.0.0/8
172.16.0.0 - 172.31.255.255 172.16.0.0/12
192.168.0.0 - 192.168.255.255192.168.0.0/16


Getting A Visitor's IP Using PHP



If you don't already know how to get a visitor's IP, then here's the PHP code to do that. Someone else came up with the PHP code, so I can't take credit for it. Keep in mind that many computers within the visitor's internal network can share this external IP. Note, the IP could also be a computer from within your internal network connecting to the Web server which is also within your internal network given that your network configurations permit that computer to connect to the Web server using an internal IP. This type of configuration is common as it reduces the external traffic. See the Private Network IPs above to see if it is an internal IP.




PHP Code:
function getRealIpAddr() {
if(!empty($_SERVER['HTTP_CLIENT_IP'])) {
$ip=$_SERVER['HTTP_CLIENT_IP']; // share internet
} elseif(!empty($_SERVER['HTTP_X_FORWARDED_FOR'])) {
$ip=$_SERVER['HTTP_X_FORWARDED_FOR']; // pass from proxy
} else {
$ip=$_SERVER['REMOTE_ADDR'];
}
return $ip;
}
$ip = getRealIpAddr(); // Get the visitor's IP

$ip = $_SERVER['REMOTE_ADDR']; // Simple version


IP Lookup Using An IP Address



The simplest way to lookup an IP in a database is to store the IP as it is. If you are looking for "216.239.32.0", then store that IP in the database.




Database Table "ips":
ip [char(15)]
216.239.32.0 <-returned
216.239.32.1
216.239.32.255

SQL Statement:
SELECT * FROM ips WHERE ip = '216.239.32.0'
ORDER BY ip DESC
LIMIT 1;


Unfortunately, if you want to lookup multiple IPs, you will need a record for each IP. That could be a lot considering that there are a few billion IPs out there.



IP Lookup Using A Reverse LIKE On An IP Range



If you want to lookup a range of IPs, you can perform a reverse LIKE comparison where a "%" character (wildcard) is stored in the database as part of the IP representing a range. The following searches for an IP between 64.233.160.0 and 64.233.169.255 using the IP with the third octave of "16%". The wildcard, in this case, means that the remainder of the third octave and the entire forth octave can be anything.




Database Table "ips":
ip [char(15)]
216.239.32.%
70.89.39.1%
64.233.16% <-returned

SQL Statement:
SELECT * FROM ips
WHERE '64.233.160.0' LIKE ip
ORDER BY ip DESC
LIMIT 1;


Unfortunately, IPs aren't always registered in octaves and so one company may own a portion of the range while other companies may own the rest. Such is the case for one of Google's blocks. ARIN WHOIS shows that Google owns 64.124.112.[24 - 31], but Google doesn't own 64.124.112.[0 - 23] or 64.124.112.[32 - 255]. You would have to use "64.124.112.2%" and "64.124.112.3%" to better cover that IP range which still includes some unwanted IPs in the range.



What Is CIDR Notation?



A range of IPs can be written out using CIDR notation so it looks like IP/CIDR. I showed a few examples in the "Private Network IPs" table above. To understand CIDR notation, you first need to understand how a computer sees an IP which is in bits or a series of 1s and 0s (on and off bits). There are 32 bits total in an IP and if you divide that by four, you get four sets of eight or four octaves. The CIDR number basically says that all the IPs in the CIDR block share a certain number of bits from the left, so a CIDR of 24 means that the last 8 bits of the IP can be anything which is the entire last octave (8 bits). If you want to learn more, then read about CIDR notation.



To use CIDR notation in our database example, you need to get the "Subnet ID" of the IP range which you can quickly do using a CIDR Calculator.



Use the calculator and entier in an IP of 192.168.0.1 and set the "Mask Bits" to 24. The "Subnet ID" and "Mask Bits" are what need to be stored in the database which can be found in the "Net: CIDR Notation" field which are 192.168.0.0 and 24. Notice if you change the IP to 192.168.0.255, the "Subnet ID" doesn't change. That's because IPs 192.168.0.[0-255] belong to the 24 CIDR block. Remember how I kept referring to the last octave, well it is a series of eight bits where 0 is "00000000" and 255 is "11111111" in binary format. An equation to remember is 2^(32-m) where m = the "Mask Bits", so 2^(32-24) = 256 possible IPs or Hosts in this CIDR block. Check out the CIDR table in the CIDR link I provided above.



I seem to be getting too deep into this, so let's just assume you will be using the calculator and move on. The last thing you need to remember is that the CIDR number is an unsigned integer between 0 and 32 where 0 would be all IPs and 32 would be one IP. CIDR blocks of 32 and 31 aren't generally used on networks as those blocks only have one and two IPs respectively and defeat the purpose behind using a CIDR block. Also, the starting and ending IPs in a CIDR block cannot be used by a host as they are reserved for other reasons. However, these two IPs still belong to that CIDR block and so this doesn't change anything in our database lookup. We will use the 31 and 32 CIDR blocks in our database in those cases where we only need one or two IPs.



IP Lookup Using An IP Subnet ID And A CIDR Value



The key to using the CIDR notation in the database query is that you have to convert the IP (Subnet ID) to a base 10 number which is the counting system that we use in our every day lives. Also you have to calculate the last IP in the range using the CIDR value. There are two functions you can use in SQL to convert an IP to a base 10 number and back again. Those functions are inet_aton() and inet_ntoa() where "aton" means address to number and visa versa. In addition, the pow() function will be used to calculate the number of IPs in that CIDR block. Here's a quick breakup of the SQL equations:




Start IP = subnet
End IP = inet_ntoa(inet_aton(subnet) + pow(2,32-cidr) - 1)



Database Table "ips":
subnet [char(15)] cidr [tinyint(4)]
216.239.32.8 29
70.89.39.0 24
64.233.16.4 30 <-returned

SQL Statement:
SELECT * FROM ips
WHERE '64.233.16.6' BETWEEN
subnet AND INET_NTOA(INET_ATON(subnet) + POW(2,32-cidr) - 1)
ORDER BY subnet DESC
LIMIT 1;

SQL Statement (Optimized):
SELECT * FROM ips
WHERE '64.233.16.6' >= subnet
AND '64.233.16.6' <= INET_NTOA(INET_ATON(subnet) + POW(2,32-cidr) - 1)
ORDER BY subnet DESC
LIMIT 1;


How does this work? The SQL statement returned the record where 64.233.16.6 (visitor's IP) was between 64.233.16.4 and 64.233.16.7. Since the subnet and CIDR values were known, we could calculate the last IP in the CIDR block. The equation pow(2,32-cidr) gives the number of IPs in this CIDR block which in this case was 2^(32-30) or 4. Given that the subnet is the starting IP, we needed to go up three more IPs to get a total of four IPs which is why the "minus 1" is there. Converting from IP to a numeric value and back again makes this arithmetic easy.



Unfortunately, this method has the same kind of flaws that the previous "reverse LIKE" lookup method had. There may be cases where you want the results we obtained from the above query as well as a few more IPs above it such as 64.233.16.4 to 64.233.10. Since the total number of IPs in the CIDR block is calculated by 2 to the power of something, having 7 IPs would only work if you had multiple records in the database such as:



64.233.16.4/30 (4 IPs - 64.233.16.4 to 64.233.16.7)
64.233.16.8/31 (2 IPs - 64.233.16.8 to 64.233.16.9)
64.233.16.10/32 (1 IP - 64.233.16.10 to 64.233.16.10)


IP Lookup Using A Starting And Ending IP Address



Just get the visitor's IP, the starting IP, and the ending IP and convert everything to a base 10 integer and do your query.




Database Table "ips":
ip_start [char(15)] ip_end [char(15)]
216.239.32.8 216.239.32.15
70.89.39.0 70.89.39.255
64.233.16.4 64.233.16.10 <-returned

SQL Statement:
SELECT * FROM ips
WHERE INET_ATON('64.233.16.6') BETWEEN
INET_ATON(ip_start) AND INET_ATON(ip_end)
ORDER BY ip_start DESC, ip_end DESC
LIMIT 1;


SQL Statement (Optimized):
SELECT * FROM ips
WHERE INET_ATON('64.233.16.6') >= INET_ATON(ip_start)
AND INET_ATON('64.233.16.6') <= INET_ATON(ip_end)
ORDER BY ip_start DESC, ip_end DESC
LIMIT 1;


This seems to better cover the issues that the preceding methods had with IPs outside of a range or CIDR block. The only problem is that there are a lot of IP conversions being done. I think the query would be simpler if the IPs were converted to their numeric values before the lookup is done.



Converting A Visitor's IP To A Numeric Value Using PHP



To convert an IP to a numeric value in PHP, you need to use the ip2long() function with some extra code to convert it to an unsigned integer. I found two methods to doing this. Someone else came up with the PHP code, so I can't take credit for it.




PHP Code:
// Converts 64.233.16.6 to 1089015814
$ip_aton = sprintf("%u", ip2long($ip));
$ip_aton = (substr($ip,0,3)>127) ? ((ip2long($ip) & 0x7FFFFFFF) + 0x80000000) : ip2long($ip);


IP Lookup Using A Starting And Ending IP Numeric Value



Using the previous method, let's try converting the IPs before the lookup is done. To store the numeric value in the database, simply do an INSERT query using the inet_aton() function. Keep in mind that there are 4,294,967,296 possible IPs, so set the type to an unsigned INT which has a range of 0 to 4,294,967,295 according to the MySQL Reference Manual. Perfect for what we need. You can also store the readable IPs in the database for reference.




Database Table "ips":
ip_start_aton [int(10)] ip_end_aton [int(10)]
3639549960 3639549967
1180247808 1180248063
1089015812 1089015818 <-returned

SQL Statement:
SELECT * FROM ips
WHERE 1089015814 BETWEEN ip_start_aton AND ip_end_aton
ORDER BY ip_start_aton DESC, ip_end_aton DESC
LIMIT 1;

SQL Statement (Optimized):
SELECT * FROM ips
WHERE 1089015814 >= ip_start_aton
AND 1089015814 <= ip_end_aton
ORDER BY ip_start_aton DESC, ip_end_aton DESC
LIMIT 1;


Doing it this way, you can have a starting IP of 0.0.0.0 and an ending IP of 255.255.255.255 covering all 4,294,967,296 IPs. This is overkill as networks aren't that large.



IP Lookup Using A Starting IP Numeric Value And A Range Number



Using the previous method, let's remove the oversized ending IP numeric value and replace it with a more convenient value like an "add range" value to calculate the ending IP. For my purposes, I am going to assume the IP ranges I am looking for won't be more then 255 IPs from the starting IP and so I will set the type to an unsigned TINYINT which has a range of 0 to 255 according to the MySQL Reference Manual. Feel free to change the type if you need a larger range. Remember that the range is added to the starting IP numeric value and so setting the range to 0 means that there is no range and so there is only one IP.




Database Table "ips":
ip_start_aton [int(10)] ip_add_range [tinyint(3)]
3639549960 7
1180247808 255
1089015812 6 <-returned

SQL Statement:
SELECT * FROM ips
WHERE 1089015814 BETWEEN
ip_start_aton AND ip_start_aton + ip_add_range
ORDER BY ip_start_aton DESC
LIMIT 1;

SQL Statement (Optimized):
SELECT * FROM ips
WHERE 1089015814 >= ip_start_aton
AND 1089015814 <= ip_start_aton + ip_add_range
ORDER BY ip_start_aton DESC
LIMIT 1;



This seems to be the most optimized solution that covers all the scenarios mentioned above plus some.



Resources