Friday, February 16, 2018

Regex: Valid IP Address

How to create a regular expression that will check if an IP is valid.

What is a valid IPv4 address? It is an address that has four parts each separated by a dot "." and each part having a value from 0 to 255. Example:


Since you have four repeating numeric ranges [0-255], lets start by focusing on that one range. First break the [0-255] numeric range into smaller numeric ranges that still fall into the larger numeric range that we can easily convert to regular expression ranges.

[0-9], [10-99], [100-199], [200-249], and [250-255]

Next, we need to convert these numeric ranges into regular expression ranges. Note, a regular expression (regex) range of [0-9] and \d mean the same thing.

Numeric       Regex
[0-9]      =  \d
[10-99]    =  [1-9]\d
[100-199]  =  1\d\d
[200-249]  =  2[0-4]\d
[250-255]  =  25[0-5]

Now, put the regex ranges together inside of round brackets "()" and place a "|" bar between each part to indicate "OR" conditions where one part of the condition makes the whole condition true. What you have now is a regular expression that allows for a numeric range of [0-255].


Next step is to allow four of these numeric ranges with a "." dot between them. To put a dot in regex, you will need to escape it "\." because a dot by itself "." means "any character" which could be non-numeric. Using our previous regular expression for [0-255], we repeat it four times and put a "\." between them.


This is quite long with the part for the ".[0-255]" numeric range repeating three times, so lets consolidate those pieces by placing one of them in round brackets "()" and adding curly brackets "{3}" to the end to indicate it should repeat three times.


One final step that we want to do is set it so the entire value has to match this regular expression by using the starting "^" and ending "$" characters. Note, some people replace the "^" and "$" characters with a word boundary "\b".


Here is an online regular expression checker.

Tuesday, February 7, 2012

Linux: Find Symbolic Links Full Paths

How to find all the symbolic links (soft links and hard links) along with their full paths on Linux/Unix command line.

find / -type l | xargs -r ls -l > symlinks.log


First, find all symbolically linked files/folders by using "-type l" which restricts the results to symbolic links. This example starts from the root of the server, but I would specify as much of the path as possible to speed up the search.

Second, list out all the details of each result by piping what was found to the "ls -l" command. To do this, "xargs" is used which passes the results from the first part as arguments to the "ls -l" command of the second part. The xargs option "-r" is used which tells it to not run the command if no results were returned.

Third, save the results from the second part to a file in the current directory named "symlinks.log".

Linux: Count Unique Error Log Entries

How to get the unique error log entries and a count of each entry on Linux/Unix command line. This is very useful for programmers and developers so they can look through website error log files and quickly fix any problems.

sed 's^\[.*\]^^g' error_log | sort | uniq -c > error_log-unique


In error log files, generally speaking, most of the errors which repeat will differ by the IP and create date stored along side them. In my example, I am assuming these differing parts are stored in brackets which I remove so I can get unique values and a count of those unique values. How the error logs are stored can be modified in your Apache config file assuming you are using Apache.

First, open the error log file using "sed" which is a stream editor and replace all bracketed values with an empty string using a regular expression. The "^" carrot characters are where the find and replace values start and end. \[.*\] basically says match anything inside of brackets. This regular expression will remove the IP and create date as well as other values that aren't really needed leaving the actual error messages.

Second, sort the results. If you don't sort, unique won't work as it works by comparing consecutive lines.

Third, get all the unique values by comparing consecutive lines. Also show a count of those values by using the "-c" option. You can delete the option if you don't need a count.

Forth, save the results to a file in the current directory named "error_log-unique".


If you find that the log file is too large to open, sort, and return a count of the unique lines, then try using the "tail" command to take a specific number of lines off the end of the file. Below I set the "tail" command to give me the last 100,000 lines of the file.

tail -100000 error_log | sed 's^\[.*\]^^g' error_log | sort | uniq -c > error_log-unique

Here is another variation to only keep errors that contain the string "PHP ". By using the "-n" option in conjunction with the "p" argument, sed will not print out any thing except matching lines. In addition, I've added a sed command to remove the ", referer: ..." that can appear at the end of each line which can vary. By removing the referer, the error logs can be grouped together more precisely.

sed -n '/PHP /p' error_log | sed 's^\[.*\]^^g' | sed 's^\, referer: [^\n]*^^g' | sort | uniq -c > error_log-unique

Adding to the customization, I wanted to loop though all my log files ([filename]) and create unique log files for each ([filename]-unique). In the for loop, I look for all files in the current directory with a syntax of "*error_log". I perform the same sed command previously discussed except with the "$filename" variable set in the for loop. I use an if/else statement with the "-s" flag to test if the file exists and is not empty. If it is not empty, display the filename, else remove the file.

for filename in *error_log; do
 sed -n '/PHP /p' $filename | sed 's^\[.*\]^^g' | sed 's^\, referer: [^\n]*^^g' | sort | uniq -c > $filename-unique;
 if [ -s $filename-unique ]; then echo "$filename-unique"; else rm -f $filename-unique; fi;

Saturday, May 1, 2010

PHP: Convert Word Characters to UTF-8 HTML Numbered Entities


Have you ever tried to paste text from a Word document straight into a Web page and you get all these question marks "?", boxes or weird characters? What about when you store the same text to a database and you loss all those nice curly single quotes and curly double quotes? This will show you how to fix that and convert Microsoft Word ASCII characters [ISO-8859-1] to UTF-8 compliant HTML numbered entity codes (decimal).

The ISO 8859-1 Character Set

First you need to understand a few things about the ISO 8859-1 character set. There are 256 possible ranging from 0 to 255 and of these, some have not been assigned a value while others will display correctly on an ISO 8859-1 or UTF-8 Web page. You can display a character within PHP using the chr() function. Example chr(255). When displaying the characters in PHP, these characters repeat when greater then 255 such that chr(255) will be the same as chr(511) and chr(767).

So to jump straight to the point, the characters that need to be converted to a numbered entity range from 128 to 255, inclusively. There are probably a few of these characters that still display properly in Internet Explorer and not other browsers, but we will keep those characters included just to be thorough. Within this range, there are a few characters that need excluding which are 129, 141, 143, 144 and 157. They either are not assigned a value or the value doesn't display correctly; such would be the case with a backspace. Here is a nice visual ISO-8859-1 and Numbered Entities.

HTML Numbered Entities

Numbered entities are displayed on Web pages using the format of ÿ where the numerical value varies. The ISO 8859-1 character set range described above will match up with to the numbered entity equivalent so chr(255) is the same as ÿ. Keep in mind that &#255 is not the same as ǿ or ˿, so the desired range is restricted to that of the ISO-8859-1 range set in the above paragraph.

Converting the ISO-8859-1 Characters to HTML Numbered Entities

First a list will need to be made of all the characters that need converting. This can be accomplished best with a for loop and an array where the numbered entity is the key and the ASCII character is the value. Once that is done, you need to assign variables according to the desired conversion such as going from ASCII characters to numbered entities or vice-versa. A simple switch statement does the trick. Finally, you convert all the applicable characters within the string and return the converted value.

PHP Code:
function ConvertCharacters($String, $ConvertTo='entity') {
// Build character map list
$exclude = array(129, 141, 143, 144, 157);
for($i=128; $i<=255; $i++)
$characterMap['&#'.$i.';'] = chr($i);
foreach($exclude as $i)

// Assign find and replace variables
case 'ascii': // To ascii characters
$find = array_keys($characterMap);
$replace = array_values($characterMap);
case 'entity': // To numbered entities
$find = array_values($characterMap);
$replace = array_keys($characterMap);

// Convert characters within string and return results
return str_replace($find, $replace, $String);
} // ConvertCharacters()

You can call the PHP function using one of three methods:
$String = 'string of text';
$ConvertedString = ConvertCharacters($String); // Convert to entity
$ConvertedString = ConvertCharacters($String, 'entity'); // Convert to entity
$ConvertedString = ConvertCharacters($String, 'ascii');// Convert to ascii


Saturday, August 1, 2009

Query For An IP In A Database


Below are several ways to query for an IP within a database and the shortcomings of each one. I'll assume that you have a PHP Web page where you would like to get the visitor's IP and look for that IP within your MySQL database. Also, I'll assume that you are able to query your database.

What Is An IP?

IP Addresses are 32-bit values which are typically represented as 8-bit quadruplets or "dotted quads" (IPv4). It consists of four unsigned integers separated by three dots. Each integer is less than or equal to 255. For example, one of the IPs that Google has registered would be as seen on ARIN WHOIS

The format of an IP is [0-255].[0-255].[0-255].[0-255] and there are 4,294,967,296 (2^32) possible combinations (256*256*256*256), but some are reserved for private networks or multicast networks. Any user may use the reserved blocks within their network. For simplicity sake, any IP that is not in the table below should be considered external to your network.

Private Network IPs
IP address rangelargest CIDR block (subnet mask) - - -

Getting A Visitor's IP Using PHP

If you don't already know how to get a visitor's IP, then here's the PHP code to do that. Someone else came up with the PHP code, so I can't take credit for it. Keep in mind that many computers within the visitor's internal network can share this external IP. Note, the IP could also be a computer from within your internal network connecting to the Web server which is also within your internal network given that your network configurations permit that computer to connect to the Web server using an internal IP. This type of configuration is common as it reduces the external traffic. See the Private Network IPs above to see if it is an internal IP.

PHP Code:
function getRealIpAddr() {
if(!empty($_SERVER['HTTP_CLIENT_IP'])) {
$ip=$_SERVER['HTTP_CLIENT_IP']; // share internet
} elseif(!empty($_SERVER['HTTP_X_FORWARDED_FOR'])) {
$ip=$_SERVER['HTTP_X_FORWARDED_FOR']; // pass from proxy
} else {
return $ip;
$ip = getRealIpAddr(); // Get the visitor's IP

$ip = $_SERVER['REMOTE_ADDR']; // Simple version

IP Lookup Using An IP Address

The simplest way to lookup an IP in a database is to store the IP as it is. If you are looking for "", then store that IP in the database.

Database Table "ips":
ip [char(15)] <-returned

SQL Statement:
SELECT * FROM ips WHERE ip = ''

Unfortunately, if you want to lookup multiple IPs, you will need a record for each IP. That could be a lot considering that there are a few billion IPs out there.

IP Lookup Using A Reverse LIKE On An IP Range

If you want to lookup a range of IPs, you can perform a reverse LIKE comparison where a "%" character (wildcard) is stored in the database as part of the IP representing a range. The following searches for an IP between and using the IP with the third octave of "16%". The wildcard, in this case, means that the remainder of the third octave and the entire forth octave can be anything.

Database Table "ips":
ip [char(15)]
64.233.16% <-returned

SQL Statement:

Unfortunately, IPs aren't always registered in octaves and so one company may own a portion of the range while other companies may own the rest. Such is the case for one of Google's blocks. ARIN WHOIS shows that Google owns 64.124.112.[24 - 31], but Google doesn't own 64.124.112.[0 - 23] or 64.124.112.[32 - 255]. You would have to use "" and "" to better cover that IP range which still includes some unwanted IPs in the range.

What Is CIDR Notation?

A range of IPs can be written out using CIDR notation so it looks like IP/CIDR. I showed a few examples in the "Private Network IPs" table above. To understand CIDR notation, you first need to understand how a computer sees an IP which is in bits or a series of 1s and 0s (on and off bits). There are 32 bits total in an IP and if you divide that by four, you get four sets of eight or four octaves. The CIDR number basically says that all the IPs in the CIDR block share a certain number of bits from the left, so a CIDR of 24 means that the last 8 bits of the IP can be anything which is the entire last octave (8 bits). If you want to learn more, then read about CIDR notation.

To use CIDR notation in our database example, you need to get the "Subnet ID" of the IP range which you can quickly do using a CIDR Calculator.

Use the calculator and entier in an IP of and set the "Mask Bits" to 24. The "Subnet ID" and "Mask Bits" are what need to be stored in the database which can be found in the "Net: CIDR Notation" field which are and 24. Notice if you change the IP to, the "Subnet ID" doesn't change. That's because IPs 192.168.0.[0-255] belong to the 24 CIDR block. Remember how I kept referring to the last octave, well it is a series of eight bits where 0 is "00000000" and 255 is "11111111" in binary format. An equation to remember is 2^(32-m) where m = the "Mask Bits", so 2^(32-24) = 256 possible IPs or Hosts in this CIDR block. Check out the CIDR table in the CIDR link I provided above.

I seem to be getting too deep into this, so let's just assume you will be using the calculator and move on. The last thing you need to remember is that the CIDR number is an unsigned integer between 0 and 32 where 0 would be all IPs and 32 would be one IP. CIDR blocks of 32 and 31 aren't generally used on networks as those blocks only have one and two IPs respectively and defeat the purpose behind using a CIDR block. Also, the starting and ending IPs in a CIDR block cannot be used by a host as they are reserved for other reasons. However, these two IPs still belong to that CIDR block and so this doesn't change anything in our database lookup. We will use the 31 and 32 CIDR blocks in our database in those cases where we only need one or two IPs.

IP Lookup Using An IP Subnet ID And A CIDR Value

The key to using the CIDR notation in the database query is that you have to convert the IP (Subnet ID) to a base 10 number which is the counting system that we use in our every day lives. Also you have to calculate the last IP in the range using the CIDR value. There are two functions you can use in SQL to convert an IP to a base 10 number and back again. Those functions are inet_aton() and inet_ntoa() where "aton" means address to number and visa versa. In addition, the pow() function will be used to calculate the number of IPs in that CIDR block. Here's a quick breakup of the SQL equations:

Start IP = subnet
End IP = inet_ntoa(inet_aton(subnet) + pow(2,32-cidr) - 1)

Database Table "ips":
subnet [char(15)] cidr [tinyint(4)] 29 24 30 <-returned

SQL Statement:
subnet AND INET_NTOA(INET_ATON(subnet) + POW(2,32-cidr) - 1)

SQL Statement (Optimized):
WHERE '' >= subnet
AND '' <= INET_NTOA(INET_ATON(subnet) + POW(2,32-cidr) - 1)

How does this work? The SQL statement returned the record where (visitor's IP) was between and Since the subnet and CIDR values were known, we could calculate the last IP in the CIDR block. The equation pow(2,32-cidr) gives the number of IPs in this CIDR block which in this case was 2^(32-30) or 4. Given that the subnet is the starting IP, we needed to go up three more IPs to get a total of four IPs which is why the "minus 1" is there. Converting from IP to a numeric value and back again makes this arithmetic easy.

Unfortunately, this method has the same kind of flaws that the previous "reverse LIKE" lookup method had. There may be cases where you want the results we obtained from the above query as well as a few more IPs above it such as to 64.233.10. Since the total number of IPs in the CIDR block is calculated by 2 to the power of something, having 7 IPs would only work if you had multiple records in the database such as: (4 IPs - to (2 IPs - to (1 IP - to

IP Lookup Using A Starting And Ending IP Address

Just get the visitor's IP, the starting IP, and the ending IP and convert everything to a base 10 integer and do your query.

Database Table "ips":
ip_start [char(15)] ip_end [char(15)] <-returned

SQL Statement:
INET_ATON(ip_start) AND INET_ATON(ip_end)
ORDER BY ip_start DESC, ip_end DESC

SQL Statement (Optimized):
WHERE INET_ATON('') >= INET_ATON(ip_start)
AND INET_ATON('') <= INET_ATON(ip_end)
ORDER BY ip_start DESC, ip_end DESC

This seems to better cover the issues that the preceding methods had with IPs outside of a range or CIDR block. The only problem is that there are a lot of IP conversions being done. I think the query would be simpler if the IPs were converted to their numeric values before the lookup is done.

Converting A Visitor's IP To A Numeric Value Using PHP

To convert an IP to a numeric value in PHP, you need to use the ip2long() function with some extra code to convert it to an unsigned integer. I found two methods to doing this. Someone else came up with the PHP code, so I can't take credit for it.

PHP Code:
// Converts to 1089015814
$ip_aton = sprintf("%u", ip2long($ip));
$ip_aton = (substr($ip,0,3)>127) ? ((ip2long($ip) & 0x7FFFFFFF) + 0x80000000) : ip2long($ip);

IP Lookup Using A Starting And Ending IP Numeric Value

Using the previous method, let's try converting the IPs before the lookup is done. To store the numeric value in the database, simply do an INSERT query using the inet_aton() function. Keep in mind that there are 4,294,967,296 possible IPs, so set the type to an unsigned INT which has a range of 0 to 4,294,967,295 according to the MySQL Reference Manual. Perfect for what we need. You can also store the readable IPs in the database for reference.

Database Table "ips":
ip_start_aton [int(10)] ip_end_aton [int(10)]
3639549960 3639549967
1180247808 1180248063
1089015812 1089015818 <-returned

SQL Statement:
WHERE 1089015814 BETWEEN ip_start_aton AND ip_end_aton
ORDER BY ip_start_aton DESC, ip_end_aton DESC

SQL Statement (Optimized):
WHERE 1089015814 >= ip_start_aton
AND 1089015814 <= ip_end_aton
ORDER BY ip_start_aton DESC, ip_end_aton DESC

Doing it this way, you can have a starting IP of and an ending IP of covering all 4,294,967,296 IPs. This is overkill as networks aren't that large.

IP Lookup Using A Starting IP Numeric Value And A Range Number

Using the previous method, let's remove the oversized ending IP numeric value and replace it with a more convenient value like an "add range" value to calculate the ending IP. For my purposes, I am going to assume the IP ranges I am looking for won't be more then 255 IPs from the starting IP and so I will set the type to an unsigned TINYINT which has a range of 0 to 255 according to the MySQL Reference Manual. Feel free to change the type if you need a larger range. Remember that the range is added to the starting IP numeric value and so setting the range to 0 means that there is no range and so there is only one IP.

Database Table "ips":
ip_start_aton [int(10)] ip_add_range [tinyint(3)]
3639549960 7
1180247808 255
1089015812 6 <-returned

SQL Statement:
WHERE 1089015814 BETWEEN
ip_start_aton AND ip_start_aton + ip_add_range
ORDER BY ip_start_aton DESC

SQL Statement (Optimized):
WHERE 1089015814 >= ip_start_aton
AND 1089015814 <= ip_start_aton + ip_add_range
ORDER BY ip_start_aton DESC

This seems to be the most optimized solution that covers all the scenarios mentioned above plus some.