Tuesday, February 7, 2012

Linux: Count Unique Error Log Entries

How to get the unique error log entries and a count of each entry on Linux/Unix command line. This is very useful for programmers and developers so they can look through website error log files and quickly fix any problems.

sed 's^\[.*\]^^g' error_log | sort | uniq -c > error_log-unique

Explanation:

In error log files, generally speaking, most of the errors which repeat will differ by the IP and create date stored along side them. In my example, I am assuming these differing parts are stored in brackets which I remove so I can get unique values and a count of those unique values. How the error logs are stored can be modified in your Apache config file assuming you are using Apache.

First, open the error log file using "sed" which is a stream editor and replace all bracketed values with an empty string using a regular expression. The "^" carrot characters are where the find and replace values start and end. \[.*\] basically says match anything inside of brackets. This regular expression will remove the IP and create date as well as other values that aren't really needed leaving the actual error messages.

Second, sort the results. If you don't sort, unique won't work as it works by comparing consecutive lines.

Third, get all the unique values by comparing consecutive lines. Also show a count of those values by using the "-c" option. You can delete the option if you don't need a count.

Forth, save the results to a file in the current directory named "error_log-unique".

Customization:

If you find that the log file is too large to open, sort, and return a count of the unique lines, then try using the "tail" command to take a specific number of lines off the end of the file. Below I set the "tail" command to give me the last 100,000 lines of the file.

tail -100000 error_log | sed 's^\[.*\]^^g' error_log | sort | uniq -c > error_log-unique


Here is another variation to only keep errors that contain the string "PHP ". By using the "-n" option in conjunction with the "p" argument, sed will not print out any thing except matching lines. In addition, I've added a sed command to remove the ", referer: ..." that can appear at the end of each line which can vary. By removing the referer, the error logs can be grouped together more precisely.

sed -n '/PHP /p' error_log | sed 's^\[.*\]^^g' | sed 's^\, referer: [^\n]*^^g' | sort | uniq -c > error_log-unique


Adding to the customization, I wanted to loop though all my log files ([filename]) and create unique log files for each ([filename]-unique). In the for loop, I look for all files in the current directory with a syntax of "*error_log". I perform the same sed command previously discussed except with the "$filename" variable set in the for loop. I use an if/else statement with the "-s" flag to test if the file exists and is not empty. If it is not empty, display the filename, else remove the file.

for filename in *error_log; do
 sed -n '/PHP /p' $filename | sed 's^\[.*\]^^g' | sed 's^\, referer: [^\n]*^^g' | sort | uniq -c > $filename-unique;
 if [ -s $filename-unique ]; then echo "$filename-unique"; else rm -f $filename-unique; fi;
done

1 comment:

  1. Uniq's man page states that ADJACENT duplicate lines are removed, which is why the sort command is needed first, but is there a way to run uniq and keep the order in which the entries were found? I.e. a log file has the same error on line 1 and 12, but running it through sort would, alphabetically, move the errors further down than the first line, which is where the first entry originated.

    ReplyDelete