2012年8月16日 星期四

Using gawk (GNU awk) to monitor /var/log/messages

I have a requirement to periodically scan through /var/log/messages to catch specific error message. The error itself is very time-sensitive and I want to be informed as soon as the message is shown on the log, ideally in a minute interval.

Usually people would suggest to periodically run grep against the log which is the simplest way to facilitate the need but it doesn't work for my scenario. The problem to simply run grep is that it may also capture unnecessary information. For example, I have below logs on my /var/log/messages.

Aug 3 12:40:06 test-box kernel: [1645652.295156] CPU0: Temperature/speed normal
Aug 3 15:05:28 test-box kernel: [1645673.980296] CPU1: Temperature/speed normal

If I simply use grep to catch above log entry in a minute interval, I will probably be informed at 12:40.06 and then all the way till the end of the day (i.e. when the /var/log/messages be rotated and cleaned up). Not to mention starting from 15:05:28, the grep script will catch 2 errors from the log, the lines being recorded at 12:40:05 and 15:05:28. However, the fact is that I actually need the one on 15:05:28.

After some research, it looks like gawk (GNU awk) would be the perfect tool to solve the problem. Eventually I come up with below gawk script to read /var/log/messages.


$ cat gawk-script
BEGIN {
    # Declare field seperator.
    FS="[- :.]";


    
    # Generate the timestamp
    NOW=systime();

    
    # The time interval that I want this script to read from
    # As long as I will run this script with cron in every minutes
    # I set the time interval to 1sec*60
    # So, for example when the script is run at 12:05, only log entries between 12:04 ~ 12:05 will be read.
    PAST=NOW-(60);


    
    # /var/log/messages start with the line in format of "Month date" (e.g Mar 23)
    # Ideally we can convert it to something like 23/3 but I want to be lazy, 
    # so I simply make use of the system time library and 
    # picked %Y %m to represent the Year and Month attributes.
    format="%Y %m";

    # LOGMTH will be something like "2012 10" (Oct 2012)
    # This will be used later to generate the timestamp of the log entry.
    LOGMTH=strftime(format, THEN)
}
{

    # Read the line of the log and convert it to a timestamp
    LOGTIME=mktime(LOGMTH " " $2 " " $3 " " $4 " " $5);

    # Below 3 lines can be commented out to debug with the value being read.
    #{print $2, $3, $4, $5};
    #{print LOGMTH};
    #{print LOGTIME};

    
    # print the log if the timestamp of the line is between PAST <> NOW.
    if(PAST<LOGTIME){print}
}


So here is a demo of the script.

$ cat /var/log/messages  ### totally 4 lines here
Aug 3 12:40:06 test-box kernel: [1645652.295156] CPU0: Temperature/speed normal
Aug 3 15:05:28 test-box kernel: [1645673.980296] CPU1: Temperature/speed normal Aug 3 18:03:31 test-box kernel: [1640277.321612] usb 2-2: new high speed USB device using ehci_hcd and address 28
Aug 3 18:03:32 test-box kernel: [1640277.474483] usb 2-2: configuration #1 chosen from 4 choices


$ date  #Lets check the current time
Wed Oct 3 18:04:24 HKT 2012


$ gawk -f gawk-script /var/log/messages
Oct 3 18:03:31 test-box kernel: [1640277.321612] usb 2-2: new high speed USB device using ehci_hcd and address 28
Oct 3 18:03:32 test-box kernel: [1640277.474483] usb 2-2: configuration #1 chosen from 4 choices


So, the script now only print everything captured within last minute. You can then use the script together with grep utility to catch the string you are interested.