How to exclude Apache Access log entries

The Apache Access log records each request for content received by your Apache web server.  This is important information to have available.  You can see where requests are coming from and detect requests that may be attacks from hackers seeking to get in to your system.  However, you can also see a lot of routine requests that may not interest you.  In this post, a practical example will be used to show how to eliminate repetitive, routine entries from this log.

Zend Server 6 for IBM i introduced a new service, the Zend Server daemon.  This service performs a couple of routine checks about twice a minute, resulting in these two entries in the Apache access log:

127.0.0.1 - - [12/May/2014:13:50:15 -0700] "GET /UserServer/zsd_print_extensions.php HTTP/1.1" 200 3915 "-" "Mozilla/5.0"
127.0.0.1 - - [12/May/2014:13:50:15 -0700] "GET /UserServer/zsd_is_webserver_alive.php HTTP/1.1" 200 6 "-" "Mozilla/5.0"

At that rate, these messages are filling up the log with 240 messages an hour, or 5760 a day in a 24 hour shop.  That’s a lot of messages.  All these messages are really telling us is that the Zend Server daemon process is running, and there are other ways to determine that.  Other than that, they are mostly just taking up space, and can be a real hindrance when checking the access log for more interesting requests.

For this example, we are using IBM HTTP Server powered by Apache on an IBM i at 7.1. We are modifying Zend Server for IBM i version 6.3.0.

Important: Before making this change, please make a back up copy of the Apache configuration file.:
/www/zendsvr6/conf/httpd.conf

You can edit the httpd.conf (Apache configuration) file in Zend Studio from the Remote System Explorer perspective, or you can edit it in the IBM HTTP Administrator in your browser.

This is a simple change.  It adds a SetEnvIf directive, and modifies the existing CustomLog directive to test for the environment variable.

In the httpd.conf file, please find this line:

CustomLog logs/access_log combined

Replace that line with these:

# Check for requests to exclude from the access log
SetEnvIf Request_URI "^/UserServer/zsd_.+\.php$" log_exclude=true
# Change log to test for not log_exclude
CustomLog logs/access_log combined env=!log_exclude

The lines that start with ‘#’ are comments.  The SetEnvIf directive tests for some condition, and sets an environment variable based on the result.  In the example above, the Request_URI is checked for a value that matches the given regular expression.  If it matches, environment variable log_exclude is created.  The Request_URI is a constant defined for this directive that will test against the part of the request after the host, but not including the query string (GET parameters).  Basically, it is the file being requested.

The next part is a regular expression.  It is a pattern that can be used to test a string.  If the string matches the pattern, the test returns true.  This regular expression says the string should start with “/UserServer/zsd_” and end with “.php”.  This should match any requests for “/UserServer/zsd_print_extensions.php” and “/UserServer/zsd_is_webserver_alive.php”, the files being requested in the messages shown at the start of this post.  It would also match anything that fit the pattern.  For example, “/UserServer/zsd_Paul_is_dead.php” or “/UserServer/zsd_I_am_the_walrus.php” would also match.

Be careful constructing your regex pattern. Try not to make it too generic, so that you do not accidentally exclude entries you might want to see. If regular expressions are new to you, consider using a regex tester and try putting in some strings to see what matches. You can also look at a regex tutorial to learn how to make the regex pattern.

The last argument of the SetEnvIf specifies the environment variable to create if the test returns true, and optionally sets a value in the environment variable.  In this example, I did not really need to set log_exclude to true.  The “= true” part of the expression could be left off.  It just seems to make the argument in the CustomLog directive a little more obvious to a PHP programmer.

To modify the CustomLog directive, we are just adding the env argument, “env=!log_exclude”.  In this case, the environment variable will only exist if we want to bypass the request, so the CustomLog will NOT record the request if the environment variable exists.  The exclamation mark means “not” in this expression, just as it would in PHP.  The difference is that in this case, “not” infers “does not exist”, rather then just boolean false.

After making the changes to the httpd.conf file, stop and start Apache for the change to take effect.

You can have more than one SetEnvIf directive in your httpd.conf .  For example, if you really want to see those requests for “/UserServer/zsd_Paul_is_dead.php”, you can test for each of the two files specifically:

SetEnvIf Request_URI "^/UserServer/zsd_print_extensions.php$" log_exclude=true

SetEnvIf Request_URI "^/UserServer/zsd_is_webserver_alive.php$" log_exclude=true

You could also add more SetEnvIf directives to check for more requests you would like to exclude.  Here are some reference links to help you learn more.

SetEnvIf 7.1 documention in IBM i Knowledge Center

CustomLog 7.1 documentation in IBM i Knowledge Center

Regular-Expressions.info – An excellent guide to regular expressions.  Keep in mind that IBM HTTP Server has an older regex engine, so keep it simple.  In particular, avoid back references. This is not supported.

regexpal – A free online regex tester.

 

Bookmark the permalink.

Leave a Reply