Made of Everything You're Not

No, not the flute playing Eric Lamb; the guitar playing, PHP programmer Eric Lamb. The better Eric Lamb.
  • Home
  • Projects
  • Portfolio
  • Resume
« Get Rid of HTML Templates With HAML
Introducing WP-hResume »

Parse Apache Log Files With PHP

Parsing the log files generated by Apache is one of those random tasks with a random occurrence in my world. This is a task that, until recently, hadn’t come up enough to warrant any sort of a ready solution (and it was just fun enough to be ok to write a custom solution). So every time this came up I would always fire up Google and go on a scavenger hunt for a starter script written in php.

Parse Apache Log Files With PHP

Parse Apache Log Files With PHP

This always felt like a good idea at the time the need came up. These days, for some ungodly reason, parsing Apache logs seems to come up a little too frequently to keep this up. In the spirit of making my life a hell of a lot easier for tomorrow I’ve taken a shot at writing an Apache log parser written in PHP.

One thing I decided to implement is a filtering system so you can filter out based on a provided regex. Might not be too useful to everyone but it should be trivial to remove the functionality.

Anyway, I hope someone finds this useful (even to learn from and, of course, use)

Here’s the main class:

<?php
/**
 * Apache Log Parser
 * Parses an Apache log file and runs the strings through filters to find what you're looking for.
 * @author Eric Lamb
 *
 */
class apache_log_parser
{
	/**
	 * The path to the log file
	 * @var string
	 */
	private $file = FALSE;
 
	/**
	 * What filters to apply. Should be in the format of array('KEY_TO_SEARCH' => array('regex' => 'YOUR_REGEX'))
	 * @var array
	 */
	public $filters = FALSE;
 
	/**
	 * Duh.
	 * @param string $file
	 * @return void
	 */
	public function __construct($file)
	{
		if(!is_readable($file))
		{
			return 	FALSE;
		}
 
		$this->file = $file;
	}
 
	/**
	 * Executes the supplied filter to the string
	 * @param $filer
	 * @param $status
	 * @return string
	 */
	private function applyFilters($str)
	{
		if(!$this->filters || !is_array($this->filters))
		{
			return $str;
		}
 
		foreach($this->filters AS $area => $filter)
		{
			if(preg_match($filter['regex'], $str[$area], $matches, PREG_OFFSET_CAPTURE))
			{
				return $str;
			}
		}
	}
 
	/**
	 * Returns an array of all the filtered lines 
	 * @param $limit
	 * @return array
	 */
	public function getData($limit = FALSE)
	{
		$handle = fopen($this->file, 'rb');
		if ($handle) {
			$count = 1;
			$lines = array();
		    while (!feof($handle)) {
		        $buffer = fgets($handle);
		        $data = $this->applyFilters($this->format_line($buffer));
		        if($data)
		        {
		        	$lines[] = $data;
		        }
 
		        if($limit && $count == $limit)
		        {
		        	break;
		        }
		        $count++;
		    }
		    fclose($handle);
		    return $lines;
		}		
	}
 
	/**
	 * Regex to parse the log file line
	 * @param string $line
	 * @return array
	 */
	function format_log_line($line)
	{
		preg_match("/^(\S+) (\S+) (\S+) \[([^:]+):(\d+:\d+:\d+) ([^\]]+)\] \"(\S+) (.*?) (\S+)\" (\S+) (\S+) (\".*?\") (\".*?\")$/", $line, $matches); // pattern to format the line
		return $matches;
	}
 
	/**
	 * Takes the format_log_line array and makes it usable to us stupid humans
	 * @param $line
	 * @return array
	 */
	function format_line($line)
	{
		$logs = $this->format_log_line($line); // format the line
 
		if (isset($logs[0])) // check that it formated OK
		{
			$formated_log = array(); // make an array to store the lin info in
			$formated_log['ip'] = $logs[1];
			$formated_log['identity'] = $logs[2];
			$formated_log['user'] = $logs[2];
			$formated_log['date'] = $logs[4];
			$formated_log['time'] = $logs[5];
			$formated_log['timezone'] = $logs[6];
			$formated_log['method'] = $logs[7];
			$formated_log['path'] = $logs[8];
			$formated_log['protocal'] = $logs[9];
			$formated_log['status'] = $logs[10];
			$formated_log['bytes'] = $logs[11];
			$formated_log['referer'] = $logs[12];
			$formated_log['agent'] = $logs[13];
			return $formated_log; // return the array of info
		}
		else
		{
			$this->badRows++; // if the row is not in the right format add it to the bad rows
			return false;
		}
	}
}
?>

And here’s an example of how to use it:

<?php
$data = new apache_log_parser($d->path.'/'.$entry); // Create an apache log parser
$data->filters = array(
	'path' => array('regex' => '/^.*\.(FLV|flv)$/') //pull only flv files
);
 
$data = $data->getData();
?>

A couple things to note about this script though:

1. The regex and parsing was pretty stolen from the Apache Log Parser on PHPClasses.org.
2. Without filters the script is pretty memory intensive. My needs don’t require anything client facing but heed my adivice; Don’t use this on a public web server.

Bookmark and Share

Related Posts

Stand Alone ExpressionEngine Authentication
Importing Legacy Users Into ExpressionEngine
CartThrob 2.0 Beta Fun
ExpressionEngine and the Mystery of M00o93H7pQ09L8X1t49cHY01Z5j4TT91fGfr
Custom Routes With Zend Framework

Tags: apache, php

This entry was written by Eric Lamb and posted on Saturday, January 9th, 2010 at 12:35 am and is filed under IT, Programming. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

6 Comments

  1. abcphp.com says:
    January 9, 2010 at 2:13 am

    Parse Apache Log Files With PHP | Made of Everything You’re Not | Eric Lamb…

    Parsing the log files generated by Apache is one of those random tasks with a random occurrence in my world. This is a task that, until recently, hadn’t come up | Eric Lamb…

    Reply
  2. Daniel says:
    January 11, 2011 at 11:00 am

    Nice code!

    You have a copy and paste typo on the line:
    $formated_log['user'] = $logs[2];

    (should be 3, not 2)

    Cheers,

    DM

    Reply
  3. matsu says:
    July 22, 2011 at 8:24 pm

    Thank you for your work.
    I was searching PHP apache log parser.

    Reply
  4. Pete says:
    August 26, 2011 at 10:10 am

    Seems like there’s a bug in this version that wasn’t in the original class:

    Fatal error: Call to undefined method apache_log_parser::getData() in /wwwroot/example.php on line 48

    include ‘apache-log-parser.php’;
    $log = ‘Jun-2011.log’;

    $parser = new apache_log_parser($log); // Create an apache log parser
    if($parser == false)
    {
    echo “Unable to create parser for: $log”;
    exit;
    }

    $parser->filters = array(
    ‘path’ => array(‘regex’ => ‘/^.*\.(PHP|php)$/’) //pull only php files
    );

    $log_data = $parser->getData();
    echo “

    ";
    print_r($log_data); // print out the array
    echo "

    “;

    Reply
  5. DJ says:
    October 17, 2011 at 7:18 am

    You sir, are a wonderful human being. Saved me my morning. Great code.

    Reply
  6. yauri says:
    November 30, 2011 at 5:47 am

    Thanks for the class :)
    I enhance the code with jQuery Plugin

    Reply

Leave a Reply

Click here to cancel reply.

  • Subscribe: Entries | Comments
  • About Me

    Email Email
    Twitter Twitter
    310.739.3322
  • Categories

    • Brain Dump
    • Business
    • Code
    • IT
    • Programming
    • Rant
    • Servers
  • Archives

    • October 2011
    • August 2011
    • July 2011
    • June 2011
    • May 2011
    • April 2011
    • March 2011
    • February 2011
    • January 2011
    • December 2010
    • November 2010
    • October 2010
    • September 2010
    • August 2010
    • July 2010
    • June 2010
    • May 2010
    • April 2010
    • March 2010
    • February 2010
    • January 2010
    • December 2009
    • November 2009
    • October 2009
    • September 2009
    • August 2009
    • July 2009
    • June 2009
    • May 2009
    • April 2009
    • March 2009
    • February 2009
    • January 2009
    • December 2008
    • November 2008
    • October 2008

Copyright © 2008 - 2012 Eric Lamb - All rights reserved