Parsing web logs with grep

If your Web server returns logs in standard format (with each hit represented by one line of text in the log), you can easily use grep to extract useful data on your recent traffic. For example, I run the BeOS Tip Server as an adjunct to Birdhouse Arts. Since both sites are on the same server (actually, they haven’t been for a long time, since this site now runs on BeOS, but you get the idea), they share a common traffic log, but I like to follow the number of hits on just the Tip Server each week. Hits in the log look like this (without the line wrapping):
www.birdhouse.org 209.211.101.75 - - [18/Jul/1998:03:14:44 -0700] "GET / beos/tips/archive/tip103.html HTTP/1.0" 304 - www.dnai.com 38.232.165.5 - - [08/Jul/1998:13:10:57 -0700] "GET /~waxwing/ dreams/confirm.htm HTTP/1.0" 200 844
The distinguishing characteristic of the Tip Server hits is that all of them include the string „beos/tips“ in them. To sort out only those hits from a log file called waxwing.log, I use this grep string, remembering to „escape“ the forward slash:

grep beos/tips waxwing.log 

This spits back a list of Tip Server hits, and now all I have to do is count the number of new lines in this command’s standard output. There are a few ways to get this number, but the easiest is to run the output through the wc command, using its -l flag to count new lines. Thus, typing:

grep beos/tips waxwing.log | wc -l 

yields „3754“ or some other number. Since I get a fresh web log every week, I know how many hits the Tip Server got that week. If you have many such strings to check on regularly, write a script containing a series of greps and write the results to a report file.

 

Comments

No comments so far.

(comments are closed)

Kategorien

 
 
Blogroll
Resources