90.2 Web Site Analyser

20210707 Use webalizer as a command line tool to analyse the apache2 access logs.

wajig install webalizer

After installing webalizer create a folder into which the results will be saved (as HTML), using mkdir, and then cd (change directory) into the folder.

mkdir webalizer
cd webalizer

Next, collect together all of the apache logs you can access using zcat for compressed files and cat for text files, redirecting the output to the file tmp.log.

zcat /var/log/apache2/access*.gz > tmp.log
cat /var/log/apache2/access.log.1 /var/log/apache2/access.log >> tmp.log

We sort the lines in tmp.log, saving to access.log, so that we don’t confuse the webalizer. This command line comes from https://stackoverflow.com/questions/5672733/how-can-i-sort-an-apache-log-file-by-date.

sort -u -t ' ' -k 4.9,4.12n -k 4.5,4.7M -k 4.2,4.3n -k 4.14,4.15n -k 4.17,4.18n -k 4.20,4.21n tmp.log > access.log

Now run the webalizer over the sorted access.log file targeting the output, using -o, to be the current working directory (.):

webalizer -o . access.log

Finally, visit that directory in a browser:

brave-browser ./index.html

Your donation will support ongoing availability and give you access to the PDF version of the book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2021 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0.