Analog是一款基于C语言功能强大的开源的网站访问日志分析软件,支持多语言(含中文),可以运行在linux,windows下,支持apache、ngix、iis等主流WEB日志.速度飞快,10分钟内可以处理2千万条日志,数据统计以PV为主,相比Awstats和Webalizer 的报告页面简单了点,更漂亮的图表可用Report Magic 2.21.
目前最新版为analog-6.0,作者自19-Dec-04后就没更新过.演示地址
安装很简单,到:http://www.analog.cx/download.html 下载相应的版本,这里以源码版为例:将下载回来的源码包解压到安装目录,再进入该目录执行make命令即可.
wget http://www.analog.cx/analog-6.0.tar.gz
tar zxvf analog-6.0.tar.gz
cp -ar analog-6.0 /usr/local/
cd /usr/local/analog-6.0
make
ln -s analog-6.0 analog
mkdir /opt/htdocs/www/analog
chown www:website /opt/htdocs/www/analog
cp images /opt/htdocs/www/analog/
mkdir conf
cp analog.cfg conf/c1g.cfg
配置
vi conf/c1g.cfg
#定义为中文
LANGUAGE SIMP-CHINESE
#nginx日志格式
LOGFORMAT (%s – %j [%d/%M/%Y:%h:%n:%j %j] “%j %r %j” %c %b “%f” “%B”\n)
#日志文件
LOGFILE /opt/log/Y.%M/*/*c1gstudio.com.log.gz
#输出文件
OUTFILE /opt/htdocs/www/analog/c1gstudiolY.%M/index.html
#主机名
HOSTNAME “c1gstudio.com”
#主机URL
HOSTURL http://www.c1gstudio.com/
#web图片目录
IMAGEDIR ../images/
#只列出访问最高的200个页面URL
REQFLOOR 1000p
#forum.php文件算一个文件
FILEALIAS /forum.php* /forum.php
#统计子目录
SUBDIR */*
LOGFORMAT 说明
%S
host (the client hostname, or address of the computer making the request)
%s
numerical IP address of client (if recorded in a separate field; used when %S is empty)
%r
file requested
%q
query string (part of filename after ?, if recorded in a separate field)
%B
browser
%A
browser with +’s instead of spaces
%f
referrer
%u
user (tip: a cookie or session id can usefully be defined as %u too)
%v
virtual host (the server hostname, also called the virtual domain)
%d
day of the month
%m
month in digits
%M
month, three letter English abbreviation
%y
year, last two digits
%Y
year, four digits
%Z
year, two or four digits (less efficient)
%h
hour of the day
%n
minute of the hour
%a
a or A for am, or p or P for pm, if %h is in the 12-hour clock. (So to match “am” you need %am and to match “AM” you need %aM)
%U
“Unix time” (seconds since beginning of 1970, GMT). If it includes decimals, use %U.%j
%b
number of bytes transferred
%t
processing time in seconds
%T
processing time in milliseconds
%D
processing time in microseconds
%c
HTTP status code
%C
code words used instead of HTTP status code in some servers — only used internally
%j
junk: ignore this field (field can be empty too)
%w
white space: spaces or tabs
%W
optional white space
%%
% sign
\n
new line
\t
tab stop
\\
single backslash
我的nginx日志格式
‘$remote_addr – $remote_user [$time_local] “$request” ‘
‘$status $body_bytes_sent “$http_referer” ‘
‘”$http_user_agent” $http_x_forwarded_for’;
183.62.5.13 – – [06/Aug/2014:17:16:44 +0800] “GET /aboutc1g.html HTTP/1.1” 200 6642 “http://www.c1gstudio.com/web/hello.html” “Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36” 183.62.5.13
我这多了个$http_x_forwarded_for’,后面也要加个%j表示丢弃,它不会处理”-”
LOGFORMAT (%s – %j [%d/%M/%Y:%h:%n:%j %j] “%j %r %j” %c %b “%f” “%B” %j\n)
更多参考
LOGFILE 和OUTFILE说明
LOGFILE new1.log,old*.log
LOGFILE /opt/log/%Y.%M/%D/*.c1gstudio.com.log.gz
支持通配符,日期变量及gz压缩,OUTFILE不会自动创建目录
%D date of month
%m month name, in English
%M month number
%y two-digit year
%Y four-digit year
%H hour
%n minute
%w day of week, in English
但是日期不支持运算有点麻烦,需要外部用shell来解决了
更多参考
==================================
2014-8-26更新
The arguments to LOGFILE and CACHEFILE commands are checked for containing only certain allowed characters (specifically, letters, digits, /\.:_*? space, and – between two {letter, digit, underscore}’s). This is because they could match an UNCOMPRESS command and thus be passed to the shell when the uncompress command is popen()’ed.
可以将一个月份分成3部分来减轻压力
LOGFILE /opt/log/%Y.%M/[2-3]?/*.c1gstudio.com.log.gz
Analog运行时会将日志读到内存中,想要运行快最好准备比日志大的内存,CACHEOUTFILE和CACHEFILE会占用大量空间,感觉没什么用.
==================================
配置文件内统计开关变量
MONTHLY ON # one line for each month
WEEKLY ON # one line for each week
DAILYREP ON # one line for each day
DAILYSUM ON # one line for each day of the week
HOURLYREP ON # one line for each hour of the day
GENERAL ON # the General Summary at the top
REQUEST ON # which files were requested
FAILURE ON # which files were not found
DIRECTORY ON # Directory Report
HOST ON # which computers requested files
ORGANISATION ON # which organisations they were from
DOMAIN ON # which countries they were in
REFERRER ON # where people followed links from
FAILREF ON # where people followed broken links from
SEARCHQUERY ON # the phrases and words they used…
SEARCHWORD ON # …to find you from search engines
BROWSERSUM ON # which browser types people were using
OSREP ON # and which operating systems
FILETYPE ON # types of file requested
SIZE ON # sizes of files requested
STATUS ON # number of each type of success and failure
命令行参数
x GENERAL General Summary
1 YEARLY Yearly Report
Q QUARTERLY Quarterly Report
m MONTHLY Monthly Report
W WEEKLY Weekly Report
D DAILYREP Daily Report
d DAILYSUM Daily Summary
H HOURLYREP Hourly Report
h HOURLYSUM Hourly Summary
w WEEKHOUR Hour of the Week Summary
4 QUARTERREP Quarter-Hour Report
6 QUARTERSUM Quarter-Hour Summary
5 FIVEREP Five-Minute Report
7 FIVESUM Five-Minute Summary
S HOST Host Report
l REDIRHOST Host Redirection Report
L FAILHOST Host Failure Report
Z ORGANISATION Organisation Report
o DOMAIN Domain Report
r REQUEST Request Report
i DIRECTORY Directory Report
t FILETYPE File Type Report
z SIZE File Size Report
P PROCTIME Processing Time Report
E REDIR Redirection Report
I FAILURE Failure Report
f REFERRER Referrer Report
s REFSITE Referring Site Report
N SEARCHQUERY Search Query Report
n SEARCHWORD Search Word Report
Y INTSEARCHQUERY Internal Search Query Report
y INTSEARCHWORD Internal Search Word Report
k REDIRREF Redirected Referrer Report
K FAILREF Failed Referrer Report
B BROWSERREP Browser Report
b BROWSERSUM Browser Summary
p OSREP Operating System Report
v VHOST Virtual Host Report
R REDIRVHOST Virtual Host Redirection Report
M FAILVHOST Virtual Host Failure Report
u USER User Report
j REDIRUSER User Redirection Report
J FAILUSER User Failure Report
c STATUS Status Code Report
#+a可以带上全部统计
更多参考
#输出当前配置
analog -settings > file
#使用命令行配置LOGFILE和OUTFILE
./analog +O/opt/htdocs/www/analog/c1gstudio2014.html /opt/log/2014.08/02/*.c1gstudio.com.log.gz
我使用时一直会报日志格式错误,无法出报告
#我使用的参数
/usr/local/analog -G +g/usr/local/analog/conf/c1g.cfg +b +s +S -n -o -Z -r
+b 浏览器概要报告
-n 检索字报告
+s 来源网站报告
-o 网域报告
-Z 来源组织单位报告
+S 主机报告
-r 请求报告
-G 不读analog.cfg
+g读取自定义配置文件
我这每日报告用awstats统计,每月报告用analog统计,每个域名汇总一个月报告.
日志按天存放在/opt/log/2014.08/07/目录下
www.c1gstudio.com.log.gz
blog.c1gstudio.com.log.gz
www.c1g.com.log.gz
每日运行完awstats后运行analog
crontab
10 5 * * * /bin/sh /opt/shell/analog.sh > /dev/null 2>&1
vi /opt/shell/analog.sh
#!/bin/sh
ana_dir=/usr/local/analog/
web_dir=/opt/htdocs/www/analog/
conf_dir=”${ana_dir}/conf/”
today=`date +%d`
yesterday=`date +%Y%m%d`
lastday_month=`date +%Y.%m -d ‘1 day ago’`
lastday_day=`date +%d -d ‘1 day ago’`
c1g_LOGFILE=/opt/log/${lastday_month}/*/*c1gstudio.com.log.gz
c1g_OUTFILE=${web_dir}c1gstudio${lastday_month}/index.html
POST_LOGFILE=/opt/log/${lastday_month}/*/c1g.com.log.gz
POST_OUTFILE=${web_dir}c1g${lastday_month}/index.html
#if [ $today == “02” ]; then
if [ ! -d $(dirname “${c1g_OUTFILE}”) ]; then
mkdir -p $(dirname “${c1g_OUTFILE}”)
chown www:website $(dirname “${c1g_OUTFILE}”)
fi
if [ ! -d $(dirname “${POST_OUTFILE}”) ]; then
mkdir -p $(dirname “${POST_OUTFILE}”)
chown www:website $(dirname “${POST_OUTFILE}”)
fi
sed -i “s;LOGFILE.*;LOGFILE ${c1g_LOGFILE};” ${conf_dir}c1gstudio.cfg
sed -i “s;OUTFILE.*;OUTFILE ${c1g_OUTFILE};” ${conf_dir}c1gstudio.cfg
sed -i “s;LOGFILE.*;LOGFILE ${POST_LOGFILE};” ${conf_dir}c1g.cfg
sed -i “s;OUTFILE.*;OUTFILE ${POST_OUTFILE};” ${conf_dir}c1g.cfg
#fi
${ana_dir}analog -G +g${conf_dir}c1gstudio.cfg +b +D -d +s +S -n -o -Z -r
${ana_dir}analog -G +g${conf_dir}c1g.cfg +b +D -d +s +S -n -o -Z +r
No Responses (yet)
Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.