Skip to content


对apache 访问日志排序练习

记录一下练习
假设多台web服务器的日志合并在一起,需按日期重新排序。

样列:

127.0.0.1 – – [01/Dec/2005:14:00:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32
127.0.0.1 – – [01/Dec/2005:14:01:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32
127.0.0.1 – – [01/Dec/2005:14:05:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32
127.0.0.1 – – [01/Dec/2005:14:04:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32
127.0.0.1 – – [01/Dec/2005:14:02:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32
127.0.0.1 – – [01/Dec/2005:14:02:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32
127.0.0.1 – – [01/Dec/2005:15:02:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32
127.0.0.1 – – [01/Dec/2005:14:02:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32
207.0.0.1 – – [01/Dec/2005:14:02:54 +800] “GET /test/testregx.php HTTP/1.1” 200 32
227.0.0.1 – – [01/Dec/2005:14:02:54 +800] “GET /test/testregx.php HTTP/1.1” 200 32
217.0.0.1 – – [01/Dec/2005:14:02:54 +800] “GET /test/testregx.php HTTP/1.1” 200 32
127.0.0.1 – – [01/Dec/2005:14:02:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32
127.0.0.1 – – [01/Dec/2005:14:02:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32
127.0.0.1 – – [01/Dec/2004:14:12:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32
127.0.0.1 – – [01/Feb/2005:14:02:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32
127.0.0.1 – – [01/Jan/2005:14:02:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32
127.0.0.1 – – [01/Jan/2004:14:22:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32

开始的想法:
不知道sort还可以分段排序,
所以先替换成一定规则,用awk提到行首,再用sort排序
然后清除行首排序字符,再替换成原来的样子。
缺点:
不能对年和月排序,效率慢,代码很傻

cat http.log |sed -e ‘s#\[#:#’ -e ‘s#/#:#’ -e ‘s#/#:#’ -e ‘s# +800]#:#’|awk -F: ‘{print $2$5$6$7″|”$0}’|sort -n|cut -d’|’ -f2|sed -e ‘s/:/[/’ -e ‘s#:#/#’ -e ‘s#:#/#’ -e ‘s/: “/ +0800]” /’

最简单的方法:
按空格分割后对第四列排序
缺点:
日志中的月份为英文,跨月后排序可能不正确,但速度很快

cat http.log |sort -t” ” -k4

改进型:
对年月时间进行多列排序
缺点:
算位不太好算,速度也不快,但是解决了日期排序

export LC_ALL=POSIX
cat http.log |sort -t’ ‘ -f -i -k 4.9,4.12n -k 4.5,4.7M -k 4.2,4.3n -k 4.14

#LC_ALL未声明可能会在排序月份再排日时显示错误顺序

输出结果

127.0.0.1 – – [01/Jan/2004:14:22:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32
127.0.0.1 – – [01/Dec/2004:14:12:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32
127.0.0.1 – – [01/Jan/2005:14:02:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32
127.0.0.1 – – [01/Feb/2005:14:02:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32
127.0.0.1 – – [01/Dec/2005:14:00:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32
127.0.0.1 – – [01/Dec/2005:14:01:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32
207.0.0.1 – – [01/Dec/2005:14:02:54 +800] “GET /test/testregx.php HTTP/1.1” 200 32
217.0.0.1 – – [01/Dec/2005:14:02:54 +800] “GET /test/testregx.php HTTP/1.1” 200 32
227.0.0.1 – – [01/Dec/2005:14:02:54 +800] “GET /test/testregx.php HTTP/1.1” 200 32
127.0.0.1 – – [01/Dec/2005:14:02:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32
127.0.0.1 – – [01/Dec/2005:14:02:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32
127.0.0.1 – – [01/Dec/2005:14:02:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32
127.0.0.1 – – [01/Dec/2005:14:02:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32
127.0.0.1 – – [01/Dec/2005:14:02:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32
127.0.0.1 – – [01/Dec/2005:14:04:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32
127.0.0.1 – – [01/Dec/2005:14:05:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32
127.0.0.1 – – [01/Dec/2005:15:02:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32

参考
http://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021
http://www.softpanorama.org/Tools/sort.shtml
http://www.phpman.info/index.php/info/sort
http://www.technow.com.hk/bash-shell-use-sort
http://www.chedong.com/tech/rotate_merge_log.html

Posted in shell, 技术.

Tagged with , , .


No Responses (yet)

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.



Some HTML is OK

or, reply to this post via trackback.