Skip to content


对apache 访问日志排序练习

记录一下练习 假设多台web服务器的日志合并在一起,需按日期重新排序。

样列:

127.0.0.1 – – [01/Dec/2005:14:00:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32 127.0.0.1 – – [01/Dec/2005:14:01:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32 127.0.0.1 – – [01/Dec/2005:14:05:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32 127.0.0.1 – – [01/Dec/2005:14:04:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32 127.0.0.1 – – [01/Dec/2005:14:02:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32 127.0.0.1 – – [01/Dec/2005:14:02:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32 127.0.0.1 – – [01/Dec/2005:15:02:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32 127.0.0.1 – – [01/Dec/2005:14:02:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32 207.0.0.1 – – [01/Dec/2005:14:02:54 +800] “GET /test/testregx.php HTTP/1.1” 200 32 227.0.0.1 – – [01/Dec/2005:14:02:54 +800] “GET /test/testregx.php HTTP/1.1” 200 32 217.0.0.1 – – [01/Dec/2005:14:02:54 +800] “GET /test/testregx.php HTTP/1.1” 200 32 127.0.0.1 – – [01/Dec/2005:14:02:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32 127.0.0.1 – – [01/Dec/2005:14:02:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32 127.0.0.1 – – [01/Dec/2004:14:12:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32 127.0.0.1 – – [01/Feb/2005:14:02:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32 127.0.0.1 – – [01/Jan/2005:14:02:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32 127.0.0.1 – – [01/Jan/2004:14:22:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32

开始的想法: 不知道sort还可以分段排序, 所以先替换成一定规则,用awk提到行首,再用sort排序 然后清除行首排序字符,再替换成原来的样子。 缺点: 不能对年和月排序,效率慢,代码很傻

cat http.log |sed -e ‘s#\[#:#’ -e ‘s#/#:#’ -e ‘s#/#:#’ -e ‘s# +800]#:#’|awk -F: ‘{print $2$5$6$7″|”$0}’|sort -n|cut -d’|’ -f2|sed -e ‘s/:/[/’ -e ‘s#:#/#’ -e ‘s#:#/#’ -e ‘s/: “/ +0800]” /’

最简单的方法: 按空格分割后对第四列排序 缺点: 日志中的月份为英文,跨月后排序可能不正确,但速度很快

cat http.log |sort -t” ” -k4

改进型: 对年月时间进行多列排序 缺点: 算位不太好算,速度也不快,但是解决了日期排序

export LC_ALL=POSIX cat http.log |sort -t’ ‘ -f -i -k 4.9,4.12n -k 4.5,4.7M -k 4.2,4.3n -k 4.14

#LC_ALL未声明可能会在排序月份再排日时显示错误顺序

输出结果

127.0.0.1 – – [01/Jan/2004:14:22:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32 127.0.0.1 – – [01/Dec/2004:14:12:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32 127.0.0.1 – – [01/Jan/2005:14:02:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32 127.0.0.1 – – [01/Feb/2005:14:02:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32 127.0.0.1 – – [01/Dec/2005:14:00:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32 127.0.0.1 – – [01/Dec/2005:14:01:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32 207.0.0.1 – – [01/Dec/2005:14:02:54 +800] “GET /test/testregx.php HTTP/1.1” 200 32 217.0.0.1 – – [01/Dec/2005:14:02:54 +800] “GET /test/testregx.php HTTP/1.1” 200 32 227.0.0.1 – – [01/Dec/2005:14:02:54 +800] “GET /test/testregx.php HTTP/1.1” 200 32 127.0.0.1 – – [01/Dec/2005:14:02:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32 127.0.0.1 – – [01/Dec/2005:14:02:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32 127.0.0.1 – – [01/Dec/2005:14:02:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32 127.0.0.1 – – [01/Dec/2005:14:02:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32 127.0.0.1 – – [01/Dec/2005:14:02:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32 127.0.0.1 – – [01/Dec/2005:14:04:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32 127.0.0.1 – – [01/Dec/2005:14:05:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32 127.0.0.1 – – [01/Dec/2005:15:02:55 +800] “GET /test/testregx.php HTTP/1.1” 200 32

参考 http://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021 http://www.softpanorama.org/Tools/sort.shtml http://www.phpman.info/index.php/info/sort http://www.technow.com.hk/bash-shell-use-sort http://www.chedong.com/tech/rotate_merge_log.html

Posted in shell, 技术.

Tagged with , , .


No Responses (yet)

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.



Some HTML is OK

or, reply to this post via trackback.