十、日志_elk
日志收集内容
在日常使用控制过程中,一般需要收集的日志为以下几类:
服务器系统日志:
/var/log/messages
/var/log/kube-xxx.log
Kubernetes组件日志:
kube-apiserver日志
kube-controller-manager日志
kube-scheduler日志
kubelet日志
kube-proxy日志
应用程序日志
云原生:控制台日志
非云原生:容器内日志文件
网关日志(如ingress-nginx)
服务之间调用链日志
日志收集工具
日志收集技术栈一般分为ELK,EFK,Grafana+Loki
ELK是由Elasticsearch、Logstash、Kibana三者组成
EFK是由Elasticsearch、Fluentd、Kibana三者组成
Filebeat+Kafka+Logstash+ES
Grafana+Loki Loki负责日志的存储和查询、Promtail负责收集日志并将其发送给Loki、Grafana用来展示或查询相关日志
ELK 日志流程可以有多种方案(不同组件可自由组合,根据自身业务配置),常见有以下:
Filebeat、Logstash、Fluentd(采集、处理)—> ElasticSearch (存储)—>Kibana (展示)
Filebeat、Logstash、Fluentd(采集)—> Logstash(聚合、处理)—> ElasticSearch (存储)—>Kibana (展示)
Filebeat、Logstash、Fluentd(采集)—> Kafka/Redis(消峰) —> Logstash(聚合、处理)—> ElasticSearch (存 储)—>Kibana (展示)
Logstash
Logstash 是一个开源的数据收集、处理和传输工具。它可以从多种来源(如日志文件、消息队列等)收集数据,并对数据进行过滤、解析和转换,最终将数据发送到目标存储(如 Elasticsearch)。
优势:有很多插件
缺点:性能以及资源消耗(默认的堆大小是 1GB)
Fluentd/FluentBit
语言:(Ruby + C)
GitHub 地址:https://github.com/fluent/fluentd-kubernetes-daemonset
在线文档:https://docs.fluentd.org/
由于Logstash比较“重”,并且配置稍微有些复杂,所以出现了EFK的日志收集解决方案。相对于ELK中Logstash,Fluentd采用“一锅端”的形式,可以直接将某些日志文件中的内容存储至Elasticsearch,然后通过Kibana进行展示。其中Fluentd只能收集控制台日志(使用logs命令查出来的日志),不能收集非控制台日志,不能很好的满足生产环境的需求。大部分情况下,没有遵循云原生理念开发的程序,往往都会输出很多日志文件,这些容器内的日志无法采集,除非在每个Pod内添加一个Sidecar,将日志文件的内容进行tail -f转成控制台日志,但这也是非常麻烦的。
另外,用来存储日志的Elasticsearch集群是不建议搭建在Kubernetes集群中的,因为会非常浪费Kubernetes集群资源,所以大部分情况下通过Fluentd采集日志输出到外部的Elasticsearch集群中。
优点: Fluentd占用资源小,语法简单
缺点:解析前没有缓冲,可能会导致日志管道出现背压,对转换数据的支持有限,就像您可以使用 Logstash 的 mutate 过滤器或 rsyslog 的变量和模板一样.
Fluentd只能收集控制台日志(使用logs命令查出来的日志),不能收集非控制台日志,不能很好的满足生产环境的需求,依赖Elasticsearch,维护难度和资源使用都是偏高.
和 syslog-ng 一样,它的缓冲只存在与输出端,单线程核心以及 Ruby GIL 实现的插件意味着它 大的节点下性能是受限的
Fluent-bit
语言:C
fluentd精简版
在线文档:https://docs.fluentbit.io/manual/about/fluentd-and-fluent-bit
Filebeat
语言:Golang
GitHub 地址:https://github.com/elastic/beats/tree/master/filebeat
在线文档:https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-getting-started.html
优势:只是一个二进制文件没有任何依赖。占用系统的CPU和内存小支持发送Logstash ,Elasticsearch,Kafka 和 Redis
缺点:有限的解析和丰富功能,可以在后端再加一层Logstash。
在早期的ELK架构中,日志收集均以Logstash为主,Logstash负责收集和解析日志,它对内存、CPU、IO资源的消耗比较高,但是Filebeat所占系统的CPU和内存几乎可以忽略不计。
由于Filebeat本身是比较轻量级的日志采集工具,因此Filebeat经常被用于以Sidecar的形式配置在Pod中,用来采集容器内程序输出的自定义日志文件。当然,Filebeat同样可以采用DaemonSet的形式部署在Kubernetes集群中,用于采集系统日志和程序控制台输出的日志。至于Filebeat为什么采用DaemonSet的形式部署而不是采用Deployment和StatefulSet部署,原因有以下几点:
收集节点级别的日志:Filebeat需要能够访问并收集每个节点上的日志文件,包括系统级别的日志和容器日志。Deployment和STS的主要目标是部署和管理应用程序的Pod,而不是关注节点级别的日志收集。因此,使用DaemonSet更适合收集节点级别的日志
自动扩展:Deployment和STS旨在管理应用程序的副本数,并确保所需的Pod数目在故障恢复和水平扩展时保持一致。但对于Filebeat来说,并不需要根据负载或应用程序的副本数来调整Pod数量。Filebeat只需在每个节点上运行一个实例即可,因此使用DaemonSet可以更好地满足这个需求
高可用性:Deployment和STS提供了副本管理和故障恢复的机制,确保应用程序的高可用性。然而,对于Filebeat而言,它是作为一个日志收集代理来收集日志,不同于应用程序,其故障恢复的机制和需求通常不同。使用DaemonSet可以确保在每个节点上都有一个运行中的Filebeat实例,即使某些节点上的Filebeat Pod不可用,也能保持日志收集的连续性
Fluentd和Logstash可以将采集的日志输出到Elasticsearch集群,Filebeat同样可以将日志直接存储到Elasticsearch中,Filebeat 也会和 Logstash 一样记住上次读取的偏移,但是为了更好地分析日志或者减轻Elasticsearch的压力,一般都是将日志先输出到Kafka,再由Logstash进行简单的处理,最后输出到Elasticsearch中。
LogAgent:
语言:JS
GitHub 地址:https://github.com/sematext/logagent-js
在线文档:https://sematext.com/docs/logagent/
优势:可以获取 /var/log 下的所有信息,解析各种格式(Elasticsearch,Solr,MongoDB,Apache HTTPD等等,以 掩盖敏感的数据信息 , Logagent 有本地缓冲,所以不像 Logstash ,在数据传输目的地不可用时会丢失日志
劣势:没有 Logstash 灵活
logtail:
阿里云日志服务的生产者,目前在阿里集团内部机器上运行,经过 3 年多时间的考验,目前为阿 里公有云用户提供日志收集服务
采用 C++语言实现,对稳定性、资源控制、管理等下过很大的功夫,性能良好。相比于 logstash、fluentd 的社区支持,logtail 功能较为单一,专注日志收集功能。
优势:
logtail 占用机器 cpu、内存资源最少,结合阿里云日志服务的 E2E 体验良好
劣势:
logtail 目前对特定日志类型解析的支持较弱,后续需要把这一块补起来。
rsyslog
绝大多数 Linux 发布版本默认的 syslog 守护进程
优势:是经测试过的最快的传输工具
rsyslog 适合那些非常轻的应用(应用,小 VM,Docker 容器)。如果需要在另一个传输工具(例 如,Logstash)中进行处理,可以直接通过 TCP 转发 JSON ,或者连接 Kafka/Redis 缓冲
syslog-ng
优势:和 rsyslog 一样,作为一个轻量级的传输工具,它的性能也非常好
Grafana Loki
Loki 及其生态系统是 ELK 堆栈的替代方案, ELK 相比,摄取速度更快:索引更少,无需合并
优势:小存储占用:较小的索引,数据只写入一次到长期存储
缺点:与 ELK 相比,较长时间范围内的查询和分析速度较慢,log shippers选项更少(例如 Promtail 或 Fluentd)
ElasticSearch
一个正常es集群中只有一个主节点(Master),主节点负责管理整个集群。如创建或删除索引,跟踪哪些节点是群集的一部分,并决定哪些分片分配给相关的节点。集群的所有节点都会选择同一个节点作为主节点
脑裂现象:
脑裂问题的出现就是因为从节点在选择主节点上出现分歧导致一个集群出现多个主节点从而使集群分裂,使得集群处于异常状态。主节点的角色既为master又为data。数据访问量较大时,可能会导致Master节点停止响应(假死状态)
避免脑裂:
1.网络原因:discovery.zen.ping.timeout 超时时间配置大一点。默认是3S
2.节点负载:角色分离策略
3.JVM内存回收:修改 config/jvm.options 文件的 -Xms 和 -Xmx 为服务器的内存一半。
5个管理节点,其中一个是工作主节点,其余4个是备选节点,集群脑裂因子设置是3.
节点类型/角色
Elasticsearch 7.9 之前的版本中的节点类型主要有4种,:数据节点、协调节点、候选主节点、ingest 节点.
7.9 以及之后节点类型升级为节点角色(Node roles)。
ES集群由多节点组成,每个节点通过node.name指定节点的名称
一个节点可以支持多个角色,也可以支持一种角色。
1、master节点
配置文件中node.master属性为true,就有资格被选为
master节点用于控制整个集群的操作,比如创建和删除索引,以及管理非master节点,管理集群元数据信息,集群节点信息,集群索引元数据信息;
node.master: true
node.data: false
2、data数据节点
配置文件中node.data属于为true,就有资格被选为data节点,存储实际数据,提供初步联合查询,初步聚合查询,也可以作为协调节点
主要用于执行数据相关的操作
node.master: false
node.data: true
3、客户端节点
配置文件中node.master和node.data均为false(既不能为master也不能为data)
用于响应客户的请求,把请求转发到其他节点
node.master: false
node.data: false
4、部落节点
当一个节点配置tribe.*的时候,它是一个特殊的客户端,可以连接多个集群,在所有集群上执行索引和操作
其它角色汇总
7.9 以后 | 角色缩写 | 英文释义 | 中文释义 |
---|---|---|---|
c | cold node | 冷数据节点 | |
d | data node | 数据节点 | |
f | frozen node | 冷冻数据节点 | |
h | hot node | 热数据节点 | |
i | ingest node | 数据预处理节点 | |
l | machine learning node | 机器学习节点 | |
m | master-eligible node | 候选主节点 | |
r | remote cluster client node | 远程节点 | |
s | content node | 内容数据节点 | |
t | transform node | 转换节点 | |
v | voting-only node | 仅投票节点 | |
w | warm node | 温数据节点 | |
空 | coordinating node only | 仅协调节点 |
新版使用node.roles 定义
node.roles: [data,master]
关于节点角色和硬件配置的关系,也是经常被提问的问题,推荐配置参考: | 角色 | 描述 | 存储 | 内存 | 计算 | 网络 |
---|---|---|---|---|---|---|
数据节点 | 存储和检索数据 | 极高 | 高 | 高 | 中 | |
主节点 | 管理集群状态 | 低 | 低 | 低 | 低 | |
Ingest 节点 | 转换输入数据 | 低 | 中 | 高 | 中 | |
机器学习节点 | 机器学习 | 低 | 极高 | 极高 | 中 | |
协调节点 | 请求转发和合并检索结果 | 低 | 中 | 中 | 中 |
集群选举
主从架构模式,一个集群只能有一个工作状态的管理节点,其余管理节点是备选,备选数量原则上不限制。很多大数据产品管理节点仅支持一主一从,如Greenplum、Hadoop、Prestodb;
工作管理节点自动选举,工作管理节点关闭之后自动触发集群重新选举,无需外部三方应用,无需人工干预。很多大数据产品需要人工切换或者借助第三方软件应用,如Greenplum、Hadoop、Prestodb。
discovery.zen.minimum_master_nodes = (master_eligible_nodes / 2) + 1
以1个主节点+4个候选节点为例设为3
conf/elasticsearch.yml:
discovery.zen.minimum_master_nodes: 3
协调路由
Elasticsearch集群中有多个节点,其中任一节点都可以查询数据或者写入数据,集群内部节点会有路由机制协调,转发请求到索引分片所在的节点。我们在迁移集群时采用应用代理切换,外部访问从旧集群数据节点切换到新集群数据节点,就是基于此特点。
查询主节点
http://192.168.111.200:9200/_cat/nodes?v
含有 * 的代表当前主节点
http://192.168.111.200:9200/_cat/master
排查
集群数据平衡
Elastic自身设计了集群分片的负载平衡机制,当有新数据节点加入集群或者离开集群,集群会自动平衡分片的负载分布。
索引分片会在数据节点之间平衡漂移,达到平均分布之后停止,频繁的集群节点加入或者下线会严重影响集群的IO,影响集群响应速度,所以要尽量避免次情况发生。如果频繁关闭重启,这样很容易造成集群问题。
#集群迁移时先关,迁移后再开
#禁用集群新创建索引分配
cluster.routing.allocation.enable: false
#禁用集群自动平衡
cluster.routing.rebalance.enable: false
ES 慢查询日志 打开
切换集群访问
Hadoop
Hadoop平台离线数据写入ES,从ES抽取数据。Elastic提供了Hadoop直连访问驱动。如Hive是通过创建映射表与Elasticsearch索引关联的,新的数据节点启动之后,原有所有Hive-Es映射表需要全部重新创建,更换其中的IP+PORT指向;由于Hive有很多与Elastic关联的表,所以短时间内没有那么快替换完成,新旧数据节点需要共存一段时间,不能在数据迁移完成之后马上关闭
#Hive指定连接
es.nodes=多个数据节点IP+PORT
业务系统应用实时查询
Elastic集群对外提供了代理访问
数据写入
kafka队列
安装
ELK数据处理流程
数据由Beats采集后,可以选择直接推送给Elasticsearch检索,或者先发送给Logstash处理,再推送给Elasticsearch,最后都通过Kibana进行数据可视化的展示
镜像文件准备
docker pull docker.io/fluent/fluentd-kubernetes-daemonset:v1.16.2-debian-elasticsearch8-amd64-1.1
wget https://github.com/fluent/fluentd-kubernetes-daemonset/blob/master/fluentd-daemonset-elasticsearch.yaml
在hub.docker.com 查询elasticsearch 可用版本
elasticsearch7.17.14
elasticsearch8.11.0
docker pull docker.elastic.co/elasticsearch/elasticsearch:8.11.0
docker pull docker.elastic.co/kibana/kibana:8.11.0
docker pull docker.elastic.co/logstash/logstash:8.11.0
docker pull docker.elastic.co/beats/filebeat:8.11.0
docker tag docker.elastic.co/elasticsearch/elasticsearch:8.11.0 repo.k8s.local/docker.elastic.co/elasticsearch/elasticsearch:8.11.0
docker tag docker.elastic.co/kibana/kibana:8.11.0 repo.k8s.local/docker.elastic.co/kibana/kibana:8.11.0
docker tag docker.elastic.co/beats/filebeat:8.11.0 repo.k8s.local/docker.elastic.co/beats/filebeat:8.11.0
docker tag docker.elastic.co/logstash/logstash:8.11.0 repo.k8s.local/docker.elastic.co/logstash/logstash:8.11.0
docker push repo.k8s.local/docker.elastic.co/elasticsearch/elasticsearch:8.11.0
docker push repo.k8s.local/docker.elastic.co/kibana/kibana:8.11.0
docker push repo.k8s.local/docker.elastic.co/beats/filebeat:8.11.0
docker push repo.k8s.local/docker.elastic.co/logstash/logstash:8.11.0
docker rmi docker.elastic.co/elasticsearch/elasticsearch:8.11.0
docker rmi docker.elastic.co/kibana/kibana:8.11.0
docker rmi docker.elastic.co/beats/filebeat:8.11.0
docker rmi docker.elastic.co/logstash/logstash:8.11.0
搭建elasticsearch+kibana
node.name定义节点名,使用metadata.name名称,需要能dns解析
cluster.initial_master_nodes 对应metadata.name名称加编号,编号从0开始
elasticsearch配置文件:
cat > log-es-elasticsearch.yml <<EOF
cluster.name: log-es
node.name: "log-es-elastic-sts-0"
path.data: /usr/share/elasticsearch/data
#path.logs: /var/log/elasticsearch
bootstrap.memory_lock: false
network.host: 0.0.0.0
http.port: 9200
#transport.tcp.port: 9300
#discovery.seed_hosts: ["127.0.0.1", "[::1]"]
cluster.initial_master_nodes: ["log-es-elastic-sts-0"]
xpack.security.enabled: "false"
xpack.security.transport.ssl.enabled: "false"
#增加参数,使head插件可以访问es
http.cors.enabled: true
http.cors.allow-origin: "*"
http.cors.allow-headers: Authorization,X-Requested-With,Content-Length,Content-Type
EOF
kibana配置文件:
statefulset管理的Pod名称是有序的,删除指定Pod后自动创建的Pod名称不会改变。
statefulset创建时必须指定server名称,如果server没有IP地址,则会对server进行DNS解析,找到对应的Pod域名。
statefulset具有volumeclaimtemplate卷管理模板,创建出来的Pod都具有独立卷,相互没有影响。
statefulset创建出来的Pod,拥有独立域名,我们在指定访问Pod资源时,可以使用域名指定,IP会发生改变,但是域名不会(域名组成:Pod名称.svc名称.svc名称空间.svc.cluster.local)
cat > log-es-kibana.yml <<EOF
server.port: 5601
server.host: "0.0.0.0"
elasticsearch.hosts: "http://localhost:9200"
i18n.locale: "zh-CN"
EOF
创建k8s命名空间
kubectl create namespace log-es
kubectl get namespace log-es
kubectl describe namespace log-es
创建elasticsearch和kibana的配置文件configmap:
1、configmap是以明文的形式将配置信息给pod内使用的办法。它的大小有限,不能超过1Mi
2、可以将文件、目录等多种形式做成configmap,并且通过env或者volume的形式供pod内使用。
3、它可以在不重新构建镜像或重启容器的情况下在线更新,但是需要一定时间间隔
以subPath方式挂载时,configmap更新,容器不会更新。
kubectl create configmap log-es-elastic-config -n log-es --from-file=log-es-elasticsearch.yml
kubectl create configmap log-es-kibana-config -n log-es --from-file=log-es-kibana.yml
更新方式1
#kubectl create configmap log-es-kibana-config --from-file log-es-kibana.yml -o yaml --dry-run=client | kubectl apply -f -
#kubectl get cm log-es-kibana-config -n log-es -o yaml > log-es-kibana.yaml && kubectl replace -f log-es-kibana.yaml -n log-es
#测试下来不行
更新方式2
kubectl edit configmap log-es-elastic-config -n log-es
kubectl edit configmap log-es-kibana-config -n log-es
查看列表
kubectl get configmap -n log-es
删除
kubectl delete cm log-es-elastic-config -n log-es
kubectl delete cm log-es-kibana-config -n log-es
kibana
kibana为有状态的固定节点,不需负载均衡,可以建无头服务
这个 Service 被创建后并不会被分配一个 VIP,而是会以 DNS 记录的方式暴露出它所代理的 Pod
<pod-name>.<svc-name>.<namespace>.svc.cluster.local
$(podname)-$(ordinal).$(servicename).$(namespace).svc.cluster.local
log-es-elastic-sts-0.es-kibana-svc.log-es.svc.cluster.local
cat > log-es-kibana-svc.yaml <<EOF
apiVersion: v1
kind: Service
metadata:
labels:
app: log-es-svc
name: es-kibana-svc
namespace: log-es
spec:
ports:
- name: 9200-9200
port: 9200
protocol: TCP
targetPort: 9200
nodePort: 9200
- name: 5601-5601
port: 5601
protocol: TCP
targetPort: 5601
nodePort: 5601
#clusterIP: None
selector:
app: log-es-elastic-sts
type: NodePort
#type: ClusterIP
EOF
创建es-kibana的有状态资源 yaml配置文件:
cat > log-es-kibana-sts.yaml <<EOF
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
app: log-es-elastic-sts
name: log-es-elastic-sts
namespace: log-es
spec:
replicas: 1
selector:
matchLabels:
app: log-es-elastic-sts
serviceName: "es-kibana-svc" #关联svc名称
template:
metadata:
labels:
app: log-es-elastic-sts
spec:
#imagePullSecrets:
#- name: registry-pull-secret
containers:
- name: log-es-elasticsearch
image: repo.k8s.local/docker.elastic.co/elasticsearch/elasticsearch:8.11.0
imagePullPolicy: IfNotPresent
# lifecycle:
# postStart:
# exec:
# command: [ "/bin/bash", "-c", touch /tmp/start" ] #sysctl -w vm.max_map_count=262144;ulimit -HSn 65535; 请在宿主机设定
#command: ["sh", "-c", "chown -R 1000:1000 /usr/share/elasticsearch/data"]
#command: [ "/bin/bash", "-c", "--" ]
#args: [ "while true; do sleep 30; done;" ]
#command: [ "/bin/bash", "-c","ulimit -HSn 65535;" ]
resources:
requests:
memory: "800Mi"
cpu: "800m"
limits:
memory: "1.2Gi"
cpu: "2000m"
ports:
- containerPort: 9200
- containerPort: 9300
volumeMounts:
- name: log-es-elastic-config
mountPath: /usr/share/elasticsearch/config/elasticsearch.yml
subPath: log-es-elasticsearch.yml #对应configmap log-es-elastic-config 中文件名称
- name: log-es-persistent-storage
mountPath: /usr/share/elasticsearch/data
env:
- name: TZ
value: Asia/Shanghai
- name: ES_JAVA_OPTS
value: -Xms512m -Xmx512m
- image: repo.k8s.local/docker.elastic.co/kibana/kibana:8.11.0
imagePullPolicy: IfNotPresent
#command: [ "/bin/bash", "-ce", "tail -f /dev/null" ]
name: log-es-kibana
ports:
- containerPort: 5601
env:
- name: TZ
value: Asia/Shanghai
volumeMounts:
- name: log-es-kibana-config
mountPath: /usr/share/kibana/config/kibana.yml
subPath: log-es-kibana.yml #对应configmap log-es-kibana-config 中文件名称
volumes:
- name: log-es-elastic-config
configMap:
name: log-es-elastic-config
- name: log-es-kibana-config
configMap:
name: log-es-kibana-config
- name: log-es-persistent-storage
hostPath:
path: /localdata/es/data
type: DirectoryOrCreate
#hostNetwork: true
#dnsPolicy: ClusterFirstWithHostNet
nodeSelector:
kubernetes.io/hostname: node02.k8s.local
EOF
单独调试文件,测试ulimit失败问题
cat > log-es-kibana-sts.yaml <<EOF
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
app: log-es-elastic-sts
name: log-es-elastic-sts
namespace: log-es
spec:
replicas: 1
selector:
matchLabels:
app: log-es-elastic-sts
serviceName: "es-kibana-svc" #关联svc名称
template:
metadata:
labels:
app: log-es-elastic-sts
spec:
#imagePullSecrets:
#- name: registry-pull-secret
# initContainers: # 初始化容器
# - name: init-vm-max-map
# image: repo.k8s.local/google_containers/busybox:9.9
# imagePullPolicy: IfNotPresent
# command: ["sysctl","-w","vm.max_map_count=262144"]
# securityContext:
# privileged: true
# - name: init-fd-ulimit
# image: repo.k8s.local/google_containers/busybox:9.9
# imagePullPolicy: IfNotPresent
# command: ["sh","-c","ulimit -HSn 65535;ulimit -n >/tmp/index/init.log"]
# securityContext:
# privileged: true
# volumeMounts:
# - name: init-test
# mountPath: /tmp/index
# terminationMessagePath: /dev/termination-log
# terminationMessagePolicy: File
containers:
- name: log-es-elasticsearch
image: repo.k8s.local/docker.elastic.co/elasticsearch/elasticsearch:8.11.0
imagePullPolicy: IfNotPresent
# securityContext:
# privileged: true
# capabilities:
# add: ["SYS_RESOURCE"]
# lifecycle:
# postStart:
# exec:
# command: [ "/bin/bash", "-c", "sysctl -w vm.max_map_count=262144; ulimit -l unlimited;echo 'Container started';" ]
# command: ["sh", "-c", "chown -R 1000:1000 /usr/share/elasticsearch/data"]
command: [ "/bin/bash", "-c", "--" ]
args: [ "while true; do sleep 30; done;" ]
resources:
requests:
memory: "800Mi"
cpu: "800m"
limits:
memory: "1Gi"
cpu: "1000m"
ports:
- containerPort: 9200
- containerPort: 9300
volumeMounts:
# - name: ulimit-config
# mountPath: /etc/security/limits.conf
# #readOnly: true
# #subPath: limits.conf
- name: log-es-elastic-config
mountPath: /usr/share/elasticsearch/config/elasticsearch.yml
subPath: log-es-elasticsearch.yml #对应configmap log-es-elastic-config 中文件名称
- name: log-es-persistent-storage
mountPath: /usr/share/elasticsearch/data
# - name: init-test
# mountPath: /tmp/index
env:
- name: TZ
value: Asia/Shanghai
- name: ES_JAVA_OPTS
value: -Xms512m -Xmx512m
volumes:
# - name: init-test
# emptyDir: {}
- name: log-es-elastic-config
configMap:
name: log-es-elastic-config
- name: log-es-persistent-storage
hostPath:
path: /localdata/es/data
type: DirectoryOrCreate
# - name: ulimit-config
# hostPath:
# path: /etc/security/limits.conf
#hostNetwork: true
#dnsPolicy: ClusterFirstWithHostNet
nodeSelector:
kubernetes.io/hostname: node02.k8s.local
EOF
elastic
elastic索引目录
在node02上,pod默认运行用户id=1000
mkdir -p /localdata/es/data
chmod 777 /localdata/es/data
chown 1000:1000 /localdata/es/data
在各节点上建filebeat registry 目录,包括master
使用daemonSet运行filebeat需要挂载/usr/share/filebeat/data,该目录下有一个registry文件,里面记录了filebeat采集日志位置的相关内容,比如文件offset、source、timestamp等,如果Pod发生异常后K8S自动将Pod进行重启,不挂载的情况下registry会被重置,将导致日志文件又从offset=0开始采集,结果就是es中日志重复一份,这点非常重要.
mkdir -p /localdata/filebeat/data
chown 1000:1000 /localdata/filebeat/data
chmod 777 /localdata/filebeat/data
kubectl apply -f log-es-kibana-sts.yaml
kubectl delete -f log-es-kibana-sts.yaml
kubectl apply -f log-es-kibana-svc.yaml
kubectl delete -f log-es-kibana-svc.yaml
kubectl get pods -o wide -n log-es
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
log-es-elastic-sts-0 0/2 ContainerCreating 0 66s <none> node02.k8s.local <none> <none>
#查看详情
kubectl -n log-es describe pod log-es-elastic-sts-0
kubectl -n log-es logs -f log-es-elastic-sts-0 -c log-es-elasticsearch
kubectl -n log-es logs -f log-es-elastic-sts-0 -c log-es-kibana
kubectl -n log-es logs log-es-elastic-sts-0 -c log-es-elasticsearch
kubectl -n log-es logs log-es-elastic-sts-0 -c log-es-kibana
kubectl -n log-es logs -f --tail=20 log-es-elastic-sts-0 -c log-es-elasticsearch
kubectl exec -it log-es-elasticsearch -n log-es -- /bin/sh
进入指定 pod 中指定容器
kubectl exec -n log-es -it log-es-elastic-sts-0 -c log-es-elasticsearch -- /bin/sh
kubectl exec -n log-es -it log-es-elastic-sts-0 -c log-es-kibana -- /bin/sh
kubectl exec -n log-es -it log-es-elastic-sts-0 -c log-es-elasticsearch -- /bin/sh -c 'cat /etc/security/limits.conf'
kubectl exec -n log-es -it log-es-elastic-sts-0 -c log-es-elasticsearch -- /bin/sh -c 'ulimit -HSn 65535'
kubectl exec -n log-es -it log-es-elastic-sts-0 -c log-es-elasticsearch -- /bin/sh -c 'ulimit -n'
kubectl exec -n log-es -it log-es-elastic-sts-0 -c log-es-elasticsearch -- /bin/sh -c 'cat /etc/security/limits.d/20-nproc.conf'
kubectl exec -n log-es -it log-es-elastic-sts-0 -c log-es-elasticsearch -- /bin/sh -c 'cat /etc/pam.d/login|grep pam_limits'
kubectl exec -n log-es -it log-es-elastic-sts-0 -c log-es-elasticsearch -- /bin/sh -c 'ls /etc/pam.d/sshd'
kubectl exec -n log-es -it log-es-elastic-sts-0 -c log-es-elasticsearch -- /bin/sh -c 'cat /etc/profile'
kubectl exec -n log-es -it log-es-elastic-sts-0 -c log-es-elasticsearch -- /bin/sh -c '/usr/share/elasticsearch/bin/elasticsearch'
kubectl exec -n log-es -it log-es-elastic-sts-0 -c log-es-elasticsearch -- /bin/sh -c 'ls /tmp/'
kubectl exec -n log-es -it log-es-elastic-sts-0 -c log-es-elasticsearch -- /bin/sh -c 'cat /dev/termination-log'
kubectl exec -n log-es -it log-es-elastic-sts-0 -c log-es-elasticsearch -- /bin/sh -c 'ps -aux'
kubectl exec -n log-es -it log-es-elastic-sts-0 -c log-es-kibana -- /bin/sh -c 'ps -aux'
启动失败主要原因
文件语柄不够
ulimit -HSn 65535
最大虚拟内存太小
sysctl -w vm.max_map_count=262144
数据目录权限不对
/usr/share/elasticsearch/data
xpack权限不对
xpack.security.enabled: "false"
容器添加调试,进入容器中查看不再退出
#command: [ "/bin/bash", "-c", "--" ]
#args: [ "while true; do sleep 30; done;" ]
查看运行用户
id
uid=1000(elasticsearch) gid=1000(elasticsearch) groups=1000(elasticsearch),0(root)
查看数据目录权限
touch /usr/share/elasticsearch/data/test
ls /usr/share/elasticsearch/data/
测试启动
/usr/share/elasticsearch/bin/elasticsearch
{"@timestamp":"2023-11-09T05:20:55.298Z", "log.level":"ERROR", "message":"node validation exception\n[2] bootstrap checks failed. You must address the points described in the following [2] lines before starting Elasticsearch. For more information see [https://www.elastic.co/guide/en/elasticsearch/reference/8.11/bootstrap-checks.html]\nbootstrap check failure [1] of [2]: max file descriptors [4096] for elasticsearch process is too low, increase to at least [65535]; for more information see [https://www.elastic.co/guide/en/elasticsearch/reference/8.11/_file_descriptor_check.html]\nbootstrap check failure [2] of [2]: Transport SSL must be enabled if security is enabled. Please set [xpack.security.transport.ssl.enabled] to [true] or disable security by setting [xpack.security.enabled] to [false]; for more information see [https://www.elastic.co/guide/en/elasticsearch/reference/8.11/bootstrap-checks-xpack.html#bootstrap-checks-tls]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.bootstrap.Elasticsearch","elasticsearch.node.name":"node-1","elasticsearch.cluster.name":"log-es"}
ERROR: Elasticsearch did not exit normally - check the logs at /usr/share/elasticsearch/logs/log-es.log
启动elasticsearch时,提示max virtual memory areas vm.max_map_count [65530] is too low, increase to at least
sysctl: setting key "vm.max_map_count", ignoring: Read-only file system
ulimit: max locked memory: cannot modify limit: Operation not permitted
错误
master not discovered yet, this node has not previously joined a bootstrapped cluster, and this node must discover master-eligible nodes
没有发现主节点
在/etc/elasticsearch/elasticsearch.yml文件中加入主节点名:cluster.initial_master_nodes: ["master","node"]
-
找到pod的CONTAINER 名称
在pod对应node下运行crictl ps CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD d3f8f373be7cb 7f03b6ec0c6f6 About an hour ago Running log-es-elasticsearch 0 cd3cebe50807f log-es-elastic-sts-0
-
找到pod的pid
crictl inspect d3f8f373be7cb |grep -i pid "pid": 8420, "pid": 1 "type": "pid"
-
容器外执行容器内命令
nsenter -t 8420 -n hostname node02.k8s.local
cat /proc/8420/limits |grep "open files"
Max open files 4096 4096 files
参考
https://imroc.cc/kubernetes/trick/deploy/set-sysctl/
https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/
解决:
postStart中执行无效,改成宿主机上执行,映射到容器
在node02上执行
sysctl -w vm.max_map_count=262144
sysctl -a|grep vm.max_map_count
cat /etc/security/limits.conf
cat > /etc/security/limits.d/20-nofile.conf <<EOF
root soft nofile 65535
root hard nofile 65535
- soft nofile 65535
- hard nofile 65535
EOF
cat > /etc/security/limits.d/20-nproc.conf <<EOF
-
- nproc 65535
root soft nproc unlimited
root hard nproc unlimited
EOF在CentOS 7版本中为/etc/security/limits.d/20-nproc.conf,在CentOS 6版本中为/etc/security/limits.d/90-nproc.conf
echo "* soft nofile 65535" >> /etc/security/limits.conf
echo "* hard nofile 65535" >> /etc/security/limits.conf
echo "andychu soft nofile 65535" >> /etc/security/limits.conf
echo "andychu hard nofile 65535" >> /etc/security/limits.conf
echo "ulimit -HSn 65535" >> /etc/rc.local
- nproc 65535
ulimit -a
sysctl -p
systemctl show sshd |grep LimitNOFILE
cat /etc/systemd/system.conf|grep DefaultLimitNOFILE
sed -n 's/#DefaultLimitNOFILE=/DefaultLimitNOFILE=65535/p' /etc/systemd/system.conf
sed -i 's/^#DefaultLimitNOFILE=/DefaultLimitNOFILE=65535/' /etc/systemd/system.conf
systemctl daemon-reexec
systemctl restart containerd
systemctl restart kubelet
crictl inspect c30a814bcf048 |grep -i pid
cat /proc/53657/limits |grep "open files"
Max open files 65335 65335 files
kubectl get pods -o wide -n log-es
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
log-es-elastic-sts-0 2/2 Running 0 2m19s 10.244.2.131 node02.k8s.local <none> <none>
[root@node01 nginx]# curl http://10.244.2.131:9200
{
"name" : "node-1",
"cluster_name" : "log-es",
"cluster_uuid" : "Agfoz8qmS3qob_R6bp2cAw",
"version" : {
"number" : "8.11.0",
"build_flavor" : "default",
"build_type" : "docker",
"build_hash" : "d9ec3fa628c7b0ba3d25692e277ba26814820b20",
"build_date" : "2023-11-04T10:04:57.184859352Z",
"build_snapshot" : false,
"lucene_version" : "9.8.0",
"minimum_wire_compatibility_version" : "7.17.0",
"minimum_index_compatibility_version" : "7.0.0"
},
"tagline" : "You Know, for Search"
}
kubectl get pod -n log-es
kubectl get pod -n test
查看service
kubectl get service -n log-es
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
es-kibana-svc ClusterIP None <none> 9200/TCP,5601/TCP 53s
kubectl apply -f log-es-kibana-svc.yaml
kubectl delete -f log-es-kibana-svc.yaml
kubectl exec -it pod/test-pod-1 -n test — ping www.c1gstudio.com
kubectl exec -it pod/test-pod-1 -n test — ping svc-openresty.test
kubectl exec -it pod/test-pod-1 -n test — nslookup log-es-elastic-sts-0.es-kibana-svc.log-es
kubectl exec -it pod/test-pod-1 -n test — ping log-es-elastic-sts-0.es-kibana-svc.log-es
kubectl exec -it pod/test-pod-1 -n test — curl http://log-es-elastic-sts-0.es-kibana-svc.log-es.svc.cluster.local:9200
kubectl exec -it pod/test-pod-1 -n test — curl -L http://log-es-elastic-sts-0.es-kibana-svc.log-es.svc.cluster.local:5601
cat > log-es-kibana-svc.yaml <<EOF
apiVersion: v1
kind: Service
metadata:
labels:
app: log-es-svc
name: es-kibana-svc
namespace: log-es
spec:
ports:
- name: 9200-9200
port: 9200
protocol: TCP
targetPort: 9200nodePort: 9200
- name: 5601-5601
port: 5601
protocol: TCP
targetPort: 5601nodePort: 5601
clusterIP: None
selector:
app: log-es-elastic-sts
type: NodePorttype: ClusterIP
EOF
kubectl get service -n log-es
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
es-kibana-svc NodePort 10.96.128.50 <none> 9200:30118/TCP,5601:31838/TCP 16m
使用nodeip+port访问,本次端口为31838
curl -L http://192.168.244.7:31838
curl -L http://10.96.128.50:5601
外部nat转发后访问
http://127.0.0.1:5601/
ingress
cat > log-es-kibana-ingress.yaml << EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress-kibana
namespace: log-es
labels:
app.kubernetes.io/name: nginx-ingress
app.kubernetes.io/part-of: kibana
annotations:
kubernetes.io/ingress.class: "nginx"
spec:
ingressClassName: nginx
rules:
- host: kibana.k8s.local
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: es-kibana-svc
port:
number: 5601
EOF
kubectl apply -f log-es-kibana-ingress.yaml
kubectl get ingress -n log-es
curl -L -H "Host:kibana.k8s.local" http://10.96.128.50:5601
filebeat
#https://www.elastic.co/guide/en/beats/filebeat/8.11/drop-fields.html
#https://raw.githubusercontent.com/elastic/beats/7.9/deploy/kubernetes/filebeat-kubernetes.yaml
cat > log-es-filebeat-configmap.yaml <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
name: log-es-filebeat-config
namespace: log-es
data:
filebeat.yml: |-
filebeat.inputs:
- type: container
containers.ids:
- "*"
id: 99
enabled: true
tail_files: true
paths:
- /var/log/containers/*.log
#- /var/lib/docker/containers/*/*.log
#- /var/log/pods/*/*/*.log
processors:
- add_kubernetes_metadata:
in_cluster: true
matchers:
- logs_path:
logs_path: "/var/log/containers/"
fields_under_root: true
exclude_files: ['\.gz$']
tags: ["k8s"]
fields:
source: "container"
- type: filestream
id: 100
enabled: true
tail_files: true
paths:
- /var/log/nginx/access*.log
processors:
- decode_json_fields:
fields: [ 'message' ]
target: "" # 指定日志字段message,头部以json标注,如果不要json标注则设置为空如:target: ""
overwrite_keys: false # 默认情况下,解码后的 JSON 位于输出文档中的“json”键下。如果启用此设置,则键将在输出文档中的顶层复制。默认值为 false
process_array: false
max_depth: 1
- drop_fields:
fields: ["agent","ecs.version"]
ignore_missing: true
fields_under_root: true
tags: ["ingress-nginx-access"]
fields:
source: "ingress-nginx-access"
- type: filestream
id: 101
enabled: true
tail_files: true
paths:
- /var/log/nginx/error.log
close_inactive: 5m
ignore_older: 24h
clean_inactive: 96h
clean_removed: true
fields_under_root: true
tags: ["ingress-nginx-error"]
fields:
source: "ingress-nginx-error"
- type: filestream
id: 102
enabled: true
tail_files: true
paths:
- /nginx/logs/*.log
exclude_files: ['\.gz$','error.log']
close_inactive: 5m
ignore_older: 24h
clean_inactive: 96h
clean_removed: true
fields_under_root: true
tags: ["web-log"]
fields:
source: "nginx-access"
output.logstash:
hosts: ["logstash.log-es.svc.cluster.local:5044"]
#output.elasticsearch:
#hosts: ["http://log-es-elastic-sts-0.es-kibana-svc.log-es.svc.cluster.local:9200"]
#index: "log-%{[fields.tags]}-%{+yyyy.MM.dd}"
#indices:
#- index: "log-ingress-nginx-access-%{+yyyy.MM.dd}"
#when.contains:
#tags: "ingress-nginx-access"
#- index: "log-ingress-nginx-error-%{+yyyy.MM.dd}"
#when.contains:
#tags: "ingress-nginx-error"
#- index: "log-web-log-%{+yyyy.MM.dd}"
#when.contains:
#tags: "web-log"
#- index: "log-k8s-%{+yyyy.MM.dd}"
#when.contains:
#tags: "k8s"
json.keys_under_root: true # 默认情况下,解码后的 JSON 位于输出文档中的“json”键下。如果启用此设置,则键将在输出文档中的顶层复制。默认值为 false
json.overwrite_keys: true # 如果启用了此设置,则解码的 JSON 对象中的值将覆盖 Filebeat 在发生冲突时通常添加的字段(类型、源、偏移量等)
setup.template.enabled: false #false不使用默认的filebeat-%{[agent.version]}-%{+yyyy.MM.dd}索引
setup.template.overwrite: true #开启新设置的模板
setup.template.name: "log" #设置一个新的模板,模板的名称
setup.template.pattern: "log-*" #模板匹配那些索引
filebeat.config.modules:
path: ${path.config}/modules.d/*.yml
reload.enabled: false
#setup.template.settings:
# index.number_of_shards: 1
# index.number_of_replicas: 1
setup.ilm.enabled: false # 修改索引名称,要关闭索引生命周期管理ilm功能
logging.level: warning #debug、info、warning、error
logging.to_syslog: false
logging.metrics.period: 300s
logging.to_files: true
logging.files:
path: /tmp/
name: "filebeat.log"
rotateeverybytes: 10485760
keepfiles: 7
EOF
cat > log-es-filebeat-daemonset.yaml <<EOF
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: filebeat
namespace: log-es
spec:
#replicas: 1
selector:
matchLabels:
app: filebeat
template:
metadata:
labels:
app: filebeat
spec:
serviceAccount: filebeat
containers:
- name: filebeat
image: repo.k8s.local/docker.elastic.co/beats/filebeat:8.11.0
imagePullPolicy: IfNotPresent
env:
- name: TZ
value: "Asia/Shanghai"
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 100Mi
volumeMounts:
- name: filebeat-config
readOnly: true
mountPath: /config/filebeat.yml # Filebeat 配置
subPath: filebeat.yml
- name: fb-data
mountPath: /usr/share/filebeat/data
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
readOnly: true
mountPath: /var/lib/docker/containers
- name: varlogingress
readOnly: true
mountPath: /var/log/nginx
- name: varlogweb
readOnly: true
mountPath: /nginx/logs
args:
- -c
- /config/filebeat.yml
volumes:
- name: fb-data
hostPath:
path: /localdata/filebeat/data
type: DirectoryOrCreate
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: varlogingress
hostPath:
path: /var/log/nginx
- name: varlogweb
hostPath:
path: /nginx/logs
- name: filebeat-config
configMap:
name: log-es-filebeat-config
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
EOF
kubectl apply -f log-es-filebeat-configmap.yaml
kubectl delete -f log-es-filebeat-configmap.yaml
kubectl apply -f log-es-filebeat-daemonset.yaml
kubectl delete -f log-es-filebeat-daemonset.yaml
kubectl get pod -n log-es -o wide
kubectl get cm -n log-es
kubectl get ds -n log-es
kubectl edit configmap log-es-filebeat-config -n log-es
kubectl get service -n log-es
#更新configmap后需手动重启pod
kubectl rollout restart ds/filebeat -n log-es
kubectl patch ds filebeat -n log-es --patch '{"spec": {"template": {"metadata": {"annotations": {"version/config": "202311141" }}}}}'
#重启es
kubectl rollout restart sts/log-es-elastic-sts -n log-es
#删除 congfig
kubectl delete cm filebeat -n log-es
kubectl delete cm log-es-filebeat-config -n log-es
#查看详细
kubectl -n log-es describe pod filebeat-mdldl
kubectl -n log-es logs -f filebeat-4kgpl
#查看pod
kubectl exec -n log-es -it filebeat-hgpnl -- /bin/sh
kubectl exec -n log-es -it filebeat-q69f5 -- /bin/sh -c 'ps aux'
kubectl exec -n log-es -it filebeat-wx4x2 -- /bin/sh -c 'cat /config/filebeat.yml'
kubectl exec -n log-es -it filebeat-4j2qd -- /bin/sh -c 'cat /tmp/filebeat*'
kubectl exec -n log-es -it filebeat-9qx6f -- /bin/sh -c 'cat /tmp/filebeat*'
kubectl exec -n log-es -it filebeat-9qx6f -- /bin/sh -c 'ls /usr/share/filebeat/data'
kubectl exec -n log-es -it filebeat-9qx6f -- /bin/sh -c 'filebeat modules list'
kubectl exec -n log-es -it filebeat-kmrcc -- /bin/sh -c 'curl http://localhost:5066/?pretty'
kubectl exec -n log-es -it filebeat-hqz9b -- /bin/sh -c 'curl http://log-es-elastic-sts-0.es-kibana-svc.log-es.svc.cluster.local:9200/_cat/indices?v'
curl -XGET 'http://log-es-elastic-sts-0.es-kibana-svc.log-es.svc.cluster.local:9200/_cat/indices?v'
curl http://10.96.128.50:9200/_cat/indices?v
错误 serviceaccount
Failed to watch v1.Node: failed to list v1.Node: nodes "node02.k8s.local" is forbidden: User "system:serviceaccount:log-es:default" cannot list resource "nodes" in API group "" at the cluster scope
当你创建namespace的时候,会默认为该namespace创建一个名为default的serviceaccount。这个的错误的信息代表的意思是,pod用namespace默认的serviceaccout是没有权限访问K8s的 API group的。可以通过命令查看:
kubectl get sa -n log-es
解决方法
创建一个 tillerServiceAccount.yaml,并使用 kubectl apply -f tiller-ServiceAccount.yaml 创建账号解角色,其中kube-system就是xxxx就是你的命名空间
vi tiller-ServiceAccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: tiller
namespace: log-es
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: tiller
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: tiller
namespace: log-es
kubectl apply -f tiller-ServiceAccount.yaml
和当前运行pod关联,同时修改yaml
kubectl patch ds –namespace log-es filebeat -p ‘{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}’
vi filebeat-ServiceAccount.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: filebeat
subjects:
- kind: ServiceAccount
name: filebeat
namespace: log-es
roleRef:
kind: ClusterRole
name: filebeat
apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: filebeat
namespace: log-es
subjects:
- kind: ServiceAccount
name: filebeat
namespace: log-es
roleRef:
kind: Role
name: filebeat
apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: filebeat-kubeadm-config
namespace: log-es
subjects:
- kind: ServiceAccount
name: filebeat
namespace: log-es
roleRef:
kind: Role
name: filebeat-kubeadm-config
apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: filebeat
labels:
k8s-app: filebeat
rules:
- apiGroups: [""] # "" indicates the core API group
resources:
- namespaces
- pods
- nodes
verbs:
- get
- watch
- list
- apiGroups: ["apps"]
resources:
- replicasets
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: filebeat
# should be the namespace where filebeat is running
namespace: log-es
labels:
k8s-app: filebeat
rules:
- apiGroups:
- coordination.k8s.io
resources:
- leases
verbs: ["get", "create", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: filebeat-kubeadm-config
namespace: log-es
labels:
k8s-app: filebeat
rules:
- apiGroups: [""]
resources:
- configmaps
resourceNames:
- kubeadm-config
verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: filebeat
namespace: log-es
labels:
k8s-app: filebeat
kubectl apply -f filebeat-ServiceAccount.yaml
kubectl get sa -n log-es
nginx模块不好用
在filebeat.yml中启用模块
filebeat.modules:
- module: nginx
```
Exiting: module nginx is configured but has no enabled filesets
还需要改文件名
./filebeat modules enable nginx
./filebeat --modules nginx
需写在modules.d/nginx.yml中
Filebeat的output
1、Elasticsearch Output (Filebeat收集到数据,输出到es里。默认的配置文件里是有的,也可以去官网上去找)
2、Logstash Output (Filebeat收集到数据,输出到logstash里。默认的配置文件里是有的,也可以得去官网上去找)
3、Redis Output (Filebeat收集到数据,输出到redis里。默认的配置文件里是没有的,得去官网上去找)
4、File Output (Filebeat收集到数据,输出到file里。默认的配置文件里是有的,也可以去官网上去找)
5、Console Output (Filebeat收集到数据,输出到console里。默认的配置文件里是有的,也可以去官网上去找)
https://www.elastic.co/guide/en/beats/filebeat/8.12/configuring-howto-filebeat.html
### 增加logstash来对采集到的原始日志进行业务需要的清洗
vi log-es-logstash-deploy.yaml
apiVersion: apps/v1
kind: DaemonSet
kind: StatefulSet
kind: Deployment
metadata:
name: logstash
namespace: log-es
labels:
app: logstash
spec:
selector:
matchLabels:
app: logstash
template:
metadata:
labels:
app: logstash
spec:
terminationGracePeriodSeconds: 30
hostNetwork: true
#dnsPolicy: ClusterFirstWithHostNet
containers:
- name: logstash
ports:
- containerPort: 5044
name: logstash
command:
- logstash
- '-f'
- '/etc/logstash_c/logstash.conf'
image: repo.k8s.local/docker.elastic.co/logstash/logstash:8.11.0
env:
- name: TZ
value: "Asia/Shanghai"
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
volumeMounts:
- name: config-volume
mountPath: /etc/logstash_c/
- name: config-yml-volume
mountPath: /usr/share/logstash/config/
resources: #logstash一定要加上资源限制,避免对其他业务造成资源抢占影响
limits:
cpu: 1000m
memory: 2048Mi
requests:
cpu: 512m
memory: 512Mi
volumes:
- name: config-volume
configMap:
name: logstash-conf
items:
- key: logstash.conf
path: logstash.conf
- name: config-yml-volume
configMap:
name: logstash-yml
items:
- key: logstash.yml
path: logstash.yml
nodeSelector:
ingresstype: ingress-nginx
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
vi log-es-logstash-svc.yaml
apiVersion: v1
kind: Service
metadata:
name: logstash
annotations:
labels:
app: logstash
namespace: log-es
spec:
type: NodePort
type: ClusterIP
ports:
- name: http
port: 5044nodePort: 30044
protocol: TCP
targetPort: 5044
clusterIP: None
selector:
app: logstash
vi log-es-logstash-ConfigMap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: logstash-conf
namespace: log-es
labels:
app: logstash
data:
logstash.conf: |-
input {
beats {
port => 5044
}
}
filter{
if [agent][type] == "filebeat" {
mutate{
remove_field => "[agent]"
remove_field => "[ecs]"
remove_field => "[log][offset]"
}
}
if [input][type] == "container" {
mutate{
remove_field => "[kubernetes][node][hostname]"
remove_field => "[kubernetes][labels]"
remove_field => "[kubernetes][namespace_labels]"
remove_field => "[kubernetes][node][labels]"
}
}
处理ingress日志
if [kubernetes][container][name] == "nginx-ingress-controller" {
json {
source => "message"
target => "ingress_log"
}
if [ingress_log][requesttime] {
mutate {
convert => ["[ingress_log][requesttime]", "float"]
}
}
if [ingress_log][upstremtime] {
mutate {
convert => ["[ingress_log][upstremtime]", "float"]
}
}
if [ingress_log][status] {
mutate {
convert => ["[ingress_log][status]", "float"]
}
}
if [ingress_log][httphost] and [ingress_log][uri] {
mutate {
add_field => {"[ingress_log][entry]" => "%{[ingress_log][httphost]}%{[ingress_log][uri]}"}
}
mutate{
split => ["[ingress_log][entry]","/"]
}
if [ingress_log][entry][1] {
mutate{
add_field => {"[ingress_log][entrypoint]" => "%{[ingress_log][entry][0]}/%{[ingress_log][entry][1]}"}
remove_field => "[ingress_log][entry]"
}
}
else{
mutate{
add_field => {"[ingress_log][entrypoint]" => "%{[ingress_log][entry][0]}/"}
remove_field => "[ingress_log][entry]"
}
}
}
}
# 处理以srv进行开头的业务服务日志
if [kubernetes][container][name] =~ /^srv*/ {
json {
source => "message"
target => "tmp"
}
if [kubernetes][namespace] == "kube-system" {
drop{}
}
if [tmp][level] {
mutate{
add_field => {"[applog][level]" => "%{[tmp][level]}"}
}
if [applog][level] == "debug"{
drop{}
}
}
if [tmp][msg]{
mutate{
add_field => {"[applog][msg]" => "%{[tmp][msg]}"}
}
}
if [tmp][func]{
mutate{
add_field => {"[applog][func]" => "%{[tmp][func]}"}
}
}
if [tmp][cost]{
if "ms" in [tmp][cost]{
mutate{
split => ["[tmp][cost]","m"]
add_field => {"[applog][cost]" => "%{[tmp][cost][0]}"}
convert => ["[applog][cost]", "float"]
}
}
else{
mutate{
add_field => {"[applog][cost]" => "%{[tmp][cost]}"}
}
}
}
if [tmp][method]{
mutate{
add_field => {"[applog][method]" => "%{[tmp][method]}"}
}
}
if [tmp][request_url]{
mutate{
add_field => {"[applog][request_url]" => "%{[tmp][request_url]}"}
}
}
if [tmp][meta._id]{
mutate{
add_field => {"[applog][traceId]" => "%{[tmp][meta._id]}"}
}
}
if [tmp][project] {
mutate{
add_field => {"[applog][project]" => "%{[tmp][project]}"}
}
}
if [tmp][time] {
mutate{
add_field => {"[applog][time]" => "%{[tmp][time]}"}
}
}
if [tmp][status] {
mutate{
add_field => {"[applog][status]" => "%{[tmp][status]}"}
convert => ["[applog][status]", "float"]
}
}
}
mutate{
rename => ["kubernetes", "k8s"]
remove_field => "beat"
remove_field => "tmp"
remove_field => "[k8s][labels][app]"
remove_field => "[event][original]"
}
}
output{
if [source] == "container" {
elasticsearch {
hosts => ["http://log-es-elastic-sts-0.es-kibana-svc.log-es.svc.cluster.local:9200"]
codec => json
index => "k8s-logstash-container-%{+YYYY.MM.dd}"
}
#stdout { codec => rubydebug }
}
if [source] == "ingress-nginx-access" {
elasticsearch {
hosts => ["http://log-es-elastic-sts-0.es-kibana-svc.log-es.svc.cluster.local:9200"]
codec => json
index => "k8s-logstash-ingress-nginx-access-%{+YYYY.MM.dd}"
}
#stdout { codec => rubydebug }
}
if [source] == "ingress-nginx-error" {
elasticsearch {
hosts => ["http://log-es-elastic-sts-0.es-kibana-svc.log-es.svc.cluster.local:9200"]
codec => json
index => "k8s-logstash-ingress-nginx-error-%{+YYYY.MM.dd}"
}
#stdout { codec => rubydebug }
}
if [source] == "nginx-access" {
elasticsearch {
hosts => ["http://log-es-elastic-sts-0.es-kibana-svc.log-es.svc.cluster.local:9200"]
codec => json
index => "k8s-logstash-nginx-access-%{+YYYY.MM.dd}"
}
#stdout { codec => rubydebug }
}
#stdout { codec => rubydebug }
}
apiVersion: v1
kind: ConfigMap
metadata:
name: logstash-yml
namespace: log-es
labels:
app: logstash
data:
logstash.yml: |-
http.host: "0.0.0.0"
xpack.monitoring.elasticsearch.hosts: http://log-es-elastic-sts-0.es-kibana-svc.log-es.svc.cluster.local:9200
kubectl apply -f log-es-logstash-ConfigMap.yaml
kubectl delete -f log-es-logstash-ConfigMap.yaml
kubectl apply -f log-es-logstash-deploy.yaml
kubectl delete -f log-es-logstash-deploy.yaml
kubectl apply -f log-es-logstash-svc.yaml
kubectl delete -f log-es-logstash-svc.yaml
kubectl apply -f log-es-filebeat-configmap.yaml
kubectl get pod -n log-es -o wide
kubectl get cm -n log-es
kubectl get ds -n log-es
kubectl edit configmap log-es-filebeat-config -n log-es
kubectl get service -n log-es
查看详细
kubectl -n log-es describe pod filebeat-97l85
kubectl -n log-es logs -f logstash-847d7f5b56-jv5jj
kubectl logs -n log-es $(kubectl get pod -n log-es -o jsonpath='{.items[3].metadata.name}’) -f
kubectl exec -n log-es -it filebeat-97l85 — /bin/sh -c ‘cat /tmp/filebeat*’
更新configmap后需手动重启pod
kubectl rollout restart ds/filebeat -n log-es
kubectl rollout restart deploy/logstash -n log-es
强制关闭
kubectl delete pod filebeat-fncq9 -n log-es –force –grace-period=0
kubectl exec -it pod/test-pod-1 -n test — ping logstash.log-es.svc.cluster.local
kubectl exec -it pod/test-pod-1 -n test — curl http://logstash.log-es.svc.cluster.local:5044
“`
“`
#停止服务
kubectl delete -f log-es-filebeat-daemonset.yaml
kubectl delete -f log-es-logstash-deploy.yaml
kubectl delete -f log-es-kibana-sts.yaml
kubectl delete -f log-es-kibana-svc.yaml
“`
# filebeat使用syslog接收
log-es-filebeat-configmap.yaml
“`
– type: syslog
format: auto
id: syslog-id
enabled: true
max_message_size: 20KiB
timeout: 10
keep_null: true
processors:
– drop_fields:
fields: [“input”,”agent”,”ecs.version”,”log.offset”,”event”,”syslog”]
ignore_missing: true
protocol.udp:
host: “0.0.0.0:33514”
tags: [“web-access”]
fields:
source: “syslog-web-access”
“`
damonset的node上查看是否启用的hostport
netstat -anup
lsof -i
##pod中安装ping测试
“`
curl -o /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-6.repo
sed -i -e ‘/mirrors.cloud.aliyuncs.com/d’ -e ‘/mirrors.aliyuncs.com/d’ /etc/yum.repos.d/CentOS-Base.repo
sed -i ‘s/mirrors.aliyun.com/vault.centos.org/g’ /etc/yum.repos.d/CentOS-Base.repo
sed -i ‘s/gpgcheck=1/gpgcheck=0/g’ /etc/yum.repos.d/CentOS-Base.repo
yum clean all && yum makecache
yum install iputils
ping 192.168.244.4
“`
admin
https://nginx.org/en/docs/syslog.html
nginx支持udp发送,不支持tcp
nginx配制文件
“`
access_log syslog:server=192.168.244.7:33514,facility=local5,tag=data_c1gstudiodotnet,severity=info access;
“`
filebeat 可以接收到。
“`
{
“@timestamp”: “2024-02-20T02:35:41.000Z”,
“@metadata”: {
“beat”: “filebeat”,
“type”: “_doc”,
“version”: “8.11.0”,
“truncated”: false
},
“hostname”: “openresty-php5.2-6cbdff6bbd-7fjdc”,
“process”: {
“program”: “data_c1gstudiodotnet”
},
“host”: {
“name”: “node02.k8s.local”
},
“agent”: {
“id”: “cf964318-5fdc-493e-ae2c-d2acb0bc6ca8”,
“name”: “node02.k8s.local”,
“type”: “filebeat”,
“version”: “8.11.0”,
“ephemeral_id”: “42789eee-3658-4f0f-982e-cb96d18fd9a2”
},
“message”: “10.100.3.80 – – [20/Feb/2024:10:35:41 +0800] \”GET /admin/imgcode/imgcode.php HTTP/1.1\” 200 1271 \”https://data.c1gstudio.net:31443/admin/login.php?1\” \”Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.5735.289 Safari/537.36\” “,
“ecs”: {
“version”: “8.0.0”
},
“log”: {
“source”: {
“address”: “10.244.2.216:37255”
}
},
“tags”: [
“syslog-web-log”
],
“fields”: {
“source”: “syslog-nginx-access”
}
}
“`
测试filebeat使用syslog接收
echo "hello" > /dev/udp/192.168.244.4/1514
进阶版,解决filebeat宿主机ip问题。
部署时将node的ip写入hosts中,nginx中使用主机名来通信
“`
containers:
– name: openresty-php-fpm5-2-17
env:
– name: MY_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
– name: MY_NODE_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
command: [“/bin/sh”, “-c”, “echo \”$(MY_NODE_IP) MY_NODE_IP\” >> /etc/hosts;/opt/lemp start;cd /opt/init/ && ./inotify_reload.sh “]
“`
nginx配制
“`
access_log syslog:server=MY_NODE_IP:33514,facility=local5,tag=data_c1gstudiodotnet,severity=info access;
“`
# filebeat使用unix socket接收.
filebeat将socket共享给宿主,nginx挂载宿主socket,将消息发送给socket
nginx 配制文件
“`
access_log syslog:server=unix:/usr/local/filebeat/filebeat.sock,facility=local5,tag=data_c1gstudiodotnet,severity=info access;
“`
各个node上创建共享目录,自动创建的为root 755 pod内不能写
mkdir -m 0777 /localdata/filebeat/socket
chmod 0777 /localdata/filebeat/socket
filebeat配制文件
“`
– type: unix
enabled: true
id: unix-id
max_message_size: 100KiB
path: “/usr/share/filebeat/socket/filebeat.sock”
socket_type: datagram
#group: “website”
processors:
– syslog:
field: message
– drop_fields:
fields: [“input”,”agent”,”ecs”,”log.syslog.severity”,”log.syslog.facility”,”log.syslog.priority”]
ignore_missing: true
tags: [“web-access”]
fields:
source: “unix-web-access”
“`
filebeat的damonset
“`
volumeMounts:
– name: fb-socket
mountPath: /usr/share/filebeat/socket
volumes:
– name: fb-socket
hostPath:
path: /localdata/filebeat/socket
type: DirectoryOrCreate
“`
nginx的deployment
“`
volumeMounts:
– name: host-filebeat-socket
mountPath: “/usr/local/filebeat”
– name: host-filebeat-socket
volumes:
hostPath:
path: /localdata/filebeat/socket
type: Directory
“`
示例
“`
{
“@timestamp”: “2024-02-20T06:17:08.000Z”,
“@metadata”: {
“beat”: “filebeat”,
“type”: “_doc”,
“version”: “8.11.0”
},
“agent”: {
“type”: “filebeat”,
“version”: “8.11.0”,
“ephemeral_id”: “4546cf71-5f33-4f5d-bc91-5f0a58c9b0fd”,
“id”: “cf964318-5fdc-493e-ae2c-d2acb0bc6ca8”,
“name”: “node02.k8s.local”
},
“message”: “10.100.3.80 – – [20/Feb/2024:14:17:08 +0800] \”GET /admin/imgcode/imgcode.php HTTP/1.1\” 200 1356 \”https://data.c1gstudio.net:31443/admin/login.php?1\” \”Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.5735.289 Safari/537.36\” “,
“tags”: [
“web-access”
],
“ecs”: {
“version”: “8.0.0”
},
“fields”: {
“source”: “unix-web-access”
},
“log”: {
“syslog”: {
“hostname”: “openresty-php5.2-78cb7cb54b-bsgt6”,
“appname”: “data_c1gstudiodotnet”
}
},
“host”: {
“name”: “node02.k8s.local”
}
}
“`
#ingress 配制syslog
可以在configmap中配制,还需要解决nodeip问题及ext-ingress是否共用问题。
但是不支持tag,不能定义来源.有个process.program=nginx
使用http-snippet或access-log-params都不行
“`
apiVersion: v1
kind: ConfigMap
metadata:
labels:
app.kubernetes.io/component: controller
app.kubernetes.io/instance: int-ingress-nginx
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/part-of: ingress-nginx
app.kubernetes.io/version: 1.9.5
name: int-ingress-nginx-controller
namespace: int-ingress-nginx
data:
allow-snippet-annotations: “true”
error-log-level: “warn”
enable-syslog: “true”
syslog-host: “192.168.244.7”
syslog-port: “10514”
“`
“`
{
“@timestamp”: “2024-02-21T03:42:40.000Z”,
“@metadata”: {
“beat”: “filebeat”,
“type”: “_doc”,
“version”: “8.11.0”,
“truncated”: false
},
“process”: {
“program”: “nginx”
},
“upstream”: {
“status”: “200”,
“response_length”: “1281”,
“proxy_alternative”: “”,
“addr”: “10.244.2.228:80”,
“name”: “data-c1gstudio-net-svc-web-http”,
“response_time”: “0.004”
},
“timestamp”: “2024-02-21T11:42:40+08:00”,
“req_id”: “16b868da1aba50a72f32776b4a2f5cb2”,
“agent”: {
“ephemeral_id”: “64c4b6d1-3d5c-4079-8bda-18d1a0d063a5”,
“id”: “3bd77823-c801-4dd1-a3e5-1cf25874c09f”,
“name”: “master01.k8s.local”,
“type”: “filebeat”,
“version”: “8.11.0”
},
“log”: {
“source”: {
“address”: “192.168.244.4:49244”
}
},
“request”: {
“status”: 200,
“bytes_sent”: “1491”,
“request_time”: “0.004”,
“request_length”: “94”,
“referer”: “https://data.c1gstudio.net:31443/admin/login.php?1”,
“user_agent”: “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.5735.289 Safari/537.36”,
“request_method”: “GET”,
“request_uri”: “/admin/imgcode/imgcode.php”,
“remote_user”: “”,
“protocol”: “HTTP/2.0”,
“remote_port”: “”,
“real_port”: “11464”,
“x-forward-for”: “10.100.3.80”,
“remote_addr”: “10.100.3.80”,
“hostname”: “data.c1gstudio.net”,
“body_bytes_sent”: “1269”,
“real_ip”: “192.168.244.2”,
“server_name”: “data.c1gstudio.net”
},
“ingress”: {
“service_port”: “http”,
“hostname”: “master01.k8s.local”,
“addr”: “192.168.244.4”,
“port”: “443”,
“namespace”: “data-c1gstudio-net”,
“ingress_name”: “ingress-data-c1gstudio-net”,
“service_name”: “svc-web”
},
“message”: “{\”timestamp\”: \”2024-02-21T11:42:40+08:00\”, \”source\”: \”int-ingress\”, \”req_id\”: \”16b868da1aba50a72f32776b4a2f5cb2\”, \”ingress\”:{ \”hostname\”: \”master01.k8s.local\”, \”addr\”: \”192.168.244.4\”, \”port\”: \”443\”,\”namespace\”: \”data-c1gstudio-net\”,\”ingress_name\”: \”ingress-data-c1gstudio-net\”,\”service_name\”: \”svc-web\”,\”service_port\”: \”http\” }, \”upstream\”:{ \”addr\”: \”10.244.2.228:80\”, \”name\”: \”data-c1gstudio-net-svc-web-http\”, \”response_time\”: \”0.004\”, \”status\”: \”200\”, \”response_length\”: \”1281\”, \”proxy_alternative\”: \”\”}, \”request\”:{ \”remote_addr\”: \”10.100.3.80\”, \”real_ip\”: \”192.168.244.2\”, \”remote_port\”: \”\”, \”real_port\”: \”11464\”, \”remote_user\”: \”\”, \”request_method\”: \”GET\”, \”server_name\”: \”data.c1gstudio.net\”,\”hostname\”: \”data.c1gstudio.net\”, \”request_uri\”: \”/admin/imgcode/imgcode.php\”, \”status\”: 200, \”body_bytes_sent\”: \”1269\”, \”bytes_sent\”: \”1491\”, \”request_time\”: \”0.004\”, \”request_length\”: \”94\”, \”referer\”: \”https://data.c1gstudio.net:31443/admin/login.php?1\”, \”user_agent\”: \”Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.5735.289 Safari/537.36\”, \”x-forward-for\”: \”10.100.3.80\”, \”protocol\”: \”HTTP/2.0\”}}”,
“fields”: {
“source”: “syslog-web-access”
},
“source”: “int-ingress”,
“ecs”: {
“version”: “8.0.0”
},
“hostname”: “master01.k8s.local”,
“host”: {
“name”: “master01.k8s.local”
},
“tags”: [
“web-access”
]
}
“`
## filebeat @timestamp 时区问题
默认为UTC,相差8小时。不能修改。
方法一,可以格式化另一字段后进行替换。
方法二,添加一个字段
“`
processors:
– add_locale: ~
“`
https://www.elastic.co/guide/en/beats/filebeat/current/processor-timestamp.html
No Responses (yet)
Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.