Skip to content


k8s_安装10_日志_elk

十、日志_elk

日志收集内容

在日常使用控制过程中,一般需要收集的日志为以下几类:

服务器系统日志:

/var/log/messages
/var/log/kube-xxx.log

Kubernetes组件日志:

kube-apiserver日志
kube-controller-manager日志
kube-scheduler日志
kubelet日志
kube-proxy日志

应用程序日志

云原生:控制台日志
非云原生:容器内日志文件
网关日志(如ingress-nginx)

服务之间调用链日志

日志收集工具

日志收集技术栈一般分为ELK,EFK,Grafana+Loki
ELK是由Elasticsearch、Logstash、Kibana三者组成
EFK是由Elasticsearch、Fluentd、Kibana三者组成
Filebeat+Kafka+Logstash+ES
Grafana+Loki Loki负责日志的存储和查询、Promtail负责收集日志并将其发送给Loki、Grafana用来展示或查询相关日志

ELK 日志流程可以有多种方案(不同组件可自由组合,根据自身业务配置),常见有以下:

Filebeat、Logstash、Fluentd(采集、处理)—> ElasticSearch (存储)—>Kibana (展示)

Filebeat、Logstash、Fluentd(采集)—> Logstash(聚合、处理)—> ElasticSearch (存储)—>Kibana (展示)

Filebeat、Logstash、Fluentd(采集)—> Kafka/Redis(消峰) —> Logstash(聚合、处理)—> ElasticSearch (存 储)—>Kibana (展示)

Logstash

Logstash 是一个开源的数据收集、处理和传输工具。它可以从多种来源(如日志文件、消息队列等)收集数据,并对数据进行过滤、解析和转换,最终将数据发送到目标存储(如 Elasticsearch)。
优势:有很多插件
缺点:性能以及资源消耗(默认的堆大小是 1GB)

Fluentd/FluentBit

语言:(Ruby + C)
GitHub 地址:https://github.com/fluent/fluentd-kubernetes-daemonset
在线文档:https://docs.fluentd.org/
由于Logstash比较“重”,并且配置稍微有些复杂,所以出现了EFK的日志收集解决方案。相对于ELK中Logstash,Fluentd采用“一锅端”的形式,可以直接将某些日志文件中的内容存储至Elasticsearch,然后通过Kibana进行展示。其中Fluentd只能收集控制台日志(使用logs命令查出来的日志),不能收集非控制台日志,不能很好的满足生产环境的需求。大部分情况下,没有遵循云原生理念开发的程序,往往都会输出很多日志文件,这些容器内的日志无法采集,除非在每个Pod内添加一个Sidecar,将日志文件的内容进行tail -f转成控制台日志,但这也是非常麻烦的。
另外,用来存储日志的Elasticsearch集群是不建议搭建在Kubernetes集群中的,因为会非常浪费Kubernetes集群资源,所以大部分情况下通过Fluentd采集日志输出到外部的Elasticsearch集群中。
优点: Fluentd占用资源小,语法简单
缺点:解析前没有缓冲,可能会导致日志管道出现背压,对转换数据的支持有限,就像您可以使用 Logstash 的 mutate 过滤器或 rsyslog 的变量和模板一样.
Fluentd只能收集控制台日志(使用logs命令查出来的日志),不能收集非控制台日志,不能很好的满足生产环境的需求,依赖Elasticsearch,维护难度和资源使用都是偏高.
和 syslog-ng 一样,它的缓冲只存在与输出端,单线程核心以及 Ruby GIL 实现的插件意味着它 大的节点下性能是受限的

Fluent-bit

语言:C
fluentd精简版
在线文档:https://docs.fluentbit.io/manual/about/fluentd-and-fluent-bit

Filebeat

语言:Golang
GitHub 地址:https://github.com/elastic/beats/tree/master/filebeat
在线文档:https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-getting-started.html
优势:只是一个二进制文件没有任何依赖。占用系统的CPU和内存小支持发送Logstash ,Elasticsearch,Kafka 和 Redis
缺点:有限的解析和丰富功能,可以在后端再加一层Logstash。
在早期的ELK架构中,日志收集均以Logstash为主,Logstash负责收集和解析日志,它对内存、CPU、IO资源的消耗比较高,但是Filebeat所占系统的CPU和内存几乎可以忽略不计。

由于Filebeat本身是比较轻量级的日志采集工具,因此Filebeat经常被用于以Sidecar的形式配置在Pod中,用来采集容器内程序输出的自定义日志文件。当然,Filebeat同样可以采用DaemonSet的形式部署在Kubernetes集群中,用于采集系统日志和程序控制台输出的日志。至于Filebeat为什么采用DaemonSet的形式部署而不是采用Deployment和StatefulSet部署,原因有以下几点:

收集节点级别的日志:Filebeat需要能够访问并收集每个节点上的日志文件,包括系统级别的日志和容器日志。Deployment和STS的主要目标是部署和管理应用程序的Pod,而不是关注节点级别的日志收集。因此,使用DaemonSet更适合收集节点级别的日志
自动扩展:Deployment和STS旨在管理应用程序的副本数,并确保所需的Pod数目在故障恢复和水平扩展时保持一致。但对于Filebeat来说,并不需要根据负载或应用程序的副本数来调整Pod数量。Filebeat只需在每个节点上运行一个实例即可,因此使用DaemonSet可以更好地满足这个需求
高可用性:Deployment和STS提供了副本管理和故障恢复的机制,确保应用程序的高可用性。然而,对于Filebeat而言,它是作为一个日志收集代理来收集日志,不同于应用程序,其故障恢复的机制和需求通常不同。使用DaemonSet可以确保在每个节点上都有一个运行中的Filebeat实例,即使某些节点上的Filebeat Pod不可用,也能保持日志收集的连续性
Fluentd和Logstash可以将采集的日志输出到Elasticsearch集群,Filebeat同样可以将日志直接存储到Elasticsearch中,Filebeat 也会和 Logstash 一样记住上次读取的偏移,但是为了更好地分析日志或者减轻Elasticsearch的压力,一般都是将日志先输出到Kafka,再由Logstash进行简单的处理,最后输出到Elasticsearch中。

LogAgent:

语言:JS
GitHub 地址:https://github.com/sematext/logagent-js
在线文档:https://sematext.com/docs/logagent/
优势:可以获取 /var/log 下的所有信息,解析各种格式(Elasticsearch,Solr,MongoDB,Apache HTTPD等等,以 掩盖敏感的数据信息 , Logagent 有本地缓冲,所以不像 Logstash ,在数据传输目的地不可用时会丢失日志
劣势:没有 Logstash 灵活

logtail:

阿里云日志服务的生产者,目前在阿里集团内部机器上运行,经过 3 年多时间的考验,目前为阿 里公有云用户提供日志收集服务
 采用 C++语言实现,对稳定性、资源控制、管理等下过很大的功夫,性能良好。相比于 logstash、fluentd 的社区支持,logtail 功能较为单一,专注日志收集功能。
优势:
  logtail 占用机器 cpu、内存资源最少,结合阿里云日志服务的 E2E 体验良好
劣势:
  logtail 目前对特定日志类型解析的支持较弱,后续需要把这一块补起来。

rsyslog

绝大多数 Linux 发布版本默认的 syslog 守护进程
优势:是经测试过的最快的传输工具
rsyslog 适合那些非常轻的应用(应用,小 VM,Docker 容器)。如果需要在另一个传输工具(例 如,Logstash)中进行处理,可以直接通过 TCP 转发 JSON ,或者连接 Kafka/Redis 缓冲

syslog-ng

优势:和 rsyslog 一样,作为一个轻量级的传输工具,它的性能也非常好

Grafana Loki

Loki 及其生态系统是 ELK 堆栈的替代方案, ELK 相比,摄取速度更快:索引更少,无需合并
优势:小存储占用:较小的索引,数据只写入一次到长期存储
缺点:与 ELK 相比,较长时间范围内的查询和分析速度较慢,log shippers选项更少(例如 Promtail 或 Fluentd)

ElasticSearch

一个正常es集群中只有一个主节点(Master),主节点负责管理整个集群。如创建或删除索引,跟踪哪些节点是群集的一部分,并决定哪些分片分配给相关的节点。集群的所有节点都会选择同一个节点作为主节点

脑裂现象:

脑裂问题的出现就是因为从节点在选择主节点上出现分歧导致一个集群出现多个主节点从而使集群分裂,使得集群处于异常状态。主节点的角色既为master又为data。数据访问量较大时,可能会导致Master节点停止响应(假死状态)

避免脑裂:

1.网络原因:discovery.zen.ping.timeout 超时时间配置大一点。默认是3S
2.节点负载:角色分离策略
3.JVM内存回收:修改 config/jvm.options 文件的 -Xms 和 -Xmx 为服务器的内存一半。

5个管理节点,其中一个是工作主节点,其余4个是备选节点,集群脑裂因子设置是3.

节点类型/角色

Elasticsearch 7.9 之前的版本中的节点类型主要有4种,:数据节点、协调节点、候选主节点、ingest 节点.
7.9 以及之后节点类型升级为节点角色(Node roles)。

ES集群由多节点组成,每个节点通过node.name指定节点的名称
一个节点可以支持多个角色,也可以支持一种角色。

1、master节点

配置文件中node.master属性为true,就有资格被选为
master节点用于控制整个集群的操作,比如创建和删除索引,以及管理非master节点,管理集群元数据信息,集群节点信息,集群索引元数据信息;
node.master: true
node.data: false

2、data数据节点

配置文件中node.data属于为true,就有资格被选为data节点,存储实际数据,提供初步联合查询,初步聚合查询,也可以作为协调节点
主要用于执行数据相关的操作
node.master: false
node.data: true

3、客户端节点

配置文件中node.master和node.data均为false(既不能为master也不能为data)
用于响应客户的请求,把请求转发到其他节点
node.master: false
node.data: false

4、部落节点

当一个节点配置tribe.*的时候,它是一个特殊的客户端,可以连接多个集群,在所有集群上执行索引和操作

其它角色汇总
7.9 以后 角色缩写 英文释义 中文释义
c cold node 冷数据节点
d data node 数据节点
f frozen node 冷冻数据节点
h hot node 热数据节点
i ingest node 数据预处理节点
l machine learning node 机器学习节点
m master-eligible node 候选主节点
r remote cluster client node 远程节点
s content node 内容数据节点
t transform node 转换节点
v voting-only node 仅投票节点
w warm node 温数据节点
coordinating node only 仅协调节点

新版使用node.roles 定义
node.roles: [data,master]

关于节点角色和硬件配置的关系,也是经常被提问的问题,推荐配置参考: 角色 描述 存储 内存 计算 网络
数据节点 存储和检索数据 极高
主节点 管理集群状态
Ingest 节点 转换输入数据
机器学习节点 机器学习 极高 极高
协调节点 请求转发和合并检索结果

集群选举

主从架构模式,一个集群只能有一个工作状态的管理节点,其余管理节点是备选,备选数量原则上不限制。很多大数据产品管理节点仅支持一主一从,如Greenplum、Hadoop、Prestodb;
工作管理节点自动选举,工作管理节点关闭之后自动触发集群重新选举,无需外部三方应用,无需人工干预。很多大数据产品需要人工切换或者借助第三方软件应用,如Greenplum、Hadoop、Prestodb。
discovery.zen.minimum_master_nodes = (master_eligible_nodes / 2) + 1
以1个主节点+4个候选节点为例设为3

conf/elasticsearch.yml:
    discovery.zen.minimum_master_nodes: 3

协调路由

Elasticsearch集群中有多个节点,其中任一节点都可以查询数据或者写入数据,集群内部节点会有路由机制协调,转发请求到索引分片所在的节点。我们在迁移集群时采用应用代理切换,外部访问从旧集群数据节点切换到新集群数据节点,就是基于此特点。

查询主节点

http://192.168.111.200:9200/_cat/nodes?v
含有 * 的代表当前主节点
http://192.168.111.200:9200/_cat/master

排查
集群数据平衡

Elastic自身设计了集群分片的负载平衡机制,当有新数据节点加入集群或者离开集群,集群会自动平衡分片的负载分布。
索引分片会在数据节点之间平衡漂移,达到平均分布之后停止,频繁的集群节点加入或者下线会严重影响集群的IO,影响集群响应速度,所以要尽量避免次情况发生。如果频繁关闭重启,这样很容易造成集群问题。

#集群迁移时先关,迁移后再开
#禁用集群新创建索引分配
cluster.routing.allocation.enable: false 
#禁用集群自动平衡
cluster.routing.rebalance.enable: false

ES 慢查询日志 打开

切换集群访问

Hadoop

Hadoop平台离线数据写入ES,从ES抽取数据。Elastic提供了Hadoop直连访问驱动。如Hive是通过创建映射表与Elasticsearch索引关联的,新的数据节点启动之后,原有所有Hive-Es映射表需要全部重新创建,更换其中的IP+PORT指向;由于Hive有很多与Elastic关联的表,所以短时间内没有那么快替换完成,新旧数据节点需要共存一段时间,不能在数据迁移完成之后马上关闭

#Hive指定连接
es.nodes=多个数据节点IP+PORT

业务系统应用实时查询

Elastic集群对外提供了代理访问

数据写入

kafka队列

安装

ELK数据处理流程
数据由Beats采集后,可以选择直接推送给Elasticsearch检索,或者先发送给Logstash处理,再推送给Elasticsearch,最后都通过Kibana进行数据可视化的展示

镜像文件准备

docker pull docker.io/fluent/fluentd-kubernetes-daemonset:v1.16.2-debian-elasticsearch8-amd64-1.1
wget https://github.com/fluent/fluentd-kubernetes-daemonset/blob/master/fluentd-daemonset-elasticsearch.yaml

在hub.docker.com 查询elasticsearch 可用版本
elasticsearch7.17.14
elasticsearch8.11.0

docker pull docker.elastic.co/elasticsearch/elasticsearch:8.11.0
docker pull docker.elastic.co/kibana/kibana:8.11.0
docker pull docker.elastic.co/logstash/logstash:8.11.0
docker pull docker.elastic.co/beats/filebeat:8.11.0

docker tag docker.elastic.co/elasticsearch/elasticsearch:8.11.0 repo.k8s.local/docker.elastic.co/elasticsearch/elasticsearch:8.11.0
docker tag docker.elastic.co/kibana/kibana:8.11.0 repo.k8s.local/docker.elastic.co/kibana/kibana:8.11.0
docker tag docker.elastic.co/beats/filebeat:8.11.0 repo.k8s.local/docker.elastic.co/beats/filebeat:8.11.0
docker tag docker.elastic.co/logstash/logstash:8.11.0 repo.k8s.local/docker.elastic.co/logstash/logstash:8.11.0

docker push repo.k8s.local/docker.elastic.co/elasticsearch/elasticsearch:8.11.0
docker push repo.k8s.local/docker.elastic.co/kibana/kibana:8.11.0
docker push repo.k8s.local/docker.elastic.co/beats/filebeat:8.11.0
docker push repo.k8s.local/docker.elastic.co/logstash/logstash:8.11.0

docker rmi docker.elastic.co/elasticsearch/elasticsearch:8.11.0
docker rmi docker.elastic.co/kibana/kibana:8.11.0
docker rmi docker.elastic.co/beats/filebeat:8.11.0
docker rmi docker.elastic.co/logstash/logstash:8.11.0

搭建elasticsearch+kibana

node.name定义节点名,使用metadata.name名称,需要能dns解析
cluster.initial_master_nodes 对应metadata.name名称加编号,编号从0开始
elasticsearch配置文件:

cat > log-es-elasticsearch.yml <<EOF
cluster.name: log-es
node.name: "log-es-elastic-sts-0"
path.data: /usr/share/elasticsearch/data
#path.logs: /var/log/elasticsearch
bootstrap.memory_lock: false
network.host: 0.0.0.0
http.port: 9200
#transport.tcp.port: 9300
#discovery.seed_hosts: ["127.0.0.1", "[::1]"]
cluster.initial_master_nodes: ["log-es-elastic-sts-0"]
xpack.security.enabled: "false"
xpack.security.transport.ssl.enabled: "false"
#增加参数,使head插件可以访问es
http.cors.enabled: true
http.cors.allow-origin: "*"
http.cors.allow-headers: Authorization,X-Requested-With,Content-Length,Content-Type
EOF

kibana配置文件:
statefulset管理的Pod名称是有序的,删除指定Pod后自动创建的Pod名称不会改变。
statefulset创建时必须指定server名称,如果server没有IP地址,则会对server进行DNS解析,找到对应的Pod域名。
statefulset具有volumeclaimtemplate卷管理模板,创建出来的Pod都具有独立卷,相互没有影响。
statefulset创建出来的Pod,拥有独立域名,我们在指定访问Pod资源时,可以使用域名指定,IP会发生改变,但是域名不会(域名组成:Pod名称.svc名称.svc名称空间.svc.cluster.local)

cat > log-es-kibana.yml <<EOF
server.port: 5601
server.host: "0.0.0.0"
elasticsearch.hosts: "http://localhost:9200"
i18n.locale: "zh-CN"
EOF

创建k8s命名空间

kubectl create namespace log-es
kubectl get namespace log-es
kubectl describe namespace  log-es

创建elasticsearch和kibana的配置文件configmap:

1、configmap是以明文的形式将配置信息给pod内使用的办法。它的大小有限,不能超过1Mi

2、可以将文件、目录等多种形式做成configmap,并且通过env或者volume的形式供pod内使用。

3、它可以在不重新构建镜像或重启容器的情况下在线更新,但是需要一定时间间隔
以subPath方式挂载时,configmap更新,容器不会更新。

kubectl create configmap log-es-elastic-config -n log-es --from-file=log-es-elasticsearch.yml
kubectl create configmap log-es-kibana-config -n log-es --from-file=log-es-kibana.yml

更新方式1
#kubectl create configmap log-es-kibana-config --from-file log-es-kibana.yml -o yaml --dry-run=client | kubectl apply -f -
#kubectl get cm log-es-kibana-config -n log-es -o yaml > log-es-kibana.yaml  && kubectl replace -f log-es-kibana.yaml -n log-es
#测试下来不行

更新方式2
kubectl edit configmap log-es-elastic-config -n log-es
kubectl edit configmap log-es-kibana-config -n log-es

查看列表
kubectl get configmap  -n log-es 

删除
kubectl delete cm log-es-elastic-config -n log-es
kubectl delete cm log-es-kibana-config -n log-es

kibana

kibana为有状态的固定节点,不需负载均衡,可以建无头服务
这个 Service 被创建后并不会被分配一个 VIP,而是会以 DNS 记录的方式暴露出它所代理的 Pod

<pod-name>.<svc-name>.<namespace>.svc.cluster.local  
$(podname)-$(ordinal).$(servicename).$(namespace).svc.cluster.local  
log-es-elastic-sts-0.es-kibana-svc.log-es.svc.cluster.local  
cat > log-es-kibana-svc.yaml <<EOF
apiVersion: v1
kind: Service
metadata:
  labels:
    app: log-es-svc
  name: es-kibana-svc
  namespace: log-es
spec:
  ports:
  - name: 9200-9200
    port: 9200
    protocol: TCP
    targetPort: 9200
    nodePort: 9200
  - name: 5601-5601
    port: 5601
    protocol: TCP
    targetPort: 5601
    nodePort: 5601
  #clusterIP: None 
  selector:
    app: log-es-elastic-sts
  type: NodePort
  #type: ClusterIP
EOF

创建es-kibana的有状态资源 yaml配置文件:

cat > log-es-kibana-sts.yaml <<EOF
apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app: log-es-elastic-sts
  name: log-es-elastic-sts
  namespace: log-es
spec:
  replicas: 1
  selector:
    matchLabels:
      app: log-es-elastic-sts
  serviceName: "es-kibana-svc"  #关联svc名称
  template:
    metadata:
      labels:
        app: log-es-elastic-sts
    spec:
      #imagePullSecrets:
      #- name: registry-pull-secret
      containers:
      - name: log-es-elasticsearch
        image: repo.k8s.local/docker.elastic.co/elasticsearch/elasticsearch:8.11.0
        imagePullPolicy: IfNotPresent
#        lifecycle:
#          postStart:
#            exec:
#              command: [ "/bin/bash", "-c",  touch /tmp/start" ] #sysctl -w vm.max_map_count=262144;ulimit -HSn 65535; 请在宿主机设定
        #command: ["sh", "-c", "chown -R 1000:1000 /usr/share/elasticsearch/data"]
        #command: [ "/bin/bash", "-c", "--" ]
        #args: [ "while true; do sleep 30; done;" ]
        #command: [ "/bin/bash", "-c","ulimit -HSn 65535;" ]
        resources:
          requests:
            memory: "800Mi"
            cpu: "800m"
          limits:
            memory: "1.2Gi"
            cpu: "2000m"
        ports:
        - containerPort: 9200
        - containerPort: 9300
        volumeMounts:
        - name: log-es-elastic-config
          mountPath: /usr/share/elasticsearch/config/elasticsearch.yml
          subPath: log-es-elasticsearch.yml  #对应configmap log-es-elastic-config 中文件名称
        - name: log-es-persistent-storage
          mountPath: /usr/share/elasticsearch/data
        env:
        - name: TZ
          value: Asia/Shanghai
        - name: ES_JAVA_OPTS
          value: -Xms512m -Xmx512m
      - image: repo.k8s.local/docker.elastic.co/kibana/kibana:8.11.0
        imagePullPolicy: IfNotPresent
        #command: [ "/bin/bash", "-ce", "tail -f /dev/null" ]
        name: log-es-kibana
        ports:
        - containerPort: 5601
        env:
        - name: TZ
          value: Asia/Shanghai
        volumeMounts:
        - name: log-es-kibana-config
          mountPath: /usr/share/kibana/config/kibana.yml
          subPath: log-es-kibana.yml   #对应configmap log-es-kibana-config 中文件名称
      volumes:
      - name: log-es-elastic-config
        configMap:
          name: log-es-elastic-config
      - name: log-es-kibana-config
        configMap:
          name: log-es-kibana-config
      - name: log-es-persistent-storage
        hostPath:
          path: /localdata/es/data
          type: DirectoryOrCreate
      #hostNetwork: true
      #dnsPolicy: ClusterFirstWithHostNet
      nodeSelector:
         kubernetes.io/hostname: node02.k8s.local
EOF

单独调试文件,测试ulimit失败问题

cat > log-es-kibana-sts.yaml <<EOF
apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app: log-es-elastic-sts
  name: log-es-elastic-sts
  namespace: log-es
spec:
  replicas: 1
  selector:
    matchLabels:
      app: log-es-elastic-sts
  serviceName: "es-kibana-svc"  #关联svc名称
  template:
    metadata:
      labels:
        app: log-es-elastic-sts
    spec:
      #imagePullSecrets:
      #- name: registry-pull-secret
#      initContainers:        # 初始化容器
#      - name: init-vm-max-map
#        image: repo.k8s.local/google_containers/busybox:9.9 
#        imagePullPolicy: IfNotPresent
#        command: ["sysctl","-w","vm.max_map_count=262144"]
#        securityContext:
#          privileged: true
#      - name: init-fd-ulimit
#        image: repo.k8s.local/google_containers/busybox:9.9 
#        imagePullPolicy: IfNotPresent
#        command: ["sh","-c","ulimit -HSn 65535;ulimit -n >/tmp/index/init.log"]
#        securityContext:
#          privileged: true
#        volumeMounts:
#        - name: init-test
#          mountPath: /tmp/index
#        terminationMessagePath: /dev/termination-log
#        terminationMessagePolicy: File
      containers:
      - name: log-es-elasticsearch
        image: repo.k8s.local/docker.elastic.co/elasticsearch/elasticsearch:8.11.0
        imagePullPolicy: IfNotPresent
#        securityContext:
#          privileged: true
#          capabilities:
#          add: ["SYS_RESOURCE"]
#        lifecycle:
#          postStart:
#            exec:
#              command: [ "/bin/bash", "-c", "sysctl -w vm.max_map_count=262144; ulimit -l unlimited;echo 'Container started';" ]
#        command: ["sh", "-c", "chown -R 1000:1000 /usr/share/elasticsearch/data"]
        command: [ "/bin/bash", "-c", "--" ]
        args: [ "while true; do sleep 30; done;" ]
        resources:
          requests:
            memory: "800Mi"
            cpu: "800m"
          limits:
            memory: "1Gi"
            cpu: "1000m"
        ports:
        - containerPort: 9200
        - containerPort: 9300
        volumeMounts:
#        - name: ulimit-config
#          mountPath: /etc/security/limits.conf
#          #readOnly: true
#          #subPath: limits.conf
        - name: log-es-elastic-config
          mountPath: /usr/share/elasticsearch/config/elasticsearch.yml
          subPath: log-es-elasticsearch.yml  #对应configmap log-es-elastic-config 中文件名称
        - name: log-es-persistent-storage
          mountPath: /usr/share/elasticsearch/data
#        - name: init-test
#          mountPath: /tmp/index
        env:
        - name: TZ
          value: Asia/Shanghai
        - name: ES_JAVA_OPTS
          value: -Xms512m -Xmx512m
      volumes:
#      - name: init-test
#        emptyDir: {}
      - name: log-es-elastic-config
        configMap:
          name: log-es-elastic-config
      - name: log-es-persistent-storage
        hostPath:
          path: /localdata/es/data
          type: DirectoryOrCreate
#      - name: ulimit-config
#        hostPath:
#          path: /etc/security/limits.conf 
      #hostNetwork: true
      #dnsPolicy: ClusterFirstWithHostNet
      nodeSelector:
         kubernetes.io/hostname: node02.k8s.local
EOF

elastic

elastic索引目录
在node02上,pod默认运行用户id=1000

mkdir -p /localdata/es/data
chmod 777 /localdata/es/data
chown 1000:1000 /localdata/es/data

在各节点上建filebeat registry 目录,包括master
使用daemonSet运行filebeat需要挂载/usr/share/filebeat/data,该目录下有一个registry文件,里面记录了filebeat采集日志位置的相关内容,比如文件offset、source、timestamp等,如果Pod发生异常后K8S自动将Pod进行重启,不挂载的情况下registry会被重置,将导致日志文件又从offset=0开始采集,结果就是es中日志重复一份,这点非常重要.

mkdir -p /localdata/filebeat/data
chown 1000:1000 /localdata/filebeat/data
chmod 777 /localdata/filebeat/data
kubectl apply -f log-es-kibana-sts.yaml
kubectl delete -f log-es-kibana-sts.yaml
kubectl apply -f log-es-kibana-svc.yaml
kubectl delete -f log-es-kibana-svc.yaml

kubectl get pods -o wide -n log-es
NAME               READY   STATUS              RESTARTS   AGE   IP       NODE               NOMINATED NODE   READINESS GATES
log-es-elastic-sts-0   0/2     ContainerCreating   0          66s   <none>   node02.k8s.local   <none>           <none>

#查看详情
kubectl -n log-es describe pod log-es-elastic-sts-0 
kubectl -n log-es logs -f log-es-elastic-sts-0 -c log-es-elasticsearch
kubectl -n log-es logs -f log-es-elastic-sts-0 -c log-es-kibana

kubectl -n log-es logs log-es-elastic-sts-0 -c log-es-elasticsearch
kubectl -n log-es logs log-es-elastic-sts-0 -c log-es-kibana

kubectl -n log-es logs -f --tail=20 log-es-elastic-sts-0 -c log-es-elasticsearch

kubectl exec -it log-es-elasticsearch -n log-es -- /bin/sh 
进入指定 pod 中指定容器
kubectl exec -n log-es -it log-es-elastic-sts-0 -c log-es-elasticsearch -- /bin/sh 
kubectl exec -n log-es -it log-es-elastic-sts-0 -c log-es-kibana  -- /bin/sh 

kubectl exec -n log-es -it log-es-elastic-sts-0 -c log-es-elasticsearch -- /bin/sh  -c 'cat /etc/security/limits.conf'
kubectl exec -n log-es -it log-es-elastic-sts-0 -c log-es-elasticsearch -- /bin/sh  -c 'ulimit -HSn 65535'
kubectl exec -n log-es -it log-es-elastic-sts-0 -c log-es-elasticsearch -- /bin/sh  -c 'ulimit -n'
kubectl exec -n log-es -it log-es-elastic-sts-0 -c log-es-elasticsearch -- /bin/sh  -c 'cat /etc/security/limits.d/20-nproc.conf'
kubectl exec -n log-es -it log-es-elastic-sts-0 -c log-es-elasticsearch -- /bin/sh  -c 'cat /etc/pam.d/login|grep  pam_limits'
kubectl exec -n log-es -it log-es-elastic-sts-0 -c log-es-elasticsearch -- /bin/sh  -c 'ls /etc/pam.d/sshd'
kubectl exec -n log-es -it log-es-elastic-sts-0 -c log-es-elasticsearch -- /bin/sh  -c 'cat /etc/profile'
kubectl exec -n log-es -it log-es-elastic-sts-0 -c log-es-elasticsearch -- /bin/sh  -c '/usr/share/elasticsearch/bin/elasticsearch'
kubectl exec -n log-es -it log-es-elastic-sts-0 -c log-es-elasticsearch -- /bin/sh  -c 'ls /tmp/'
kubectl exec -n log-es -it log-es-elastic-sts-0 -c log-es-elasticsearch -- /bin/sh  -c 'cat /dev/termination-log'

kubectl exec -n log-es -it log-es-elastic-sts-0 -c log-es-elasticsearch -- /bin/sh  -c 'ps -aux'
kubectl exec -n log-es -it log-es-elastic-sts-0 -c log-es-kibana -- /bin/sh  -c 'ps -aux'

启动失败主要原因

文件语柄不够
ulimit -HSn 65535
最大虚拟内存太小
sysctl -w vm.max_map_count=262144
数据目录权限不对
/usr/share/elasticsearch/data
xpack权限不对
xpack.security.enabled: "false"

容器添加调试,进入容器中查看不再退出

        #command: [ "/bin/bash", "-c", "--" ]
        #args: [ "while true; do sleep 30; done;" ]

查看运行用户
id
uid=1000(elasticsearch) gid=1000(elasticsearch) groups=1000(elasticsearch),0(root)

查看数据目录权限
touch /usr/share/elasticsearch/data/test
ls /usr/share/elasticsearch/data/

测试启动
/usr/share/elasticsearch/bin/elasticsearch

{"@timestamp":"2023-11-09T05:20:55.298Z", "log.level":"ERROR", "message":"node validation exception\n[2] bootstrap checks failed. You must address the points described in the following [2] lines before starting Elasticsearch. For more information see [https://www.elastic.co/guide/en/elasticsearch/reference/8.11/bootstrap-checks.html]\nbootstrap check failure [1] of [2]: max file descriptors [4096] for elasticsearch process is too low, increase to at least [65535]; for more information see [https://www.elastic.co/guide/en/elasticsearch/reference/8.11/_file_descriptor_check.html]\nbootstrap check failure [2] of [2]: Transport SSL must be enabled if security is enabled. Please set [xpack.security.transport.ssl.enabled] to [true] or disable security by setting [xpack.security.enabled] to [false]; for more information see [https://www.elastic.co/guide/en/elasticsearch/reference/8.11/bootstrap-checks-xpack.html#bootstrap-checks-tls]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.bootstrap.Elasticsearch","elasticsearch.node.name":"node-1","elasticsearch.cluster.name":"log-es"}
ERROR: Elasticsearch did not exit normally - check the logs at /usr/share/elasticsearch/logs/log-es.log

启动elasticsearch时,提示max virtual memory areas vm.max_map_count [65530] is too low, increase to at least

sysctl: setting key "vm.max_map_count", ignoring: Read-only file system

ulimit: max locked memory: cannot modify limit: Operation not permitted

错误
master not discovered yet, this node has not previously joined a bootstrapped cluster, and this node must discover master-eligible nodes
没有发现主节点
在/etc/elasticsearch/elasticsearch.yml文件中加入主节点名:cluster.initial_master_nodes: ["master","node"]

  1. 找到pod的CONTAINER 名称
    在pod对应node下运行

    crictl ps
    CONTAINER           IMAGE               CREATED             STATE               NAME                      ATTEMPT             POD ID              POD
    d3f8f373be7cb       7f03b6ec0c6f6       About an hour ago   Running             log-es-elasticsearch                   0                   cd3cebe50807f       log-es-elastic-sts-0
  2. 找到pod的pid

    crictl inspect d3f8f373be7cb |grep -i pid
    "pid": 8420,
            "pid": 1
            "type": "pid"
  3. 容器外执行容器内命令

    
    nsenter -t 8420 -n hostname
    node02.k8s.local

cat /proc/8420/limits |grep "open files"
Max open files 4096 4096 files


参考
https://imroc.cc/kubernetes/trick/deploy/set-sysctl/
https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/
解决:  
postStart中执行无效,改成宿主机上执行,映射到容器  
在node02上执行

sysctl -w vm.max_map_count=262144

sysctl -a|grep vm.max_map_count

cat /etc/security/limits.conf

cat > /etc/security/limits.d/20-nofile.conf <<EOF
root soft nofile 65535
root hard nofile 65535

  • soft nofile 65535
  • hard nofile 65535
    EOF

cat > /etc/security/limits.d/20-nproc.conf <<EOF

    • nproc 65535
      root soft nproc unlimited
      root hard nproc unlimited
      EOF

      在CentOS 7版本中为/etc/security/limits.d/20-nproc.conf,在CentOS 6版本中为/etc/security/limits.d/90-nproc.conf

      echo "* soft nofile 65535" >> /etc/security/limits.conf

      echo "* hard nofile 65535" >> /etc/security/limits.conf

      echo "andychu soft nofile 65535" >> /etc/security/limits.conf

      echo "andychu hard nofile 65535" >> /etc/security/limits.conf

      echo "ulimit -HSn 65535" >> /etc/rc.local

ulimit -a
sysctl -p

systemctl show sshd |grep LimitNOFILE

cat /etc/systemd/system.conf|grep DefaultLimitNOFILE
sed -n 's/#DefaultLimitNOFILE=/DefaultLimitNOFILE=65535/p' /etc/systemd/system.conf
sed -i 's/^#DefaultLimitNOFILE=/DefaultLimitNOFILE=65535/' /etc/systemd/system.conf

systemctl daemon-reexec

systemctl restart containerd
systemctl restart kubelet

crictl inspect c30a814bcf048 |grep -i pid

cat /proc/53657/limits |grep "open files"
Max open files 65335 65335 files

kubectl get pods -o wide -n log-es
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
log-es-elastic-sts-0 2/2 Running 0 2m19s 10.244.2.131 node02.k8s.local <none> <none>

[root@node01 nginx]# curl http://10.244.2.131:9200
{
"name" : "node-1",
"cluster_name" : "log-es",
"cluster_uuid" : "Agfoz8qmS3qob_R6bp2cAw",
"version" : {
"number" : "8.11.0",
"build_flavor" : "default",
"build_type" : "docker",
"build_hash" : "d9ec3fa628c7b0ba3d25692e277ba26814820b20",
"build_date" : "2023-11-04T10:04:57.184859352Z",
"build_snapshot" : false,
"lucene_version" : "9.8.0",
"minimum_wire_compatibility_version" : "7.17.0",
"minimum_index_compatibility_version" : "7.0.0"
},
"tagline" : "You Know, for Search"
}

kubectl get pod -n log-es
kubectl get pod -n test

查看service

kubectl get service -n log-es
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
es-kibana-svc ClusterIP None <none> 9200/TCP,5601/TCP 53s

kubectl apply -f log-es-kibana-svc.yaml
kubectl delete -f log-es-kibana-svc.yaml

kubectl exec -it pod/test-pod-1 -n test — ping www.c1gstudio.com
kubectl exec -it pod/test-pod-1 -n test — ping svc-openresty.test
kubectl exec -it pod/test-pod-1 -n test — nslookup log-es-elastic-sts-0.es-kibana-svc.log-es
kubectl exec -it pod/test-pod-1 -n test — ping log-es-elastic-sts-0.es-kibana-svc.log-es

kubectl exec -it pod/test-pod-1 -n test — curl http://log-es-elastic-sts-0.es-kibana-svc.log-es.svc.cluster.local:9200
kubectl exec -it pod/test-pod-1 -n test — curl -L http://log-es-elastic-sts-0.es-kibana-svc.log-es.svc.cluster.local:5601

cat > log-es-kibana-svc.yaml <<EOF
apiVersion: v1
kind: Service
metadata:
labels:
app: log-es-svc
name: es-kibana-svc
namespace: log-es
spec:
ports:

  • name: 9200-9200
    port: 9200
    protocol: TCP
    targetPort: 9200

    nodePort: 9200

  • name: 5601-5601
    port: 5601
    protocol: TCP
    targetPort: 5601

    nodePort: 5601

    clusterIP: None

    selector:
    app: log-es-elastic-sts
    type: NodePort

    type: ClusterIP

    EOF

kubectl get service -n log-es
NAME            TYPE       CLUSTER-IP     EXTERNAL-IP   PORT(S)                         AGE
es-kibana-svc   NodePort   10.96.128.50   <none>        9200:30118/TCP,5601:31838/TCP   16m

使用nodeip+port访问,本次端口为31838
curl -L http://192.168.244.7:31838
curl -L http://10.96.128.50:5601

外部nat转发后访问
http://127.0.0.1:5601/

ingress

cat > log-es-kibana-ingress.yaml  << EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ingress-kibana
  namespace: log-es
  labels:
    app.kubernetes.io/name: nginx-ingress
    app.kubernetes.io/part-of: kibana
  annotations:
    kubernetes.io/ingress.class: "nginx"
spec:
  ingressClassName: nginx
  rules:
  - host: kibana.k8s.local
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: es-kibana-svc
            port:
              number: 5601
EOF             

kubectl apply -f log-es-kibana-ingress.yaml

kubectl get ingress -n log-es
curl -L -H "Host:kibana.k8s.local" http://10.96.128.50:5601

filebeat

#https://www.elastic.co/guide/en/beats/filebeat/8.11/drop-fields.html
#https://raw.githubusercontent.com/elastic/beats/7.9/deploy/kubernetes/filebeat-kubernetes.yaml
cat > log-es-filebeat-configmap.yaml <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
  name: log-es-filebeat-config
  namespace: log-es
data:
  filebeat.yml: |-
    filebeat.inputs:
      - type: container
        containers.ids:
          - "*"
        id: 99
        enabled: true
        tail_files: true
        paths:
          - /var/log/containers/*.log
          #- /var/lib/docker/containers/*/*.log
          #- /var/log/pods/*/*/*.log   
        processors:
        - add_kubernetes_metadata:
            in_cluster: true
            matchers:
              - logs_path:
                  logs_path: "/var/log/containers/"     
        fields_under_root: true
        exclude_files: ['\.gz$']
        tags: ["k8s"]  
        fields:
          source: "container"     
      - type: filestream
        id: 100
        enabled: true
        tail_files: true
        paths:
          - /var/log/nginx/access*.log
        processors:
          - decode_json_fields:
              fields: [ 'message' ]
              target: "" # 指定日志字段message,头部以json标注,如果不要json标注则设置为空如:target: ""
              overwrite_keys: false # 默认情况下,解码后的 JSON 位于输出文档中的“json”键下。如果启用此设置,则键将在输出文档中的顶层复制。默认值为 false
              process_array: false
              max_depth: 1
          - drop_fields: 
              fields: ["agent","ecs.version"]
              ignore_missing: true
        fields_under_root: true
        tags: ["ingress-nginx-access"]
        fields:
          source: "ingress-nginx-access"          
      - type: filestream
        id: 101
        enabled: true
        tail_files: true
        paths:
          - /var/log/nginx/error.log
        close_inactive: 5m
        ignore_older: 24h
        clean_inactive: 96h
        clean_removed: true
        fields_under_root: true
        tags: ["ingress-nginx-error"]   
        fields:
          source: "ingress-nginx-error"             
      - type: filestream
        id: 102
        enabled: true
        tail_files: true
        paths:
          - /nginx/logs/*.log
        exclude_files: ['\.gz$','error.log']
        close_inactive: 5m
        ignore_older: 24h
        clean_inactive: 96h
        clean_removed: true
        fields_under_root: true
        tags: ["web-log"]  
        fields:
          source: "nginx-access"            
    output.logstash:
      hosts: ["logstash.log-es.svc.cluster.local:5044"]                           
    #output.elasticsearch:
      #hosts: ["http://log-es-elastic-sts-0.es-kibana-svc.log-es.svc.cluster.local:9200"]
      #index: "log-%{[fields.tags]}-%{+yyyy.MM.dd}"
      #indices:
        #- index: "log-ingress-nginx-access-%{+yyyy.MM.dd}"
          #when.contains:
            #tags: "ingress-nginx-access"
        #- index: "log-ingress-nginx-error-%{+yyyy.MM.dd}"
          #when.contains:
            #tags: "ingress-nginx-error" 
        #- index: "log-web-log-%{+yyyy.MM.dd}"
          #when.contains:
            #tags: "web-log" 
        #- index: "log-k8s-%{+yyyy.MM.dd}"
          #when.contains:
            #tags: "k8s"                                
    json.keys_under_root: true # 默认情况下,解码后的 JSON 位于输出文档中的“json”键下。如果启用此设置,则键将在输出文档中的顶层复制。默认值为 false
    json.overwrite_keys: true # 如果启用了此设置,则解码的 JSON 对象中的值将覆盖 Filebeat 在发生冲突时通常添加的字段(类型、源、偏移量等)
    setup.template.enabled: false  #false不使用默认的filebeat-%{[agent.version]}-%{+yyyy.MM.dd}索引
    setup.template.overwrite: true #开启新设置的模板
    setup.template.name: "log" #设置一个新的模板,模板的名称
    setup.template.pattern: "log-*" #模板匹配那些索引
    filebeat.config.modules:
      path: ${path.config}/modules.d/*.yml
      reload.enabled: false
    #setup.template.settings:
    #  index.number_of_shards: 1
    #  index.number_of_replicas: 1
    setup.ilm.enabled: false # 修改索引名称,要关闭索引生命周期管理ilm功能
    logging.level: warning #debug、info、warning、error
    logging.to_syslog: false
    logging.metrics.period: 300s
    logging.to_files: true
    logging.files:
      path: /tmp/
      name: "filebeat.log"
      rotateeverybytes: 10485760
      keepfiles: 7
EOF
cat > log-es-filebeat-daemonset.yaml <<EOF
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: filebeat
  namespace: log-es
spec:
  #replicas: 1
  selector:
    matchLabels:
      app: filebeat
  template:
    metadata:
      labels:
        app: filebeat
    spec:
      serviceAccount: filebeat
      containers:
      - name: filebeat
        image: repo.k8s.local/docker.elastic.co/beats/filebeat:8.11.0
        imagePullPolicy: IfNotPresent
        env:
        - name: TZ
          value: "Asia/Shanghai"
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 100Mi              
        volumeMounts:
        - name: filebeat-config
          readOnly: true
          mountPath: /config/filebeat.yml    # Filebeat 配置
          subPath: filebeat.yml
        - name: fb-data
          mountPath: /usr/share/filebeat/data         
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          readOnly: true          
          mountPath: /var/lib/docker/containers
        - name: varlogingress
          readOnly: true          
          mountPath: /var/log/nginx      
        - name: varlogweb
          readOnly: true          
          mountPath: /nginx/logs              
        args:
        - -c
        - /config/filebeat.yml
      volumes:
      - name: fb-data
        hostPath:
          path: /localdata/filebeat/data 
          type: DirectoryOrCreate    
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
      - name: varlogingress
        hostPath:
          path: /var/log/nginx
      - name: varlogweb
        hostPath:
          path: /nginx/logs      
      - name: filebeat-config
        configMap:
          name: log-es-filebeat-config
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/master
EOF
kubectl apply -f log-es-filebeat-configmap.yaml
kubectl delete -f log-es-filebeat-configmap.yaml
kubectl apply -f log-es-filebeat-daemonset.yaml
kubectl delete -f log-es-filebeat-daemonset.yaml 
kubectl get pod -n log-es  -o wide 
kubectl get cm -n log-es
kubectl get ds -n log-es
kubectl edit configmap log-es-filebeat-config -n log-es
kubectl get service -n log-es

#更新configmap后需手动重启pod
kubectl rollout restart  ds/filebeat -n log-es
kubectl patch  ds filebeat  -n log-es   --patch '{"spec": {"template": {"metadata": {"annotations": {"version/config": "202311141" }}}}}'

#重启es
kubectl rollout restart  sts/log-es-elastic-sts -n log-es
#删除 congfig
kubectl delete cm filebeat -n log-es
kubectl delete cm log-es-filebeat-config -n log-es

#查看详细
kubectl -n log-es describe pod filebeat-mdldl
kubectl -n log-es logs -f filebeat-4kgpl

#查看pod
kubectl exec -n log-es -it filebeat-hgpnl  -- /bin/sh 
kubectl exec -n log-es -it filebeat-q69f5   -- /bin/sh  -c 'ps aux'
kubectl exec -n log-es -it filebeat-wx4x2  -- /bin/sh  -c 'cat  /config/filebeat.yml'
kubectl exec -n log-es -it filebeat-4j2qd -- /bin/sh  -c 'cat  /tmp/filebeat*'
kubectl exec -n log-es -it filebeat-9qx6f -- /bin/sh  -c 'cat  /tmp/filebeat*'
kubectl exec -n log-es -it filebeat-9qx6f -- /bin/sh  -c 'ls /usr/share/filebeat/data'
kubectl exec -n log-es -it filebeat-9qx6f -- /bin/sh  -c 'filebeat modules list'

kubectl exec -n log-es -it filebeat-kmrcc  -- /bin/sh -c 'curl http://localhost:5066/?pretty'

kubectl exec -n log-es -it filebeat-hqz9b  -- /bin/sh -c 'curl http://log-es-elastic-sts-0.es-kibana-svc.log-es.svc.cluster.local:9200/_cat/indices?v'
curl -XGET 'http://log-es-elastic-sts-0.es-kibana-svc.log-es.svc.cluster.local:9200/_cat/indices?v'

curl http://10.96.128.50:9200/_cat/indices?v

错误 serviceaccount
Failed to watch v1.Node: failed to list v1.Node: nodes "node02.k8s.local" is forbidden: User "system:serviceaccount:log-es:default" cannot list resource "nodes" in API group "" at the cluster scope
当你创建namespace的时候,会默认为该namespace创建一个名为default的serviceaccount。这个的错误的信息代表的意思是,pod用namespace默认的serviceaccout是没有权限访问K8s的 API group的。可以通过命令查看:
kubectl get sa -n log-es

解决方法
创建一个 tillerServiceAccount.yaml,并使用 kubectl apply -f tiller-ServiceAccount.yaml 创建账号解角色,其中kube-system就是xxxx就是你的命名空间

vi tiller-ServiceAccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: tiller
  namespace: log-es
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: tiller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
  - kind: ServiceAccount
    name: tiller
    namespace: log-es

kubectl apply -f tiller-ServiceAccount.yaml
和当前运行pod关联,同时修改yaml
kubectl patch ds –namespace log-es filebeat -p ‘{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}’

vi filebeat-ServiceAccount.yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: filebeat
subjects:
- kind: ServiceAccount
  name: filebeat
  namespace: log-es
roleRef:
  kind: ClusterRole
  name: filebeat
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: filebeat
  namespace: log-es
subjects:
  - kind: ServiceAccount
    name: filebeat
    namespace: log-es
roleRef:
  kind: Role
  name: filebeat
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: filebeat-kubeadm-config
  namespace: log-es
subjects:
  - kind: ServiceAccount
    name: filebeat
    namespace: log-es
roleRef:
  kind: Role
  name: filebeat-kubeadm-config
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: filebeat
  labels:
    k8s-app: filebeat
rules:
- apiGroups: [""] # "" indicates the core API group
  resources:
  - namespaces
  - pods
  - nodes
  verbs:
  - get
  - watch
  - list
- apiGroups: ["apps"]
  resources:
    - replicasets
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: filebeat
  # should be the namespace where filebeat is running
  namespace: log-es
  labels:
    k8s-app: filebeat
rules:
  - apiGroups:
      - coordination.k8s.io
    resources:
      - leases
    verbs: ["get", "create", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: filebeat-kubeadm-config
  namespace: log-es
  labels:
    k8s-app: filebeat
rules:
  - apiGroups: [""]
    resources:
      - configmaps
    resourceNames:
      - kubeadm-config
    verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: filebeat
  namespace: log-es
  labels:
    k8s-app: filebeat
kubectl apply -f filebeat-ServiceAccount.yaml 
kubectl get sa -n log-es

nginx模块不好用
在filebeat.yml中启用模块

    filebeat.modules:
      - module: nginx 
 ```     
Exiting: module nginx is configured but has no enabled filesets  
还需要改文件名

./filebeat modules enable nginx

./filebeat --modules nginx  
需写在modules.d/nginx.yml中

Filebeat的output
  1、Elasticsearch Output   (Filebeat收集到数据,输出到es里。默认的配置文件里是有的,也可以去官网上去找)
  2、Logstash Output      (Filebeat收集到数据,输出到logstash里。默认的配置文件里是有的,也可以得去官网上去找)
  3、Redis Output       (Filebeat收集到数据,输出到redis里。默认的配置文件里是没有的,得去官网上去找)
  4、File Output         (Filebeat收集到数据,输出到file里。默认的配置文件里是有的,也可以去官网上去找)
  5、Console Output       (Filebeat收集到数据,输出到console里。默认的配置文件里是有的,也可以去官网上去找)

https://www.elastic.co/guide/en/beats/filebeat/8.12/configuring-howto-filebeat.html

### 增加logstash来对采集到的原始日志进行业务需要的清洗
vi log-es-logstash-deploy.yaml

apiVersion: apps/v1

kind: DaemonSet

kind: StatefulSet

kind: Deployment
metadata:
name: logstash
namespace: log-es
labels:
app: logstash
spec:
selector:
matchLabels:
app: logstash
template:
metadata:
labels:
app: logstash
spec:
terminationGracePeriodSeconds: 30

hostNetwork: true

  #dnsPolicy: ClusterFirstWithHostNet
  containers:
  - name: logstash
    ports:
    - containerPort: 5044
      name: logstash
    command:
    - logstash
    - '-f'
    - '/etc/logstash_c/logstash.conf'
    image: repo.k8s.local/docker.elastic.co/logstash/logstash:8.11.0
    env:
    - name: TZ
      value: "Asia/Shanghai"
    - name: NODE_NAME
      valueFrom:
        fieldRef:
          fieldPath: spec.nodeName        
    volumeMounts:
    - name: config-volume
      mountPath: /etc/logstash_c/
    - name: config-yml-volume
      mountPath: /usr/share/logstash/config/
    resources: #logstash一定要加上资源限制,避免对其他业务造成资源抢占影响
      limits:
        cpu: 1000m
        memory: 2048Mi
      requests:
        cpu: 512m
        memory: 512Mi
  volumes:  
  - name: config-volume
    configMap:
      name: logstash-conf
      items:
      - key: logstash.conf
        path: logstash.conf
  - name: config-yml-volume
    configMap:
      name: logstash-yml
      items:
      - key: logstash.yml
        path: logstash.yml 
  nodeSelector:
    ingresstype: ingress-nginx            
  tolerations:
    - key: node-role.kubernetes.io/master
      effect: NoSchedule

vi log-es-logstash-svc.yaml

apiVersion: v1
kind: Service
metadata:
name: logstash
annotations:
labels:
app: logstash
namespace: log-es
spec:

type: NodePort

type: ClusterIP
ports:

  • name: http
    port: 5044

    nodePort: 30044

    protocol: TCP
    targetPort: 5044
    clusterIP: None
    selector:
    app: logstash


vi log-es-logstash-ConfigMap.yaml

apiVersion: v1
kind: ConfigMap
metadata:
name: logstash-conf
namespace: log-es
labels:
app: logstash
data:
logstash.conf: |-
input {
beats {
port => 5044
}
}
filter{
if [agent][type] == "filebeat" {
mutate{
remove_field => "[agent]"
remove_field => "[ecs]"
remove_field => "[log][offset]"
}
}
if [input][type] == "container" {
mutate{
remove_field => "[kubernetes][node][hostname]"
remove_field => "[kubernetes][labels]"
remove_field => "[kubernetes][namespace_labels]"
remove_field => "[kubernetes][node][labels]"
}
}

处理ingress日志

  if [kubernetes][container][name] == "nginx-ingress-controller" {
    json {
      source => "message"
      target => "ingress_log"
    }
    if [ingress_log][requesttime] {
        mutate {
        convert => ["[ingress_log][requesttime]", "float"]
        }
    }
    if [ingress_log][upstremtime] {
        mutate {
        convert => ["[ingress_log][upstremtime]", "float"]
        }
    }
    if [ingress_log][status] {
        mutate {
        convert => ["[ingress_log][status]", "float"]
        }
    }
    if  [ingress_log][httphost] and [ingress_log][uri] {
        mutate {
          add_field => {"[ingress_log][entry]" => "%{[ingress_log][httphost]}%{[ingress_log][uri]}"}
        }
        mutate{
          split => ["[ingress_log][entry]","/"]
        }
        if [ingress_log][entry][1] {
          mutate{
          add_field => {"[ingress_log][entrypoint]" => "%{[ingress_log][entry][0]}/%{[ingress_log][entry][1]}"}
          remove_field => "[ingress_log][entry]"
          }
        }
        else{
          mutate{
          add_field => {"[ingress_log][entrypoint]" => "%{[ingress_log][entry][0]}/"}
          remove_field => "[ingress_log][entry]"
          }
        }
    }
  }
  # 处理以srv进行开头的业务服务日志
  if [kubernetes][container][name] =~ /^srv*/ {
    json {
      source => "message"
      target => "tmp"
    }
    if [kubernetes][namespace] == "kube-system" {
      drop{}
    }
    if [tmp][level] {
      mutate{
        add_field => {"[applog][level]" => "%{[tmp][level]}"}
      }
      if [applog][level] == "debug"{
        drop{}
      }
    }
    if [tmp][msg]{
      mutate{
        add_field => {"[applog][msg]" => "%{[tmp][msg]}"}
      }
    }
    if [tmp][func]{
      mutate{
      add_field => {"[applog][func]" => "%{[tmp][func]}"}
      }
    }
    if [tmp][cost]{
      if "ms" in [tmp][cost]{
        mutate{
          split => ["[tmp][cost]","m"]
          add_field => {"[applog][cost]" => "%{[tmp][cost][0]}"}
          convert => ["[applog][cost]", "float"]
        }
      }
      else{
        mutate{
        add_field => {"[applog][cost]" => "%{[tmp][cost]}"}
        }
      }
    }
    if [tmp][method]{
      mutate{
      add_field => {"[applog][method]" => "%{[tmp][method]}"}
      }
    }
    if [tmp][request_url]{
      mutate{
        add_field => {"[applog][request_url]" => "%{[tmp][request_url]}"}
      }
    }
    if [tmp][meta._id]{
      mutate{
        add_field => {"[applog][traceId]" => "%{[tmp][meta._id]}"}
      }
    }
    if [tmp][project] {
      mutate{
        add_field => {"[applog][project]" => "%{[tmp][project]}"}
      }
    }
    if [tmp][time] {
      mutate{
      add_field => {"[applog][time]" => "%{[tmp][time]}"}
      }
    }
    if [tmp][status] {
      mutate{
        add_field => {"[applog][status]" => "%{[tmp][status]}"}
      convert => ["[applog][status]", "float"]
      }
    }
  }
  mutate{
    rename => ["kubernetes", "k8s"]
    remove_field => "beat"
    remove_field => "tmp"
    remove_field => "[k8s][labels][app]"
    remove_field => "[event][original]"      
  }
}
output{
    if [source] == "container" {      
      elasticsearch {
        hosts => ["http://log-es-elastic-sts-0.es-kibana-svc.log-es.svc.cluster.local:9200"]
        codec => json
        index => "k8s-logstash-container-%{+YYYY.MM.dd}"
      }
      #stdout { codec => rubydebug }
    }
    if [source] == "ingress-nginx-access" {
      elasticsearch {
        hosts => ["http://log-es-elastic-sts-0.es-kibana-svc.log-es.svc.cluster.local:9200"]
        codec => json
        index => "k8s-logstash-ingress-nginx-access-%{+YYYY.MM.dd}"
      }
      #stdout { codec => rubydebug }
    }
    if [source] == "ingress-nginx-error" {
      elasticsearch {
        hosts => ["http://log-es-elastic-sts-0.es-kibana-svc.log-es.svc.cluster.local:9200"]
        codec => json
        index => "k8s-logstash-ingress-nginx-error-%{+YYYY.MM.dd}"
      }
      #stdout { codec => rubydebug }
    }
    if [source] == "nginx-access" {
      elasticsearch {
        hosts => ["http://log-es-elastic-sts-0.es-kibana-svc.log-es.svc.cluster.local:9200"]
        codec => json
        index => "k8s-logstash-nginx-access-%{+YYYY.MM.dd}"
      }
      #stdout { codec => rubydebug }
    }                  
    #stdout { codec => rubydebug }
}

apiVersion: v1
kind: ConfigMap
metadata:
name: logstash-yml
namespace: log-es
labels:
app: logstash
data:
logstash.yml: |-
http.host: "0.0.0.0"
xpack.monitoring.elasticsearch.hosts: http://log-es-elastic-sts-0.es-kibana-svc.log-es.svc.cluster.local:9200

kubectl apply -f log-es-logstash-ConfigMap.yaml
kubectl delete -f log-es-logstash-ConfigMap.yaml
kubectl apply -f log-es-logstash-deploy.yaml
kubectl delete -f log-es-logstash-deploy.yaml
kubectl apply -f log-es-logstash-svc.yaml
kubectl delete -f log-es-logstash-svc.yaml

kubectl apply -f log-es-filebeat-configmap.yaml

kubectl get pod -n log-es -o wide
kubectl get cm -n log-es
kubectl get ds -n log-es
kubectl edit configmap log-es-filebeat-config -n log-es
kubectl get service -n log-es

查看详细

kubectl -n log-es describe pod filebeat-97l85
kubectl -n log-es logs -f logstash-847d7f5b56-jv5jj
kubectl logs -n log-es $(kubectl get pod -n log-es -o jsonpath='{.items[3].metadata.name}’) -f
kubectl exec -n log-es -it filebeat-97l85 — /bin/sh -c ‘cat /tmp/filebeat*’

更新configmap后需手动重启pod

kubectl rollout restart ds/filebeat -n log-es
kubectl rollout restart deploy/logstash -n log-es

强制关闭

kubectl delete pod filebeat-fncq9 -n log-es –force –grace-period=0

kubectl exec -it pod/test-pod-1 -n test — ping logstash.log-es.svc.cluster.local
kubectl exec -it pod/test-pod-1 -n test — curl http://logstash.log-es.svc.cluster.local:5044

...svc.cluster.local
“`

“`
#停止服务
kubectl delete -f log-es-filebeat-daemonset.yaml
kubectl delete -f log-es-logstash-deploy.yaml
kubectl delete -f log-es-kibana-sts.yaml
kubectl delete -f log-es-kibana-svc.yaml

“`

# filebeat使用syslog接收
log-es-filebeat-configmap.yaml
“`
– type: syslog
format: auto
id: syslog-id
enabled: true
max_message_size: 20KiB
timeout: 10
keep_null: true
processors:
– drop_fields:
fields: [“input”,”agent”,”ecs.version”,”log.offset”,”event”,”syslog”]
ignore_missing: true
protocol.udp:
host: “0.0.0.0:33514”
tags: [“web-access”]
fields:
source: “syslog-web-access”
“`
damonset的node上查看是否启用的hostport
netstat -anup
lsof -i

##pod中安装ping测试
“`
curl -o /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-6.repo
sed -i -e ‘/mirrors.cloud.aliyuncs.com/d’ -e ‘/mirrors.aliyuncs.com/d’ /etc/yum.repos.d/CentOS-Base.repo
sed -i ‘s/mirrors.aliyun.com/vault.centos.org/g’ /etc/yum.repos.d/CentOS-Base.repo
sed -i ‘s/gpgcheck=1/gpgcheck=0/g’ /etc/yum.repos.d/CentOS-Base.repo

yum clean all && yum makecache
yum install iputils

ping 192.168.244.4
“`
admin
https://nginx.org/en/docs/syslog.html
nginx支持udp发送,不支持tcp
nginx配制文件
“`
access_log syslog:server=192.168.244.7:33514,facility=local5,tag=data_c1gstudiodotnet,severity=info access;
“`
filebeat 可以接收到。
“`
{
“@timestamp”: “2024-02-20T02:35:41.000Z”,
“@metadata”: {
“beat”: “filebeat”,
“type”: “_doc”,
“version”: “8.11.0”,
“truncated”: false
},
“hostname”: “openresty-php5.2-6cbdff6bbd-7fjdc”,
“process”: {
“program”: “data_c1gstudiodotnet”
},
“host”: {
“name”: “node02.k8s.local”
},
“agent”: {
“id”: “cf964318-5fdc-493e-ae2c-d2acb0bc6ca8”,
“name”: “node02.k8s.local”,
“type”: “filebeat”,
“version”: “8.11.0”,
“ephemeral_id”: “42789eee-3658-4f0f-982e-cb96d18fd9a2”
},
“message”: “10.100.3.80 – – [20/Feb/2024:10:35:41 +0800] \”GET /admin/imgcode/imgcode.php HTTP/1.1\” 200 1271 \”https://data.c1gstudio.net:31443/admin/login.php?1\” \”Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.5735.289 Safari/537.36\” “,
“ecs”: {
“version”: “8.0.0”
},
“log”: {
“source”: {
“address”: “10.244.2.216:37255”
}
},
“tags”: [
“syslog-web-log”
],
“fields”: {
“source”: “syslog-nginx-access”
}
}
“`

测试filebeat使用syslog接收
echo "hello" > /dev/udp/192.168.244.4/1514

进阶版,解决filebeat宿主机ip问题。
部署时将node的ip写入hosts中,nginx中使用主机名来通信
“`
containers:
– name: openresty-php-fpm5-2-17
env:
– name: MY_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
– name: MY_NODE_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP

command: [“/bin/sh”, “-c”, “echo \”$(MY_NODE_IP) MY_NODE_IP\” >> /etc/hosts;/opt/lemp start;cd /opt/init/ && ./inotify_reload.sh “]

“`
nginx配制
“`
access_log syslog:server=MY_NODE_IP:33514,facility=local5,tag=data_c1gstudiodotnet,severity=info access;
“`

# filebeat使用unix socket接收.
filebeat将socket共享给宿主,nginx挂载宿主socket,将消息发送给socket
nginx 配制文件
“`
access_log syslog:server=unix:/usr/local/filebeat/filebeat.sock,facility=local5,tag=data_c1gstudiodotnet,severity=info access;
“`

各个node上创建共享目录,自动创建的为root 755 pod内不能写
mkdir -m 0777 /localdata/filebeat/socket
chmod 0777 /localdata/filebeat/socket
filebeat配制文件
“`
– type: unix
enabled: true
id: unix-id
max_message_size: 100KiB
path: “/usr/share/filebeat/socket/filebeat.sock”
socket_type: datagram
#group: “website”
processors:
– syslog:
field: message
– drop_fields:
fields: [“input”,”agent”,”ecs”,”log.syslog.severity”,”log.syslog.facility”,”log.syslog.priority”]
ignore_missing: true
tags: [“web-access”]
fields:
source: “unix-web-access”
“`
filebeat的damonset
“`
volumeMounts:
– name: fb-socket
mountPath: /usr/share/filebeat/socket
volumes:
– name: fb-socket
hostPath:
path: /localdata/filebeat/socket
type: DirectoryOrCreate
“`

nginx的deployment
“`
volumeMounts:
– name: host-filebeat-socket
mountPath: “/usr/local/filebeat”
– name: host-filebeat-socket
volumes:
hostPath:
path: /localdata/filebeat/socket
type: Directory
“`
示例
“`
{
“@timestamp”: “2024-02-20T06:17:08.000Z”,
“@metadata”: {
“beat”: “filebeat”,
“type”: “_doc”,
“version”: “8.11.0”
},
“agent”: {
“type”: “filebeat”,
“version”: “8.11.0”,
“ephemeral_id”: “4546cf71-5f33-4f5d-bc91-5f0a58c9b0fd”,
“id”: “cf964318-5fdc-493e-ae2c-d2acb0bc6ca8”,
“name”: “node02.k8s.local”
},
“message”: “10.100.3.80 – – [20/Feb/2024:14:17:08 +0800] \”GET /admin/imgcode/imgcode.php HTTP/1.1\” 200 1356 \”https://data.c1gstudio.net:31443/admin/login.php?1\” \”Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.5735.289 Safari/537.36\” “,
“tags”: [
“web-access”
],
“ecs”: {
“version”: “8.0.0”
},
“fields”: {
“source”: “unix-web-access”
},
“log”: {
“syslog”: {
“hostname”: “openresty-php5.2-78cb7cb54b-bsgt6”,
“appname”: “data_c1gstudiodotnet”
}
},
“host”: {
“name”: “node02.k8s.local”
}
}
“`

#ingress 配制syslog
可以在configmap中配制,还需要解决nodeip问题及ext-ingress是否共用问题。
但是不支持tag,不能定义来源.有个process.program=nginx
使用http-snippet或access-log-params都不行
“`
apiVersion: v1
kind: ConfigMap
metadata:
labels:
app.kubernetes.io/component: controller
app.kubernetes.io/instance: int-ingress-nginx
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/part-of: ingress-nginx
app.kubernetes.io/version: 1.9.5
name: int-ingress-nginx-controller
namespace: int-ingress-nginx
data:
allow-snippet-annotations: “true”
error-log-level: “warn”
enable-syslog: “true”
syslog-host: “192.168.244.7”
syslog-port: “10514”
“`

“`
{
“@timestamp”: “2024-02-21T03:42:40.000Z”,
“@metadata”: {
“beat”: “filebeat”,
“type”: “_doc”,
“version”: “8.11.0”,
“truncated”: false
},
“process”: {
“program”: “nginx”
},
“upstream”: {
“status”: “200”,
“response_length”: “1281”,
“proxy_alternative”: “”,
“addr”: “10.244.2.228:80”,
“name”: “data-c1gstudio-net-svc-web-http”,
“response_time”: “0.004”
},
“timestamp”: “2024-02-21T11:42:40+08:00”,
“req_id”: “16b868da1aba50a72f32776b4a2f5cb2”,
“agent”: {
“ephemeral_id”: “64c4b6d1-3d5c-4079-8bda-18d1a0d063a5”,
“id”: “3bd77823-c801-4dd1-a3e5-1cf25874c09f”,
“name”: “master01.k8s.local”,
“type”: “filebeat”,
“version”: “8.11.0”
},
“log”: {
“source”: {
“address”: “192.168.244.4:49244”
}
},
“request”: {
“status”: 200,
“bytes_sent”: “1491”,
“request_time”: “0.004”,
“request_length”: “94”,
“referer”: “https://data.c1gstudio.net:31443/admin/login.php?1”,
“user_agent”: “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.5735.289 Safari/537.36”,
“request_method”: “GET”,
“request_uri”: “/admin/imgcode/imgcode.php”,
“remote_user”: “”,
“protocol”: “HTTP/2.0”,
“remote_port”: “”,
“real_port”: “11464”,
“x-forward-for”: “10.100.3.80”,
“remote_addr”: “10.100.3.80”,
“hostname”: “data.c1gstudio.net”,
“body_bytes_sent”: “1269”,
“real_ip”: “192.168.244.2”,
“server_name”: “data.c1gstudio.net”
},
“ingress”: {
“service_port”: “http”,
“hostname”: “master01.k8s.local”,
“addr”: “192.168.244.4”,
“port”: “443”,
“namespace”: “data-c1gstudio-net”,
“ingress_name”: “ingress-data-c1gstudio-net”,
“service_name”: “svc-web”
},
“message”: “{\”timestamp\”: \”2024-02-21T11:42:40+08:00\”, \”source\”: \”int-ingress\”, \”req_id\”: \”16b868da1aba50a72f32776b4a2f5cb2\”, \”ingress\”:{ \”hostname\”: \”master01.k8s.local\”, \”addr\”: \”192.168.244.4\”, \”port\”: \”443\”,\”namespace\”: \”data-c1gstudio-net\”,\”ingress_name\”: \”ingress-data-c1gstudio-net\”,\”service_name\”: \”svc-web\”,\”service_port\”: \”http\” }, \”upstream\”:{ \”addr\”: \”10.244.2.228:80\”, \”name\”: \”data-c1gstudio-net-svc-web-http\”, \”response_time\”: \”0.004\”, \”status\”: \”200\”, \”response_length\”: \”1281\”, \”proxy_alternative\”: \”\”}, \”request\”:{ \”remote_addr\”: \”10.100.3.80\”, \”real_ip\”: \”192.168.244.2\”, \”remote_port\”: \”\”, \”real_port\”: \”11464\”, \”remote_user\”: \”\”, \”request_method\”: \”GET\”, \”server_name\”: \”data.c1gstudio.net\”,\”hostname\”: \”data.c1gstudio.net\”, \”request_uri\”: \”/admin/imgcode/imgcode.php\”, \”status\”: 200, \”body_bytes_sent\”: \”1269\”, \”bytes_sent\”: \”1491\”, \”request_time\”: \”0.004\”, \”request_length\”: \”94\”, \”referer\”: \”https://data.c1gstudio.net:31443/admin/login.php?1\”, \”user_agent\”: \”Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.5735.289 Safari/537.36\”, \”x-forward-for\”: \”10.100.3.80\”, \”protocol\”: \”HTTP/2.0\”}}”,
“fields”: {
“source”: “syslog-web-access”
},
“source”: “int-ingress”,
“ecs”: {
“version”: “8.0.0”
},
“hostname”: “master01.k8s.local”,
“host”: {
“name”: “master01.k8s.local”
},
“tags”: [
“web-access”
]
}
“`

## filebeat @timestamp 时区问题
默认为UTC,相差8小时。不能修改。
方法一,可以格式化另一字段后进行替换。
方法二,添加一个字段
“`
processors:
– add_locale: ~
“`

https://www.elastic.co/guide/en/beats/filebeat/current/processor-timestamp.html

Posted in 安装k8s/kubernetes.

Tagged with , , , , .


No Responses (yet)

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.



Some HTML is OK

or, reply to this post via trackback.