Skip to content


k8s_安装12_operator_Prometheus+grafana

十二 operator安装Prometheus+grafana

Prometheus

Promtheus 本身只支持单机部署,没有自带支持集群部署,也不支持高可用以及水平扩容,它的存储空间受限于本地磁盘的容量。同时随着数据采集量的增加,单台 Prometheus 实例能够处理的时间序列数会达到瓶颈,这时 CPU 和内存都会升高,一般内存先达到瓶颈,主要原因有:

  • Prometheus 的内存消耗主要是因为每隔 2 小时做一个 Block 数据落盘,落盘之前所有数据都在内存里面,因此和采集量有关。
  • 加载历史数据时,是从磁盘到内存的,查询范围越大,内存越大。这里面有一定的优化空间。
  • 一些不合理的查询条件也会加大内存,如 Group 或大范围 Rate。
    这个时候要么加内存,要么通过集群分片来减少每个实例需要采集的指标。
    Prometheus 主张根据功能或服务维度进行拆分,即如果要采集的服务比较多,一个 Prometheus 实例就配置成仅采集和存储某一个或某一部分服务的指标,这样根据要采集的服务将 Prometheus 拆分成多个实例分别去采集,也能一定程度上达到水平扩容的目的。

安装选型

  • 原生 prometheus
    自行创造一切
    如果您已准备好了Prometheus组件、及其先决条件,则可以通过参考其相互之间的依赖关系,以正确的顺序为Prometheus、Alertmanager、Grafana的所有密钥、以及ConfigMaps等每个组件,手动部署YAML规范文件。这种方法通常非常耗时,并且需要花费大量的精力,去部署和管理Prometheus生态系统。同时,它还需要构建强大的文档,以便将其复制到其他环境中。

  • prometheus-operator
    Prometheus operator并非Prometheus官方组件,是由CoreOS公司研发
    使用Kubernetes Custom Resource简化部署与配置Prometheus、Alertmanager等相关的监控组件 ​
    官方安装文档: https://prometheus-operator.dev/docs/user-guides/getting-started/
    ​Prometheus Operator requires use of Kubernetes v1.16.x and up.)需要Kubernetes版本至少在v1.16.x以上 ​
    官方Github地址:https://github.com/prometheus-operator/prometheus-operator

  • kube-prometheus
    kube-prometheus提供基于Prometheus & Prometheus Operator完整的集群监控配置示例,包括多实例Prometheus & Alertmanager部署与配置及node exporter的metrics采集,以及scrape Prometheus target各种不同的metrics endpoints,Grafana,并提供Alerting rules一些示例,触发告警集群潜在的问题 ​
    官方安装文档:https://prometheus-operator.dev/docs/prologue/quick-start/
    安装要求:https://github.com/prometheus-operator/kube-prometheus#compatibility
    ​官方Github地址:https://github.com/prometheus-operator/kube-prometheus

  • helm chart prometheus-community/kube-prometheus-stack
    提供类似kube-prometheus的功能,但是该项目是由Prometheus-community来维护,
    具体信息参考https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack#kube-prometheus-stack

k8s operator 方式安装 Prometheus+grafana

用operator的方式部署Prometheus+Grafana,这是一种非常简单使用的方法
打开Prometheus operator的GitHub主页https://github.com/prometheus-operator/kube-prometheus,首先确认自己的kubernetes版本应该使用哪个版本的Prometheus operator.
我这里的kubernetes是1.28版本,因此使用的operator应该是release-0.13
https://github.com/prometheus-operator/kube-prometheus/tree/release-0.13

资源准备

安装资源准备

wget –no-check-certificate https://github.com/prometheus-operator/kube-prometheus/archive/refs/tags/v0.13.0.zip -O prometheus-0.13.0.zip
unzip prometheus-0.13.0.zip
cd kube-prometheus-0.13.0

提取image

cat manifests/.yaml|grep image:|sed -e ‘s/.image: //’|sort|uniq
提取出image地址

grafana/grafana:9.5.3
jimmidyson/configmap-reload:v0.5.0
quay.io/brancz/kube-rbac-proxy:v0.14.2
quay.io/prometheus/alertmanager:v0.26.0
quay.io/prometheus/blackbox-exporter:v0.24.0
quay.io/prometheus/node-exporter:v1.6.1
quay.io/prometheus-operator/prometheus-operator:v0.67.1
quay.io/prometheus/prometheus:v2.46.0
registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.9.2
registry.k8s.io/prometheus-adapter/prometheus-adapter:v0.11.1

推送到私仓

手动下载网络不好的,并推送至私仓repo.k8s.local
注意:事配制好私仓repo.k8s.local,并建立相应项目及权限.

#registry.k8s.io地址下的
registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.9.2
registry.k8s.io/prometheus-adapter/prometheus-adapter:v0.11.1

docker pull k8s.dockerproxy.com/kube-state-metrics/kube-state-metrics:v2.9.2
docker pull k8s.dockerproxy.com/prometheus-adapter/prometheus-adapter:v0.11.1

docker tag k8s.dockerproxy.com/kube-state-metrics/kube-state-metrics:v2.9.2 repo.k8s.local/registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.9.2
docker tag k8s.dockerproxy.com/prometheus-adapter/prometheus-adapter:v0.11.1 repo.k8s.local/registry.k8s.io/prometheus-adapter/prometheus-adapter:v0.11.1

docker push repo.k8s.local/registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.9.2
docker push repo.k8s.local/registry.k8s.io/prometheus-adapter/prometheus-adapter:v0.11.1
#重命名docker.io下的
docker pull jimmidyson/configmap-reload:v0.5.0
docker pull grafana/grafana:9.5.3

docker tag jimmidyson/configmap-reload:v0.5.0 repo.k8s.local/docker.io/jimmidyson/configmap-reload:v0.5.0
docker tag grafana/grafana:9.5.3 repo.k8s.local/docker.io/grafana/grafana:9.5.3
docker push repo.k8s.local/docker.io/jimmidyson/configmap-reload:v0.5.0
docker push repo.k8s.local/docker.io/grafana/grafana:9.5.3
kube-prometheus-0.13.0/manifests/prometheusOperator-deployment.yaml
#     - --prometheus-config-reloader=repo.k8s.local/quay.io/prometheus-operator/prometheus-config-reloader:v0.67.1
#quay.io单独一个
docker pull  quay.io/prometheus-operator/prometheus-config-reloader:v0.67.1
docker tag quay.io/prometheus-operator/prometheus-config-reloader:v0.67.1 repo.k8s.local/quay.io/prometheus-operator/prometheus-config-reloader:v0.67.1
docker push repo.k8s.local/quay.io/prometheus-operator/prometheus-config-reloader:v0.67.1
#使用脚本批量下载quay.io
vi images.txt
quay.io/prometheus/alertmanager:v0.26.0
quay.io/prometheus/blackbox-exporter:v0.24.0
quay.io/brancz/kube-rbac-proxy:v0.14.2
quay.io/prometheus/node-exporter:v1.6.1
quay.io/prometheus-operator/prometheus-operator:v0.67.1
quay.io/prometheus/prometheus:v2.46.0

vim auto-pull-and-push-images.sh

#!/bin/bash
#新镜像标签:默认取当前时间作为标签名
imageNewTag=`date +%Y%m%d-%H%M%S`
#镜像仓库地址
registryAddr="repo.k8s.local/"

#循环读取images.txt,并存入list中
n=0

for line in $(cat images.txt | grep ^[^#])
do
    list[$n]=$line
    ((n+=1))
done

echo "需推送的镜像地址如下:"
for variable in ${list[@]}
do
    echo ${variable}
done

for variable in ${list[@]}
do
    #下载镜像
    echo "准备拉取镜像: $variable"
    docker pull $variable

    # #获取拉取的镜像ID
    imageId=`docker images -q $variable`
    echo "[$variable]拉取完成后的镜像ID: $imageId"

    #获取完整的镜像名
    imageFormatName=`docker images --format "{{.Repository}}:{{.Tag}}:{{.ID}}" |grep $variable`
    echo "imageFormatName:$imageFormatName"

    #最开头地址
  #如:quay.io/prometheus-operator/prometheus-operator:v0.67.1  -> quay.io
  repository=${imageFormatName}
    repositoryurl=${imageFormatName%%/*}
    echo "repositoryurl :$repositoryurl"

    #删掉第一个:及其右边的字符串
  #如:quay.io/prometheus-operator/prometheus-operator:v0.67.11:b6ec194a1a0 -> quay.io/prometheus-operator/prometheus-operator:v0.67.11
    repository=${repository%:*}

    echo "新镜像地址: $registryAddr$repository"

    #重新打镜像标签
    docker tag $imageId $registryAddr$repository

    # #推送镜像
    docker push $registryAddr$repository
  echo -e "\n"
done

chmod 755 auto-pull-and-push-images.sh
./auto-pull-and-push-images.sh

替换yaml中image地址为私仓

#测试
sed -n "/image:/{s/image: jimmidyson/image: repo.k8s.local\/docker.io\/jimmidyson/p}" `grep 'image: jimmidyson' ./manifests/ -rl`
sed -n "/image:/{s/image: grafana/image: repo.k8s.local\/docker.io\/grafana/p}" `grep 'image: grafana' ./manifests/ -rl`
sed -n "/image:/{s/image: registry.k8s.io/image: repo.k8s.local\/registry.k8s.io/p}" `grep 'image: registry.k8s.io' ./manifests/ -rl`
sed -n "/image:/{s/image: quay.io/image: repo.k8s.local\/quay.io/p}" `grep 'image: quay.io' ./manifests/ -rl`

#替换
sed -i "/image:/{s/image: jimmidyson/image: repo.k8s.local\/docker.io\/jimmidyson/}" `grep 'image: jimmidyson' ./manifests/ -rl`
sed -i "/image:/{s/image: grafana/image: repo.k8s.local\/docker.io\/grafana/}" `grep 'image: grafana' ./manifests/ -rl`
sed -i "/image:/{s/image: registry.k8s.io/image: repo.k8s.local\/registry.k8s.io/}" `grep 'image: registry.k8s.io' ./manifests/ -rl`
sed -i "/image:/{s/image: quay.io/image: repo.k8s.local\/quay.io/}" `grep 'image: quay.io' ./manifests/ -rl`

#重新验证
cat manifests/*.yaml|grep image:|sed -e 's/.*image: //'
manifests/prometheusOperator-deployment.yaml
      containers:
      - args:
        - --kubelet-service=kube-system/kubelet
        - --prometheus-config-reloader=repo.k8s.local/quay.io/prometheus-operator/prometheus-config-reloader:v0.67.1
        image: repo.k8s.local/quay.io/prometheus-operator/prometheus-operator:v0.67.1
        name: prometheus-operator
修改prometheus-config-reloader
       - --prometheus-config-reloader=repo.k8s.local/quay.io/prometheus-operator/prometheus-config-reloader:v0.67.1

安装Prometheus+Grafana(安装和启动)

首先,回到kube-prometheus-0.13.0 目录,执行以下命令开始安装

kubectl apply --server-side -f manifests/setup

customresourcedefinition.apiextensions.k8s.io/alertmanagerconfigs.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/probes.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/prometheusagents.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/scrapeconfigs.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/thanosrulers.monitoring.coreos.com serverside-applied
namespace/monitoring serverside-applied
kubectl apply -f manifests/
alertmanager.monitoring.coreos.com/main created
networkpolicy.networking.k8s.io/alertmanager-main created
poddisruptionbudget.policy/alertmanager-main created
prometheusrule.monitoring.coreos.com/alertmanager-main-rules created
secret/alertmanager-main created
service/alertmanager-main created
serviceaccount/alertmanager-main created
servicemonitor.monitoring.coreos.com/alertmanager-main created
clusterrole.rbac.authorization.k8s.io/blackbox-exporter created
clusterrolebinding.rbac.authorization.k8s.io/blackbox-exporter created
configmap/blackbox-exporter-configuration created
deployment.apps/blackbox-exporter created
networkpolicy.networking.k8s.io/blackbox-exporter created
service/blackbox-exporter created
serviceaccount/blackbox-exporter created
servicemonitor.monitoring.coreos.com/blackbox-exporter created
secret/grafana-config created
secret/grafana-datasources created
configmap/grafana-dashboard-alertmanager-overview created
configmap/grafana-dashboard-apiserver created
configmap/grafana-dashboard-cluster-total created
configmap/grafana-dashboard-controller-manager created
configmap/grafana-dashboard-grafana-overview created
configmap/grafana-dashboard-k8s-resources-cluster created
configmap/grafana-dashboard-k8s-resources-multicluster created
configmap/grafana-dashboard-k8s-resources-namespace created
configmap/grafana-dashboard-k8s-resources-node created
configmap/grafana-dashboard-k8s-resources-pod created
configmap/grafana-dashboard-k8s-resources-workload created
configmap/grafana-dashboard-k8s-resources-workloads-namespace created
configmap/grafana-dashboard-kubelet created
configmap/grafana-dashboard-namespace-by-pod created
configmap/grafana-dashboard-namespace-by-workload created
configmap/grafana-dashboard-node-cluster-rsrc-use created
configmap/grafana-dashboard-node-rsrc-use created
configmap/grafana-dashboard-nodes-darwin created
configmap/grafana-dashboard-nodes created
configmap/grafana-dashboard-persistentvolumesusage created
configmap/grafana-dashboard-pod-total created
configmap/grafana-dashboard-prometheus-remote-write created
configmap/grafana-dashboard-prometheus created
configmap/grafana-dashboard-proxy created
configmap/grafana-dashboard-scheduler created
configmap/grafana-dashboard-workload-total created
configmap/grafana-dashboards created
deployment.apps/grafana created
networkpolicy.networking.k8s.io/grafana created
prometheusrule.monitoring.coreos.com/grafana-rules created
service/grafana created
serviceaccount/grafana created
servicemonitor.monitoring.coreos.com/grafana created
prometheusrule.monitoring.coreos.com/kube-prometheus-rules created
clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
deployment.apps/kube-state-metrics created
networkpolicy.networking.k8s.io/kube-state-metrics created
prometheusrule.monitoring.coreos.com/kube-state-metrics-rules created
service/kube-state-metrics created
serviceaccount/kube-state-metrics created
servicemonitor.monitoring.coreos.com/kube-state-metrics created
prometheusrule.monitoring.coreos.com/kubernetes-monitoring-rules created
servicemonitor.monitoring.coreos.com/kube-apiserver created
servicemonitor.monitoring.coreos.com/coredns created
servicemonitor.monitoring.coreos.com/kube-controller-manager created
servicemonitor.monitoring.coreos.com/kube-scheduler created
servicemonitor.monitoring.coreos.com/kubelet created
clusterrole.rbac.authorization.k8s.io/node-exporter created
clusterrolebinding.rbac.authorization.k8s.io/node-exporter created
daemonset.apps/node-exporter created
networkpolicy.networking.k8s.io/node-exporter created
prometheusrule.monitoring.coreos.com/node-exporter-rules created
service/node-exporter created
serviceaccount/node-exporter created
servicemonitor.monitoring.coreos.com/node-exporter created
clusterrole.rbac.authorization.k8s.io/prometheus-k8s created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-k8s created
networkpolicy.networking.k8s.io/prometheus-k8s created
poddisruptionbudget.policy/prometheus-k8s created
prometheus.monitoring.coreos.com/k8s created
prometheusrule.monitoring.coreos.com/prometheus-k8s-prometheus-rules created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s-config created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s-config created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
service/prometheus-k8s created
serviceaccount/prometheus-k8s created
servicemonitor.monitoring.coreos.com/prometheus-k8s created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io configured
clusterrole.rbac.authorization.k8s.io/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader configured
clusterrolebinding.rbac.authorization.k8s.io/prometheus-adapter created
clusterrolebinding.rbac.authorization.k8s.io/resource-metrics:system:auth-delegator created
clusterrole.rbac.authorization.k8s.io/resource-metrics-server-resources created
configmap/adapter-config created
deployment.apps/prometheus-adapter created
networkpolicy.networking.k8s.io/prometheus-adapter created
poddisruptionbudget.policy/prometheus-adapter created
rolebinding.rbac.authorization.k8s.io/resource-metrics-auth-reader created
service/prometheus-adapter created
serviceaccount/prometheus-adapter created
servicemonitor.monitoring.coreos.com/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/prometheus-operator created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator created
deployment.apps/prometheus-operator created
networkpolicy.networking.k8s.io/prometheus-operator created
prometheusrule.monitoring.coreos.com/prometheus-operator-rules created
service/prometheus-operator created
serviceaccount/prometheus-operator created
servicemonitor.monitoring.coreos.com/prometheus-operator created
kubectl get pods -o wide -n monitoring
NAME                                   READY   STATUS    RESTARTS   AGE    IP              NODE                 NOMINATED NODE   READINESS GATES
alertmanager-main-0                    2/2     Running   0          82s    10.244.1.6      node01.k8s.local     <none>           <none>
alertmanager-main-1                    2/2     Running   0          82s    10.244.1.7      node01.k8s.local     <none>           <none>
alertmanager-main-2                    2/2     Running   0          82s    10.244.2.3      node02.k8s.local     <none>           <none>
blackbox-exporter-76847bbff-wt77c      3/3     Running   0          104s   10.244.2.252    node02.k8s.local     <none>           <none>
grafana-5955685bfd-shf4s               1/1     Running   0          103s   10.244.2.253    node02.k8s.local     <none>           <none>
kube-state-metrics-7dddfffd96-2ktrs    3/3     Running   0          103s   10.244.1.4      node01.k8s.local     <none>           <none>
node-exporter-g8d5k                    2/2     Running   0          102s   192.168.244.4   master01.k8s.local   <none>           <none>
node-exporter-mqqkc                    2/2     Running   0          102s   192.168.244.7   node02.k8s.local     <none>           <none>
node-exporter-zpfl2                    2/2     Running   0          102s   192.168.244.5   node01.k8s.local     <none>           <none>
prometheus-adapter-6db6c659d4-25lgm    1/1     Running   0          100s   10.244.1.5      node01.k8s.local     <none>           <none>
prometheus-adapter-6db6c659d4-ps5mz    1/1     Running   0          100s   10.244.2.254    node02.k8s.local     <none>           <none>
prometheus-k8s-0                       2/2     Running   0          81s    10.244.1.8      node01.k8s.local     <none>           <none>
prometheus-k8s-1                       2/2     Running   0          81s    10.244.2.4      node02.k8s.local     <none>           <none>
prometheus-operator-797d795d64-4wnw2   2/2     Running   0          99s    10.244.2.2      node02.k8s.local     <none>           <none>
kubectl get svc -n monitoring -o wide
NAME                    TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                      AGE     SELECTOR
alertmanager-main       ClusterIP   10.96.71.121   <none>        9093/TCP,8080/TCP            2m10s   app.kubernetes.io/component=alert-router,app.kubernetes.io/instance=main,app.kubernetes.io/name=alertmanager,app.kubernetes.io/part-of=kube-prometheus
alertmanager-operated   ClusterIP   None           <none>        9093/TCP,9094/TCP,9094/UDP   108s    app.kubernetes.io/name=alertmanager
blackbox-exporter       ClusterIP   10.96.33.150   <none>        9115/TCP,19115/TCP           2m10s   app.kubernetes.io/component=exporter,app.kubernetes.io/name=blackbox-exporter,app.kubernetes.io/part-of=kube-prometheus
grafana                 ClusterIP   10.96.12.88    <none>        3000/TCP                     2m9s    app.kubernetes.io/component=grafana,app.kubernetes.io/name=grafana,app.kubernetes.io/part-of=kube-prometheus
kube-state-metrics      ClusterIP   None           <none>        8443/TCP,9443/TCP            2m9s    app.kubernetes.io/component=exporter,app.kubernetes.io/name=kube-state-metrics,app.kubernetes.io/part-of=kube-prometheus
node-exporter           ClusterIP   None           <none>        9100/TCP                     2m8s    app.kubernetes.io/component=exporter,app.kubernetes.io/name=node-exporter,app.kubernetes.io/part-of=kube-prometheus
prometheus-adapter      ClusterIP   10.96.24.212   <none>        443/TCP                      2m7s    app.kubernetes.io/component=metrics-adapter,app.kubernetes.io/name=prometheus-adapter,app.kubernetes.io/part-of=kube-prometheus
prometheus-k8s          ClusterIP   10.96.57.42    <none>        9090/TCP,8080/TCP            2m8s    app.kubernetes.io/component=prometheus,app.kubernetes.io/instance=k8s,app.kubernetes.io/name=prometheus,app.kubernetes.io/part-of=kube-prometheus
prometheus-operated     ClusterIP   None           <none>        9090/TCP                     107s    app.kubernetes.io/name=prometheus
prometheus-operator     ClusterIP   None           <none>        8443/TCP                     2m6s    app.kubernetes.io/component=controller,app.kubernetes.io/name=prometheus-operator,app.kubernetes.io/part-of=kube-prometheus
kubectl  get svc  -n monitoring 
NAME                    TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                      AGE
alertmanager-main       ClusterIP   10.96.71.121   <none>        9093/TCP,8080/TCP            93m
alertmanager-operated   ClusterIP   None           <none>        9093/TCP,9094/TCP,9094/UDP   93m
blackbox-exporter       ClusterIP   10.96.33.150   <none>        9115/TCP,19115/TCP           93m
grafana                 ClusterIP   10.96.12.88    <none>        3000/TCP                     93m
kube-state-metrics      ClusterIP   None           <none>        8443/TCP,9443/TCP            93m
node-exporter           ClusterIP   None           <none>        9100/TCP                     93m
prometheus-adapter      ClusterIP   10.96.24.212   <none>        443/TCP                      93m
prometheus-k8s          ClusterIP   10.96.57.42    <none>        9090/TCP,8080/TCP            93m
prometheus-operated     ClusterIP   None           <none>        9090/TCP                     93m
prometheus-operator     ClusterIP   None           <none>        8443/TCP                     93m

blackbox_exporter: Prometheus 官方项目,网络探测,dns、ping、http监控
node-exporter:prometheus的exporter,收集Node级别的监控数据,采集机器指标如 CPU、内存、磁盘。
prometheus:监控服务端,从node-exporter拉数据并存储为时序数据。
kube-state-metrics:将prometheus中可以用PromQL查询到的指标数据转换成k8s对应的数据,采集pod、deployment等资源的元信息。
prometheus-adpater:聚合进apiserver,即一种custom-metrics-apiserver实现

创建ingress

方便通过域名访问,前提需安装ingress.

cat > prometheus-ingress.yaml  << EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ingress-prometheus
  namespace: monitoring
  labels:
    app.kubernetes.io/name: nginx-ingress
    app.kubernetes.io/part-of: monitoring
  annotations:
    #kubernetes.io/ingress.class: "nginx"
    #nginx.ingress.kubernetes.io/rewrite-target: /  #rewrite
spec:
  ingressClassName: nginx
  rules:
  - host: prometheus.k8s.local
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: prometheus-k8s
            port:
              name: web
              #number: 9090
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ingress-grafana
  namespace: monitoring
  labels:
    app.kubernetes.io/name: nginx-ingress
    app.kubernetes.io/part-of: monitoring
  annotations:
    #kubernetes.io/ingress.class: "nginx"
    #nginx.ingress.kubernetes.io/rewrite-target: /  #rewrite
spec:
  ingressClassName: nginx
  rules:
  - host: grafana.k8s.local
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: grafana
            port:
              name: http
              #number: 3000
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ingress-alertmanager
  namespace: monitoring
  labels:
    app.kubernetes.io/name: nginx-ingress
    app.kubernetes.io/part-of: monitoring
  annotations:
    #kubernetes.io/ingress.class: "nginx"
    #nginx.ingress.kubernetes.io/rewrite-target: /  #rewrite
spec:
  ingressClassName: nginx
  rules:
  - host: alertmanager.k8s.local
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: alertmanager-main
            port:
              name: web
              #number: 9093

EOF
kubectl delete -f  prometheus-ingress.yaml  
kubectl apply -f  prometheus-ingress.yaml  
kubectl get ingress -A

host文件中添加域名

127.0.0.1 prometheus.k8s.local
127.0.0.1 grafana.k8s.local
127.0.0.1 alertmanager.k8s.local
#测试clusterip
curl -k  -H "Host:prometheus.k8s.local"  http://10.96.57.42:9090/graph
curl -k  -H "Host:grafana.k8s.local"  http://10.96.12.88:3000/login
curl -k  -H "Host:alertmanager.k8s.local"  http://10.96.71.121:9093/
#测试dns
curl -k  http://prometheus-k8s.monitoring.svc:9090
#在测试pod中测试
kubectl exec -it pod/test-pod-1 -n test -- ping prometheus-k8s.monitoring

在浏览器上访问
http://prometheus.k8s.local:30180/
http://grafana.k8s.local:30180/
admin/admin
http://alertmanager.k8s.local:30180/#/alerts

#重启pod
kubectl get pods -n monitoring

kubectl rollout restart deployment/grafana -n monitoring
kubectl rollout restart sts/prometheus-k8s -n monitoring

卸载

kubectl delete –ignore-not-found=true -f manifests/ -f manifests/setup

更改 Prometheus 的显示时区

Prometheus 为避免时区混乱,在所有组件中专门使用 Unix Time 和 Utc 进行显示。不支持在配置文件中设置时区,也不能读取本机 /etc/timezone 时区。

其实这个限制是不影响使用的:

如果做可视化,Grafana是可以做时区转换的。

如果是调接口,拿到了数据中的时间戳,你想怎么处理都可以。

如果因为 Prometheus 自带的 UI 不是本地时间,看着不舒服,2.16 版本的新版 Web UI已经引入了Local Timezone 的选项。

更改 Grafana 的显示时区

默认prometheus显示的是UTC时间,比上海少了8小时。
对于已导入的模板通用设置中的时区及个人资料中的修改无效

helm安装修改values.yaml

   ##defaultDashboardsTimezone: utc
<   defaultDashboardsTimezone: "Asia/Shanghai"

方式一
每次改查询时时区

方式二
另导出一份改了时区的模板

方式三
修改导入模板时区
cat grafana-dashboardDefinitions.yaml|grep -C 2 timezone

              ]
          },
          "timezone": "utc",
          "title": "Alertmanager / Overview",
          "uid": "alertmanager-overview",
--
              ]
          },
          "timezone": "UTC",
          "title": "Kubernetes / API server",
          "uid": "09ec8aa1e996d6ffcd6817bbaff4db1b",
--
              ]
          },
          "timezone": "UTC",
          "title": "Kubernetes / Networking / Cluster",
          "uid": "ff635a025bcfea7bc3dd4f508990a3e9",
--
              ]
          },
          "timezone": "UTC",
          "title": "Kubernetes / Controller Manager",
          "uid": "72e0e05bef5099e5f049b05fdc429ed4",
--
              ]
          },
          "timezone": "",
          "title": "Grafana Overview",
          "uid": "6be0s85Mk",
--
              ]
          },
          "timezone": "UTC",
          "title": "Kubernetes / Compute Resources / Cluster",
          "uid": "efa86fd1d0c121a26444b636a3f509a8",
--
              ]
          },
          "timezone": "UTC",
          "title": "Kubernetes / Compute Resources /  Multi-Cluster",
          "uid": "b59e6c9f2fcbe2e16d77fc492374cc4f",
--
              ]
          },
          "timezone": "UTC",
          "title": "Kubernetes / Compute Resources / Namespace (Pods)",
          "uid": "85a562078cdf77779eaa1add43ccec1e",
--
              ]
          },
          "timezone": "UTC",
          "title": "Kubernetes / Compute Resources / Node (Pods)",
          "uid": "200ac8fdbfbb74b39aff88118e4d1c2c",
--
              ]
          },
          "timezone": "UTC",
          "title": "Kubernetes / Compute Resources / Pod",
          "uid": "6581e46e4e5c7ba40a07646395ef7b23",
--
              ]
          },
          "timezone": "UTC",
          "title": "Kubernetes / Compute Resources / Workload",
          "uid": "a164a7f0339f99e89cea5cb47e9be617",
--
              ]
          },
          "timezone": "UTC",
          "title": "Kubernetes / Compute Resources / Namespace (Workloads)",
          "uid": "a87fb0d919ec0ea5f6543124e16c42a5",
--
              ]
          },
          "timezone": "UTC",
          "title": "Kubernetes / Kubelet",
          "uid": "3138fa155d5915769fbded898ac09fd9",
--
              ]
          },
          "timezone": "UTC",
          "title": "Kubernetes / Networking / Namespace (Pods)",
          "uid": "8b7a8b326d7a6f1f04244066368c67af",
--
              ]
          },
          "timezone": "UTC",
          "title": "Kubernetes / Networking / Namespace (Workload)",
          "uid": "bbb2a765a623ae38130206c7d94a160f",
--
              ]
          },
          "timezone": "utc",
          "title": "Node Exporter / USE Method / Cluster",
          "version": 0
--
              ]
          },
          "timezone": "utc",
          "title": "Node Exporter / USE Method / Node",
          "version": 0
--
              ]
          },
          "timezone": "utc",
          "title": "Node Exporter / MacOS",
          "version": 0
--
              ]
          },
          "timezone": "utc",
          "title": "Node Exporter / Nodes",
          "version": 0
--
              ]
          },
          "timezone": "UTC",
          "title": "Kubernetes / Persistent Volumes",
          "uid": "919b92a8e8041bd567af9edab12c840c",
--
              ]
          },
          "timezone": "UTC",
          "title": "Kubernetes / Networking / Pod",
          "uid": "7a18067ce943a40ae25454675c19ff5c",
--
              ]
          },
          "timezone": "browser",
          "title": "Prometheus / Remote Write",
          "version": 0
--
              ]
          },
          "timezone": "utc",
          "title": "Prometheus / Overview",
          "uid": "",
--
              ]
          },
          "timezone": "UTC",
          "title": "Kubernetes / Proxy",
          "uid": "632e265de029684c40b21cb76bca4f94",
--
              ]
          },
          "timezone": "UTC",
          "title": "Kubernetes / Scheduler",
          "uid": "2e6b6a3b4bddf1427b3a55aa1311c656",
--
              ]
          },
          "timezone": "UTC",
          "title": "Kubernetes / Networking / Workload",
          "uid": "728bf77cc1166d2f3133bf25846876cc",

删除utc时区
sed -rn ‘/"timezone":/{s/"timezone": "."/"timezone": ""/p}’ grafana-dashboardDefinitions.yaml
sed -i ‘/"timezone":/{s/"timezone": ".
"/"timezone": ""/}’ grafana-dashboardDefinitions.yaml

数据持久化

默认没有持久化,重启pod后配制就丢了

准备pvc

提前准备好StorageClass
注意namesapce要和 service一致

cat > grafana-pvc.yaml  << EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: grafana-pvc
  namespace: monitoring
spec:
  storageClassName: managed-nfs-storage  
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 10Gi
EOF

修改yaml

grafana的存储
grafana-deployment.yaml

      serviceAccountName: grafana
      volumes:
      - emptyDir: {}
        name: grafana-storage

修改成

      serviceAccountName: grafana
      volumes:
      - PersistentVolumeClaim: 
          claimName:grafana-pvc
        name: grafana-storage

spec:添加storage
prometheus-prometheus.yaml

  namespace: monitoring
spec:
  storage:
      volumeClaimTemplate:
        spec:
          storageClassName: managed-nfs-storage
          resources:
            requests:
              storage: 10Gi

新增一些权限配置,修改完毕后的完整内容如下所示,新增的位置主要在resources和varbs两处
prometheus-clusterRole.yaml

rules:
- apiGroups:
  - ""
  resources:
  - nodes/metrics
  verbs:
  - get
- nonResourceURLs:
  - /metrics
  verbs:
  - get
rules:
- apiGroups:
  - ""
  resources:
  - nodes/metrics
  - services
  - endpoints
  - pods
  verbs:
  - get
  - list
  - watch
- nonResourceURLs:
  - /metrics
  verbs:
  - get

再执行以下操作,给prometheus增加管理员身份(可酌情选择)

kubectl create clusterrolebinding kube-state-metrics-admin-binding \
--clusterrole=cluster-admin  \
--user=system:serviceaccount:monitoring:kube-state-metrics
kubectl apply -f grafana-pvc.yaml
kubectl apply -f prometheus-clusterRole.yaml

kubectl apply -f grafana-deployment.yaml
kubectl apply -f prometheus-prometheus.yaml

kubectl get pv,pvc -o wide
NAME                                                        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS     CLAIM                                           STORAGECLASS          REASON   AGE   VOLUMEMODE
persistentvolume/pvc-6dfcbb35-dd1a-4784-8c97-34affe78fe19   10Gi       RWX            Delete           Bound      monitoring/grafana-pvc                          managed-nfs-storage            25h   Filesystem
persistentvolume/pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e   10Gi       RWO            Delete           Bound      monitoring/prometheus-k8s-db-prometheus-k8s-1   managed-nfs-storage            25h   Filesystem
persistentvolume/pvc-c57701e8-6ee1-48f0-b23c-a966fd8a18ca   10Gi       RWO            Delete           Bound      monitoring/prometheus-k8s-db-prometheus-k8s-0   managed-nfs-storage            25h   Filesystem

修改动态pv回收为Retain,否测重启pod会删数据

kubectl edit pv -n default pvc-6dfcbb35-dd1a-4784-8c97-34affe78fe19 
persistentVolumeReclaimPolicy: Retain

kubectl edit pv -n default pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e
kubectl edit pv -n default pvc-c57701e8-6ee1-48f0-b23c-a966fd8a18ca

kubectl get pods -n monitoring

在nfs上查看是否有数据生成

ll /nfs/k8s/dpv/
total 0
drwxrwxrwx. 2 root root  6 Oct 24 18:19 default-test-pvc2-pvc-f9153444-5653-4684-a845-83bb313194d1
drwxrwxrwx. 2 root root  6 Nov 22 15:45 monitoring-grafana-pvc-pvc-6dfcbb35-dd1a-4784-8c97-34affe78fe19
drwxrwxrwx. 3 root root 27 Nov 22 15:52 monitoring-prometheus-k8s-db-prometheus-k8s-0-pvc-c57701e8-6ee1-48f0-b23c-a966fd8a18ca
drwxrwxrwx. 3 root root 27 Nov 22 15:52 monitoring-prometheus-k8s-db-prometheus-k8s-1-pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e

kubectl logs -f prometheus-k8s-0 prometheus -n monitoring

自定义pod/service自动发现配置

目标:
用户启动的service或pod,在annotation中添加label后,可以自动被prometheus发现:

annotations:
  prometheus.io/scrape: "true"
  prometheus.io/port: "9121"
  1. secret保存自动发现的配置
    若要特定的annotation被发现,需要为prometheus增加如下配置:
    prometheus-additional.yaml

    cat > prometheus-additional.yaml << EOF
    - job_name: 'kubernetes-service-endpoints'
    kubernetes_sd_configs:
    - role: endpoints
    relabel_configs:
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
    action: keep
    regex: true
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
    action: replace
    target_label: __scheme__
    regex: (https?)
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
    action: replace
    target_label: __metrics_path__
    regex: (.+)
    - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
    action: replace
    target_label: __address__
    regex: ([^:]+)(?::\d+)?;(\d+)
    replacement: \$1:\$2
    - action: labelmap
    regex: __meta_kubernetes_service_label_(.+)
    - source_labels: [__meta_kubernetes_namespace]
    action: replace
    target_label: kubernetes_namespace
    - source_labels: [__meta_kubernetes_service_name]
    action: replace
    target_label: kubernetes_name
    EOF
    有变量再次查看
    cat prometheus-additional.yaml

    上述配置会筛选endpoints:prometheus.io/scrape=True

在需要监控的服务中添加

  annotations: 
     prometheus.io/scrape: "True"

将上述配置保存为secret:

kubectl delete secret additional-configs -n monitoring
kubectl create secret generic additional-configs --from-file=prometheus-additional.yaml -n monitoring
secret "additional-configs" created
kubectl get secret additional-configs -n monitoring  -o yaml 
  1. 将配置添加到prometheus实例
    修改prometheus CRD,将上面的secret添加进去:

vi prometheus-prometheus.yaml

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  labels:
    prometheus: k8s
  name: k8s
  namespace: monitoring
spec:
  ......
  additionalScrapeConfigs:
    name: additional-configs
    key: prometheus-additional.yaml
  serviceAccountName: prometheus-k8s
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector: {}
  version: 2.46.0

kubectl apply -f prometheus-prometheus.yaml

prometheus CRD修改完毕,可以到prometheus dashboard查看config是否被修改。
http://prometheus.k8s.local:30180/targets?search=#pool-kubernetes-service-endpoints

kubectl get pods -n monitoring -o wide
kubectl rollout restart sts/prometheus-k8s -n monitoring
kubectl logs -f prometheus-k8s-0 prometheus -n monitoring

nfs重启后服务503,无法关闭pod

#df -h 无反应,nfs卡死,需重启客户端服务器
kubectl get pods -n monitoring

kubectl delete -f prometheus-prometheus.yaml
kubectl delete pod prometheus-k8s-1  -n monitoring
kubectl delete pod prometheus-k8s-1 --grace-period=0 --force --namespace monitoring

kubectl delete -f grafana-deployment.yaml
kubectl apply -f grafana-deployment.yaml

kubectl apply -f prometheus-prometheus.yaml
kubectl logs -n monitoring pod prometheus-k8s-0 
kubectl describe -n monitoring pod prometheus-k8s-0 
kubectl describe -n monitoring pod prometheus-k8s-1 
kubectl describe -n monitoring pod grafana-65fdddb9c7-xml6m  

kubectl get pv,pvc -o wide

persistentvolume/pvc-6dfcbb35-dd1a-4784-8c97-34affe78fe19   10Gi       RWX            Delete           Bound      monitoring/grafana-pvc                          managed-nfs-storage            25h   Filesystem
persistentvolume/pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e   10Gi       RWO            Delete           Bound      monitoring/prometheus-k8s-db-prometheus-k8s-1   managed-nfs-storage            25h   Filesystem
persistentvolume/pvc-c57701e8-6ee1-48f0-b23c-a966fd8a18ca   10Gi       RWO            Delete           Bound      monitoring/prometheus-k8s-db-prometheus-k8s-0   managed-nfs-storage            25h   Filesystem
persistentvolume/pvc-f9153444-5653-4684-a845-83bb313194d1   300Mi      RWX            Retain           Released   default/test-pvc2                               managed-nfs-storage            29d   Filesystem

#完全删除重装
kubectl delete -f manifests/
kubectl apply -f manifests/

当nfs异常时,网元进程读nfs挂载目录超时卡住,导致线程占满,无法响应k8s心跳检测,一段时间后,k8s重启该网元pod,在终止pod时,由于nfs异常,umount卡住,导致pod一直处于Terminating状态。

去原有布署的node上卸载nfs挂载点
mount -l | grep nfs

sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)
192.168.244.6:/nfs/k8s/dpv/monitoring-prometheus-k8s-db-prometheus-k8s-1-pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e on /var/lib/kubelet/pods/67309a97-b69c-4423-9353-74863d55b3be/volumes/kubernetes.io~nfs/pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e type nfs4 (rw,relatime,vers=4.1,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.244.7,local_lock=none,addr=192.168.244.6)
192.168.244.6:/nfs/k8s/dpv/monitoring-prometheus-k8s-db-prometheus-k8s-1-pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e/prometheus-db on /var/lib/kubelet/pods/67309a97-b69c-4423-9353-74863d55b3be/volume-subpaths/pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e/prometheus/2 type nfs4 (rw,relatime,vers=4.1,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.244.7,local_lock=none,addr=192.168.244.6)
umount -l -f /var/lib/kubelet/pods/67309a97-b69c-4423-9353-74863d55b3be/volumes/kubernetes.io~nfs/pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e

修改默认挂载为Soft方式

vi /etc/nfsmount.conf
Soft=True

soft 当NFS Client以soft挂载Server后,若网络或Server出现问题,造成Client和Server无法传输资料时,Client会一直尝试到 timeout后显示错误并且停止尝试。若使用soft mount的话,可能会在timeout出现时造成资料丢失,故一般不建议使用。
hard 这是默认值。若用hard挂载硬盘时,刚好和soft相反,此时Client会一直尝试连线到Server,若Server有回应就继续刚才的操作,若没有回应NFS Client会一直尝试,此时无法umount或kill,所以常常会配合intr使用。
intr 当使用hard挂载的资源timeout后,若有指定intr可以在timeout后把它中断掉,这避免出问题时系统整个被NFS锁死,建议使用。

StatefulSet删除pv后,pod起来还是会找原pv,不建议删pv

kubectl get pv -o wide
pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e   10Gi       RWO            Retain           Bound      monitoring/prometheus-k8s-db-prometheus-k8s-1   managed-nfs-storage            26h   Filesystem
pvc-c57701e8-6ee1-48f0-b23c-a966fd8a18ca   10Gi       RWO            Retain           Bound      monitoring/prometheus-k8s-db-prometheus-k8s-0   managed-nfs-storage            26h   Filesystem

kubectl patch pv pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e -p '{"metadata":{"finalizers":null}}'
kubectl patch pv pvc-c57701e8-6ee1-48f0-b23c-a966fd8a18ca -p '{"metadata":{"finalizers":null}}'
kubectl delete pv pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e
kubectl delete pv pvc-c57701e8-6ee1-48f0-b23c-a966fd8a18ca 

kubectl describe pvc pvc-6dfcbb35-dd1a-4784-8c97-34affe78fe19 | grep Mounted
kubectl patch pv pvc-6dfcbb35-dd1a-4784-8c97-34affe78fe19 -p '{"metadata":{"finalizers":null}}'
kubectl delete pv pvc-6dfcbb35-dd1a-4784-8c97-34affe78fe19

恢复pv

恢复grafana-pvc,从nfs的动态pv目录下找到原来的挂载点 monitoring-grafana-pvc-pvc-6dfcbb35-dd1a-4784-8c97-34affe78fe19

kubectl describe -n monitoring pod grafana-65fdddb9c7-xml6m
default-scheduler 0/3 nodes are available: persistentvolumeclaim "grafana-pvc" bound to non-existent persistentvolume "pvc-6dfcbb35-dd1a-4784-8c97-34affe78fe19". preemption: 0/3 nodes are available:
3 Preemption is not helpful for scheduling..

cat > rebuid-grafana-pvc.yaml  << EOF
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pvc-6dfcbb35-dd1a-4784-8c97-34affe78fe19
  labels:
    pv: pvc-6dfcbb35-dd1a-4784-8c97-34affe78fe19
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  storageClassName:  managed-nfs-storage 
  nfs:
    path: /nfs/k8s/dpv/monitoring-grafana-pvc-pvc-6dfcbb35-dd1a-4784-8c97-34affe78fe19
    server: 192.168.244.6
EOF
kubectl apply -f ../k8s/rebuid-grafana-pvc.yaml 

恢复prometheus-k8s-0

kubectl describe -n monitoring pod prometheus-k8s-0
Warning FailedScheduling 14m (x3 over 24m) default-scheduler 0/3 nodes are available: persistentvolumeclaim "prometheus-k8s-db-prometheus-k8s-0" bound to non-existent persistentvolume "pvc-c57701e8-6ee1-48f0-b23c-a966fd8a18ca". preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling..

cat > rebuid-prometheus-k8s-0-pv.yaml  << EOF
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pvc-c57701e8-6ee1-48f0-b23c-a966fd8a18ca
  labels:
    pv: pvc-c57701e8-6ee1-48f0-b23c-a966fd8a18ca
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  storageClassName:  managed-nfs-storage 
  nfs:
    path: /nfs/k8s/dpv/monitoring-prometheus-k8s-db-prometheus-k8s-0-pvc-c57701e8-6ee1-48f0-b23c-a966fd8a18ca
    server: 192.168.244.6
EOF

kubectl describe -n monitoring pod prometheus-k8s-1
Warning FailedScheduling 19m (x3 over 29m) default-scheduler 0/3 nodes are available: persistentvolumeclaim "prometheus-k8s-db-prometheus-k8s-1" bound to non-existent persistentvolume "pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e". preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling..

cat > rebuid-prometheus-k8s-1-pv.yaml  << EOF
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e
  labels:
    pv: pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  storageClassName:  managed-nfs-storage 
  nfs:
    path: /nfs/k8s/dpv/monitoring-prometheus-k8s-db-prometheus-k8s-1-pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e
    server: 192.168.244.6
EOF
kubectl apply -f rebuid-prometheus-k8s-0-pv.yaml 
kubectl apply -f rebuid-prometheus-k8s-1-pv.yaml 

kubectl get pv -o wide
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                           STORAGECLASS          REASON   AGE     VOLUMEMODE
pvc-6dfcbb35-dd1a-4784-8c97-34affe78fe19   10Gi       RWX            Retain           Bound    monitoring/grafana-pvc                          managed-nfs-storage            9m17s   Filesystem
pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e   10Gi       RWX            Retain           Bound    monitoring/prometheus-k8s-db-prometheus-k8s-1   managed-nfs-storage            17s     Filesystem
pvc-c57701e8-6ee1-48f0-b23c-a966fd8a18ca   10Gi       RWX            Retain           Bound    monitoring/prometheus-k8s-db-prometheus-k8s-0   managed-nfs-storage            2m37s   Filesystem
kubectl get pods -n monitoring
kubectl -n monitoring logs -f prometheus-k8s-1

Error from server (BadRequest): container "prometheus" in pod "prometheus-k8s-1" is waiting to start: PodInitializing
iowait很高
iostat -kx 1
有很多挂载进程
ps aux|grep mount

mount -t nfs 192.168.244.6:/nfs/k8s/dpv/monitoring-prometheus-k8s-db-prometheus-k8s-1-pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e ./tmp
showmount -e 192.168.244.6
Export list for 192.168.244.6:
/nfs/k8s/dpv     *
/nfs/k8s/spv_003 *
/nfs/k8s/spv_002 *
/nfs/k8s/spv_001 *
/nfs/k8s/web     *

mount -v -t nfs 192.168.244.6:/nfs/k8s/web ./tmp
mount.nfs: timeout set for Fri Nov 24 14:33:04 2023
mount.nfs: trying text-based options 'soft,vers=4.1,addr=192.168.244.6,clientaddr=192.168.244.5'

mount -v -t nfs -o vers=3  192.168.244.6:/nfs/k8s/web ./tmp
#nfs3可以挂载

如果客户端正在挂载使用,服务器端 NFS 服务突然间停掉了,那么在客户端就会出现执行 df -h命令卡死的现象。
可以杀死挂载点,重启客户端和服务端nfs服务,重新挂载,或重启服务器。

Posted in 安装k8s/kubernetes.

Tagged with , , , .


No Responses (yet)

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.



Some HTML is OK

or, reply to this post via trackback.