十二 operator安装Prometheus+grafana
Prometheus
Promtheus 本身只支持单机部署,没有自带支持集群部署,也不支持高可用以及水平扩容,它的存储空间受限于本地磁盘的容量。同时随着数据采集量的增加,单台 Prometheus 实例能够处理的时间序列数会达到瓶颈,这时 CPU 和内存都会升高,一般内存先达到瓶颈,主要原因有:
- Prometheus 的内存消耗主要是因为每隔 2 小时做一个 Block 数据落盘,落盘之前所有数据都在内存里面,因此和采集量有关。
- 加载历史数据时,是从磁盘到内存的,查询范围越大,内存越大。这里面有一定的优化空间。
- 一些不合理的查询条件也会加大内存,如 Group 或大范围 Rate。
这个时候要么加内存,要么通过集群分片来减少每个实例需要采集的指标。
Prometheus 主张根据功能或服务维度进行拆分,即如果要采集的服务比较多,一个 Prometheus 实例就配置成仅采集和存储某一个或某一部分服务的指标,这样根据要采集的服务将 Prometheus 拆分成多个实例分别去采集,也能一定程度上达到水平扩容的目的。
安装选型
-
原生 prometheus
自行创造一切
如果您已准备好了Prometheus组件、及其先决条件,则可以通过参考其相互之间的依赖关系,以正确的顺序为Prometheus、Alertmanager、Grafana的所有密钥、以及ConfigMaps等每个组件,手动部署YAML规范文件。这种方法通常非常耗时,并且需要花费大量的精力,去部署和管理Prometheus生态系统。同时,它还需要构建强大的文档,以便将其复制到其他环境中。 -
prometheus-operator
Prometheus operator并非Prometheus官方组件,是由CoreOS公司研发
使用Kubernetes Custom Resource简化部署与配置Prometheus、Alertmanager等相关的监控组件
官方安装文档: https://prometheus-operator.dev/docs/user-guides/getting-started/
Prometheus Operator requires use of Kubernetes v1.16.x and up.)需要Kubernetes版本至少在v1.16.x以上
官方Github地址:https://github.com/prometheus-operator/prometheus-operator -
kube-prometheus
kube-prometheus提供基于Prometheus & Prometheus Operator完整的集群监控配置示例,包括多实例Prometheus & Alertmanager部署与配置及node exporter的metrics采集,以及scrape Prometheus target各种不同的metrics endpoints,Grafana,并提供Alerting rules一些示例,触发告警集群潜在的问题
官方安装文档:https://prometheus-operator.dev/docs/prologue/quick-start/
安装要求:https://github.com/prometheus-operator/kube-prometheus#compatibility
官方Github地址:https://github.com/prometheus-operator/kube-prometheus -
helm chart prometheus-community/kube-prometheus-stack
提供类似kube-prometheus的功能,但是该项目是由Prometheus-community来维护,
具体信息参考https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack#kube-prometheus-stack
k8s operator 方式安装 Prometheus+grafana
用operator的方式部署Prometheus+Grafana,这是一种非常简单使用的方法
打开Prometheus operator的GitHub主页https://github.com/prometheus-operator/kube-prometheus,首先确认自己的kubernetes版本应该使用哪个版本的Prometheus operator.
我这里的kubernetes是1.28版本,因此使用的operator应该是release-0.13
https://github.com/prometheus-operator/kube-prometheus/tree/release-0.13
资源准备
安装资源准备
wget –no-check-certificate https://github.com/prometheus-operator/kube-prometheus/archive/refs/tags/v0.13.0.zip -O prometheus-0.13.0.zip
unzip prometheus-0.13.0.zip
cd kube-prometheus-0.13.0
提取image
cat manifests/.yaml|grep image:|sed -e ‘s/.image: //’|sort|uniq
提取出image地址
grafana/grafana:9.5.3
jimmidyson/configmap-reload:v0.5.0
quay.io/brancz/kube-rbac-proxy:v0.14.2
quay.io/prometheus/alertmanager:v0.26.0
quay.io/prometheus/blackbox-exporter:v0.24.0
quay.io/prometheus/node-exporter:v1.6.1
quay.io/prometheus-operator/prometheus-operator:v0.67.1
quay.io/prometheus/prometheus:v2.46.0
registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.9.2
registry.k8s.io/prometheus-adapter/prometheus-adapter:v0.11.1
推送到私仓
手动下载网络不好的,并推送至私仓repo.k8s.local
注意:事配制好私仓repo.k8s.local,并建立相应项目及权限.
#registry.k8s.io地址下的
registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.9.2
registry.k8s.io/prometheus-adapter/prometheus-adapter:v0.11.1
docker pull k8s.dockerproxy.com/kube-state-metrics/kube-state-metrics:v2.9.2
docker pull k8s.dockerproxy.com/prometheus-adapter/prometheus-adapter:v0.11.1
docker tag k8s.dockerproxy.com/kube-state-metrics/kube-state-metrics:v2.9.2 repo.k8s.local/registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.9.2
docker tag k8s.dockerproxy.com/prometheus-adapter/prometheus-adapter:v0.11.1 repo.k8s.local/registry.k8s.io/prometheus-adapter/prometheus-adapter:v0.11.1
docker push repo.k8s.local/registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.9.2
docker push repo.k8s.local/registry.k8s.io/prometheus-adapter/prometheus-adapter:v0.11.1
#重命名docker.io下的
docker pull jimmidyson/configmap-reload:v0.5.0
docker pull grafana/grafana:9.5.3
docker tag jimmidyson/configmap-reload:v0.5.0 repo.k8s.local/docker.io/jimmidyson/configmap-reload:v0.5.0
docker tag grafana/grafana:9.5.3 repo.k8s.local/docker.io/grafana/grafana:9.5.3
docker push repo.k8s.local/docker.io/jimmidyson/configmap-reload:v0.5.0
docker push repo.k8s.local/docker.io/grafana/grafana:9.5.3
kube-prometheus-0.13.0/manifests/prometheusOperator-deployment.yaml
# - --prometheus-config-reloader=repo.k8s.local/quay.io/prometheus-operator/prometheus-config-reloader:v0.67.1
#quay.io单独一个
docker pull quay.io/prometheus-operator/prometheus-config-reloader:v0.67.1
docker tag quay.io/prometheus-operator/prometheus-config-reloader:v0.67.1 repo.k8s.local/quay.io/prometheus-operator/prometheus-config-reloader:v0.67.1
docker push repo.k8s.local/quay.io/prometheus-operator/prometheus-config-reloader:v0.67.1
#使用脚本批量下载quay.io
vi images.txt
quay.io/prometheus/alertmanager:v0.26.0
quay.io/prometheus/blackbox-exporter:v0.24.0
quay.io/brancz/kube-rbac-proxy:v0.14.2
quay.io/prometheus/node-exporter:v1.6.1
quay.io/prometheus-operator/prometheus-operator:v0.67.1
quay.io/prometheus/prometheus:v2.46.0
vim auto-pull-and-push-images.sh
#!/bin/bash
#新镜像标签:默认取当前时间作为标签名
imageNewTag=`date +%Y%m%d-%H%M%S`
#镜像仓库地址
registryAddr="repo.k8s.local/"
#循环读取images.txt,并存入list中
n=0
for line in $(cat images.txt | grep ^[^#])
do
list[$n]=$line
((n+=1))
done
echo "需推送的镜像地址如下:"
for variable in ${list[@]}
do
echo ${variable}
done
for variable in ${list[@]}
do
#下载镜像
echo "准备拉取镜像: $variable"
docker pull $variable
# #获取拉取的镜像ID
imageId=`docker images -q $variable`
echo "[$variable]拉取完成后的镜像ID: $imageId"
#获取完整的镜像名
imageFormatName=`docker images --format "{{.Repository}}:{{.Tag}}:{{.ID}}" |grep $variable`
echo "imageFormatName:$imageFormatName"
#最开头地址
#如:quay.io/prometheus-operator/prometheus-operator:v0.67.1 -> quay.io
repository=${imageFormatName}
repositoryurl=${imageFormatName%%/*}
echo "repositoryurl :$repositoryurl"
#删掉第一个:及其右边的字符串
#如:quay.io/prometheus-operator/prometheus-operator:v0.67.11:b6ec194a1a0 -> quay.io/prometheus-operator/prometheus-operator:v0.67.11
repository=${repository%:*}
echo "新镜像地址: $registryAddr$repository"
#重新打镜像标签
docker tag $imageId $registryAddr$repository
# #推送镜像
docker push $registryAddr$repository
echo -e "\n"
done
chmod 755 auto-pull-and-push-images.sh
./auto-pull-and-push-images.sh
替换yaml中image地址为私仓
#测试
sed -n "/image:/{s/image: jimmidyson/image: repo.k8s.local\/docker.io\/jimmidyson/p}" `grep 'image: jimmidyson' ./manifests/ -rl`
sed -n "/image:/{s/image: grafana/image: repo.k8s.local\/docker.io\/grafana/p}" `grep 'image: grafana' ./manifests/ -rl`
sed -n "/image:/{s/image: registry.k8s.io/image: repo.k8s.local\/registry.k8s.io/p}" `grep 'image: registry.k8s.io' ./manifests/ -rl`
sed -n "/image:/{s/image: quay.io/image: repo.k8s.local\/quay.io/p}" `grep 'image: quay.io' ./manifests/ -rl`
#替换
sed -i "/image:/{s/image: jimmidyson/image: repo.k8s.local\/docker.io\/jimmidyson/}" `grep 'image: jimmidyson' ./manifests/ -rl`
sed -i "/image:/{s/image: grafana/image: repo.k8s.local\/docker.io\/grafana/}" `grep 'image: grafana' ./manifests/ -rl`
sed -i "/image:/{s/image: registry.k8s.io/image: repo.k8s.local\/registry.k8s.io/}" `grep 'image: registry.k8s.io' ./manifests/ -rl`
sed -i "/image:/{s/image: quay.io/image: repo.k8s.local\/quay.io/}" `grep 'image: quay.io' ./manifests/ -rl`
#重新验证
cat manifests/*.yaml|grep image:|sed -e 's/.*image: //'
manifests/prometheusOperator-deployment.yaml
containers:
- args:
- --kubelet-service=kube-system/kubelet
- --prometheus-config-reloader=repo.k8s.local/quay.io/prometheus-operator/prometheus-config-reloader:v0.67.1
image: repo.k8s.local/quay.io/prometheus-operator/prometheus-operator:v0.67.1
name: prometheus-operator
修改prometheus-config-reloader
- --prometheus-config-reloader=repo.k8s.local/quay.io/prometheus-operator/prometheus-config-reloader:v0.67.1
安装Prometheus+Grafana(安装和启动)
首先,回到kube-prometheus-0.13.0 目录,执行以下命令开始安装
kubectl apply --server-side -f manifests/setup
customresourcedefinition.apiextensions.k8s.io/alertmanagerconfigs.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/probes.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/prometheusagents.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/scrapeconfigs.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/thanosrulers.monitoring.coreos.com serverside-applied
namespace/monitoring serverside-applied
kubectl apply -f manifests/
alertmanager.monitoring.coreos.com/main created
networkpolicy.networking.k8s.io/alertmanager-main created
poddisruptionbudget.policy/alertmanager-main created
prometheusrule.monitoring.coreos.com/alertmanager-main-rules created
secret/alertmanager-main created
service/alertmanager-main created
serviceaccount/alertmanager-main created
servicemonitor.monitoring.coreos.com/alertmanager-main created
clusterrole.rbac.authorization.k8s.io/blackbox-exporter created
clusterrolebinding.rbac.authorization.k8s.io/blackbox-exporter created
configmap/blackbox-exporter-configuration created
deployment.apps/blackbox-exporter created
networkpolicy.networking.k8s.io/blackbox-exporter created
service/blackbox-exporter created
serviceaccount/blackbox-exporter created
servicemonitor.monitoring.coreos.com/blackbox-exporter created
secret/grafana-config created
secret/grafana-datasources created
configmap/grafana-dashboard-alertmanager-overview created
configmap/grafana-dashboard-apiserver created
configmap/grafana-dashboard-cluster-total created
configmap/grafana-dashboard-controller-manager created
configmap/grafana-dashboard-grafana-overview created
configmap/grafana-dashboard-k8s-resources-cluster created
configmap/grafana-dashboard-k8s-resources-multicluster created
configmap/grafana-dashboard-k8s-resources-namespace created
configmap/grafana-dashboard-k8s-resources-node created
configmap/grafana-dashboard-k8s-resources-pod created
configmap/grafana-dashboard-k8s-resources-workload created
configmap/grafana-dashboard-k8s-resources-workloads-namespace created
configmap/grafana-dashboard-kubelet created
configmap/grafana-dashboard-namespace-by-pod created
configmap/grafana-dashboard-namespace-by-workload created
configmap/grafana-dashboard-node-cluster-rsrc-use created
configmap/grafana-dashboard-node-rsrc-use created
configmap/grafana-dashboard-nodes-darwin created
configmap/grafana-dashboard-nodes created
configmap/grafana-dashboard-persistentvolumesusage created
configmap/grafana-dashboard-pod-total created
configmap/grafana-dashboard-prometheus-remote-write created
configmap/grafana-dashboard-prometheus created
configmap/grafana-dashboard-proxy created
configmap/grafana-dashboard-scheduler created
configmap/grafana-dashboard-workload-total created
configmap/grafana-dashboards created
deployment.apps/grafana created
networkpolicy.networking.k8s.io/grafana created
prometheusrule.monitoring.coreos.com/grafana-rules created
service/grafana created
serviceaccount/grafana created
servicemonitor.monitoring.coreos.com/grafana created
prometheusrule.monitoring.coreos.com/kube-prometheus-rules created
clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
deployment.apps/kube-state-metrics created
networkpolicy.networking.k8s.io/kube-state-metrics created
prometheusrule.monitoring.coreos.com/kube-state-metrics-rules created
service/kube-state-metrics created
serviceaccount/kube-state-metrics created
servicemonitor.monitoring.coreos.com/kube-state-metrics created
prometheusrule.monitoring.coreos.com/kubernetes-monitoring-rules created
servicemonitor.monitoring.coreos.com/kube-apiserver created
servicemonitor.monitoring.coreos.com/coredns created
servicemonitor.monitoring.coreos.com/kube-controller-manager created
servicemonitor.monitoring.coreos.com/kube-scheduler created
servicemonitor.monitoring.coreos.com/kubelet created
clusterrole.rbac.authorization.k8s.io/node-exporter created
clusterrolebinding.rbac.authorization.k8s.io/node-exporter created
daemonset.apps/node-exporter created
networkpolicy.networking.k8s.io/node-exporter created
prometheusrule.monitoring.coreos.com/node-exporter-rules created
service/node-exporter created
serviceaccount/node-exporter created
servicemonitor.monitoring.coreos.com/node-exporter created
clusterrole.rbac.authorization.k8s.io/prometheus-k8s created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-k8s created
networkpolicy.networking.k8s.io/prometheus-k8s created
poddisruptionbudget.policy/prometheus-k8s created
prometheus.monitoring.coreos.com/k8s created
prometheusrule.monitoring.coreos.com/prometheus-k8s-prometheus-rules created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s-config created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s-config created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
service/prometheus-k8s created
serviceaccount/prometheus-k8s created
servicemonitor.monitoring.coreos.com/prometheus-k8s created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io configured
clusterrole.rbac.authorization.k8s.io/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader configured
clusterrolebinding.rbac.authorization.k8s.io/prometheus-adapter created
clusterrolebinding.rbac.authorization.k8s.io/resource-metrics:system:auth-delegator created
clusterrole.rbac.authorization.k8s.io/resource-metrics-server-resources created
configmap/adapter-config created
deployment.apps/prometheus-adapter created
networkpolicy.networking.k8s.io/prometheus-adapter created
poddisruptionbudget.policy/prometheus-adapter created
rolebinding.rbac.authorization.k8s.io/resource-metrics-auth-reader created
service/prometheus-adapter created
serviceaccount/prometheus-adapter created
servicemonitor.monitoring.coreos.com/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/prometheus-operator created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator created
deployment.apps/prometheus-operator created
networkpolicy.networking.k8s.io/prometheus-operator created
prometheusrule.monitoring.coreos.com/prometheus-operator-rules created
service/prometheus-operator created
serviceaccount/prometheus-operator created
servicemonitor.monitoring.coreos.com/prometheus-operator created
kubectl get pods -o wide -n monitoring
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
alertmanager-main-0 2/2 Running 0 82s 10.244.1.6 node01.k8s.local <none> <none>
alertmanager-main-1 2/2 Running 0 82s 10.244.1.7 node01.k8s.local <none> <none>
alertmanager-main-2 2/2 Running 0 82s 10.244.2.3 node02.k8s.local <none> <none>
blackbox-exporter-76847bbff-wt77c 3/3 Running 0 104s 10.244.2.252 node02.k8s.local <none> <none>
grafana-5955685bfd-shf4s 1/1 Running 0 103s 10.244.2.253 node02.k8s.local <none> <none>
kube-state-metrics-7dddfffd96-2ktrs 3/3 Running 0 103s 10.244.1.4 node01.k8s.local <none> <none>
node-exporter-g8d5k 2/2 Running 0 102s 192.168.244.4 master01.k8s.local <none> <none>
node-exporter-mqqkc 2/2 Running 0 102s 192.168.244.7 node02.k8s.local <none> <none>
node-exporter-zpfl2 2/2 Running 0 102s 192.168.244.5 node01.k8s.local <none> <none>
prometheus-adapter-6db6c659d4-25lgm 1/1 Running 0 100s 10.244.1.5 node01.k8s.local <none> <none>
prometheus-adapter-6db6c659d4-ps5mz 1/1 Running 0 100s 10.244.2.254 node02.k8s.local <none> <none>
prometheus-k8s-0 2/2 Running 0 81s 10.244.1.8 node01.k8s.local <none> <none>
prometheus-k8s-1 2/2 Running 0 81s 10.244.2.4 node02.k8s.local <none> <none>
prometheus-operator-797d795d64-4wnw2 2/2 Running 0 99s 10.244.2.2 node02.k8s.local <none> <none>
kubectl get svc -n monitoring -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
alertmanager-main ClusterIP 10.96.71.121 <none> 9093/TCP,8080/TCP 2m10s app.kubernetes.io/component=alert-router,app.kubernetes.io/instance=main,app.kubernetes.io/name=alertmanager,app.kubernetes.io/part-of=kube-prometheus
alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 108s app.kubernetes.io/name=alertmanager
blackbox-exporter ClusterIP 10.96.33.150 <none> 9115/TCP,19115/TCP 2m10s app.kubernetes.io/component=exporter,app.kubernetes.io/name=blackbox-exporter,app.kubernetes.io/part-of=kube-prometheus
grafana ClusterIP 10.96.12.88 <none> 3000/TCP 2m9s app.kubernetes.io/component=grafana,app.kubernetes.io/name=grafana,app.kubernetes.io/part-of=kube-prometheus
kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 2m9s app.kubernetes.io/component=exporter,app.kubernetes.io/name=kube-state-metrics,app.kubernetes.io/part-of=kube-prometheus
node-exporter ClusterIP None <none> 9100/TCP 2m8s app.kubernetes.io/component=exporter,app.kubernetes.io/name=node-exporter,app.kubernetes.io/part-of=kube-prometheus
prometheus-adapter ClusterIP 10.96.24.212 <none> 443/TCP 2m7s app.kubernetes.io/component=metrics-adapter,app.kubernetes.io/name=prometheus-adapter,app.kubernetes.io/part-of=kube-prometheus
prometheus-k8s ClusterIP 10.96.57.42 <none> 9090/TCP,8080/TCP 2m8s app.kubernetes.io/component=prometheus,app.kubernetes.io/instance=k8s,app.kubernetes.io/name=prometheus,app.kubernetes.io/part-of=kube-prometheus
prometheus-operated ClusterIP None <none> 9090/TCP 107s app.kubernetes.io/name=prometheus
prometheus-operator ClusterIP None <none> 8443/TCP 2m6s app.kubernetes.io/component=controller,app.kubernetes.io/name=prometheus-operator,app.kubernetes.io/part-of=kube-prometheus
kubectl get svc -n monitoring
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager-main ClusterIP 10.96.71.121 <none> 9093/TCP,8080/TCP 93m
alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 93m
blackbox-exporter ClusterIP 10.96.33.150 <none> 9115/TCP,19115/TCP 93m
grafana ClusterIP 10.96.12.88 <none> 3000/TCP 93m
kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 93m
node-exporter ClusterIP None <none> 9100/TCP 93m
prometheus-adapter ClusterIP 10.96.24.212 <none> 443/TCP 93m
prometheus-k8s ClusterIP 10.96.57.42 <none> 9090/TCP,8080/TCP 93m
prometheus-operated ClusterIP None <none> 9090/TCP 93m
prometheus-operator ClusterIP None <none> 8443/TCP 93m
blackbox_exporter: Prometheus 官方项目,网络探测,dns、ping、http监控
node-exporter:prometheus的exporter,收集Node级别的监控数据,采集机器指标如 CPU、内存、磁盘。
prometheus:监控服务端,从node-exporter拉数据并存储为时序数据。
kube-state-metrics:将prometheus中可以用PromQL查询到的指标数据转换成k8s对应的数据,采集pod、deployment等资源的元信息。
prometheus-adpater:聚合进apiserver,即一种custom-metrics-apiserver实现
创建ingress
方便通过域名访问,前提需安装ingress.
cat > prometheus-ingress.yaml << EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress-prometheus
namespace: monitoring
labels:
app.kubernetes.io/name: nginx-ingress
app.kubernetes.io/part-of: monitoring
annotations:
#kubernetes.io/ingress.class: "nginx"
#nginx.ingress.kubernetes.io/rewrite-target: / #rewrite
spec:
ingressClassName: nginx
rules:
- host: prometheus.k8s.local
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: prometheus-k8s
port:
name: web
#number: 9090
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress-grafana
namespace: monitoring
labels:
app.kubernetes.io/name: nginx-ingress
app.kubernetes.io/part-of: monitoring
annotations:
#kubernetes.io/ingress.class: "nginx"
#nginx.ingress.kubernetes.io/rewrite-target: / #rewrite
spec:
ingressClassName: nginx
rules:
- host: grafana.k8s.local
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: grafana
port:
name: http
#number: 3000
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress-alertmanager
namespace: monitoring
labels:
app.kubernetes.io/name: nginx-ingress
app.kubernetes.io/part-of: monitoring
annotations:
#kubernetes.io/ingress.class: "nginx"
#nginx.ingress.kubernetes.io/rewrite-target: / #rewrite
spec:
ingressClassName: nginx
rules:
- host: alertmanager.k8s.local
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: alertmanager-main
port:
name: web
#number: 9093
EOF
kubectl delete -f prometheus-ingress.yaml
kubectl apply -f prometheus-ingress.yaml
kubectl get ingress -A
host文件中添加域名
127.0.0.1 prometheus.k8s.local
127.0.0.1 grafana.k8s.local
127.0.0.1 alertmanager.k8s.local
#测试clusterip
curl -k -H "Host:prometheus.k8s.local" http://10.96.57.42:9090/graph
curl -k -H "Host:grafana.k8s.local" http://10.96.12.88:3000/login
curl -k -H "Host:alertmanager.k8s.local" http://10.96.71.121:9093/
#测试dns
curl -k http://prometheus-k8s.monitoring.svc:9090
#在测试pod中测试
kubectl exec -it pod/test-pod-1 -n test -- ping prometheus-k8s.monitoring
在浏览器上访问
http://prometheus.k8s.local:30180/
http://grafana.k8s.local:30180/
admin/admin
http://alertmanager.k8s.local:30180/#/alerts
#重启pod
kubectl get pods -n monitoring
kubectl rollout restart deployment/grafana -n monitoring
kubectl rollout restart sts/prometheus-k8s -n monitoring
卸载
kubectl delete –ignore-not-found=true -f manifests/ -f manifests/setup
更改 Prometheus 的显示时区
Prometheus 为避免时区混乱,在所有组件中专门使用 Unix Time 和 Utc 进行显示。不支持在配置文件中设置时区,也不能读取本机 /etc/timezone 时区。
其实这个限制是不影响使用的:
如果做可视化,Grafana是可以做时区转换的。
如果是调接口,拿到了数据中的时间戳,你想怎么处理都可以。
如果因为 Prometheus 自带的 UI 不是本地时间,看着不舒服,2.16 版本的新版 Web UI已经引入了Local Timezone 的选项。
更改 Grafana 的显示时区
默认prometheus显示的是UTC时间,比上海少了8小时。
对于已导入的模板通用设置中的时区及个人资料中的修改无效
helm安装修改values.yaml
##defaultDashboardsTimezone: utc
< defaultDashboardsTimezone: "Asia/Shanghai"
方式一
每次改查询时时区
方式二
另导出一份改了时区的模板
方式三
修改导入模板时区
cat grafana-dashboardDefinitions.yaml|grep -C 2 timezone
]
},
"timezone": "utc",
"title": "Alertmanager / Overview",
"uid": "alertmanager-overview",
--
]
},
"timezone": "UTC",
"title": "Kubernetes / API server",
"uid": "09ec8aa1e996d6ffcd6817bbaff4db1b",
--
]
},
"timezone": "UTC",
"title": "Kubernetes / Networking / Cluster",
"uid": "ff635a025bcfea7bc3dd4f508990a3e9",
--
]
},
"timezone": "UTC",
"title": "Kubernetes / Controller Manager",
"uid": "72e0e05bef5099e5f049b05fdc429ed4",
--
]
},
"timezone": "",
"title": "Grafana Overview",
"uid": "6be0s85Mk",
--
]
},
"timezone": "UTC",
"title": "Kubernetes / Compute Resources / Cluster",
"uid": "efa86fd1d0c121a26444b636a3f509a8",
--
]
},
"timezone": "UTC",
"title": "Kubernetes / Compute Resources / Multi-Cluster",
"uid": "b59e6c9f2fcbe2e16d77fc492374cc4f",
--
]
},
"timezone": "UTC",
"title": "Kubernetes / Compute Resources / Namespace (Pods)",
"uid": "85a562078cdf77779eaa1add43ccec1e",
--
]
},
"timezone": "UTC",
"title": "Kubernetes / Compute Resources / Node (Pods)",
"uid": "200ac8fdbfbb74b39aff88118e4d1c2c",
--
]
},
"timezone": "UTC",
"title": "Kubernetes / Compute Resources / Pod",
"uid": "6581e46e4e5c7ba40a07646395ef7b23",
--
]
},
"timezone": "UTC",
"title": "Kubernetes / Compute Resources / Workload",
"uid": "a164a7f0339f99e89cea5cb47e9be617",
--
]
},
"timezone": "UTC",
"title": "Kubernetes / Compute Resources / Namespace (Workloads)",
"uid": "a87fb0d919ec0ea5f6543124e16c42a5",
--
]
},
"timezone": "UTC",
"title": "Kubernetes / Kubelet",
"uid": "3138fa155d5915769fbded898ac09fd9",
--
]
},
"timezone": "UTC",
"title": "Kubernetes / Networking / Namespace (Pods)",
"uid": "8b7a8b326d7a6f1f04244066368c67af",
--
]
},
"timezone": "UTC",
"title": "Kubernetes / Networking / Namespace (Workload)",
"uid": "bbb2a765a623ae38130206c7d94a160f",
--
]
},
"timezone": "utc",
"title": "Node Exporter / USE Method / Cluster",
"version": 0
--
]
},
"timezone": "utc",
"title": "Node Exporter / USE Method / Node",
"version": 0
--
]
},
"timezone": "utc",
"title": "Node Exporter / MacOS",
"version": 0
--
]
},
"timezone": "utc",
"title": "Node Exporter / Nodes",
"version": 0
--
]
},
"timezone": "UTC",
"title": "Kubernetes / Persistent Volumes",
"uid": "919b92a8e8041bd567af9edab12c840c",
--
]
},
"timezone": "UTC",
"title": "Kubernetes / Networking / Pod",
"uid": "7a18067ce943a40ae25454675c19ff5c",
--
]
},
"timezone": "browser",
"title": "Prometheus / Remote Write",
"version": 0
--
]
},
"timezone": "utc",
"title": "Prometheus / Overview",
"uid": "",
--
]
},
"timezone": "UTC",
"title": "Kubernetes / Proxy",
"uid": "632e265de029684c40b21cb76bca4f94",
--
]
},
"timezone": "UTC",
"title": "Kubernetes / Scheduler",
"uid": "2e6b6a3b4bddf1427b3a55aa1311c656",
--
]
},
"timezone": "UTC",
"title": "Kubernetes / Networking / Workload",
"uid": "728bf77cc1166d2f3133bf25846876cc",
删除utc时区
sed -rn ‘/"timezone":/{s/"timezone": "."/"timezone": ""/p}’ grafana-dashboardDefinitions.yaml
sed -i ‘/"timezone":/{s/"timezone": "."/"timezone": ""/}’ grafana-dashboardDefinitions.yaml
数据持久化
默认没有持久化,重启pod后配制就丢了
准备pvc
提前准备好StorageClass
注意namesapce要和 service一致
cat > grafana-pvc.yaml << EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: grafana-pvc
namespace: monitoring
spec:
storageClassName: managed-nfs-storage
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10Gi
EOF
修改yaml
grafana的存储
grafana-deployment.yaml
serviceAccountName: grafana
volumes:
- emptyDir: {}
name: grafana-storage
修改成
serviceAccountName: grafana
volumes:
- PersistentVolumeClaim:
claimName:grafana-pvc
name: grafana-storage
spec:添加storage
prometheus-prometheus.yaml
namespace: monitoring
spec:
storage:
volumeClaimTemplate:
spec:
storageClassName: managed-nfs-storage
resources:
requests:
storage: 10Gi
新增一些权限配置,修改完毕后的完整内容如下所示,新增的位置主要在resources和varbs两处
prometheus-clusterRole.yaml
rules:
- apiGroups:
- ""
resources:
- nodes/metrics
verbs:
- get
- nonResourceURLs:
- /metrics
verbs:
- get
rules:
- apiGroups:
- ""
resources:
- nodes/metrics
- services
- endpoints
- pods
verbs:
- get
- list
- watch
- nonResourceURLs:
- /metrics
verbs:
- get
再执行以下操作,给prometheus增加管理员身份(可酌情选择)
kubectl create clusterrolebinding kube-state-metrics-admin-binding \
--clusterrole=cluster-admin \
--user=system:serviceaccount:monitoring:kube-state-metrics
kubectl apply -f grafana-pvc.yaml
kubectl apply -f prometheus-clusterRole.yaml
kubectl apply -f grafana-deployment.yaml
kubectl apply -f prometheus-prometheus.yaml
kubectl get pv,pvc -o wide
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE VOLUMEMODE
persistentvolume/pvc-6dfcbb35-dd1a-4784-8c97-34affe78fe19 10Gi RWX Delete Bound monitoring/grafana-pvc managed-nfs-storage 25h Filesystem
persistentvolume/pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e 10Gi RWO Delete Bound monitoring/prometheus-k8s-db-prometheus-k8s-1 managed-nfs-storage 25h Filesystem
persistentvolume/pvc-c57701e8-6ee1-48f0-b23c-a966fd8a18ca 10Gi RWO Delete Bound monitoring/prometheus-k8s-db-prometheus-k8s-0 managed-nfs-storage 25h Filesystem
修改动态pv回收为Retain,否测重启pod会删数据
kubectl edit pv -n default pvc-6dfcbb35-dd1a-4784-8c97-34affe78fe19
persistentVolumeReclaimPolicy: Retain
kubectl edit pv -n default pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e
kubectl edit pv -n default pvc-c57701e8-6ee1-48f0-b23c-a966fd8a18ca
kubectl get pods -n monitoring
在nfs上查看是否有数据生成
ll /nfs/k8s/dpv/
total 0
drwxrwxrwx. 2 root root 6 Oct 24 18:19 default-test-pvc2-pvc-f9153444-5653-4684-a845-83bb313194d1
drwxrwxrwx. 2 root root 6 Nov 22 15:45 monitoring-grafana-pvc-pvc-6dfcbb35-dd1a-4784-8c97-34affe78fe19
drwxrwxrwx. 3 root root 27 Nov 22 15:52 monitoring-prometheus-k8s-db-prometheus-k8s-0-pvc-c57701e8-6ee1-48f0-b23c-a966fd8a18ca
drwxrwxrwx. 3 root root 27 Nov 22 15:52 monitoring-prometheus-k8s-db-prometheus-k8s-1-pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e
kubectl logs -f prometheus-k8s-0 prometheus -n monitoring
自定义pod/service自动发现配置
目标:
用户启动的service或pod,在annotation中添加label后,可以自动被prometheus发现:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9121"
- secret保存自动发现的配置
若要特定的annotation被发现,需要为prometheus增加如下配置:
prometheus-additional.yamlcat > prometheus-additional.yaml << EOF - job_name: 'kubernetes-service-endpoints' kubernetes_sd_configs: - role: endpoints relabel_configs: - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape] action: keep regex: true - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme] action: replace target_label: __scheme__ regex: (https?) - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port] action: replace target_label: __address__ regex: ([^:]+)(?::\d+)?;(\d+) replacement: \$1:\$2 - action: labelmap regex: __meta_kubernetes_service_label_(.+) - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_service_name] action: replace target_label: kubernetes_name EOF 有变量再次查看 cat prometheus-additional.yaml
上述配置会筛选endpoints:prometheus.io/scrape=True
在需要监控的服务中添加
annotations:
prometheus.io/scrape: "True"
将上述配置保存为secret:
kubectl delete secret additional-configs -n monitoring
kubectl create secret generic additional-configs --from-file=prometheus-additional.yaml -n monitoring
secret "additional-configs" created
kubectl get secret additional-configs -n monitoring -o yaml
- 将配置添加到prometheus实例
修改prometheus CRD,将上面的secret添加进去:
vi prometheus-prometheus.yaml
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
labels:
prometheus: k8s
name: k8s
namespace: monitoring
spec:
......
additionalScrapeConfigs:
name: additional-configs
key: prometheus-additional.yaml
serviceAccountName: prometheus-k8s
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector: {}
version: 2.46.0
kubectl apply -f prometheus-prometheus.yaml
prometheus CRD修改完毕,可以到prometheus dashboard查看config是否被修改。
http://prometheus.k8s.local:30180/targets?search=#pool-kubernetes-service-endpoints
kubectl get pods -n monitoring -o wide
kubectl rollout restart sts/prometheus-k8s -n monitoring
kubectl logs -f prometheus-k8s-0 prometheus -n monitoring
nfs重启后服务503,无法关闭pod
#df -h 无反应,nfs卡死,需重启客户端服务器
kubectl get pods -n monitoring
kubectl delete -f prometheus-prometheus.yaml
kubectl delete pod prometheus-k8s-1 -n monitoring
kubectl delete pod prometheus-k8s-1 --grace-period=0 --force --namespace monitoring
kubectl delete -f grafana-deployment.yaml
kubectl apply -f grafana-deployment.yaml
kubectl apply -f prometheus-prometheus.yaml
kubectl logs -n monitoring pod prometheus-k8s-0
kubectl describe -n monitoring pod prometheus-k8s-0
kubectl describe -n monitoring pod prometheus-k8s-1
kubectl describe -n monitoring pod grafana-65fdddb9c7-xml6m
kubectl get pv,pvc -o wide
persistentvolume/pvc-6dfcbb35-dd1a-4784-8c97-34affe78fe19 10Gi RWX Delete Bound monitoring/grafana-pvc managed-nfs-storage 25h Filesystem
persistentvolume/pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e 10Gi RWO Delete Bound monitoring/prometheus-k8s-db-prometheus-k8s-1 managed-nfs-storage 25h Filesystem
persistentvolume/pvc-c57701e8-6ee1-48f0-b23c-a966fd8a18ca 10Gi RWO Delete Bound monitoring/prometheus-k8s-db-prometheus-k8s-0 managed-nfs-storage 25h Filesystem
persistentvolume/pvc-f9153444-5653-4684-a845-83bb313194d1 300Mi RWX Retain Released default/test-pvc2 managed-nfs-storage 29d Filesystem
#完全删除重装
kubectl delete -f manifests/
kubectl apply -f manifests/
当nfs异常时,网元进程读nfs挂载目录超时卡住,导致线程占满,无法响应k8s心跳检测,一段时间后,k8s重启该网元pod,在终止pod时,由于nfs异常,umount卡住,导致pod一直处于Terminating状态。
去原有布署的node上卸载nfs挂载点
mount -l | grep nfs
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)
192.168.244.6:/nfs/k8s/dpv/monitoring-prometheus-k8s-db-prometheus-k8s-1-pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e on /var/lib/kubelet/pods/67309a97-b69c-4423-9353-74863d55b3be/volumes/kubernetes.io~nfs/pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e type nfs4 (rw,relatime,vers=4.1,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.244.7,local_lock=none,addr=192.168.244.6)
192.168.244.6:/nfs/k8s/dpv/monitoring-prometheus-k8s-db-prometheus-k8s-1-pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e/prometheus-db on /var/lib/kubelet/pods/67309a97-b69c-4423-9353-74863d55b3be/volume-subpaths/pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e/prometheus/2 type nfs4 (rw,relatime,vers=4.1,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.244.7,local_lock=none,addr=192.168.244.6)
umount -l -f /var/lib/kubelet/pods/67309a97-b69c-4423-9353-74863d55b3be/volumes/kubernetes.io~nfs/pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e
修改默认挂载为Soft方式
vi /etc/nfsmount.conf
Soft=True
soft 当NFS Client以soft挂载Server后,若网络或Server出现问题,造成Client和Server无法传输资料时,Client会一直尝试到 timeout后显示错误并且停止尝试。若使用soft mount的话,可能会在timeout出现时造成资料丢失,故一般不建议使用。
hard 这是默认值。若用hard挂载硬盘时,刚好和soft相反,此时Client会一直尝试连线到Server,若Server有回应就继续刚才的操作,若没有回应NFS Client会一直尝试,此时无法umount或kill,所以常常会配合intr使用。
intr 当使用hard挂载的资源timeout后,若有指定intr可以在timeout后把它中断掉,这避免出问题时系统整个被NFS锁死,建议使用。
StatefulSet删除pv后,pod起来还是会找原pv,不建议删pv
kubectl get pv -o wide
pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e 10Gi RWO Retain Bound monitoring/prometheus-k8s-db-prometheus-k8s-1 managed-nfs-storage 26h Filesystem
pvc-c57701e8-6ee1-48f0-b23c-a966fd8a18ca 10Gi RWO Retain Bound monitoring/prometheus-k8s-db-prometheus-k8s-0 managed-nfs-storage 26h Filesystem
kubectl patch pv pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e -p '{"metadata":{"finalizers":null}}'
kubectl patch pv pvc-c57701e8-6ee1-48f0-b23c-a966fd8a18ca -p '{"metadata":{"finalizers":null}}'
kubectl delete pv pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e
kubectl delete pv pvc-c57701e8-6ee1-48f0-b23c-a966fd8a18ca
kubectl describe pvc pvc-6dfcbb35-dd1a-4784-8c97-34affe78fe19 | grep Mounted
kubectl patch pv pvc-6dfcbb35-dd1a-4784-8c97-34affe78fe19 -p '{"metadata":{"finalizers":null}}'
kubectl delete pv pvc-6dfcbb35-dd1a-4784-8c97-34affe78fe19
恢复pv
恢复grafana-pvc,从nfs的动态pv目录下找到原来的挂载点 monitoring-grafana-pvc-pvc-6dfcbb35-dd1a-4784-8c97-34affe78fe19
kubectl describe -n monitoring pod grafana-65fdddb9c7-xml6m
default-scheduler 0/3 nodes are available: persistentvolumeclaim "grafana-pvc" bound to non-existent persistentvolume "pvc-6dfcbb35-dd1a-4784-8c97-34affe78fe19". preemption: 0/3 nodes are available:
3 Preemption is not helpful for scheduling..
cat > rebuid-grafana-pvc.yaml << EOF
apiVersion: v1
kind: PersistentVolume
metadata:
name: pvc-6dfcbb35-dd1a-4784-8c97-34affe78fe19
labels:
pv: pvc-6dfcbb35-dd1a-4784-8c97-34affe78fe19
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: managed-nfs-storage
nfs:
path: /nfs/k8s/dpv/monitoring-grafana-pvc-pvc-6dfcbb35-dd1a-4784-8c97-34affe78fe19
server: 192.168.244.6
EOF
kubectl apply -f ../k8s/rebuid-grafana-pvc.yaml
恢复prometheus-k8s-0
kubectl describe -n monitoring pod prometheus-k8s-0
Warning FailedScheduling 14m (x3 over 24m) default-scheduler 0/3 nodes are available: persistentvolumeclaim "prometheus-k8s-db-prometheus-k8s-0" bound to non-existent persistentvolume "pvc-c57701e8-6ee1-48f0-b23c-a966fd8a18ca". preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling..
cat > rebuid-prometheus-k8s-0-pv.yaml << EOF
apiVersion: v1
kind: PersistentVolume
metadata:
name: pvc-c57701e8-6ee1-48f0-b23c-a966fd8a18ca
labels:
pv: pvc-c57701e8-6ee1-48f0-b23c-a966fd8a18ca
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: managed-nfs-storage
nfs:
path: /nfs/k8s/dpv/monitoring-prometheus-k8s-db-prometheus-k8s-0-pvc-c57701e8-6ee1-48f0-b23c-a966fd8a18ca
server: 192.168.244.6
EOF
kubectl describe -n monitoring pod prometheus-k8s-1
Warning FailedScheduling 19m (x3 over 29m) default-scheduler 0/3 nodes are available: persistentvolumeclaim "prometheus-k8s-db-prometheus-k8s-1" bound to non-existent persistentvolume "pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e". preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling..
cat > rebuid-prometheus-k8s-1-pv.yaml << EOF
apiVersion: v1
kind: PersistentVolume
metadata:
name: pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e
labels:
pv: pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: managed-nfs-storage
nfs:
path: /nfs/k8s/dpv/monitoring-prometheus-k8s-db-prometheus-k8s-1-pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e
server: 192.168.244.6
EOF
kubectl apply -f rebuid-prometheus-k8s-0-pv.yaml
kubectl apply -f rebuid-prometheus-k8s-1-pv.yaml
kubectl get pv -o wide
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE VOLUMEMODE
pvc-6dfcbb35-dd1a-4784-8c97-34affe78fe19 10Gi RWX Retain Bound monitoring/grafana-pvc managed-nfs-storage 9m17s Filesystem
pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e 10Gi RWX Retain Bound monitoring/prometheus-k8s-db-prometheus-k8s-1 managed-nfs-storage 17s Filesystem
pvc-c57701e8-6ee1-48f0-b23c-a966fd8a18ca 10Gi RWX Retain Bound monitoring/prometheus-k8s-db-prometheus-k8s-0 managed-nfs-storage 2m37s Filesystem
kubectl get pods -n monitoring
kubectl -n monitoring logs -f prometheus-k8s-1
Error from server (BadRequest): container "prometheus" in pod "prometheus-k8s-1" is waiting to start: PodInitializing
iowait很高
iostat -kx 1
有很多挂载进程
ps aux|grep mount
mount -t nfs 192.168.244.6:/nfs/k8s/dpv/monitoring-prometheus-k8s-db-prometheus-k8s-1-pvc-bb00943e-c32c-4972-9fb8-e8862fb92d9e ./tmp
showmount -e 192.168.244.6
Export list for 192.168.244.6:
/nfs/k8s/dpv *
/nfs/k8s/spv_003 *
/nfs/k8s/spv_002 *
/nfs/k8s/spv_001 *
/nfs/k8s/web *
mount -v -t nfs 192.168.244.6:/nfs/k8s/web ./tmp
mount.nfs: timeout set for Fri Nov 24 14:33:04 2023
mount.nfs: trying text-based options 'soft,vers=4.1,addr=192.168.244.6,clientaddr=192.168.244.5'
mount -v -t nfs -o vers=3 192.168.244.6:/nfs/k8s/web ./tmp
#nfs3可以挂载
如果客户端正在挂载使用,服务器端 NFS 服务突然间停掉了,那么在客户端就会出现执行 df -h命令卡死的现象。
可以杀死挂载点,重启客户端和服务端nfs服务,重新挂载,或重启服务器。
No Responses (yet)
Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.