개발하자

Kubernetes 메트릭 서버 API

Cuire 2022. 11. 12. 14:13
반응형

Kubernetes 메트릭 서버 API

나는 개발 환경에서 쿠브렌테스 클러스터를 운영하고 있다. 메트릭 서버에 대한 배포 파일을 실행했는데 오류 메시지 없이 포드가 실행 중입니다. 여기서 출력을 참조하십시오.

root@master:~/pre-release# kubectl get pod -o wide
NAME                              READY   STATUS    RESTARTS   AGE   IP           NODE       NOMINATED NODE   READINESS GATES
metrics-server-568697d856-9jshp   1/1     Running   0          10m   10.244.1.5   worker-1   <none>           <none>

다음에 API 서비스 상태를 확인하니 아래와 같이 표시됩니다.


Name:         v1beta1.metrics.k8s.io
Namespace:
Labels:       k8s-app=metrics-server
Annotations:  <none>
API Version:  apiregistration.k8s.io/v1
Kind:         APIService
Metadata:
  Creation Timestamp:  2021-03-29T17:32:16Z
  Resource Version:    39213
  UID:                 201f685d-9ef5-4f0a-9749-8004d4d529f4
Spec:
  Group:                     metrics.k8s.io
  Group Priority Minimum:    100
  Insecure Skip TLS Verify:  true
  Service:
    Name:            metrics-server
    Namespace:       pre-release
    Port:            443
  Version:           v1beta1
  Version Priority:  100
Status:
  Conditions:
    Last Transition Time:  2021-03-29T17:32:16Z
    Message:               failing or missing response from https://10.105.171.253:443/apis/metrics.k8s.io/v1beta1: Get "https://10.105.171.253:443/apis/metrics.k8s.io/v1beta1": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
    Reason:                FailedDiscoveryCheck
    Status:                False
    Type:                  Available
Events:                    <none>

여기서 메트릭 서버 배포 코드

containers:
      - args:
        - --cert-dir=/tmp
        - --secure-port=443
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
        - --kubelet-use-node-status-port
        image: k8s.gcr.io/metrics-server/metrics-server:v0.4.2

여기 전체 코드가 있습니다.

apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: pre-release
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    k8s-app: metrics-server
    rbac.authorization.k8s.io/aggregate-to-admin: "true"
    rbac.authorization.k8s.io/aggregate-to-edit: "true"
    rbac.authorization.k8s.io/aggregate-to-view: "true"
  name: system:aggregated-metrics-reader
rules:
- apiGroups:
  - metrics.k8s.io
  resources:
  - pods
  - nodes
  verbs:
  - get
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    k8s-app: metrics-server
  name: system:metrics-server
rules:
- apiGroups:
  - ""
  resources:
  - pods
  - nodes
  - nodes/stats
  - namespaces
  - configmaps
  verbs:
  - get
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server-auth-reader
  namespace: pre-release
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: pre-release
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server:system:auth-delegator
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:auth-delegator
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: pre-release
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: system:metrics-server
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:metrics-server
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: pre-release
---
apiVersion: v1
kind: Service
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: pre-release
spec:
  ports:
  - name: https
    port: 443
    protocol: TCP
    targetPort: https
  selector:
    k8s-app: metrics-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: pre-release
spec:
  selector:
    matchLabels:
      k8s-app: metrics-server
  strategy:
    rollingUpdate:
      maxUnavailable: 0
  template:
    metadata:
      labels:
        k8s-app: metrics-server
    spec:
      containers:
      - args:
        - --cert-dir=/tmp
        - --secure-port=443
        - --kubelet-preferred-address-types=InternalIP
        - --kubelet-insecure-tls
        - --kubelet-use-node-status-port
        image: k8s.gcr.io/metrics-server/metrics-server:v0.4.2
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
           httpGet:
            path: /livez
            port: https
            scheme: HTTPS
          periodSeconds: 10
        name: metrics-server
        ports:
        - containerPort: 443
          name: https
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /readyz
            port: https
            scheme: HTTPS
          periodSeconds: 10
        securityContext:
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 1000
        volumeMounts:
        - mountPath: /tmp
          name: tmp-dir
      nodeSelector:
        kubernetes.io/os: linux
      priorityClassName: system-cluster-critical
      serviceAccountName: metrics-server
      volumes:
       - emptyDir: {}
        name: tmp-dir
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  labels:
    k8s-app: metrics-server
  name: v1beta1.metrics.k8s.io
spec:
  group: metrics.k8s.io
  groupPriorityMinimum: 100
  insecureSkipTLSVerify: true
  service:
    name: metrics-server
    namespace: pre-release
  version: v1beta1
  versionPriority: 100

최근의 오류

I0330 09:02:31.705767       1 secure_serving.go:116] Serving securely on [::]:4443

E0330 09:04:01.718135       1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:worker-2: unable to fetch metrics from Kubelet worker-2 (worker-2): Get https://worker-2:10250/stats/summary?only_cpu_and_memory=true: dial tcp: lookup worker-2 on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:master: unable to fetch metrics from Kubelet master (master): Get https://master:10250/stats/summary?only_cpu_and_memory=true: dial tcp: lookup master on 10.96.0.10:53: read udp 10.244.2.23:41419->10.96.0.10:53: i/o timeout, unable to fully scrape metrics from source kubelet_summary:worker-1: unable to fetch metrics from Kubelet worker-1 (worker-1): Get https://worker-1:10250/stats/summary?only_cpu_and_memory=true: dial tcp: i/o timeout]

누가 이 문제를 해결하는 것을 도와줄 수 있나요?




다음 컨테이너 인수는 개발 클러스터에서 사용할 수 있습니다.

containers:
- args:
  - /metrics-server
  - --cert-dir=/tmp
  - --secure-port=443
  - --kubelet-preferred-address-types=InternalIP
  - --kubelet-insecure-tls

결과:

Status:
  Conditions:
    Last Transition Time:  2021-03-29T19:19:20Z
    Message:               all checks passed
    Reason:                Passed
    Status:                True
    Type:                  Available

한번 해보라구요.




설명 섹션에서 언급했듯이, 이 문제는 메트릭 서버 배포에 추가하여 해결할 수 있습니다.

쿠베르네테스에 따르면:

호스트 네트워크 - 포드에서 노드 네트워크 네임스페이스를 사용할 수 있는지 여부를 제어합니다. 이렇게 하면 포드는 루프백 장치, 서비스가 localhost에서 수신하는 루프백 장치에 액세스할 수 있으며, 동일한 노드에 있는 다른 포드의 네트워크 작업을 스누핑하는 데 사용될 수 있습니다.

spec:
  hostNetwork: true   <--- 
  containers:
  - args:
    - /metrics-server
    - --cert-dir=/tmp
    - --secure-port=4443
    - --kubelet-preferred-address-types=InternalIP
    - --kubelet-use-node-status-port
    - --kubelet-insecure-tls

메트릭 서버 배포에 포함하도록 배포를 편집할 수 있는 방법이 있습니다.

또한 관련이 있다.




EKS 노드 보안 그룹 규칙에 이 블록을 추가하면 다음과 같은 문제가 해결되었습니다.

node_security_group_additional_rules = {
  ...
  ingress_cluster_metricserver = {
    description                   = "Cluster to node 4443 (Metrics Server)"
    protocol                      = "tcp"
    from_port                     = 4443
    to_port                       = 4443
    type                          = "ingress"
    source_cluster_security_group = true 
  }
  ...
}

반응형