본문 바로가기

개발하자

Kubernetes 메트릭 서버 API

반응형

Kubernetes 메트릭 서버 API

나는 개발 환경에서 쿠브렌테스 클러스터를 운영하고 있다. 메트릭 서버에 대한 배포 파일을 실행했는데 오류 메시지 없이 포드가 실행 중입니다. 여기서 출력을 참조하십시오.

root@master:~/pre-release# kubectl get pod -o wide
NAME                              READY   STATUS    RESTARTS   AGE   IP           NODE       NOMINATED NODE   READINESS GATES
metrics-server-568697d856-9jshp   1/1     Running   0          10m   10.244.1.5   worker-1   <none>           <none>

다음에 API 서비스 상태를 확인하니 아래와 같이 표시됩니다.


Name:         v1beta1.metrics.k8s.io
Namespace:
Labels:       k8s-app=metrics-server
Annotations:  <none>
API Version:  apiregistration.k8s.io/v1
Kind:         APIService
Metadata:
  Creation Timestamp:  2021-03-29T17:32:16Z
  Resource Version:    39213
  UID:                 201f685d-9ef5-4f0a-9749-8004d4d529f4
Spec:
  Group:                     metrics.k8s.io
  Group Priority Minimum:    100
  Insecure Skip TLS Verify:  true
  Service:
    Name:            metrics-server
    Namespace:       pre-release
    Port:            443
  Version:           v1beta1
  Version Priority:  100
Status:
  Conditions:
    Last Transition Time:  2021-03-29T17:32:16Z
    Message:               failing or missing response from https://10.105.171.253:443/apis/metrics.k8s.io/v1beta1: Get "https://10.105.171.253:443/apis/metrics.k8s.io/v1beta1": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
    Reason:                FailedDiscoveryCheck
    Status:                False
    Type:                  Available
Events:                    <none>

여기서 메트릭 서버 배포 코드

containers:
      - args:
        - --cert-dir=/tmp
        - --secure-port=443
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
        - --kubelet-use-node-status-port
        image: k8s.gcr.io/metrics-server/metrics-server:v0.4.2

여기 전체 코드가 있습니다.

apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: pre-release
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    k8s-app: metrics-server
    rbac.authorization.k8s.io/aggregate-to-admin: "true"
    rbac.authorization.k8s.io/aggregate-to-edit: "true"
    rbac.authorization.k8s.io/aggregate-to-view: "true"
  name: system:aggregated-metrics-reader
rules:
- apiGroups:
  - metrics.k8s.io
  resources:
  - pods
  - nodes
  verbs:
  - get
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    k8s-app: metrics-server
  name: system:metrics-server
rules:
- apiGroups:
  - ""
  resources:
  - pods
  - nodes
  - nodes/stats
  - namespaces
  - configmaps
  verbs:
  - get
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server-auth-reader
  namespace: pre-release
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: pre-release
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server:system:auth-delegator
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:auth-delegator
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: pre-release
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: system:metrics-server
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:metrics-server
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: pre-release
---
apiVersion: v1
kind: Service
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: pre-release
spec:
  ports:
  - name: https
    port: 443
    protocol: TCP
    targetPort: https
  selector:
    k8s-app: metrics-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: pre-release
spec:
  selector:
    matchLabels:
      k8s-app: metrics-server
  strategy:
    rollingUpdate:
      maxUnavailable: 0
  template:
    metadata:
      labels:
        k8s-app: metrics-server
    spec:
      containers:
      - args:
        - --cert-dir=/tmp
        - --secure-port=443
        - --kubelet-preferred-address-types=InternalIP
        - --kubelet-insecure-tls
        - --kubelet-use-node-status-port
        image: k8s.gcr.io/metrics-server/metrics-server:v0.4.2
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
           httpGet:
            path: /livez
            port: https
            scheme: HTTPS
          periodSeconds: 10
        name: metrics-server
        ports:
        - containerPort: 443
          name: https
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /readyz
            port: https
            scheme: HTTPS
          periodSeconds: 10
        securityContext:
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 1000
        volumeMounts:
        - mountPath: /tmp
          name: tmp-dir
      nodeSelector:
        kubernetes.io/os: linux
      priorityClassName: system-cluster-critical
      serviceAccountName: metrics-server
      volumes:
       - emptyDir: {}
        name: tmp-dir
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  labels:
    k8s-app: metrics-server
  name: v1beta1.metrics.k8s.io
spec:
  group: metrics.k8s.io
  groupPriorityMinimum: 100
  insecureSkipTLSVerify: true
  service:
    name: metrics-server
    namespace: pre-release
  version: v1beta1
  versionPriority: 100

최근의 오류

I0330 09:02:31.705767       1 secure_serving.go:116] Serving securely on [::]:4443

E0330 09:04:01.718135       1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:worker-2: unable to fetch metrics from Kubelet worker-2 (worker-2): Get https://worker-2:10250/stats/summary?only_cpu_and_memory=true: dial tcp: lookup worker-2 on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:master: unable to fetch metrics from Kubelet master (master): Get https://master:10250/stats/summary?only_cpu_and_memory=true: dial tcp: lookup master on 10.96.0.10:53: read udp 10.244.2.23:41419->10.96.0.10:53: i/o timeout, unable to fully scrape metrics from source kubelet_summary:worker-1: unable to fetch metrics from Kubelet worker-1 (worker-1): Get https://worker-1:10250/stats/summary?only_cpu_and_memory=true: dial tcp: i/o timeout]

누가 이 문제를 해결하는 것을 도와줄 수 있나요?




다음 컨테이너 인수는 개발 클러스터에서 사용할 수 있습니다.

containers:
- args:
  - /metrics-server
  - --cert-dir=/tmp
  - --secure-port=443
  - --kubelet-preferred-address-types=InternalIP
  - --kubelet-insecure-tls

결과:

Status:
  Conditions:
    Last Transition Time:  2021-03-29T19:19:20Z
    Message:               all checks passed
    Reason:                Passed
    Status:                True
    Type:                  Available

한번 해보라구요.




설명 섹션에서 언급했듯이, 이 문제는 메트릭 서버 배포에 추가하여 해결할 수 있습니다.

쿠베르네테스에 따르면:

호스트 네트워크 - 포드에서 노드 네트워크 네임스페이스를 사용할 수 있는지 여부를 제어합니다. 이렇게 하면 포드는 루프백 장치, 서비스가 localhost에서 수신하는 루프백 장치에 액세스할 수 있으며, 동일한 노드에 있는 다른 포드의 네트워크 작업을 스누핑하는 데 사용될 수 있습니다.

spec:
  hostNetwork: true   <--- 
  containers:
  - args:
    - /metrics-server
    - --cert-dir=/tmp
    - --secure-port=4443
    - --kubelet-preferred-address-types=InternalIP
    - --kubelet-use-node-status-port
    - --kubelet-insecure-tls

메트릭 서버 배포에 포함하도록 배포를 편집할 수 있는 방법이 있습니다.

또한 관련이 있다.




EKS 노드 보안 그룹 규칙에 이 블록을 추가하면 다음과 같은 문제가 해결되었습니다.

node_security_group_additional_rules = {
  ...
  ingress_cluster_metricserver = {
    description                   = "Cluster to node 4443 (Metrics Server)"
    protocol                      = "tcp"
    from_port                     = 4443
    to_port                       = 4443
    type                          = "ingress"
    source_cluster_security_group = true 
  }
  ...
}

반응형