Heapster无法从Kubernetes集群上的Kubelet获取容器统计信息

我已经在Ubuntu(可靠)基础上通过Docker指南在本地运行Kubernetes设置了一个Kubernetes集群,部署了一个DNS,并运行带有InfluxDB后端和Grafana UI的Heapster。

除了Grafana之外,一切似乎都能顺利运行,Grafana不会显示任何graphics,只会显示其图中No datapoints的消息: 截图

在检查Docker容器日志后,我发现Heapster无法访问kubelet API(?),因此InfluxDB中没有任何指标:

 user@host:~$ docker logs e490a3ac10a8 I0701 07:07:30.829745 1 heapster.go:65] /heapster --source=kubernetes:https://kubernetes.default --sink=influxdb:http://monitoring-influxdb:8086 I0701 07:07:30.830082 1 heapster.go:66] Heapster version 1.2.0-beta.0 I0701 07:07:30.830809 1 configs.go:60] Using Kubernetes client with master "https://kubernetes.default" and version v1 I0701 07:07:30.831284 1 configs.go:61] Using kubelet port 10255 E0701 07:09:38.196674 1 influxdb.go:209] issues while creating an InfluxDB sink: failed to ping InfluxDB server at "monitoring-influxdb:8086" - Get http://monitoring-influxdb:8086/ping: dial tcp 10.0.0.223:8086: getsockopt: connection timed out, will retry on use I0701 07:09:38.196919 1 influxdb.go:223] created influxdb sink with options: host:monitoring-influxdb:8086 user:root db:k8s I0701 07:09:38.197048 1 heapster.go:92] Starting with InfluxDB Sink I0701 07:09:38.197154 1 heapster.go:92] Starting with Metric Sink I0701 07:09:38.228046 1 heapster.go:171] Starting heapster on port 8082 I0701 07:10:05.000370 1 manager.go:79] Scraping metrics start: 2016-07-01 07:09:00 +0000 UTC, end: 2016-07-01 07:10:00 +0000 UTC E0701 07:10:05.008785 1 kubelet.go:230] error while getting containers from Kubelet: failed to get all container stats from Kubelet URL "http://127.0.0.1:10255/stats/container/": Post http://127.0.0.1:10255/stats/container/: dial tcp 127.0.0.1:10255: getsockopt: connection refused I0701 07:10:05.009119 1 manager.go:152] ScrapeMetrics: time: 8.013178ms size: 0 I0701 07:11:05.001185 1 manager.go:79] Scraping metrics start: 2016-07-01 07:10:00 +0000 UTC, end: 2016-07-01 07:11:00 +0000 UTC E0701 07:11:05.007130 1 kubelet.go:230] error while getting containers from Kubelet: failed to get all container stats from Kubelet URL "http://127.0.0.1:10255/stats/container/": Post http://127.0.0.1:10255/stats/container/: dial tcp 127.0.0.1:10255: getsockopt: connection refused I0701 07:11:05.007686 1 manager.go:152] ScrapeMetrics: time: 5.945236ms size: 0 W0701 07:11:25.010298 1 manager.go:119] Failed to push data to sink: InfluxDB Sink I0701 07:12:05.000420 1 manager.go:79] Scraping metrics start: 2016-07-01 07:11:00 +0000 UTC, end: 2016-07-01 07:12:00 +0000 UTC E0701 07:12:05.002413 1 kubelet.go:230] error while getting containers from Kubelet: failed to get all container stats from Kubelet URL "http://127.0.0.1:10255/stats/container/": Post http://127.0.0.1:10255/stats/container/: dial tcp 127.0.0.1:10255: getsockopt: connection refused I0701 07:12:05.002467 1 manager.go:152] ScrapeMetrics: time: 1.93825ms size: 0 E0701 07:12:12.309151 1 influxdb.go:150] Failed to create infuxdb: failed to ping InfluxDB server at "monitoring-influxdb:8086" - Get http://monitoring-influxdb:8086/ping: dial tcp 10.0.0.223:8086: getsockopt: connection timed out I0701 07:12:12.351348 1 influxdb.go:201] Created database "k8s" on influxDB server at "monitoring-influxdb:8086" I0701 07:13:05.001052 1 manager.go:79] Scraping metrics start: 2016-07-01 07:12:00 +0000 UTC, end: 2016-07-01 07:13:00 +0000 UTC E0701 07:13:05.015947 1 kubelet.go:230] error while getting containers from Kubelet: failed to get all container stats from Kubelet URL "http://127.0.0.1:10255/stats/container/": Post http://127.0.0.1:10255/stats/container/: dial tcp 127.0.0.1:10255: getsockopt: connection refused ... 

我在GitHub上发现了一些类似的问题,这些问题让我明白Heapster不是通过节点的loopback来访问kubelet,而是通过容器的loopback来访问它。 但是,我无法重现他们的解决scheme:

github.com/kubernetes/heapster/issues/1183

您应该为Heapster pod使用主机networking,或者以节点具有不是127.0.0.1的常规名称的方式来configuration群集。 目前的问题是节点名称parsing为Heapster本地主机。 如果遇到更多问题,请重新打开。

– @ piosz

  • 如何为我的Heapster pod启用“主机联网”?
  • 如何configuration群集/节点使用不是127.0.0.1的常规名称?

github.com/kubernetes/heapster/issues/744

通过在hyperkube中使用更好的选项修复,感谢您的帮助!

– @ ddispaltro

  • 有没有办法通过在Docker docker run添加/修改kubelet的选项标志来解决这个问题?
    我尝试设置--hostname-override=<host's eth0 IP>和 – --address=127.0.0.1 (如在这个GitHub问题的最后一个答案build议),但Heapster的容器日志然后说:

    I0701 08:23:05.000566 1 manager.go:79] Scraping metrics start: 2016-07-01 08:22:00 +0000 UTC, end: 2016-07-01 08:23:00 +0000 UTC E0701 08:23:05.000962 1 kubelet.go:279] Node 127.0.0.1 is not ready E0701 08:23:05.003018 1 kubelet.go:230] error while getting containers from Kubelet: failed to get all container stats from Kubelet URL "http://<host's eth0 IP>:10255/stats/container/": Post http://<host's eth0 IP>/stats/container/: dial tcp <host's eth0 IP>:10255: getsockopt: connection refused

命名空间问题

这个问题可能是由于我在default命名空间中运行Kubernetes API而在kube-system运行Heapster kube-system

 user@host:~$ kubectl get --all-namespaces pods NAMESPACE NAME READY STATUS RESTARTS AGE default k8s-etcd-127.0.0.1 1/1 Running 0 18h default k8s-master-127.0.0.1 4/4 Running 1 18h default k8s-proxy-127.0.0.1 1/1 Running 0 18h kube-system heapster-lizks 1/1 Running 0 18h kube-system influxdb-grafana-e0pk2 2/2 Running 0 18h kube-system kube-dns-v10-4vjhm 4/4 Running 0 18h 

操作系统:Ubuntu 14.04.4 LTS(可靠)| Kubernetes:v1.2.5 | Docker:v1.11.2

Heapster已经获得了Kubernetes的节点列表,现在正试图从每个节点上的kublete进程获取统计信息(内置的cAdvisor收集节点上的统计信息)。 在这种情况下,只有一个节点,它被kubernetes 127.0.0.1所知。 还有问题。 Heapster容器正在尝试到达127.0.0.1节点,这当然是在Heapster容器内找不到kublete进程进行查询。

有两件事情需要解决这个问题。

  1. 我们需要通过127.0.0.1的环回networking地址以外的其他东西引用kublete worker节点(我们的主机运行kubernetes)
  2. kublete进程需要接受来自新networking接口/地址的stream量

假设您正在使用本地安装指南并启动kubernetes

 hack/local-up-cluster.sh 

要更改kublete被引用的主机名非常简单。 你可以采取更详细的方法,但设置这个到你的eth0 ip工作正常我(ifconfig eth0)。 缺点是你需要一个eth0接口,这是受DHCP的影响,所以你的里程可能会有所不同,这是多么方便。

 export HOSTNAME_OVERRIDE=10.0.2.15 

让kublete进程接受来自任何networking接口的stream量同样简单。

 export KUBELET_HOST=0.0.0.0