Kubeadm集群搭建

传统的集群安装方式还是比如麻烦,比如说添加新的node节点,需要安装kubelet/proxy,还要配置。kubeadm旨在简化这些繁琐的操作。

环境准备

docker版本为:1.12.6
kubeadm版本为:v1.7.5

主机IP 主机名称 内存
192.168.10.6 k8s-master 1024m
192.168.10.7 k8s-node1 1024m
192.168.10.8 k8s-node2 1024m

系统优化

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
sed -i 's;SELINUX=.*;SELINUX=disabled;' /etc/selinux/config
setenforce 0
getenforce

#LANG="en_US.UTF-8"
sed -i 's;LANG=.*;LANG="zh_CN.UTF-8";' /etc/locale.conf

cat /etc/NetworkManager/NetworkManager.conf|grep "dns=none" > /dev/null
if [[ $? != 0 ]]; then
echo "dns=none" >> /etc/NetworkManager/NetworkManager.conf
systemctl restart NetworkManager.service
fi

systemctl disable iptables
systemctl stop iptables
systemctl disable firewalld
systemctl stop firewalld

#ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
timedatectl set-timezone Asia/Shanghai

#logined limit
cat /etc/security/limits.conf|grep 100000 > /dev/null
if [[ $? != 0 ]]; then
cat >> /etc/security/limits.conf << EOF
* - nofile 100000
* - nproc 100000
EOF
fi

sed -i 's;4096;100000;g' /etc/security/limits.d/20-nproc.conf

#systemd service limit
cat /etc/systemd/system.conf|egrep '^DefaultLimitCORE' > /dev/null
if [[ $? != 0 ]]; then
cat >> /etc/systemd/system.conf << EOF
DefaultLimitCORE=infinity
DefaultLimitNOFILE=100000
DefaultLimitNPROC=100000
EOF
fi

cat /etc/sysctl.conf|grep "net.ipv4.ip_local_port_range" > /dev/null
if [[ $? != 0 ]]; then
cat >> /etc/sysctl.conf << EOF
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.ip_forward = 1
EOF
sysctl -p
fi

su - root -c "ulimit -a"

# 同步时间
yum -y install ntp
systemctl start ntpd
systemctl enable ntpd

修改主机名

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#192.168.10.6
hostnamectl --static set-hostname k8s-master
sysctl kernel.hostname=k8s-master

echo '192.168.10.6 k8s-master
192.168.10.7 k8s-node1
192.168.10.8 k8s-node2' >> /etc/hosts

#192.168.10.7
hostnamectl --static set-hostname k8s-node1
sysctl kernel.hostname=k8s-node1

echo '192.168.10.6 k8s-master
192.168.10.7 k8s-node1
192.168.10.8 k8s-node2' >> /etc/hosts

#192.168.10.8
hostnamectl --static set-hostname k8s-node2
sysctl kernel.hostname=k8s-node2

echo '192.168.10.6 k8s-master
192.168.10.7 k8s-node1
192.168.10.8 k8s-node2' >> /etc/hosts

修改系统参数

1
2
3
4
5
6
7
cat >> /etc/sysctl.d/k8s.conf  << EOF
#k8s
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF

sysctl -p /etc/sysctl.d/k8s.conf

安装

yum安装

每台添加yum源:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# docker repo
tee /etc/yum.repos.d/docker.repo <<-'EOF'
[docker-repo]
name=Docker Repository
baseurl=https://mirrors.aliyun.com/docker-engine/yum/repo/main/centos/7/
enabled=1
gpgcheck=1
gpgkey=https://mirrors.aliyun.com/docker-engine/yum/gpg
EOF

# k8s repo
tee /etc/yum.repos.d/kubernetes.repo <<EOF
[kubernetes-repo]
name=Kubernetes Repository
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=0
EOF

yum安装:

1
2
3
4
yum install -y docker-engine-1.12.6-1.el7.centos.x86_64
#yum install -y kubelet kubectl kubernetes-cni kubeadm
yum install -y kubernetes-cni-0.5.1-0.x86_64 kubelet-1.7.5-0.x86_64 kubectl-1.7.5-0.x86_64 kubeadm-1.7.5-0.x86_64
systemctl enable kubelet

二进制安装:

1
wget https://dl.k8s.io/v1.7.5/kubernetes-server-linux-amd64.tar.gz

下载镜像

镜像列表

1
2
3
4
5
6
7
8
9
docker pull gcr.io/google_containers/kube-proxy-amd64:v1.7.5
docker pull gcr.io/google_containers/kube-apiserver-amd64:v1.7.5
docker pull gcr.io/google_containers/kube-controller-manager-amd64:v1.7.5
docker pull gcr.io/google_containers/kube-scheduler-amd64:v1.7.5
docker pull gcr.io/google_containers/etcd-amd64:3.0.17
docker pull gcr.io/google_containers/k8s-dns-sidecar-amd64:1.14.1
docker pull gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.1
docker pull gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.1
docker pull gcr.io/google_containers/pause-amd64:3.0

注意:以上镜像在创建时会自动下载,无需手动下载。以上镜像需要先翻墙才行。

master初始化

初始化

统一docker与kubernetes的driver:
可以直接执行以下命令修改:

1
sed -i 's;systemd;cgroupfs;g' /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

保存配置, 重启:

1
2
systemctl daemon-reload
systemctl restart kubelet

kubeadmn创建的etcd为单节点,不建议使用。建议使用外部的etcd集群。具体安装请参考etcd集群安装
v1.6.x版本后,–external-etcd-endpoints参数已不能使用。所以要使用–config参数外挂配置文件kubeadm-config.yml:

1
2
3
4
5
6
7
8
9
10
11
12
13
apiVersion: kubeadm.k8s.io/v1alpha1
kind: MasterConfiguration
api:
advertiseAddress: 192.168.10.6
networking:
#dnsDomain: myk8s.com
podSubnet: 10.244.0.0/16
etcd:
endpoints:
- http://192.168.10.6:2379
- http://192.168.10.7:2379
- http://192.168.10.8:2379
kubernetesVersion: v1.7.5

具体请参考:kubeadm-config.yml
注意:最好不要修改dnsDomain,不然会有一些奇怪的问题。

初始化
在master上执行:

1
kubeadm init --config kubeadm-config.yml

另外一种初始化方式

1
2
3
4
5
#export KUBE_COMPONENT_LOGLEVEL='--v=0'
kubeadm init --kubernetes-version=v1.7.5 --apiserver-advertise-address=192.168.10.6

#如果是使用flannel网络的话,要加上--pod-network-cidr 10.244.0.0/16
kubeadm init --kubernetes-version=v1.7.5 --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=192.168.10.6

异常处理

初始化时, 会发现卡死不动, 可以通过系统日志查看错误

1
2
3
4
journalctl -f -u kubelet.server
# tail -n100 -f /var/log/messages

failed to create kubelet: misconfiguration: kubelet cgroup driver: "cgroupfs" is different from docker cgroup driver: "systemd"

这个是K8S v1.6.x的一个变化, 文件驱动与docker使用的文件驱动不一致, 导致镜像无法启动。
此处可以修改kubelet的文件驱动,请参考http://www.jianshu.com/p/02dc13d2f651
先确认docker的Cgroup Driver:

1
2
3
[root@k8s-node1 ~]# docker info
......
Cgroup Driver: cgroupfs

需要将kubeadm.conf的systemd修改为cgroupfs。注意: 每台都要修改:

1
2
# 进入kubelet启动配置文件
vim /etc/systemd/system/kubelet.service.d/10-kubeadm.conf


1
Environment="KUBELET_CGROUP_ARGS=--cgroup-driver=systemd"

替换为:

1
Environment="KUBELET_CGROUP_ARGS=--cgroup-driver=cgroupfs"

可以直接执行以下命令修改:

1
sed -i 's;systemd;cgroupfs;g' /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

保存配置, 重启:

1
2
systemctl daemon-reload
systemctl restart kubelet

然后再重新初始化
在master上执行:

1
2
3
kubeadm reset
#kubeadm init --kubernetes-version=v1.7.5 --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=192.168.10.6
kubeadm init --config kubeadm-config.yml

kubeadm init时出现

1
2
3
[apiclient] Temporarily unable to list nodes (will retry)
[apiclient] Temporarily unable to list nodes (will retry)
[apiclient] Temporarily unable to list nodes (will retry)

应该是dns server把localhost解析到其他地址去了。可以通过nslookup 命令验证:
[root@master ~]# nslookup localhost
修改的/etc/resolv.conf中的search内容后问题解决。

configmaps “cluster-info” already exists

需要清理etcd数据:

1
2
3
systemctl stop etcd
rm -fr /var/lib/etcd/etcd/*
systemctl restart etcd

正常情况下,应该能显示以下的日志:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
kubeadm init --kubernetes-version=v1.7.5 --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=192.168.10.6
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[init] Using Kubernetes version: v1.7.5
[init] Using Authorization mode: RBAC
[preflight] Running pre-flight checks
[preflight] WARNING: kubelet service is not enabled, please run 'systemctl enable kubelet.service'
[preflight] Starting the kubelet service
[certificates] Generated CA certificate and key.
[certificates] Generated API server certificate and key.
[certificates] API Server serving cert is signed for DNS names [k8s-master kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.10.6]
[certificates] Generated API server kubelet client certificate and key.
[certificates] Generated service account token signing key and public key.
[certificates] Generated front-proxy CA certificate and key.
[certificates] Generated front-proxy client certificate and key.
[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/scheduler.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf"
[apiclient] Created API client, waiting for the control plane to become ready
[apiclient] All control plane components are healthy after 126.837475 seconds
[apiclient] Waiting for at least one node to register
[apiclient] First node has registered after 6.516528 seconds
[token] Using token: 67a477.959aa53030fd8444
[apiconfig] Created RBAC rules
[addons] Created essential addon: kube-proxy
[addons] Created essential addon: kube-dns

Your Kubernetes master has initialized successfully!

To start using your cluster, you need to run (as a regular user):

sudo cp /etc/kubernetes/admin.conf $HOME/
sudo chown $(id -u):$(id -g) $HOME/admin.conf
export KUBECONFIG=$HOME/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
http://kubernetes.io/docs/admin/addons/

You can now join any number of machines by running the following on each node
as root:

kubeadm join --token 67a477.959aa53030fd8444 192.168.10.6:6443

之前的版本, 当我们初始化成功之后, 会发现token不会保留, 如果一旦没有记录下来, 其他节点就没法加入了, 这里添加了kubeadm token命令

1
2
3
kubeadm token list
TOKEN TTL EXPIRES USAGES DESCRIPTION
67a477.959aa53030fd8444 <forever> <never> authentication,signing The default bootstrap token generated by 'kubeadm init'.

默认情况下, master节点是不会调度pod, 也就是说, 只有一台主机的情况下, 我们无法启动pod, 但有的时候我们的确只有一台机器, 这个时候可以执行命令, 允许master调度pod(这个命令和1.5.x版本不一样)

1
kubectl taint nodes --all node-role.kubernetes.io/master-

查询node情况

1
kubectl --kubeconfig=/etc/kubernetes/admin.conf get nodes

kubectl 命令

这个命令是我们经常使用的, 几乎所有的k8s相关操作都需要, 但当我们集群安装好后, 发现这个命令会报错。
最直接的方法是带上参数 –kubeconfig

1
kubectl --kubeconfig=/etc/kubernetes/admin.conf get nodes

注意:
只有master中才有admin.conf文件,另外node上只有kubelet.conf。所以只能在master进行logs/exec等命令。
可以将admin.conf拷贝到其他node机器中:

1
2
scp /etc/kubernetes/admin.conf root@192.168.10.7:/etc/kubernetes/
scp /etc/kubernetes/admin.conf root@192.168.10.8:/etc/kubernetes/

如果不想每次都带上参数, 可以配置环境变量

1
2
3
4
# 添加
tee >> ~/.bash_profile << EOF
export KUBECONFIG=/etc/kubernetes/admin.conf
EOF

执行

1
source ~/.bash_profile

这样就可以不用带–kubeconfig参数了

1
kubectl get nodes

node加入

加入node到集群中

请记得init后的join命令(类似于下面,但token不一样),其他的node要加入集群的话,必须用下面的命令:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
kubeadm join --token 67a477.959aa53030fd8444 192.168.10.6:6443

[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[preflight] Running pre-flight checks
[preflight] WARNING: kubelet service is not enabled, please run 'systemctl enable kubelet.service'
[preflight] Starting the kubelet service
[discovery] Trying to connect to API Server "192.168.10.6:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://192.168.10.6:6443"
[discovery] Cluster info signature and contents are valid, will use API Server "https://192.168.10.6:6443"
[discovery] Successfully established connection with API Server "192.168.10.6:6443"
[bootstrap] Detected server version: v1.7.5
[bootstrap] The server supports the Certificates API (certificates.k8s.io/v1beta1)
[csr] Created API client to obtain unique certificate for this node, generating keys and certificate signing request
[csr] Received signed certificate from the API server, generating KubeConfig...
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf"

Node join complete:
* Certificate signing request sent to master and response
received.
* Kubelet informed of new secure connection details.

Run 'kubectl get nodes' on the master to see this machine join.

查看node信息

1
2
3
4
5
 kubectl get no
NAME STATUS AGE VERSION
k8s-master NotReady 39m v1.7.5
k8s-node1 NotReady 32s v1.7.5
k8s-node2 NotReady 53s v1.7.5

异常处理

node加入后,STATUS状态为:NotReady

查看kubelet日志:

1
2
systemctl status -l kubelet
kubeadm network plugin is not ready: cni config uninitialized

如果采用了非cni方式部署flannel网络,需要去掉cni网络参数,参考https://github.com/kubernetes/kubernetes/issues/43815:

you need to edit /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

1
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_SYSTEM_PODS_ARGS $KUBELET_NETWORK_ARGS $KUBELET_DNS_ARGS $KUBELET_AUTHZ_ARGS $KUBELET_EXTRA_ARGS

remove $KUBELET_NETWORK_ARGS, and then restart kubelet after that kubeadm init should work.

重启kubelet服务后正常:

1
2
systemctl daemon-reload
systemctl restart kubelet

查看pod信息:

1
2
3
4
5
6
7
8
9
[root@k8s-master v1.7.5]# kubectl get pod -o wide -n kube-system
NAME READY STATUS RESTARTS AGE IP NODE
kube-apiserver-k8s-master 1/1 Running 0 10m 192.168.10.6 k8s-master
kube-controller-manager-k8s-master 1/1 Running 0 10m 192.168.10.6 k8s-master
kube-dns-1783747724-bh3g3 3/3 Running 0 2h 10.244.3.2 k8s-node2
kube-proxy-3l1wf 1/1 Running 0 2h 192.168.10.7 k8s-node1
kube-proxy-5mwcl 1/1 Running 0 2h 192.168.10.8 k8s-node2
kube-proxy-s02wm 1/1 Running 0 2h 192.168.10.6 k8s-master
kube-scheduler-k8s-master 1/1 Running 0 10m 192.168.10.6 k8s-master

不能查看pod日志

报以下的错误:

1
2
kubectl logs kube-dns-1783747724-bh3g3 -n kube-system                
Error from server (BadRequest): a container name must be specified for pod kube-dns-1783747724-bh3g3, choose one of: [kubedns dnsmasq sidecar]

这个是因为为kubectl没有admin权限,所以需要修改~/.bash_profile中的

1
export KUBECONFIG=/etc/kubernetes/admin.conf

执行

1
source ~/.bash_profile

kube-dns是不能正常运行的,STATUS为Pending

这是因为没有部署网络,具体请参考部署网络

1
2
3
4
5
6
7
8
9
10
[root@k8s-master ~]# kubectl get pod -o wide -n kube-system
NAME READY STATUS RESTARTS AGE IP NODE
etcd-k8s-master 1/1 Running 0 37m 192.168.10.6 k8s-master
kube-apiserver-k8s-master 1/1 Running 0 37m 192.168.10.6 k8s-master
kube-controller-manager-k8s-master 1/1 Running 0 37m 192.168.10.6 k8s-master
kube-dns-3913472980-gx5zn 0/3 Pending 0 42m <none>
kube-proxy-3970g 1/1 Running 0 4m 192.168.10.8 k8s-node2
kube-proxy-t8zhh 1/1 Running 0 42m 192.168.10.6 k8s-master
kube-proxy-xvsdk 1/1 Running 0 3m 192.168.10.7 k8s-node1
kube-scheduler-k8s-master 1/1 Running 0 37m 192.168.10.6 k8s-master

部署网络

cni网络

1
wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

如果是使用vagrant,需要修改kube-flannel.yml:

1
command: [ "/opt/bin/flanneld", "--iface=eth1", "--ip-masq", "--kube-subnet-mgr" ]

参考修改后的文件:kube-flannel.yml

创建网络:

1
kubectl create -f kube-flannel.yml

flannel网络

也可以手动部署flannel网络,这种为非cni网络。具体请参考使用Flannel搭建docker网络

需要去掉cni网络:

1
2
3
sed -i 's;$KUBELET_NETWORK_ARGS ;;g' /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
systemctl daemon-reload
systemctl restart kubelet

测试dns

#https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/
Create a file named busybox.yaml with the following contents:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
tee busybox.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: busybox
namespace: default
spec:
containers:
- image: busybox
command:
- sleep
- "3600"
imagePullPolicy: IfNotPresent
name: busybox
restartPolicy: Always
EOF

Then create a pod using this file:

1
kubectl create -f busybox.yaml

测试:

1
2
3
4
5
6
7
8
9
#kubectl run busybox --rm -ti --image=busybox --restart=Never -- nslookup -type=srv kubernetes
#以上每次退出后会自动删除images中的镜像,每次执行都会重新下载image,所以每次执行都会有些慢。

[root@k8s-master config]# kubectl exec -ti busybox -- nslookup kubernetes.default
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

Name: kubernetes.default
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local

dashboard

安装

最新方式

https://forums.docker.com/t/docker-for-mac-kubernetes-dashboard/44116/6

1
2
3
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/alternative/kubernetes-dashboard.yaml
kubectl get pod --namespace=kube-system | grep dashboard
kubectl port-forward kubernetes-dashboard-57b79cdfb5-5bj6m 9090:9090 --namespace=kube-system

Then, open your browser on http://127.0.0.1:9090 12 and the dashboard should work without any authentification!

或者:

1
kubectl proxy

直接访问: http://localhost:8001/api/v1/namespaces/kube-system/services/http:kubernetes-dashboard:/proxy/#!/overview?namespace=default

如果不能访问看是不是为http:kubernetes-dashboard,如果是https:kubernetes-dashboard的话会报Error: ‘tls: oversized record received with length 20527’的错误。

see: https://github.com/kubernetes/dashboard/wiki/Accessing-Dashboard---1.7.X-and-above

其他方式

1
2
3
4
5
6
7
8
#1.6.3版本已经不能下载了
#wget https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/kubernetes-dashboard.yaml

#1.7.0版本
#https://github.com/kubernetes/dashboard/tree/v1.7.1/src/deploy/recommended
wget https://github.com/kubernetes/dashboard/raw/v1.7.1/src/deploy/recommended/kubernetes-dashboard.yaml

kubectl create -f kubernetes-dashboard.yaml

1.6请参考:kubernetes-dashboard.yaml
1.7请参考:kubernetes-dashboard.yaml

查看

1
2
3
4
5
6
kubectl get pod,svc -n kube-system -l k8s-app=kubernetes-dashboard
NAME READY STATUS RESTARTS AGE
po/kubernetes-dashboard-2315583659-qt0vm 1/1 Running 1 21h

NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
svc/kubernetes-dashboard 10.106.184.36 <none> 80/TCP 21h

访问

第1种访问方式

https://192.168.10.6:6443/ui
https://192.168.10.6:6443/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy

注意:1.7版本后https://192.168.10.6:6443/ui已不能访问。否则访问时会出现以下错误:
Error: ‘malformed HTTP response “\x15\x03\x01\x00\x02\x02”‘
Trying to reach: ‘http://10.244.2.6:8443/'
参考http://tonybai.com/2017/09/26/some-notes-about-deploying-kubernetes-dashboard-1-7-0/

访问该地址后,我们在浏览器中看到如下登录页面:

dashboard_login.png

dashboard v1.7默认支持两种身份校验登录方式:kubeconfig和token两种。
我们说说token这种方式。点击选择:Token单选框,提示你输入token。可以通过以下方式获取token:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
[root@k8s-master ~]# kubectl  get secret -n kube-system|grep dashboard
kubernetes-dashboard-token-wlzb0 kubernetes.io/service-account-token 3 3m

[root@k8s-master ~]# kubectl describe secret/kubernetes-dashboard-token-wlzb0 -n kube-system
Name: kubernetes-dashboard-token-wlzb0
Namespace: kube-system
Labels: <none>
Annotations: kubernetes.io/service-account.name=kubernetes-dashboard
kubernetes.io/service-account.uid=07dd79c6-a58d-11e7-985e-080027a8da33

Type: kubernetes.io/service-account-token

Data
====
ca.crt: 1025 bytes
namespace: 11 bytes
token: eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJrdWJlcm5ldGVzLWRhc2hib2FyZC10b2tlbi13bHpiMCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6IjA3ZGQ3OWM2LWE1OGQtMTFlNy05ODVlLTA4MDAyN2E4ZGEzMyIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlLXN5c3RlbTprdWJlcm5ldGVzLWRhc2hib2FyZCJ9.a81oQ8XkR87luB6MeClqq5OTKXgfK_Dn8ku4uQdh4Y03aMHNwLYRdCzHZ67d_1sFndeX5FaKHr-hCxVz5eLYMVyexcQoegHvJtOA5tOyX1RRF55LCotrfsigG_4IGU9caOBmODV1HSPpQGuVcGtky3-9KIUR1r8JlEsxuFl4aaBp9YmJg_TJx7sDwF5Io_S4M21JrXOP_6Wly-hJnOW5_KF0eUyyUSHPN1HCBx-4l2CANbrK9xONAFUl9cgisfLjNZDhbBvQBZi-Ru6Ugxxkxxok1fADWkJMiwILsW9gV724OfWfnTNM8PUDQYZX1tegRGfm8vnjxEdyKXuuERaLRg

登录后出现:
dashboard_error1.png

这个是由于1.7后,默认为最小权限。需要创建权限:
参考official-release

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
tee dashboard-admin.yaml << EOF
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: kubernetes-dashboard
labels:
k8s-app: kubernetes-dashboard
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: kubernetes-dashboard
namespace: kube-system
EOF

kubectl create -f dashboard-admin.yaml

让Dashboard v1.7.0支持basic auth login方式:

我们要用basic auth方式登录dashboard,需要对kubernetes-dashboard.yaml进行如下修改:

1
2
3
4
args:
- --tls-key-file=/certs/dashboard.key
- --tls-cert-file=/certs/dashboard.crt
- --authentication-mode=basic <---- 添加这一行

重新创建即可。

第2种访问方式

http://192.168.10.6:9090/ui
http://192.168.10.6:8443/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy

可以通过kubectl proxy代理访问:

1
2
3
4
5
6
[root@k8s-master ~]# kubectl get ep -n kube-system   
NAME ENDPOINTS AGE
kubernetes-dashboard 10.244.1.5:8443 19m

#kubectl proxy --address 192.168.10.6 --port=8443 --accept-hosts='^*$'
kubectl proxy --address='0.0.0.0' --port=8443 --accept-hosts='^*$'

这样就可以访问了。此访问为非安全方式,如果提示输入密码或者token时,直接SKIP就可以了。

可能会出现pod为error的错误,将所有的dashboard yaml重新创建一下就可以了。
注意:如果是1.6版本,访问地址为:http://192.168.10.6:9090/ui

1.6版本还可以通过NodePort方式访问,但1.7版本不行:
参考:
https://github.com/qianlei90/Blog/issues/28
https://github.com/kubernetes/dashboard/issues/692
因为使用kubeadm安装的集群是不带认证的,所以无法直接从https:///ui访问,可以添加NodePort方式访问:

1
2
3
4
5
6
spec:
type: NodePort
ports:
- port: 80
targetPort: 9090
nodePort: 30000

异常处理

User “system:anonymous” cannot get at the cluster scope

参考http://www.tongtongxue.com/archives/16338.html
编辑/etc/kubernetes/manifests/kube-apiserver.yaml,添加- –anonymous-auth=false:

1
2
3
4
5
spec:
containers:
- command:
- kube-apiserver
- --anonymous-auth=false

kube-apiserver周期性异常重启:

1
2
kubectl describe pod kube-apiserver-k8s-master -n kube-system|grep health
Liveness: http-get https://127.0.0.1:6443/healthz delay=15s timeout=15s period=10s #success=1 #failure=8

可以看到liveness check有8次failure!8次是kube-apiserver的failure门槛值,这个值在/etc/kubernetes/manifests/kube-apiserver.yaml中我们可以看到:

1
2
3
4
5
6
7
8
9
livenessProbe:
failureThreshold: 8
httpGet:
host: 127.0.0.1
path: /healthz
port: 6443
scheme: HTTPS
initialDelaySeconds: 15
timeoutSeconds: 15

这样,一旦failure次数超限,kubelet会尝试Restart kube-apiserver,这就是问题的原因。那么为什么kube-apiserver的 liveness check 会fail呢?这缘于我们关闭了匿名请求的身份验证权。还是来看/etc/kubernetes/manifests/kube-apiserver.yaml中的livenessProbe段,对于kube-apiserver来说,kubelet会通过访问: https://127.0.0.1:6443/healthz的方式去check是否ok?并且kubelet使用的是anonymous requests。由于上面我们已经关闭了对anonymous-requests的身份验证权,kubelet就会一直无法访问kube-apiserver的/healthz端点,导致kubelet认为kube-apiserver已经死亡,并尝试重启它。

调整/healthz检测的端点:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
spec:
containers:
- command:
- kube-apiserver
- --anonymous-auth=false
... ...
- --insecure-bind-address=127.0.0.1
- --insecure-port=8080

livenessProbe:
failureThreshold: 8
httpGet:
host: 127.0.0.1
path: /healthz
port: 8080
scheme: HTTP
initialDelaySeconds: 15
timeoutSeconds: 15
... ...

我们不再用anonymous-requests,但我们可以利用–insecure-bind-address和–insecure-port。让kubelet的请求到insecure port,而不是secure port。由于insecure port的流量不会受到身份验证、授权等功能的限制,因此可以成功probe到kube-apiserver的liveness,kubelet不会再重启kube-apiserver了。

重启kubelet服务:

1
systemctl restart kubelet

Unauthorized

参考http://tonybai.com/2017/07/20/fix-cannot-access-dashboard-in-k8s-1-6-4/
k8s 1.6.x版本与1.5.x版本的一个很大不同在于1.6.x版本启用了RBAC的Authorization mode(授权模型),但我们依旧通过basic auth方式进行apiserver的Authentication,而不是用客户端数字证书校验等其他方式:

1
2
3
4
5
6
7
spec:
containers:
- command:
- kube-apiserver
... ...
- --basic-auth-file=/etc/kubernetes/basic_auth_file
... ...

添加basic_auth_file内容:

1
echo "admin,admin,2017" > /etc/kubernetes/basic_auth_file

basic_auth_file格式为:

1
password,username,uid

参考完整的/etc/kubernetes/manifests/kube-apiserver.yaml内容:
主要添加了以下内容:

1
2
3
4
5
6
7
8
9
- --anonymous-auth=false
- --basic-auth-file=/etc/kubernetes/basic_auth_file
- --insecure-bind-address=127.0.0.1
- --insecure-port=8080

host: 127.0.0.1
path: /healthz
port: 8080
scheme: HTTP

完整的内容:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
component: kube-apiserver
tier: control-plane
name: kube-apiserver
namespace: kube-system
spec:
containers:
- command:
- kube-apiserver

- --anonymous-auth=false
- --basic-auth-file=/etc/kubernetes/basic_auth_file
- --insecure-bind-address=127.0.0.1
- --insecure-port=8080
- --secure-port=6443
- --requestheader-group-headers=X-Remote-Group
- --requestheader-extra-headers-prefix=X-Remote-Extra-
- --client-ca-file=/etc/kubernetes/pki/ca.crt
- --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
- --allow-privileged=true
- --storage-backend=etcd3
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt
- --admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,ResourceQuota,DefaultTolerationSeconds
- --requestheader-allowed-names=front-proxy-client
- --service-account-key-file=/etc/kubernetes/pki/sa.pub
- --service-cluster-ip-range=10.96.0.0/12
- --tls-cert-file=/etc/kubernetes/pki/apiserver.crt
- --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key
- --experimental-bootstrap-token-auth=true
- --requestheader-username-headers=X-Remote-User
- --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
- --authorization-mode=RBAC
- --advertise-address=192.168.10.6
- --etcd-servers=http://192.168.10.6:2379,http://192.168.10.7:2379,http://192.168.10.8:2379
image: gcr.io/google_containers/kube-apiserver-amd64:v1.7.5
livenessProbe:
failureThreshold: 8
httpGet:
host: 127.0.0.1
path: /healthz
port: 8080
scheme: HTTP
initialDelaySeconds: 15
timeoutSeconds: 15
name: kube-apiserver
resources:
requests:
cpu: 250m
volumeMounts:
- mountPath: /etc/kubernetes/
name: k8s
readOnly: true
- mountPath: /etc/ssl/certs
name: certs
- mountPath: /etc/pki
name: pki
hostNetwork: true
volumes:
- hostPath:
path: /etc/kubernetes
name: k8s
- hostPath:
path: /etc/ssl/certs
name: certs
- hostPath:
path: /etc/pki
name: pki
status: {}

User “admin” cannot get at the cluster scope

admin这个user并未得到足够的授权。这里我们要做的就是给admin选择一个合适的clusterrole。但kubectl并不支持查看user的信息,初始的clusterrolebinding又那么多,一一查看十分麻烦。我们知道cluster-admin这个clusterrole是全权限的,我们就来将admin这个user与clusterrole: cluster-admin bind到一起:

1
kubectl create clusterrolebinding login-on-dashboard-with-cluster-admin --clusterrole=cluster-admin --user=admin

重启kubelet服务后问题解决:

1
systemctl restart kubelet

访问https://192.168.10.6:6443/ui,用admin/admin就可以正常登录了。

heapster插件部署

安装

下面安装Heapster为集群添加使用统计和监控功能,为Dashboard添加仪表盘。 使用InfluxDB做为Heapster的后端存储,开始部署:

1
2
3
4
5
6
7
8
mkdir -p ~/k8s/heapster
cd ~/k8s/heapster
wget https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/influxdb/grafana.yaml
wget https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/rbac/heapster-rbac.yaml
wget https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/influxdb/heapster.yaml
wget https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/influxdb/influxdb.yaml

kubectl create -f ./

或者

1
2
3
4
5
6
7
wget https://github.com/kubernetes/heapster/archive/v1.4.2.zip
unzip v1.4.2.zip
cd heapster-1.4.2/deploy/kube-config/influxdb
kubectl create -f ./

cd heapster-1.4.2/deploy/kube-config/rbac
kubectl create -f ./

具体可参考heapster.zip

如果是通过heapster.zip创建的话,因为有修改一些内容,需要执行:

1
2
kubectl create configmap influxdb-config --from-file=config.toml  -n kube-system
kubectl create -f ./

访问地址:
cAdvisor:
http://192.168.10.6:4194/
http://192.168.10.7:4194/
http://192.168.10.8:4194/
注意:1.7版本cAdvisor已不能访问,可以手动在每台机器上安装cAdvisor:

1
2
3
4
5
6
7
8
9
10
docker run --restart=always \
--volume=/:/rootfs:ro \
--volume=/var/run:/var/run:rw \
--volume=/sys:/sys:ro \
--volume=/var/lib/docker/:/var/lib/docker:ro \
--volume=/dev/disk/:/dev/disk:ro \
--publish=4194:8080 \
--detach=true \
--name=cadvisor \
google/cadvisor:latest

grafana:
https://192.168.10.6:6443/api/v1/proxy/namespaces/kube-system/services/monitoring-grafana/
http://192.168.10.6:30015/

influxdb:
https://192.168.10.6:6443/api/v1/proxy/namespaces/kube-system/services/monitoring-influxdb
http://192.168.10.6:8083/

最后确认所有的pod都处于running状态,打开Dashboard,集群的使用统计会以仪表盘的形式显示出来。
heapster

异常处理

ErrImagePull

查看pod状态:

1
2
3
4
5
6
7
8
9
10
11
12
13
[root@k8s-master temp]# kubectl get pod -o wide -n kube-system
NAME READY STATUS RESTARTS AGE IP NODE
heapster-1528902802-hjjh8 1/1 Running 0 13s 10.244.66.3 k8s-node2
kube-apiserver-k8s-master 1/1 Running 3 20h 192.168.10.6 k8s-master
kube-controller-manager-k8s-master 1/1 Running 27 21h 192.168.10.6 k8s-master
kube-dns-1783747724-3jb47 3/3 Running 3 4h 10.244.100.2 k8s-node1
kube-proxy-3l1wf 1/1 Running 8 2d 192.168.10.7 k8s-node1
kube-proxy-5mwcl 1/1 Running 10 2d 192.168.10.8 k8s-node2
kube-proxy-s02wm 1/1 Running 7 2d 192.168.10.6 k8s-master
kube-scheduler-k8s-master 1/1 Running 27 21h 192.168.10.6 k8s-master
kubernetes-dashboard-2315583659-qt0vm 1/1 Running 13 1d 10.244.100.3 k8s-node1
monitoring-grafana-973508798-wg055 0/1 ErrImagePull 0 13s 10.244.66.2 k8s-node2
monitoring-influxdb-3871661022-jvpt9 0/1 ErrImagePull 0 12s 10.244.100.4 k8s-node1

发现grafana与influxdb状态为ErrImagePull,查看monitoring-grafana-973508798-wg055给你monitoring-influxdb-3871661022-jvpt9日志:

1
2
kubectl describe pod monitoring-influxdb-3871661022-jvpt9 -n kube-system
1m 10s 4 kubelet, k8s-node1 Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "influxdb" with ErrImagePull: "rpc error: code = 2 desc = Error: Status 405 trying to pull repository google_containers/heapster-influxdb-amd64: \"v1 Registry API is disabled. If you are not explicitly using the v1 Registry API, it is possible your v2 image could not be found. Verify that your image is available, or retry with `dockerd --disable-legacy-registry`. See https://cloud.google.com/container-registry/docs/support/deprecation-notices\""

应该是images的版本在registry中找不到。修改grafana.yaml与influxdb.yaml中image的路径:

1
2
3
4
5
6
7
#grafana.yaml
#image: gcr.io/google_containers/heapster-grafana-amd64:v4.4.3
image: gcr.io/google_containers/heapster-grafana-amd64:v4.0.2

#influxdb.yaml
#image: gcr.io/google_containers/heapster-influxdb-amd64:v1.3.3
image: gcr.io/google_containers/heapster-influxdb-amd64:v1.1.1

删除后重新创建正常:

1
2
kubectl delete -f ./
kubectl create -f ./

cannot list nodes at the cluster scope

查看日志:

1
2
kubectl logs -f --tail 100 heapster-1528902802-6kzfk -n kube-system
E0903 07:09:35.016005 1 reflector.go:190] k8s.io/heapster/metrics/util/util.go:51: Failed to list *v1.Node: User "system:serviceaccount:kube-system:heapster" cannot list nodes at the cluster scope. (get nodes)

这个是由于rbac问题,需要执行:

1
kubectl create -f heapster-rbac.yaml

ServiceUnavailable

请等待kubernetes创建完成即可。

grafana: Problem! the server could not find the requested resource

当访问https://192.168.10.6:6443/api/v1/proxy/namespaces/kube-system/services/monitoring-grafana/时,出现Problem! the server could not find the requested resource。两种方案解决:
方案1:
修改grafana.yaml:

1
2
#value: /
value: /api/v1/proxy/namespaces/kube-system/services/monitoring-grafana/

重新创建grafana:

1
2
kubectl delete -f grafana.yaml
kubectl create -f grafana.yaml

访问https://192.168.10.6:6443/api/v1/proxy/namespaces/kube-system/services/monitoring-grafana,正常。

方案2:
修改grafana.yaml:

1
2
3
4
5
6
7
8
9
10
11
12
13
spec:
# In a production setup, we recommend accessing Grafana through an external Loadbalancer
# or through a public IP.
# type: LoadBalancer
# You could also use NodePort to expose the service at a randomly-generated port
# type: NodePort
ports:
- port: 3000
targetPort: 3000
selector:
k8s-app: grafana
#externalIPs:
#- 192.168.10.6

访问http://192.168.10.6:3000,正常。

或者:

1
2
3
4
5
6
7
8
9
10
11
12
spec:
# In a production setup, we recommend accessing Grafana through an external Loadbalancer
# or through a public IP.
# type: LoadBalancer
# You could also use NodePort to expose the service at a randomly-generated port
type: NodePort
ports:
- port: 80
targetPort: 3000
nodePort: 30015
selector:
k8s-app: grafana

访问http://192.168.10.6:30015,正常。

influxdb 404 page not found

访问https://192.168.10.6:6443/api/v1/proxy/namespaces/kube-system/services/monitoring-influxdb时出现404 page not found,这是因为influxdb 官方建议使用命令行或 HTTP API 接口来查询数据库,从 v1.1.0 版本开始默认关闭 admin UI,将在后续版本中移除 admin UI 插件。解决方案如下:

下载config.toml:

1
2
3
4
5
6
7
8
9
10
11
12
wget https://raw.githubusercontent.com/kubernetes/heapster/master/influxdb/config.toml

添加:
[admin]
enabled = true
bind-address = ":8083"
https-enabled = false
https-certificate = "/etc/ssl/influxdb.pem"

# 将修改后的配置写入到 ConfigMap 对象中
#kubectl delete configmap influxdb-config -n kube-system
kubectl create configmap influxdb-config --from-file=config.toml -n kube-system

修改influxdb.yaml:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
spec:
replicas: 1
template:
metadata:
labels:
task: monitoring
k8s-app: influxdb
spec:
containers:
- name: influxdb
#image: gcr.io/google_containers/heapster-influxdb-amd64:v1.3.3
image: gcr.io/google_containers/heapster-influxdb-amd64:v1.1.1
volumeMounts:
- mountPath: /data
name: influxdb-storage
- mountPath: /etc/
name: influxdb-config
volumes:
- name: influxdb-storage
emptyDir: {}
- name: influxdb-config
configMap:
name: influxdb-config
...
---
...
spec:
ports:
- name: http
port: 8083
targetPort: 8083
- name: api
port: 8086
targetPort: 8086
selector:
k8s-app: influxdb
externalIPs:
- 192.168.10.6

重新创建influxdb:

1
2
kubectl delete -f influxdb.yaml
kubectl create -f influxdb.yaml

访问http://192.168.10.6:8083,正常。

或者:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
spec:
replicas: 1
template:
metadata:
labels:
task: monitoring
k8s-app: influxdb
spec:
containers:
- name: influxdb
#image: gcr.io/google_containers/heapster-influxdb-amd64:v1.3.3
image: gcr.io/google_containers/heapster-influxdb-amd64:v1.1.1
volumeMounts:
- mountPath: /data
name: influxdb-storage
- mountPath: /etc/
name: influxdb-config
volumes:
- name: influxdb-storage
emptyDir: {}
- name: influxdb-config
configMap:
name: influxdb-config
...
---
...
spec:
type: "NodePort"
ports:
- name: http
port: 8083
targetPort: 8083
nodePort: 30016
- name: api
port: 8086
targetPort: 8086
selector:
k8s-app: influxdb

重新创建influxdb:

1
2
kubectl delete -f influxdb.yaml
kubectl create -f influxdb.yaml

访问http://192.168.10.6:30016,正常。

重新创建influxdb:

1
2
kubectl delete -f influxdb.yaml
kubectl create -f influxdb.yaml

访问http://192.168.10.6:8083,正常。

efk插件插件部署

安装

1
2
git clone https://github.com/kubernetes/kubernetes.git
git checkout v1.7.5

进入源码的kubernetes/cluster/addons/fluentd-elasticsearch目录,需要定义了 elasticsearch 和 fluentd 使用的 Role 和 RoleBinding。添加es-rbac.yaml与fluentd-es-rbac.yaml两个文件:
es-rbac.yml:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
apiVersion: v1
kind: ServiceAccount
metadata:
name: elasticsearch
namespace: kube-system

---

kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1alpha1
metadata:
name: elasticsearch
subjects:
- kind: ServiceAccount
name: elasticsearch
namespace: kube-system
roleRef:
kind: ClusterRole
name: view
apiGroup: rbac.authorization.k8s.io

fluentd-es-rbac.yaml:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
apiVersion: v1
kind: ServiceAccount
metadata:
name: fluentd
namespace: kube-system

---

kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1alpha1
metadata:
name: fluentd
subjects:
- kind: ServiceAccount
name: fluentd
namespace: kube-system
roleRef:
kind: ClusterRole
name: view
apiGroup: rbac.authorization.k8s.io

修改es-controller.yaml,添加serviceAccountName:

1
2
3
4
spec:
serviceAccountName: elasticsearch
containers:
- image: gcr.io/google_containers/elasticsearch:v2.4.1-2

修改fluentd-es-ds.yaml,添加serviceAccountName:

1
2
3
4
5
spec:
serviceAccountName: fluentd
containers:
- name: fluentd-es
image: gcr.io/google_containers/fluentd-elasticsearch:1.22

给 Node 设置标签:
DaemonSet fluentd-es-v1.22 只会调度到设置了标签 beta.kubernetes.io/fluentd-ds-ready=true 的 Node,需要在期望运行 fluentd 的 Node 上设置该标签;

1
2
3
4
5
6
$ kubectl get nodes
NAME STATUS AGE VERSION
k8s-node1 Ready 1d v1.7.5

$ kubectl label nodes k8s-node1 beta.kubernetes.io/fluentd-ds-ready=true
$ kubectl label nodes k8s-node2 beta.kubernetes.io/fluentd-ds-ready=true

创建:

1
kubectl create -f ./

具体文件可参考EFK.zip

访问Kibana:
地址为:https://192.168.10.6:6443/api/v1/proxy/namespaces/kube-system/services/kibana-logging

如果出现Error: ‘dial tcp 10.244.2.5:5601: connection refused的错误:
查询kibana-logging日志:

1
2
3
4
[root@k8s-master EFK]# kubectl logs kibana-logging-3757371098-bjrlh -n kube-system
ELASTICSEARCH_URL=http://elasticsearch-logging:9200
server.basePath: /api/v1/proxy/namespaces/kube-system/services/kibana-logging
{"type":"log","@timestamp":"2017-09-04T09:28:51Z","tags":["info","optimize"],"pid":5,"message":"Optimizing and caching bundles for kibana and statusPage. This may take a few minutes"}

请耐心等待kibana后台完成即可。

prometheus监控

安装

参考kubernetes-prometheus

1
2
wget https://raw.githubusercontent.com/giantswarm/kubernetes-prometheus/master/manifests-all.yaml
kubectl create -f manifests-all.yaml

配置文件请参考:prometheus.zip

More Dashboards

以下已经自动添加,无需再手动添加。

See grafana.net for some example dashboards and plugins.

Configure Prometheus data source for Grafana.
Grafana UI / Data Sources / Add data source

Name: prometheus
Type: Prometheus
Url: http://prometheus:9090
Add
Import Prometheus Stats:
Grafana UI / Dashboards / Import

Grafana.net Dashboard: https://grafana.net/dashboards/2
Load
Prometheus: prometheus
Save & Open
Import Kubernetes cluster monitoring:
Grafana UI / Dashboards / Import

Grafana.net Dashboard: https://grafana.net/dashboards/162
Load
Prometheus: prometheus
Save & Open

访问地址:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[root@k8s-node1 ~]# kubectl get svc,ep -n monitoring
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
svc/alertmanager 10.111.106.237 <nodes> 9093:30582/TCP 14h
svc/grafana 10.108.245.18 <nodes> 3000:30718/TCP 14h
svc/kube-state-metrics 10.109.29.182 <none> 8080/TCP 14h
svc/prometheus 10.101.186.72 <nodes> 9090:32617/TCP 14h
svc/prometheus-node-exporter None <none> 9100/TCP 14h

NAME ENDPOINTS AGE
ep/alertmanager 10.244.100.6:9093 14h
ep/grafana 10.244.100.8:3000 14h
ep/kube-state-metrics 10.244.100.9:8080,10.244.15.4:8080 14h
ep/prometheus 10.244.15.5:9090 14h
ep/prometheus-node-exporter 192.168.10.6:9100,192.168.10.7:9100 14h

可以看到:
prometheus: http://192.168.10.6:32617
grafana: http://192.168.10.6:30718

状态查询

查看集群状态

1
2
3
4
5
6
7
8
9
kubectl cluster-info

Kubernetes master is running at https://192.168.10.6:6443
Heapster is running at https://192.168.10.6:6443/api/v1/proxy/namespaces/kube-system/services/heapster
KubeDNS is running at https://192.168.10.6:6443/api/v1/proxy/namespaces/kube-system/services/kube-dns
monitoring-grafana is running at https://192.168.10.6:6443/api/v1/proxy/namespaces/kube-system/services/monitoring-grafana
monitoring-influxdb is running at https://192.168.10.6:6443/api/v1/proxy/namespaces/kube-system/services/monitoring-influxdb

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

查看所有信息:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
#kubectl get pod,svc,ep -o wide -n kube-system
#kubectl get no,pod,svc,ep,deploy -o wide -n kube-system
[root@k8s-master prometheus]# kubectl get pod,svc,ep -o wide --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
kube-system po/elasticsearch-logging-v1-0wlf2 1/1 Running 0 36m 10.244.96.4 k8s-node2
kube-system po/elasticsearch-logging-v1-v7sg7 1/1 Running 0 36m 10.244.3.5 k8s-node1
kube-system po/fluentd-es-v1.22-h8f5d 1/1 Running 0 36m 10.244.96.5 k8s-node2
kube-system po/fluentd-es-v1.22-mrs8k 1/1 Running 0 36m 10.244.3.6 k8s-node1
kube-system po/heapster-1528902802-h8grp 1/1 Running 0 44m 10.244.3.3 k8s-node1
kube-system po/kibana-logging-3757371098-x5k1r 1/1 Running 0 36m 10.244.3.4 k8s-node1
kube-system po/kube-apiserver-k8s-master 1/1 Running 0 43m 192.168.10.6 k8s-master
kube-system po/kube-controller-manager-k8s-master 1/1 Running 1 1h 192.168.10.6 k8s-master
kube-system po/kube-dns-3913472980-sr3sh 3/3 Running 0 1h 10.244.18.2 k8s-master
kube-system po/kube-proxy-dkmd4 1/1 Running 0 1h 192.168.10.7 k8s-node1
kube-system po/kube-proxy-lbx6h 1/1 Running 0 1h 192.168.10.6 k8s-master
kube-system po/kube-proxy-lfw5w 1/1 Running 0 1h 192.168.10.8 k8s-node2
kube-system po/kube-scheduler-k8s-master 1/1 Running 1 1h 192.168.10.6 k8s-master
kube-system po/kubernetes-dashboard-2315583659-rjt21 1/1 Running 0 1h 10.244.3.2 k8s-node1
kube-system po/monitoring-grafana-241275065-n1wkt 1/1 Running 0 44m 10.244.96.2 k8s-node2
kube-system po/monitoring-influxdb-2075516717-td73m 1/1 Running 0 44m 10.244.96.3 k8s-node2
monitoring po/alertmanager-1970416631-v460d 1/1 Running 0 7m 10.244.96.6 k8s-node2
monitoring po/grafana-core-2777256714-lvx7p 1/1 Running 0 7m 10.244.3.7 k8s-node1
monitoring po/kube-state-metrics-2949788559-1ks16 1/1 Running 0 7m 10.244.96.8 k8s-node2
monitoring po/kube-state-metrics-2949788559-2bx02 1/1 Running 0 7m 10.244.3.9 k8s-node1
monitoring po/node-directory-size-metrics-9722j 2/2 Running 0 7m 10.244.3.8 k8s-node1
monitoring po/node-directory-size-metrics-z1sqn 2/2 Running 0 7m 10.244.96.7 k8s-node2
monitoring po/prometheus-core-466509865-3qt1c 1/1 Running 0 7m 10.244.96.9 k8s-node2
monitoring po/prometheus-node-exporter-7l4bf 1/1 Running 0 7m 192.168.10.7 k8s-node1
monitoring po/prometheus-node-exporter-b9v2w 1/1 Running 0 7m 192.168.10.8 k8s-node2

NAMESPACE NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default svc/kubernetes 10.96.0.1 <none> 443/TCP 1h <none>
kube-system svc/elasticsearch-logging 10.110.232.165 <none> 9200/TCP 37m k8s-app=elasticsearch-logging
kube-system svc/heapster 10.99.36.89 <none> 80/TCP 44m k8s-app=heapster
kube-system svc/kibana-logging 10.101.178.124 <none> 5601/TCP 36m k8s-app=kibana-logging
kube-system svc/kube-dns 10.96.0.10 <none> 53/UDP,53/TCP 1h k8s-app=kube-dns
kube-system svc/kubernetes-dashboard 10.99.64.238 <none> 80/TCP 1h k8s-app=kubernetes-dashboard
kube-system svc/monitoring-grafana 10.99.186.236 <nodes> 80:30015/TCP 44m k8s-app=grafana
kube-system svc/monitoring-influxdb 10.98.97.215 192.168.10.6 8083/TCP,8086/TCP 44m k8s-app=influxdb
monitoring svc/alertmanager 10.101.35.35 <nodes> 9093:32099/TCP 7m app=alertmanager
monitoring svc/grafana 10.97.63.144 <nodes> 3000:31989/TCP 7m app=grafana,component=core
monitoring svc/kube-state-metrics 10.101.180.83 <none> 8080/TCP 7m app=kube-state-metrics
monitoring svc/prometheus 10.110.91.88 <nodes> 9090:30059/TCP 7m app=prometheus,component=core
monitoring svc/prometheus-node-exporter None <none> 9100/TCP 7m app=prometheus,component=node-exporter

NAMESPACE NAME ENDPOINTS AGE
default ep/kubernetes 192.168.10.6:6443 1h
kube-system ep/elasticsearch-logging 10.244.3.5:9200,10.244.96.4:9200 37m
kube-system ep/heapster 10.244.3.3:8082 44m
kube-system ep/kibana-logging 10.244.3.4:5601 36m
kube-system ep/kube-controller-manager <none> 39m
kube-system ep/kube-dns 10.244.18.2:53,10.244.18.2:53 1h
kube-system ep/kube-scheduler <none> 39m
kube-system ep/kubernetes-dashboard 10.244.3.2:9090 1h
kube-system ep/monitoring-grafana 10.244.96.2:3000 44m
kube-system ep/monitoring-influxdb 10.244.96.3:8086,10.244.96.3:8083 44m
monitoring ep/alertmanager 10.244.96.6:9093 7m
monitoring ep/grafana 10.244.3.7:3000 7m
monitoring ep/kube-state-metrics 10.244.3.9:8080,10.244.96.8:8080 7m
monitoring ep/prometheus 10.244.96.9:9090 7m
monitoring ep/prometheus-node-exporter 192.168.10.7:9100,192.168.10.8:9100 7m

参考

http://zerosre.com/2017/05/11/k8s%E6%96%B0%E7%89%88%E6%9C%AC%E5%AE%89%E8%A3%85/
http://www.tongtongxue.com/archives/16338.html
http://tonybai.com/2017/07/20/fix-cannot-access-dashboard-in-k8s-1-6-4/
http://blog.frognew.com/2017/04/install-ha-kubernetes-1.6-cluster.html
http://blog.frognew.com/2017/04/kubeadm-install-kubernetes-1.6.html
http://blog.frognew.com/2017/07/kubeadm-install-kubernetes-1.7.html
https://cloudnil.com/2017/07/10/Deploy-kubernetes1.6.7-with-kubeadm/
https://www.centos.bz/2017/05/centos-7-kubeadm-install-k8s-kubernetes/
http://leonlibraries.github.io/2017/06/15/Kubeadm%E6%90%AD%E5%BB%BAKubernetes%E9%9B%86%E7%BE%A4/
http://www.jianshu.com/p/60069089c981
https://github.com/opsnull/follow-me-install-kubernetes-cluster/blob/master/10-%E9%83%A8%E7%BD%B2Heapster%E6%8F%92%E4%BB%B6.md
http://jimmysong.io/blogs/kubernetes-installation-on-centos/
http://jimmysong.io/blogs/kubernetes-ha-master-installation/
http://c.isme.pub/2016/11/22/docker-kubernetes/