目录
文章目录
部署环境
双节点 IP 配置:
# cat /etc/hosts 192.168.1.5 vmnote0 192.168.1.12 vmnote1
部署文档:
1. etcd 集群启动失败
etcd 双节点配置之一:
[root@k8s-master ~]# cat /usr/lib/systemd/system/etcd.service[Unit]Description=Etcd ServerAfter=network.targetAfter=network-online.targetWants=network-online.targetDocumentation=https://github.com/coreos[Service]Type=notifyWorkingDirectory=/var/lib/etcd/ExecStart=/usr/local/bin/etcd \ --name vmnode0 \ --cert-file=/etc/kubernetes/ssl/kubernetes.pem \ --key-file=/etc/kubernetes/ssl/kubernetes-key.pem \ --peer-cert-file=/etc/kubernetes/ssl/kubernetes.pem \ --peer-key-file=/etc/kubernetes/ssl/kubernetes-key.pem \ --trusted-ca-file=/etc/kubernetes/ssl/ca.pem \ --peer-trusted-ca-file=/etc/kubernetes/ssl/ca.pem \ --initial-advertise-peer-urls https://192.168.1.5:2380 \ --listen-peer-urls https://192.168.1.5:2380 \ --listen-client-urls https://192.168.1.5:2379,http://127.0.0.1:2379 \ --advertise-client-urls https://192.168.1.5:2379 \ --initial-cluster-token etcd-cluster-1 \ --initial-cluster vmnode0=https://192.168.1.5:2380,vmnode1=https://192.168.1.12:2380 \ --initial-cluster-state new \ --data-dir=/var/lib/etcdRestart=on-failureRestartSec=5LimitNOFILE=65536[Install]WantedBy=multi-user.target
etcd 双节点配置之二:
[Unit]Description=Etcd ServerAfter=network.targetAfter=network-online.targetWants=network-online.targetDocumentation=https://github.com/coreos[Service]Type=notifyWorkingDirectory=/var/lib/etcd/ExecStart=/usr/local/bin/etcd \ --name vmnode1 \ --cert-file=/etc/kubernetes/ssl/kubernetes.pem \ --key-file=/etc/kubernetes/ssl/kubernetes-key.pem \ --peer-cert-file=/etc/kubernetes/ssl/kubernetes.pem \ --peer-key-file=/etc/kubernetes/ssl/kubernetes-key.pem \ --trusted-ca-file=/etc/kubernetes/ssl/ca.pem \ --peer-trusted-ca-file=/etc/kubernetes/ssl/ca.pem \ --initial-advertise-peer-urls https://192.168.1.12:2380 \ --listen-peer-urls https://192.168.1.12:2380 \ --listen-client-urls https://192.168.1.12:2379,http://127.0.0.1:2379 \ --advertise-client-urls https://192.168.1.12:2379 \ --initial-cluster-token etcd-cluster-1 \ --initial-cluster vmnode0=https://192.168.1.5:2380,vmnode1=https://192.168.1.12:2380 \ --initial-cluster-state new \ --data-dir=/var/lib/etcdRestart=on-failureRestartSec=5LimitNOFILE=65536[Install]WantedBy=multi-user.target
问题:
Dec 14 12:30:30 k8s-node2.localdomain etcd[2560]: warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecatedDec 14 12:30:30 k8s-node2.localdomain etcd[2560]: member 218d8bfb33a29c6 has already been bootstrappedDec 14 12:30:30 k8s-node2.localdomain systemd[1]: etcd.service: main process exited, code=exited, status=1/FAILUREDec 14 12:30:30 k8s-node2.localdomain systemd[1]: Failed to start Etcd Server.
解决
主要问题在 member 218d8bfb33a29c6 has already been bootstrapped
,原因:
One of the member was bootstrapped via discovery service. You must remove the previous data-dir to clean up the member information. Or the member will ignore the new configuration and start with the old configuration. That is why you see the mismatch.
2. etcd 健康状态检查失败
[root@k8s-master ~]# etcdctl \> --ca-file=/etc/kubernetes/ssl/ca.pem \> --cert-file=/etc/kubernetes/ssl/kubernetes.pem \> --key-file=/etc/kubernetes/ssl/kubernetes-key.pem \> cluster-health2018-12-14 13:13:15.280712 I | warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated2018-12-14 13:13:15.281964 I | warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecatedfailed to check the health of member 218d8bfb33a29c6 on https://192.168.1.12:2379: Get https://192.168.1.12:2379/health: net/http: TLS handshake timeoutmember 218d8bfb33a29c6 is unreachable: [https://192.168.1.12:2379] are all unreachablefailed to check the health of member 499bc1bc6765950c on https://192.168.1.5:2379: Get https://192.168.1.5:2379/health: net/http: TLS handshake timeoutmember 499bc1bc6765950c is unreachable: [https://192.168.1.5:2379] are all unreachablecluster is unhealthy
解决
主要原因:https://192.168.1.12:2379/health: net/http: TLS handshake timeout
,等待超时,一般来说可能是因为设置了 http_proxy/https_proxy 做代理FQ导致不能访问自身和内网。 所以配置 no_proxy:
PROXY_HOST=127.0.0.1export all_proxy=http://$PROXY_HOST:8118export ftp_proxy=http://$PROXY_HOST:8118export http_proxy=http://$PROXY_HOST:8118export https_proxy=http://$PROXY_HOST:8118export no_proxy='localhost,192.168.1.5,192.168.1.12'
3. kube-apiserver 启动失败
[root@k8s-master kubernetes]# systemctl status kube-apiserver● kube-apiserver.service - Kubernetes API Service Loaded: loaded (/usr/lib/systemd/system/kube-apiserver.service; enabled; vendor preset: disabled) Active: failed (Result: start-limit) since Sat 2018-12-15 03:07:12 UTC; 4s ago Docs: https://github.com/GoogleCloudPlatform/kubernetes Process: 3659 ExecStart=/usr/local/bin/kube-apiserver $KUBE_LOGTOSTDERR $KUBE_LOG_LEVEL $KUBE_ETCD_SERVERS $KUBE_API_ADDRESS $KUBE_API_PORT $KUBELET_PORT $KUBE_ALLOW_PRIV $KUBE_SERVICE_ADDRESSES $KUBE_ADMISSION_CONTROL $KUBE_API_ARGS (code=exited, status=1/FAILURE) Main PID: 3659 (code=exited, status=1/FAILURE)Dec 15 03:07:11 k8s-master.localdomain systemd[1]: kube-apiserver.service: main process exited, code=exited, status=1/FAILUREDec 15 03:07:11 k8s-master.localdomain systemd[1]: Failed to start Kubernetes API Service.Dec 15 03:07:11 k8s-master.localdomain systemd[1]: Unit kube-apiserver.service entered failed state.Dec 15 03:07:11 k8s-master.localdomain systemd[1]: kube-apiserver.service failed.Dec 15 03:07:12 k8s-master.localdomain systemd[1]: kube-apiserver.service holdoff time over, scheduling restart.Dec 15 03:07:12 k8s-master.localdomain systemd[1]: start request repeated too quickly for kube-apiserver.serviceDec 15 03:07:12 k8s-master.localdomain systemd[1]: Failed to start Kubernetes API Service.Dec 15 03:07:12 k8s-master.localdomain systemd[1]: Unit kube-apiserver.service entered failed state.Dec 15 03:07:12 k8s-master.localdomain systemd[1]: kube-apiserver.service failed.
看不出什么问题,回过头来看 service 的 systemd 配置:
[Unit]Description=Kubernetes API ServiceDocumentation=https://github.com/GoogleCloudPlatform/kubernetesAfter=network.targetAfter=etcd.service[Service]EnvironmentFile=-/etc/kubernetes/configEnvironmentFile=-/etc/kubernetes/apiserverExecStart=/usr/local/bin/kube-apiserver \ $KUBE_LOGTOSTDERR \ $KUBE_LOG_LEVEL \ $KUBE_ETCD_SERVERS \ $KUBE_API_ADDRESS \ $KUBE_API_PORT \ $KUBELET_PORT \ $KUBE_ALLOW_PRIV \ $KUBE_SERVICE_ADDRESSES \ $KUBE_ADMISSION_CONTROL \ $KUBE_API_ARGSRestart=on-failureType=notifyLimitNOFILE=65536[Install]WantedBy=multi-user.target
手动拼装出 ExecStart
来执行:
[root@k8s-master ~]# /usr/local/bin/kube-apiserver --logtostderr=true --v=0 --advertise-address=192.168.1.5 --bind-address=192.168.1.5 --insecure-bind-address=192.168.1.5 --insecure-port=8080 --insecure-bind-address=127.0.0.1 --etcd-servers=https://192.168.1.5:2379,https://192.168.1.12:2379 --port=8080 --kubelet-port=10250 --allow-privileged=true --service-cluster-ip-range=10.254.0.0/16 --admission-control=ServiceAccount,NamespaceLifecycle,NamespaceExists,LimitRanger,ResourceQuota --authorization-mode=RBAC --runtime-config=rbac.authorization.k8s.io/v1beta1 --kubelet-https=true --experimental-bootstrap-token-auth --token-auth-file=/etc/kubernetes/token.csv --service-node-port-range=30000-32767 --tls-cert-file=/etc/kubernetes/ssl/kubernetes.pem --tls-private-key-file=/etc/kubernetes/ssl/kubernetes-key.pem --client-ca-file=/etc/kubernetes/ssl/ca.pem --service-account-key-file=/etc/kubernetes/ssl/ca-key.pem --etcd-cafile=/etc/kubernetes/ssl/ca.pem --etcd-certfile=/etc/kubernetes/ssl/kubernetes.pem --etcd-keyfile=/etc/kubernetes/ssl/kubernetes-key.pem --enable-swagger-ui=true --apiserver-count=3 --audit-log-maxage=30 --audit-log-maxbackup=3 --audit-log-maxsize=100 --audit-log-path=/var/lib/audit.log --event-ttl=1h/usr/local/bin/kube-apiserver: line 30: /lib/lsb/init-functions: No such file or directory
问题出现了:/usr/local/bin/kube-apiserver: line 30: /lib/lsb/init-functions: No such file or directory
。
安装 redhat-lsb 包:
yum install redhat-lsb -y
执行还是错误:
[root@k8s-master ~]# /usr/local/bin/kube-apiserver --logtostderr=true --v=0 --advertise-address=192.168.1.5 --bind-address=192.168.1.5 --insecure-bind-address=192.168.1.5 --insecure-port=8080 --insecure-bind-address=127.0.0.1 --etcd-servers=https://192.168.1.5:2379,https://192.168.1.12:2379 --port=8080 --kubelet-port=10250 --allow-privileged=true --service-cluster-ip-range=10.254.0.0/16 --admission-control=ServiceAccount,NamespaceLifecycle,NamespaceExists,LimitRanger,ResourceQuota --authorization-mode=RBAC --runtime-config=rbac.authorization.k8s.io/v1beta1 --kubelet-https=true --experimental-bootstrap-token-auth --token-auth-file=/etc/kubernetes/token.csv --service-node-port-range=30000-32767 --tls-cert-file=/etc/kubernetes/ssl/kubernetes.pem --tls-private-key-file=/etc/kubernetes/ssl/kubernetes-key.pem --client-ca-file=/etc/kubernetes/ssl/ca.pem --service-account-key-file=/etc/kubernetes/ssl/ca-key.pem --etcd-cafile=/etc/kubernetes/ssl/ca.pem --etcd-certfile=/etc/kubernetes/ssl/kubernetes.pem --etcd-keyfile=/etc/kubernetes/ssl/kubernetes-key.pem --enable-swagger-ui=true --apiserver-count=3 --audit-log-maxage=30 --audit-log-maxbackup=3 --audit-log-maxsize=100 --audit-log-path=/var/lib/audit.log --event-ttl=1h/opt/bin/kube-apiserver not present or not executable [FAILED]
还是错误,这里进入了一个无限循环的大坑。我在 CentOS 上拉了一个 Ubuntu 的 kube-apiserver 执行文件。所以很多依赖程序并不存在,最后决定重新拉取执行文件。
解决
$ wget https://dl.k8s.io/v1.6.0/kubernetes-server-linux-amd64.tar.gz$ tar -xzvf kubernetes-server-linux-amd64.tar.gz$ cd kubernetes$ tar -xzvf kubernetes-src.tar.gz$ cp -r server/bin/{kube-apiserver,kube-controller-manager,kube-scheduler,kubectl,kube-proxy,kubelet} /usr/local/bin/
4. kubelet 启动失败
[root@k8s-master opt]# systemctl status kubelet● kubelet.service - Kubernetes Kubelet Server Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled) Active: failed (Result: start-limit) since Sat 2018-12-15 08:09:53 UTC; 1s ago Docs: https://github.com/GoogleCloudPlatform/kubernetes Process: 12585 ExecStart=/usr/local/bin/kubelet $KUBE_LOGTOSTDERR $KUBE_LOG_LEVEL $KUBELET_API_SERVER $KUBELET_ADDRESS $KUBELET_PORT $KUBELET_HOSTNAME $KUBE_ALLOW_PRIV $KUBELET_POD_INFRA_CONTAINER $KUBELET_ARGS (code=exited, status=200/CHDIR) Main PID: 12585 (code=exited, status=200/CHDIR)Dec 15 08:09:53 k8s-master.localdomain systemd[1]: Unit kubelet.service entered failed state.Dec 15 08:09:53 k8s-master.localdomain systemd[1]: kubelet.service failed.Dec 15 08:09:53 k8s-master.localdomain systemd[1]: kubelet.service holdoff time over, scheduling restart.Dec 15 08:09:53 k8s-master.localdomain systemd[1]: Stopped Kubernetes Kubelet Server.Dec 15 08:09:53 k8s-master.localdomain systemd[1]: start request repeated too quickly for kubelet.serviceDec 15 08:09:53 k8s-master.localdomain systemd[1]: Failed to start Kubernetes Kubelet Server.Dec 15 08:09:53 k8s-master.localdomain systemd[1]: Unit kubelet.service entered failed state.Dec 15 08:09:53 k8s-master.localdomain systemd[1]: kubelet.service failed.
监控日志输出:
Dec 15 08:25:30 k8s-master.localdomain systemd[1]: Started Kubernetes Kubelet Server.Dec 15 08:25:30 k8s-master.localdomain systemd[1]: kubelet.service: main process exited, code=exited, status=200/CHDIRDec 15 08:25:30 k8s-master.localdomain systemd[1]: Unit kubelet.service entered failed state.Dec 15 08:25:30 k8s-master.localdomain systemd[1]: kubelet.service failed.Dec 15 08:25:30 k8s-master.localdomain systemd[1]: kubelet.service holdoff time over, scheduling restart.Dec 15 08:25:30 k8s-master.localdomain systemd[1]: Stopped Kubernetes Kubelet Server.Dec 15 08:25:30 k8s-master.localdomain systemd[1]: Started Kubernetes Kubelet Server.Dec 15 08:25:30 k8s-master.localdomain systemd[13491]: Failed at step CHDIR spawning /usr/local/bin/kubelet: No such file or directoryDec 15 08:25:30 k8s-master.localdomain systemd[1]: kubelet.service: main process exited, code=exited, status=200/CHDIRDec 15 08:25:30 k8s-master.localdomain systemd[1]: Unit kubelet.service entered failed state.Dec 15 08:25:30 k8s-master.localdomain systemd[1]: kubelet.service failed.Dec 15 08:25:31 k8s-master.localdomain systemd[1]: kubelet.service holdoff time over, scheduling restart.Dec 15 08:25:31 k8s-master.localdomain systemd[1]: Stopped Kubernetes Kubelet Server.Dec 15 08:25:31 k8s-master.localdomain systemd[1]: Started Kubernetes Kubelet Server.Dec 15 08:25:31 k8s-master.localdomain systemd[1]: kubelet.service: main process exited, code=exited, status=200/CHDIRDec 15 08:25:31 k8s-master.localdomain systemd[1]: Unit kubelet.service entered failed state.Dec 15 08:25:31 k8s-master.localdomain systemd[1]: kubelet.service failed.Dec 15 08:25:31 k8s-master.localdomain systemd[1]: kubelet.service holdoff time over, scheduling restart.Dec 15 08:25:31 k8s-master.localdomain systemd[1]: Stopped Kubernetes Kubelet Server.Dec 15 08:25:31 k8s-master.localdomain systemd[1]: Started Kubernetes Kubelet Server.Dec 15 08:25:31 k8s-master.localdomain systemd[1]: kubelet.service: main process exited, code=exited, status=200/CHDIRDec 15 08:25:31 k8s-master.localdomain systemd[1]: Unit kubelet.service entered failed state.Dec 15 08:25:31 k8s-master.localdomain systemd[1]: kubelet.service failed.Dec 15 08:25:31 k8s-master.localdomain systemd[1]: kubelet.service holdoff time over, scheduling restart.Dec 15 08:25:31 k8s-master.localdomain systemd[1]: Stopped Kubernetes Kubelet Server.Dec 15 08:25:31 k8s-master.localdomain systemd[1]: Started Kubernetes Kubelet Server.Dec 15 08:25:31 k8s-master.localdomain systemd[1]: kubelet.service: main process exited, code=exited, status=200/CHDIRDec 15 08:25:31 k8s-master.localdomain systemd[1]: Unit kubelet.service entered failed state.Dec 15 08:25:31 k8s-master.localdomain systemd[1]: kubelet.service failed.Dec 15 08:25:31 k8s-master.localdomain systemd[1]: kubelet.service holdoff time over, scheduling restart.Dec 15 08:25:31 k8s-master.localdomain systemd[1]: Stopped Kubernetes Kubelet Server.Dec 15 08:25:31 k8s-master.localdomain systemd[1]: start request repeated too quickly for kubelet.serviceDec 15 08:25:31 k8s-master.localdomain systemd[1]: Failed to start Kubernetes Kubelet Server.Dec 15 08:25:31 k8s-master.localdomain systemd[1]: Unit kubelet.service entered failed state.Dec 15 08:25:31 k8s-master.localdomain systemd[1]: kubelet.service failed.
Failed at step CHDIR spawning /usr/local/bin/kubelet: No such file or directory
非常可以,查看这个文件的路径是否正确。奇怪的是这个文件路径是正确的,于是继续看 systemd 配置文件:
[Unit]Description=Kubernetes Kubelet ServerDocumentation=https://github.com/GoogleCloudPlatform/kubernetesAfter=docker.serviceRequires=docker.service[Service]WorkingDirectory=/var/lib/kubeletEnvironmentFile=-/etc/kubernetes/configEnvironmentFile=-/etc/kubernetes/kubeletExecStart=/usr/local/bin/kubelet \ $KUBE_LOGTOSTDERR \ $KUBE_LOG_LEVEL \ $KUBELET_API_SERVER \ $KUBELET_ADDRESS \ $KUBELET_PORT \ $KUBELET_HOSTNAME \ $KUBE_ALLOW_PRIV \ $KUBELET_POD_INFRA_CONTAINER \ $KUBELET_ARGSRestart=on-failure[Install]WantedBy=multi-user.target
文件路径还有 /var/lib/kubelet
,查看一番,的确没有。
解决
mkdir /var/lib/kubelet
5. Approved CSR 后获取 nodes 失败
[root@k8s-master ~]# kubectl get csrNAME AGE REQUESTOR CONDITIONcsr-bv37w 19m kubelet-bootstrap Approvedcsr-bwlxd 1m kubelet-bootstrap Approvedcsr-f8w38 33m kubelet-bootstrap Approvedcsr-g8927 47m kubelet-bootstrap Approvedcsr-h7wph 4m kubelet-bootstrap Approvedcsr-hpl81 50m kubelet-bootstrap Approvedcsr-qxxsh 40m kubelet-bootstrap Approvedcsr-r7vzl 51m kubelet-bootstrap Approvedcsr-w1ccb 21m kubelet-bootstrap Approved[root@k8s-master ~]# kubectl get nodesNo resources found.
查看日志:
[root@k8s-master ~]# journalctl -xe -u kube* | grep errorDec 15 08:50:15 k8s-master.localdomain kube-scheduler[14917]: E1215 08:50:15.200937 14917 leaderelection.go:229] error retrieving resource lock kube-system/kube-scheduler: Get http://192.168.1.5:8080/api/v1/namespaces/kube-system/endpoints/kube-scheduler: dial tcp 192.168.1.5:8080: getsockopt: connection refused
kube-controller-manager 访问 kube-apiserver 的 192.168.1.5:8080 失败了,那么查看一下端口信息:
[root@k8s-master ~]# netstat -lpntu...tcp 0 0 127.0.0.1:8080 0.0.0.0:* LISTEN 14885/kube-apiserve
8080 端口绑定的 ip 是 127.0.0.1
而不是 192.168.1.5
,可能是配置文件有问题?查看一下:
KUBE_API_ADDRESS="--advertise-address=192.168.1.5 --bind-address=192.168.1.5 --insecure-bind-address=192.168.1.5 --insecure-port=8080 --insecure-bind-address=127.0.0.1"
的确绑定到了 127.0.0.1
。
解决
修改 /etc/kubernetes/apiserver 配置文件:
KUBE_API_ADDRESS="--advertise-address=192.168.1.5 --bind-address=192.168.1.5 --insecure-bind-address=192.168.1.5 --insecure-port=8080 --insecure-bind-address=192.168.1.5"
重启,解决。
[root@k8s-master kubernetes]# kubectl get csrNAME AGE REQUESTOR CONDITIONcsr-96lj4 37s kubelet-bootstrap Approved,Issuedcsr-bv37w 23m kubelet-bootstrap Approved,Issuedcsr-bwlxd 5m kubelet-bootstrap Approved,Issuedcsr-dpqgm 37s kubelet-bootstrap Approved,Issuedcsr-f8w38 37m kubelet-bootstrap Approved,Issuedcsr-g8927 52m kubelet-bootstrap Approved,Issuedcsr-h7wph 9m kubelet-bootstrap Approved,Issuedcsr-hpl81 55m kubelet-bootstrap Approved,Issuedcsr-qxxsh 44m kubelet-bootstrap Approved,Issuedcsr-r7vzl 56m kubelet-bootstrap Approved,Issuedcsr-w1ccb 26m kubelet-bootstrap Approved,Issued[root@k8s-master kubernetes]# kubectl get nodesNAME STATUS AGE VERSION192.168.1.12 NotReady 10s v1.6.0192.168.1.5 Ready 7s v1.6.0
6. 访问 pod app 失败
root@k8s-master ~]# kubectl run nginx --replicas=2 --labels="run=load-balancer-example" --image=nginx --port=80deployment "nginx" created[root@k8s-master ~]# kubectl expose deployment nginx --type=NodePort --name=example-serviceservice "example-service" exposed[root@k8s-master ~]# kubectl get deploymentNAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGEnginx 2 2 2 0 27s[root@k8s-master ~]# kubectl describe svc example-serviceName: example-serviceNamespace: defaultLabels: run=load-balancer-exampleAnnotations:Selector: run=load-balancer-exampleType: NodePortIP: 10.254.109.60Port: 80/TCPNodePort: 30019/TCPEndpoints: 172.17.0.2:80Session Affinity: NoneEvents: [root@k8s-master ~]# curl "10.254.109.60:80"^C[root@k8s-master ~]# curl "172.17.0.2:80"^C
解决
第一反应,可能还是 no_proxy 代理的问题,试一试。
export no_proxy='localhost,192.168.1.5,192.168.1.12,10.254.109.60,172.17.0.2'
再试一次就可以了。
[root@k8s-master kubernetes]# source /etc/profile[root@k8s-master kubernetes]# curl "10.254.109.60:80"Welcome to nginx! Welcome to nginx!
If you see this page, the nginx web server is successfully installed andworking. Further configuration is required.
For online documentation and support please refer tonginx.org.
Commercial support is available atnginx.com.Thank you for using nginx.
[root@k8s-master ~]# curl "172.17.0.2:80"Welcome to nginx! Welcome to nginx!
If you see this page, the nginx web server is successfully installed andworking. Further configuration is required.
For online documentation and support please refer tonginx.org.
Commercial support is available atnginx.com.Thank you for using nginx.