山河已无恙
作者山河已无恙·2023-01-10 09:31
开发工程师·浩鲸科技

关于K8s中工作节点扩容、隔离、恢复的一些笔记

字数 12406阅读 839评论 0赞 1

扩容

在使用 k8s的过程中,当现有节点不足以支撑业务时,比如多实例导致的端口冲突,资源不够造成的驱逐等因素,考虑对节点进行扩容。添加工作节点到集群。

在Kubernetes集群中,一个新Node的加入。如果使用 kubeadm 的方式,和新建节点的时候基本类似,一个 node 节点,机器上实际跑的 Service只有 docker 和 kubelet,其他的比如 kube-proxy,网络相关等都是通过容器的方式。

下面为当前环境,192.168.26.156是最开始扩容测试加入的机器。

┌──[root@vms81.liruilongs.github.io]-[~/ansible]  
└─$kubectl get nodes -o wide  
NAME                          STATUS   ROLES                  AGE    VERSION   INTERNAL-IP      EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION                CONTAINER-RUNTIME  
vms156.liruilongs.github.io   Ready                     35m    v1.22.2   192.168.26.156           CentOS Linux 7 (Core)   3.10.0-693.el7.x86_64         docker://20.10.21  
vms81.liruilongs.github.io    Ready    control-plane,master   324d   v1.22.2   192.168.26.81            CentOS Linux 7 (Core)   3.10.0-1160.76.1.el7.x86_64   docker://20.10.9  
vms82.liruilongs.github.io    Ready                     324d   v1.22.2   192.168.26.82            CentOS Linux 7 (Core)   3.10.0-693.el7.x86_64         docker://20.10.9  
vms83.liruilongs.github.io    Ready                     324d   v1.22.2   192.168.26.83            CentOS Linux 7 (Core)   3.10.0-693.el7.x86_64         docker://20.10.9  
┌──[root@vms81.liruilongs.github.io]-[~]  
└─$

现在要在当前集群添加 192.168.26.155 这台机器。需要做下面一些步骤

需要配置SSH免密,这不是必须,这些使用了 Ansible ,所以配置

┌──[root@vms81.liruilongs.github.io]-[~/ansible]  
└─$ssh-copy-id root@192.168.26.155

然后通过 Ansible 做一些节点的初始化操作

┌──[root@vms81.liruilongs.github.io]-[~/ansible]  
└─$cat init_k8s_node.yaml  
- name: init k8s  
  hosts: 192.168.26.155  
  tasks:  
# 关闭防火墙  
- shell: firewall-cmd --set-default-zone=trusted  
# 关闭selinux  
- shell: getenforce  
  register: out  
- debug: msg="{{out}}"  
- shell: setenforce 0  
  when: out.stdout != "Disabled"  
- replace:  
    path: /etc/selinux/config  
    regexp: "SELINUX=enforcing"  
    replace: "SELINUX=disabled"  
- shell: cat /etc/selinux/config  
  register: out  
- debug: msg="{{out}}"  
   # 关闭交换分区  
- shell: swapoff -a  
- shell: sed -i '/swap/d' /etc/fstab  
- shell: cat /etc/fstab  
  register: out  
- debug: msg="{{out}}"  
# 安装docker-ce  
- yum:  
    name: docker-ce  
    state: present  
# 配置docker加速  
- shell: mkdir /etc/docker  
- copy:  
    src: ./daemon.json  
    dest: /etc/docker/daemon.json  
- shell: systemctl daemon-reload  
- shell: systemctl restart docker  
# 配置需要修改的内核参数  
- copy:  
    src: ./k8s.conf  
    dest: /etc/sysctl.d/k8s.conf  
┌──[root@vms81.liruilongs.github.io]-[~/ansible]  
└─$

上面的剧本里有一些配置文件copy需要注意

内核参数修改

┌──[root@vms81.liruilongs.github.io]-[~/ansible]  
└─$cat k8s.conf  
net.bridge.bridge-nf-call-ip6tables = 1  
net.bridge.bridge-nf-call-iptables = 1  
net.ipv4.ip_forward = 1

镜像加速文件

┌──[root@vms81.liruilongs.github.io]-[~/ansible]  
└─$cat daemon.json  
{  
  "registry-mirrors": ["https://2tefyfv7.mirror.aliyuncs.com"]  
}  
┌──[root@vms81.liruilongs.github.io]-[~/ansible]  
└─$

清单文件,如果扩容多个节点,需要修改

┌──[root@vms81.liruilongs.github.io]-[~/ansible]  
└─$cat node-host  
192.168.26.155

然后直接执行就可以

┌──[root@vms81.liruilongs.github.io]-[~/ansible]  
└─$ansible-playbook init_k8s_node.yaml -i node-host

初始化环境完成折后需要配置yum源,k8s 的yum 源,其实这个可以放到剧本里

┌──[root@vms155.liruilongs.github.io]-[/etc/yum.repos.d]  
└─$cat k8s.repo  
[kubernetes]  
name=Kubernetes  
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/  
enabled=1  
gpgcheck=0  
repo_gpgcheck=0  
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-    key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg

安装需要的服务,这里要注意和集群版本一定要一致,这一步也可以放到剧本里

┌──[root@vms156.liruilongs.github.io]-[/etc/yum.repos.d]  
└─$yum install -y kubelet-1.22.2-0 kubeadm-1.22.2-0 kubectl-1.22.2-0 --disableexcludes=kubernetes

docke需要修改资源管理驱动为systemd,不然会报下面的问题,当然这个也可以写到剧本里 err="failed to run Kubelet: misconfiguration: kubelet cgroup driver: \"systemd\" is different from docker cgroup driver: \"cgroupfs\""

┌──[root@vms81.liruilongs.github.io]-[~/ansible]  
└─$ansible all -m shell -a "sed  -i '3i ,\\"exec-opts\\": [\\"native.cgroupdriver=systemd\\"]' /etc/docker/daemon.json" -i node-host  

192.168.26.155 | CHANGED | rc=0 >>

restart docke,当然也可以 reload 之后 start

┌──[root@vms81.liruilongs.github.io]-[~/ansible]  
└─$ansible all -m shell -a 'systemctl restart docker' -i node-host  
192.168.26.155 | CHANGED | rc=0 >>

加入节点之前,需要在master创建一个 token

┌──[root@vms81.liruilongs.github.io]-[~]  
└─$kubeadm token create --print-join-command  
kubeadm join 192.168.26.81:6443 --token vmya1o.xprnhn8ub6wzzb2e --discovery-token-ca-cert-hash sha256:2e17952177d9c633254e6941849885fc8e0e16dde805425effa22ed04415e7d4

复制命令,在需要加入的节点执行,这里会通过 kubea 把新节点上kubelet 需要的配置文件生成,注册到 master 上,如果不执行直接启动 ,节点的 kubelet 是无法启动的,

┌──[root@vms81.liruilongs.github.io]-[~/ansible]  
└─$ansible all -m shell -a 'kubeadm join 192.168.26.81:6443 --token vmya1o.xprnhn8ub6wzzb2e --discovery-token-ca-cert-hash sha256:2e17952177d9c633254e6941849885fc8e0e16dde805425effa22ed04415e7d4' -i node-host  

192.168.26.155 | CHANGED | rc=0 >>  
[preflight] Running pre-flight checks  
[preflight] Reading configuration from the cluster...  
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'  
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"  
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"  
[kubelet-start] Starting the kubelet  
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...  

This node has joined the cluster:  
* Certificate signing request was sent to apiserver and a response was received.  
* The Kubelet was informed of the new secure connection details.  

Run 'kubectl get nodes' on the control-plane to see this node join the cluster. [WARNING Service-Docker]: docker service is not enabled, please run 'systemctl enable docker.service'  
    [WARNING Service-Kubelet]: kubelet service is not enabled, please run 'systemctl enable kubelet.service'

根据提示,处理下警告

┌──[root@vms81.liruilongs.github.io]-[~/ansible]  
└─$ansible all -m shell -a 'systemctl enable docker.service --now;systemctl enable kubelet.service --now' -i node-host  
192.168.26.155 | CHANGED | rc=0 >>  
Created symlink from /etc/systemd/system/multi-user.target.wants/docker.service to /usr/lib/systemd/system/docker.service.  
Created symlink from /etc/systemd/system/multi-user.target.wants/kubelet.service to /usr/lib/systemd/system/kubelet.service.  
┌──[root@vms81.liruilongs.github.io]-[~/ansible]  
└─$

master 查看节点状态,vms155.liruilongs.github.io Ready

┌──[root@vms81.liruilongs.github.io]-[~/ansible]  
└─$kubectl get nodes -o wide  
NAME                          STATUS   ROLES                  AGE     VERSION   INTERNAL-IP      EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION                CONTAINER-RUNTIME  
vms155.liruilongs.github.io   Ready                     7m19s   v1.22.2   192.168.26.155           CentOS Linux 7 (Core)   3.10.0-693.el7.x86_64         docker://20.10.21  
vms156.liruilongs.github.io   Ready                     66m     v1.22.2   192.168.26.156           CentOS Linux 7 (Core)   3.10.0-693.el7.x86_64         docker://20.10.21  
vms81.liruilongs.github.io    Ready    control-plane,master   324d    v1.22.2   192.168.26.81            CentOS Linux 7 (Core)   3.10.0-1160.76.1.el7.x86_64   docker://20.10.9  
vms82.liruilongs.github.io    Ready                     324d    v1.22.2   192.168.26.82            CentOS Linux 7 (Core)   3.10.0-693.el7.x86_64         docker://20.10.9  
vms83.liruilongs.github.io    Ready                     324d    v1.22.2   192.168.26.83            CentOS Linux 7 (Core)   3.10.0-693.el7.x86_64         docker://20.10.9  
┌──[root@vms81.liruilongs.github.io]-[~/ansible]  
└─$

可以对比下 新旧节点 静态 pod 是否一致

┌──[root@vms81.liruilongs.github.io]-[~/ansible]  
└─$kubectl get pods -A  -owide  | grep '192.168.26.83'  
kube-system                calico-node-fv458                                     1/1     Running            91 (2d23h ago)    324d   192.168.26.83    vms83.liruilongs.github.io                 
kube-system                kube-proxy-xccmp                                      1/1     Running            23 (2d23h ago)    324d   192.168.26.83    vms83.liruilongs.github.io                 
metallb-system             speaker-bbl94                                         1/1     Running            66 (2d23h ago)    315d   192.168.26.83    vms83.liruilongs.github.io                 
┌──[root@vms81.liruilongs.github.io]-[~/ansible]  
└─$kubectl get pods -A  -owide  | grep '192.168.26.155'  
kube-system                calico-node-vxpxt                                     1/1     Running            0                 117m   192.168.26.155   vms155.liruilongs.github.io                
kube-system                kube-proxy-htg7t                                      1/1     Running            0                 117m   192.168.26.155   vms155.liruilongs.github.io                
metallb-system             speaker-6mwfj                                         0/1     CrashLoopBackOff   27 (3m10s ago)    117m   192.168.26.155   vms155.liruilongs.github.io                
┌──[root@vms81.liruilongs.github.io]-[~/ansible]  
└─$

隔离恢复

有时候可能需要下线某个节点,做一个硬件方面的处理,这个时候,就需要隔离节点。

k8s的隔离可以通过节点的 drain实现,如果一个节点被设置为drain,则此节点不再被调度pod,且此节点上已经运行的pod会被驱逐(evicted)到其他节点,当然 daemonsets 不会,如果也驱逐,那没有任何意义。

drain 包含两种状态:cordon不可被调度,evicted驱逐当前节点所以pod

这里的--ignore-daemonsets 用户忽略守护set

┌──[root@vms81.liruilongs.github.io]-[~/ansible]  
└─$kubectl drain vms155.liruilongs.github.io --ignore-daemonsets  
node/vms155.liruilongs.github.io cordoned  
WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-vxpxt, kube-system/kube-proxy-htg7t, metallb-system/speaker-6mwfj  
node/vms155.liruilongs.github.io drained

查看节点状态,为SchedulingDisabled,以为已经被 drain

┌──[root@vms81.liruilongs.github.io]-[~/ansible]  
└─$kubectl get nodes -o wide  
NAME                          STATUS                     ROLES                  AGE    VERSION   INTERNAL-IP      EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION                CONTAINER-RUNTIME  
vms155.liruilongs.github.io   Ready,SchedulingDisabled                    19m    v1.22.2   192.168.26.155           CentOS Linux 7 (Core)   3.10.0-693.el7.x86_64         docker://20.10.21  
vms156.liruilongs.github.io   Ready                                       78m    v1.22.2   192.168.26.156           CentOS Linux 7 (Core)   3.10.0-693.el7.x86_64         docker://20.10.21  
vms81.liruilongs.github.io    Ready                      control-plane,master   324d   v1.22.2   192.168.26.81            CentOS Linux 7 (Core)   3.10.0-1160.76.1.el7.x86_64   docker://20.10.9  
vms82.liruilongs.github.io    Ready                                       324d   v1.22.2   192.168.26.82            CentOS Linux 7 (Core)   3.10.0-693.el7.x86_64         docker://20.10.9  
vms83.liruilongs.github.io    Ready                                       324d   v1.22.2   192.168.26.83            CentOS Linux 7 (Core)   3.10.0-693.el7.x86_64         docker://20.10.9

处理完相关的事情之后,可以通过 uncordon 来恢复节点。

┌──[root@vms81.liruilongs.github.io]-[~/ansible]  
└─$kubectl uncordon vms155.liruilongs.github.io  
node/vms155.liruilongs.github.io uncordoned  
┌──[root@vms81.liruilongs.github.io]-[~/ansible]  
└─$kubectl get nodes -o wide  
NAME                          STATUS   ROLES                  AGE    VERSION   INTERNAL-IP      EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION                CONTAINER-RUNTIME  
vms155.liruilongs.github.io   Ready                     20m    v1.22.2   192.168.26.155           CentOS Linux 7 (Core)   3.10.0-693.el7.x86_64         docker://20.10.21  
vms156.liruilongs.github.io   Ready                     79m    v1.22.2   192.168.26.156           CentOS Linux 7 (Core)   3.10.0-693.el7.x86_64         docker://20.10.21  
 vms81.liruilongs.github.io    Ready    control-plane,master   324d   v1.22.2   192.168.26.81            CentOS Linux 7 (Core)   3.10.0-1160.76.1.el7.x86_64   docker://20.10.9  

vms82.liruilongs.github.io Ready 324d v1.22.2 192.168.26.82 CentOS Linux 7 (Core) 3.10.0-693.el7.x86_64 docker://20.10.9
vms83.liruilongs.github.io Ready 324d v1.22.2 192.168.26.83 CentOS Linux 7 (Core) 3.10.0-693.el7.x86_64 docker://20.10.9
┌──[root@vms81.liruilongs.github.io]-[~/ansible]
└─$

如果觉得我的文章对您有用,请点赞。您的支持将鼓励我继续创作!

1

添加新评论0 条评论

Ctrl+Enter 发表

作者其他文章

相关文章

相关问题

相关资料

X社区推广