[TOC]
Kubernetes1.27版本安装记录过程
安装参考: https://kubernetes.io/zh-cn/docs/setup/production-environment/tools/kubeadm/install-kubeadm/
安装前准备
集群规划
- 系统: Ubuntu22.04
root@k8s-master01:~# lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04.2 LTS Release: 22.04 Codename: jammy
- 安装工具:
kubeadm
- 集群分布: 3主节点 + 2个工作节点
- 容器运行时: 选用
Docker
- 因为1.24版本在k8s主干分支中移除了
dockershim
,并将其给了docker维护,后将其改名为cni-dockerd
,因此我们需要安装:- 安装Docker
- 安装cni-dockerd
- 因为1.24版本在k8s主干分支中移除了
采用堆叠的方式创建高可用集群(etcd和控制平面其他组件在同一个节点上):
集群节点图:
高可用参考: https://developer.aliyun.com/article/853054
前提准备
- 节点之中不可以有重复的主机名、MAC 地址或 product_uuid。请参见这里了解更多详细信息。
- 主机名验证
# 查看主机名 hostname # 修改为k8s-master (替换成每个节点主机名) hostnamectl hostname k8s-master
- 主机名验证
- Mac地址验证
使用命令 ip link 或 ifconfig -a 来获取网络接口的 MAC 地址
- Product_uuid验证
sudo cat /sys/class/dmi/id/product_uuid
- 配置主机名映射
配置文件:
/etc/hosts
192.168.0.18 k8s-master 192.168.0.19 k8s-master01 192.168.0.20 k8s-master02 192.168.0.21 k8s-master03 192.168.0.22 k8s-slave01 192.168.0.23 k8s-slave02
192.168.0.18 k8s-master
这一条是为了配置高可用集群准备的。 - 关闭防火墙
ufw disable
- 转发 IPv4 并让 iptables 看到桥接流量
执行下述指令:
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf overlay br_netfilter EOF sudo modprobe overlay sudo modprobe br_netfilter # 设置所需的 sysctl 参数,参数在重新启动后保持不变 cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf net.bridge.bridge-nf-call-iptables = 1 net.bridge.bridge-nf-call-ip6tables = 1 net.ipv4.ip_forward = 1 EOF # 应用 sysctl 参数而不重新启动 sudo sysctl --system
通过运行以下指令确认
br_netfilter
和overlay
模块被加载
lsmod | grep br_netfilter
lsmod | grep overlay
通过运行以下指令确认 net.bridge.bridge-nf-call-iptables
、net.bridge.bridge-nf-call-ip6tables
和 net.ipv4.ip_forward
系统变量在你的 sysctl
配置中被设置为 1:
sysctl net.bridge.bridge-nf-call-iptables net.bridge.bridge-nf-call-ip6tables net.ipv4.ip_forward
- 一台兼容的 Linux 主机。Kubernetes 项目为基于 Debian 和 Red Hat 的 Linux 发行版以及一些不提供包管理器的发行版提供通用的指令。
- 每台机器 2 GB 或更多的 RAM(如果少于这个数字将会影响你应用的运行内存)。
- CPU 2 核心及以上。
- 集群中的所有机器的网络彼此均能相互连接(公网和内网都可以)。
- 开启机器上的某些端口。请参见这里了解更多详细信息。
nc 127.0.0.1 6443
无输出,表示正常
- 禁用交换分区。为了保证 kubelet 正常工作,你必须禁用交换分区。
-
- 例如,
sudo swapoff -a
将暂时禁用交换分区。要使此更改在重启后保持不变,请确保在如/etc/fstab
、systemd.swap
等配置文件中禁用交换分区,具体取决于你的系统如何配置。#关闭swap swapoff -a sed -ri 's/.*swap.*/#&/' /etc/fstab
- 例如,
-
- 查看主机cgroup版本
stat -fc %T /sys/fs/cgroup/
对于 cgroup v1,输出为
tmpfs
。对于 cgroup v2,输出为
cgroup2fs
。 cgroup v2 具有以下要求:- 操作系统发行版启用 cgroup v2
- Linux 内核为 5.8 或更高版本
- 容器运行时支持 cgroup v2。例如:
- containerd v1.4 和更高版本
- cri-o v1.20 和更高版本
- kubelet 和容器运行时被配置为使用 systemd cgroup 驱动
参考: https://kubernetes.io/zh-cn/docs/concepts/architecture/cgroups/
开始安装
1. 安装容器运行时(所有节点执行)
https://kubernetes.io/zh-cn/docs/setup/production-environment/container-runtimes/#docker
安装前准备
更新apt 并 安装必要工具包
sudo apt-get remove docker docker-engine docker.io containerd runc
sudo apt-get update
sudo apt-get install ca-certificates curl gnupg
添加Docker官方GPG key
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
设置apt库
echo \
"deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
"$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
安装Docker
安装24.0.0
版本Docker
sudo apt-get update
VERSION_STRING=5:24.0.0-1~ubuntu.22.04~jammy
sudo apt-get install docker-ce=$VERSION_STRING docker-ce-cli=$VERSION_STRING containerd.io docker-buildx-plugin docker-compose-plugin -y
验证是否成功
root@k8s-master01:~# docker version
Client: Docker Engine - Community
Version: 24.0.0
API version: 1.43
Go version: go1.20.4
Git commit: 98fdcd7
Built: Mon May 15 18:49:22 2023
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 24.0.0
API version: 1.43 (minimum version 1.12)
Go version: go1.20.4
Git commit: 1331b8c
Built: Mon May 15 18:49:22 2023
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.21
GitCommit: 3dce8eb055cbb6872793272b4f20ed16117344f8
runc:
Version: 1.1.7
GitCommit: v1.1.7-0-g860f061
docker-init:
Version: 0.19.0
GitCommit: de40ad0
手动设置Docker
使用systemd
cgroup驱动
修改: /etc/docker/daemon.json
文件,添加如下内容 保存重启docker
cat > /etc/docker/daemon.json << EOF
{
"registry-mirrors": ["https://84bkfzte.mirror.aliyuncs.com"],
"exec-opts": ["native.cgroupdriver=systemd"]
}
EOF
systemctl daemon-reload && systemctl restart docker
启动并设置开机启动
systemctl enable docker --now
查看Docker的cgroup驱动是systemd
root@k8s-master01:~# docker info | grep -i cgroup
Cgroup Driver: systemd
Cgroup Version: 2
cgroupns
安装cri-dockerd
下载cri-docker版本:0.3.2
https://github.com/Mirantis/cri-dockerd/releases/tag/v0.3.2
安装deb软件包
sudo dpkg -i ./cri-dockerd_0.3.2.3-0.ubuntu-jammy_amd64.deb
添加infra容器镜像配置
修改镜像地址为国内,否则kubelet拉取不了镜像导致启动失败
sudo sed -i 's|ExecStart=/usr/bin/cri-dockerd --container-runtime-endpoint fd://|ExecStart=/usr/bin/cri-dockerd --container-runtime-endpoint fd:// --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.9|' /usr/lib/systemd/system/cri-docker.service
修改以后内容如下:
cat /usr/lib/systemd/system/cri-docker.service
ExecStart=/usr/bin/cri-dockerd --container-runtime-endpoint fd:// --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.9
启动并设置开机启动
systemctl daemon-reload
systemctl enable cri-docker --now
2. 安装 kubeadm、kubelet 和 kubectl (所有节点执行)
你需要在每台机器上安装以下的软件包:
kubeadm
:用来初始化集群的指令。kubelet
:在集群中的每个节点上用来启动 Pod 和容器等。kubectl
:用来与集群通信的命令行工具。
- 更新
apt
包索引并安装使用 Kubernetesapt
仓库所需要的包, 并配置阿里apt源:
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl
curl -fsSL https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | sudo gpg --dearmor -o /usr/share/keyrings/kubernetes-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
- 更新
apt
包索引,安装 kubelet、kubeadm 和 kubectl,并锁定其版本:
sudo apt-get update
sudo apt-get install -y kubelet=1.27.1-00 kubeadm=1.27.1-00 kubectl=1.27.1-00
sudo apt-mark hold kubelet kubeadm kubectl
- 查看对应版本
root@k8s-master01:~# kubectl version -o yaml
clientVersion:
buildDate: "2023-04-14T13:21:19Z"
compiler: gc
gitCommit: 4c9411232e10168d7b050c49a1b59f6df9d7ea4b
gitTreeState: clean
gitVersion: v1.27.1
goVersion: go1.20.3
major: "1"
minor: "27"
platform: linux/amd64
kustomizeVersion: v5.0.1
The connection to the server localhost:8080 was refused - did you specify the right host or port?
root@k8s-master01:~# kubelet --version
Kubernetes v1.27.1
root@k8s-master01:~# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.1", GitCommit:"4c9411232e10168d7b050c49a1b59f6df9d7ea4b", GitTreeState:"clean", BuildDate:"2023-04-14T13:20:04Z", GoVersion:"go1.20.3", Compiler:"gc", Platform:"linux/amd64"}
root@k8s-master01:~#
注意:kubelet 现在每隔几秒就会重启,因为它陷入了一个等待 kubeadm 指令的死循环(正常现象,初始化好主节点就好了)。
3. 配置cgoup驱动程序(所有节点执行)
配置配置kubelet cgroup驱动
警告:
你需要确保容器运行时和 kubelet 所使用的是相同的 cgroup 驱动,否则 kubelet 进程会失败。
相, 我们都是使用 systemd
, 新建文件并写入内容:
root@k8s-master01:~# cat << EOF >> /etc/default/kubelet
KUBELET_EXTRA_ARGS="--cgroup-driver=systemd"
EOF
root@k8s-master01:~# cat /etc/default/kubelet
KUBELET_EXTRA_ARGS="--cgroup-driver=systemd"
root@k8s-master01:~# systemctl restart kubelet
4. 配置负载均衡(控制平面节点执行)
说明
我们使用KeepAlive和HAProxy完成负载均衡高可用的配置。
安装KeepAlive(k8s静态Pod方式安装)
创建/etc/keepalived/keepalived.conf
配置文件(主: k8s-master01节点)
sudo mkdir -p /etc/keepalived
sudo cat << EOF > /etc/keepalived/keepalived.conf
! /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
router_id LVS_DEVEL
}
vrrp_script check_apiserver {
# 指定验证KeepAlive是否存活脚本位置
script "/etc/keepalived/check_apiserver.sh"
interval 3
weight -2
fall 10
rise 2
}
vrrp_instance VI_1 {
! 指定MASTER 或者 BACKUP, 这里我们使用k8s-master01节点作为MASTER
state MASTER
! 网卡名称
interface enp0s3
! 指定router_id,集群该值必须相同,这里指定为:51
virtual_router_id 51
! 优先级: MASTER节点优先级要高于BACKUP节点,我们MASTER节点配置:101, BACKUP节点设置:100
priority 101
authentication {
auth_type PASS
! 验证密码: 集群该值必须相同,这里指定为:42
auth_pass 42
}
virtual_ipaddress {
! 虚拟IP地址: 改地址将作为KeepAlive对外暴露的地址,指定的IP必须是你集群所在的网络里面没有被使用的IP地址,这里指定:192.168.0.18
! 同时改地址也是将要指定kubeadm init 命令 --control-plane-endpoint 参数中的,至于端口, 需要在HAProxy里面指定
192.168.0.18
}
track_script {
check_apiserver
}
}
EOF
cat /etc/keepalived/keepalived.conf
创建/etc/keepalived/keepalived.conf
配置文件(备: k8s-master02、k8s-master03节点)
sudo mkdir -p /etc/keepalived
sudo cat << EOF > /etc/keepalived/keepalived.conf
! /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
router_id LVS_DEVEL
}
vrrp_script check_apiserver {
# 指定验证KeepAlive是否存活脚本位置
script "/etc/keepalived/check_apiserver.sh"
interval 3
weight -2
fall 10
rise 2
}
vrrp_instance VI_1 {
! 指定MASTER 或者 BACKUP, 这里我们使用k8s-master01节点作为MASTER
state BACKUP
! 网卡名称
interface enp0s3
! 指定router_id,集群该值必须相同,这里指定为:51
virtual_router_id 51
! 优先级: MASTER节点优先级要高于BACKUP节点,我们MASTER节点配置:101, BACKUP节点设置:100
priority 100
authentication {
auth_type PASS
! 验证密码: 集群该值必须相同,这里指定为:42
auth_pass 42
}
virtual_ipaddress {
! 虚拟IP地址: 改地址将作为KeepAlive对外暴露的地址,指定的IP必须是你集群所在的网络里面没有被使用的IP地址,这里指定:192.168.0.18
! 同时改地址也是将要指定kubeadm init 命令 --control-plane-endpoint 参数中的,至于端口, 需要在HAProxy里面指定
192.168.0.18
}
track_script {
check_apiserver
}
}
EOF
cat /etc/keepalived/keepalived.conf
创建心跳检测文件(主备都执行: k8s-master01、k8s-master02、k8s-master03)
sudo mkdir -p /etc/keepalived
sudo cat << EOF > /etc/keepalived/check_apiserver.sh
# /etc/keepalived/check_apiserver.sh
#!/bin/sh
errorExit() {
echo "*** $*" 1>&2
exit 1
}
curl --silent --max-time 2 --insecure https://localhost:8443/ -o /dev/null || errorExit "Error GET https://localhost:8443/"
if ip addr | grep -q 192.168.0.18; then
curl --silent --max-time 2 --insecure https://192.168.0.18:8443/ -o /dev/null || errorExit "Error GET https://192.168.0.18:8443/"
fi
EOF
cat /etc/keepalived/check_apiserver.sh
创建Keepalive Pod yaml文件(主备都执行: k8s-master01、k8s-master02、k8s-master03)
文件名: /etc/kubernetes/manifests/keepalived.yaml
sudo cat << EOF > /etc/kubernetes/manifests/keepalived.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
name: keepalived
namespace: kube-system
spec:
containers:
- image: osixia/keepalived:2.0.17
name: keepalived
resources: {}
securityContext:
capabilities:
add:
- NET_ADMIN
- NET_BROADCAST
- NET_RAW
volumeMounts:
- mountPath: /usr/local/etc/keepalived/keepalived.conf
name: config
- mountPath: /etc/keepalived/check_apiserver.sh
name: check
hostNetwork: true
volumes:
- hostPath:
path: /etc/keepalived/keepalived.conf
name: config
- hostPath:
path: /etc/keepalived/check_apiserver.sh
name: check
status: {}
EOF
cat /etc/kubernetes/manifests/keepalived.yaml
安装HAProxy(k8s静态Pod方式安装)
说明:
由于现在没有进行kubeadm init
操作,因此现在kubelet
组件启动不了,因此想要看到效果需要等到kubeadm init
指定完成以后。
创建HAProxy配置文件 (k8s-master01、k8s-master02、k8s-master03执行)
sudo mkdir -p /etc/haproxy
sudo cat << EOF > /etc/haproxy/haproxy.cfg
# /etc/haproxy/haproxy.cfg
#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
log /dev/log local0
log /dev/log local1 notice
daemon
#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
mode http
log global
option httplog
option dontlognull
option http-server-close
option forwardfor except 127.0.0.0/8
option redispatch
retries 1
timeout http-request 10s
timeout queue 20s
timeout connect 5s
timeout client 20s
timeout server 20s
timeout http-keep-alive 10s
timeout check 10s
#---------------------------------------------------------------------
# apiserver frontend which proxys to the control plane nodes
#---------------------------------------------------------------------
frontend apiserver
# 指定负载均衡绑定的地址和端口,这里的端口需要和/etc/keepalived/check_apiserver.sh文件中监控的端口相同
bind *:8443
mode tcp
option tcplog
default_backend apiserver
#---------------------------------------------------------------------
# round robin balancing for apiserver
#---------------------------------------------------------------------
backend apiserver
option httpchk GET /healthz
http-check expect status 200
mode tcp
option ssl-hello-chk
balance roundrobin
# HAProxy负载均衡器代理的后端节点
server k8s-master01 192.168.0.19:6443 check
server k8s-master02 192.168.0.20:6443 check
server k8s-master03 192.168.0.21:6443 check
# [...]
EOF
cat /etc/haproxy/haproxy.cfg
创建HAProxy Pod 需要的yaml文件(k8s-master01、k8s-master02、k8s-master03执行)
sudo cat << EOF > /etc/kubernetes/manifests/haproxy.yaml
apiVersion: v1
kind: Pod
metadata:
name: haproxy
namespace: kube-system
spec:
containers:
- image: haproxy:2.1.4
name: haproxy
livenessProbe:
failureThreshold: 8
httpGet:
host: localhost
path: /healthz
# 指定HAProxy代理的端口,该端口必须和/etc/haproxy/haproxy.cfg配置的端口相同
port: 8443
scheme: HTTPS
volumeMounts:
- mountPath: /usr/local/etc/haproxy/haproxy.cfg
name: haproxyconf
readOnly: true
hostNetwork: true
volumes:
- hostPath:
path: /etc/haproxy/haproxy.cfg
type: FileOrCreate
name: haproxyconf
status: {}
EOF
cat /etc/kubernetes/manifests/haproxy.yaml
针对Keepalive和HAProxy的说明
- 我们配置负载均衡使用
8443
端口而没有使用默认的6443
端口是因为我们Keepalive和HAProxy都部署在主节点上,而主节点上也部署Kubernetes的api-server组件,而api-server主键已经占用了6443
端口,因此,我们这里配置了8443
端口。 - 当使用Keepalived和HAProxy组合作为高可用负载均衡时,可以构建一个可靠的架构来提供高可用性和负载均衡功能。下面是一个简化的架构图,以可视化方式展示Keepalived和HAProxy的组合:
+----------------------+ | Load Balancer | | (HAProxy) | +----------------------+ | | | | +--------+ +--------+ | | +--------------+ +--------------+ | Backend 1 | | Backend 2 | +--------------+ +--------------+
在上面的架构图中,有以下组件:
- Load Balancer (HAProxy):负责接收客户端请求并将其转发到后端服务器。HAProxy是一种高性能的负载均衡器,能够根据不同的负载均衡算法将请求分发到后端服务器,以实现负载均衡和高可用性。
- Backend 1 和 Backend 2:这些是真实的后端服务器,用于处理客户端请求。可以有多个后端服务器,以实现负载均衡和高可用性。这些后端服务器可以是应用服务器、数据库服务器等。
- Keepalived:用于实现高可用性的组件。Keepalived监测Load Balancer节点的可用性,并在主节点发生故障时将其切换到备份节点。Keepalived使用虚拟IP地址(VIP)来提供无缝的故障转移和高可用性。
在这个架构中,客户端请求首先到达Load Balancer(HAProxy),然后根据负载均衡算法选择一个后端服务器进行处理。如果Load Balancer节点出现故障,Keepalived会自动检测到,并将主节点的VIP切换到备份节点,以确保服务的持续可用性。
- 访问HAProxy的入口是Keepalived提供的虚拟IP(VIP)。Keepalived会将虚拟IP绑定到主节点上,以便客户端可以通过该虚拟IP与负载均衡器通信。
在高可用负载均衡架构中,客户端不需要直接连接到单个负载均衡器节点。相反,客户端将请求发送到虚拟IP地址,该虚拟IP地址由Keepalived管理并绑定到当前的主节点上。通过这种方式,客户端可以无需关心主节点和备份节点之间的切换,而始终通过虚拟IP与负载均衡器通信。
当Keepalived检测到主节点故障时,它会自动将虚拟IP迁移到备份节点上,以实现无缝的故障转移。这样,客户端可以继续使用相同的虚拟IP与负载均衡器通信,无需感知主备节点的切换。
总结起来,访问HAProxy的入口就是通过Keepalived提供的虚拟IP。客户端可以使用该虚拟IP来连接负载均衡器,并由负载均衡器将请求转发到后端服务器。
5. 初始化Master节点(控制平面节点执行)
参考界面
kubeadm init
命令运行过程:
https://kubernetes.io/zh-cn/docs/reference/setup-tools/kubeadm/kubeadm-init/#custom-images
kubeadm config print
打印kubeadm join
或者 kubeadm init
命令默认值:
https://kubernetes.io/zh-cn/docs/reference/setup-tools/kubeadm/kubeadm-config/#cmd-config-print
查看kubeadm init
命令默认配置文件 (参考)
这里输出的yaml格式不正确: 正确格式参考:
https://kubernetes.io/docs/reference/config-api/kubeadm-config.v1beta3/
root@k8s-master01:~# kubeadm config print init-defaults | tee kubeadm-config.yaml
apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 1.2.3.4
bindPort: 6443
nodeRegistration:
criSocket: unix:///var/run/containerd/containerd.sock
imagePullPolicy: IfNotPresent
name: node
taints: null
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns: {}
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: registry.k8s.io
kind: ClusterConfiguration
kubernetesVersion: 1.27.0
networking:
dnsDomain: cluster.local
serviceSubnet: 10.96.0.0/12
scheduler: {}
下载镜像
手动安装1.27.1版本需要的镜像
# 注意: 这里 EOF 是使用单引号括起来的,不适用单引号,后面的脚本文件会执行,导致结果错误
sudo cat << 'EOF' > download_image.sh
#!/bin/bash
# Kubernetes 安装的版本
KUBERNETES_VERSION=$(kubeadm version | grep -oP 'GitVersion:"v\K[^"]+')
# 阿里Kubernetes官方镜像库
AILI_KUBERNETES_REGISTRY="registry.cn-hangzhou.aliyuncs.com/google_containers"
echo "KUBERNETES_VERSION => ${KUBERNETES_VERSION}"
echo "AILI_KUBERNETES_REGISTRY => ${AILI_KUBERNETES_REGISTRY}"
# 下载并重命名镜像
function download_and_tag_image() {
# 官方镜像全称: registry.k8s.io/xxx/xxx:xxx
# 比如: registry.k8s.io/kube-proxy:v1.27.1
local full_official_image=$1
local ali_image
ali_image=$(echo "$full_official_image" | sed -E "s|(.*/)(.*)|$AILI_KUBERNETES_REGISTRY/\2|")
echo "downloading image => $ali_image"
echo "downloading image => $ali_image"
sudo docker pull "$ali_image"
# 重命名镜像
echo "rename image $ali_image to $full_official_image"
sudo docker tag "$ali_image" "$full_official_image"
}
# 官方镜像列表
OFFICIAL_IMAGE_LIST=$(kubeadm config images list --kubernetes-version "$KUBERNETES_VERSION" 2>/dev/null | grep "$OFFICIAL_KUBERNETES_REGISTRY")
for official_image in $OFFICIAL_IMAGE_LIST; do
download_and_tag_image "$official_image"
done
EOF
cat download_image.sh
sudo chmod u+x ./download_image.sh && ./download_image.sh
启动初始化
root@k8s-master01:~# kubeadm init \
--apiserver-advertise-address=192.168.0.19 \
--kubernetes-version v1.27.1 \
--service-cidr=10.96.0.0/12 \
--pod-network-cidr=10.244.0.0/16 \
--cri-socket=unix:///var/run/cri-dockerd.sock \
--control-plane-endpoint=k8s-master:8443 \
--upload-certs
[init] Using Kubernetes version: v1.27.1
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
W0520 20:12:40.038258 30744 images.go:80] could not find officially supported version of etcd for Kubernetes v1.27.1, falling back to the nearest etcd version (3.5.7-0)
W0520 20:12:40.293262 30744 checks.go:835] detected that the sandbox image "registry.k8s.io/pause:3.6" of the container runtime is inconsistent with that used by kubeadm. It is recommended that using "registry.k8s.io/pause:3.9" as the CRI sandbox image.
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [k8s-master k8s-master01 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.0.19]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [k8s-master01 localhost] and IPs [192.168.0.19 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [k8s-master01 localhost] and IPs [192.168.0.19 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
W0520 20:12:42.465414 30744 endpoint.go:57] [endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "admin.conf" kubeconfig file
W0520 20:12:42.555482 30744 endpoint.go:57] [endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "kubelet.conf" kubeconfig file
W0520 20:12:42.934781 30744 endpoint.go:57] [endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
W0520 20:12:43.058171 30744 endpoint.go:57] [endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
W0520 20:12:43.628841 30744 images.go:80] could not find officially supported version of etcd for Kubernetes v1.27.1, falling back to the nearest etcd version (3.5.7-0)
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 9.011713 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
3636bc7d84515aeb36ca79597792b07cc64a888ebdea9221ab68a5bae93ac947
[mark-control-plane] Marking the node k8s-master01 as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node k8s-master01 as control-plane by adding the taints [node-role.kubernetes.io/control-plane:NoSchedule]
[bootstrap-token] Using token: th5i1f.fnzc9v0yb6z3aok8
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
W0520 20:12:54.340208 30744 endpoint.go:57] [endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of the control-plane node running the following command on each as root:
kubeadm join k8s-master:8443 --token th5i1f.fnzc9v0yb6z3aok8 \
--discovery-token-ca-cert-hash sha256:25357bff7f44a787886222dc9439916ab271dc5af5d5bbef274288fdd8e245b4 \
--control-plane --certificate-key 3636bc7d84515aeb36ca79597792b07cc64a888ebdea9221ab68a5bae93ac947
Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
"kubeadm init phase upload-certs --upload-certs" to reload certs afterward.
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join k8s-master:8443 --token th5i1f.fnzc9v0yb6z3aok8 \
--discovery-token-ca-cert-hash sha256:25357bff7f44a787886222dc9439916ab271dc5af5d5bbef274288fdd8e245b4
--apiserver-advertise-address=192.168.0.20
: 用于为控制平面节点的 API server 设置广播地址(必须指定master节点IP)--cri-socket=unix:///var/run/cri-dockerd.sock
: 指定CRI 套接字的路径,我们这里有两个运行时环境: containerd 和 cri-dockerd, 这里我们指定cri-dockerd。 注意:kubeadm join
命令的时候也需要指定。--control-plane-endpoint=k8s-master:8443
: 用于为所有控制平面(高可用环境必须指定)节点设置共享端点,集群节点都要需要配置/etc/hosts
将k8s-master
指定我们Keepalive中配置的虚拟IP地址,8443
这个端口是我们之前在HAProxy中配置的。
kubeadm 不支持将没有 --control-plane-endpoint
参数的单个控制平面集群转换为高可用性集群。
--service-cidr=10.96.0.0/12
: 指定service网段--pod-network-cidr=10.244.0.0/16
: 指定pod IP地址网段--kubernetes-version 1.27.0
: 指定k8s版本--upload-certs
: 指定上传证书 , 高可用集群建议指定。- 如果不指定,也可以手动指定: 参考如下部分证书分配手册。
- 或者安装完成集群以后,重新上传证书:
sudo kubeadm init phase upload-certs --upload-certs
这里有可能失败,如果失败就手动下载这个镜像: registry.k8s.io/pause:3.6
cat << 'EOF' > download.sh
#!/bin/bash
# 官方镜像列表
OFFICIAL_IMAGE_LIST=("$@")
# 阿里Kubernetes官方惊镜像库
AILI_KUBERNETES_REGISTRY="registry.cn-hangzhou.aliyuncs.com/google_containers"
echo "AILI_KUBERNETES_REGISTRY => ${AILI_KUBERNETES_REGISTRY}"
# 下载并重命名镜像
function download_and_tag_image() {
# 官方镜像全称: registry.k8s.io/xxx/xxx:xxx
# 比如: registry.k8s.io/kube-proxy:v1.27.1
local full_official_image=$1
local ali_image
ali_image=$(echo "$full_official_image" | sed -E "s|(.*/)(.*)|$AILI_KUBERNETES_REGISTRY/\2|")
echo "downloading image => $ali_image"
sudo docker pull "$ali_image"
# 重命名镜像
echo "rename image $ali_image to $full_official_image"
sudo docker tag "$ali_image" "$full_official_image"
}
for official_image in "${OFFICIAL_IMAGE_LIST[@]}"; do
download_and_tag_image "$official_image"
done
EOF
sudo chmod u+x ./download.sh && ./download.sh registry.k8s.io/pause:3.6
也有可能报错:Nameserver limits exceeded
root@k8s-master01:/etc/kubernetes# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Tue 2023-05-30 00:34:59 CST; 3s ago
Docs: https://kubernetes.io/docs/home/
Main PID: 7672 (kubelet)
Tasks: 13 (limit: 13832)
Memory: 25.7M
CPU: 588ms
CGroup: /system.slice/kubelet.service
└─7672 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --containe>
May 30 00:35:01 k8s-master01 kubelet[7672]: E0530 00:35:01.033186 7672 dns.go:158] "Nameserver limits exceeded" err="Nameserver limits were exceeded, some nameservers have been omitted, >
解决:Nameserver limits exceeded
参考: https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/#known-issues
root@k8s-master01:~# cat /etc/resolv.conf
# This is /run/systemd/resolve/stub-resolv.conf managed by man:systemd-resolved(8).
# Do not edit.
#
# This file might be symlinked as /etc/resolv.conf. If you're looking at
# /etc/resolv.conf and seeing this text, you have followed the symlink.
#
# This is a dynamic resolv.conf file for connecting local clients to the
# internal DNS stub resolver of systemd-resolved. This file lists all
# configured search domains.
#
# Run "resolvectl status" to see details about the uplink DNS servers
# currently in use.
#
# Third party programs should typically not access this file directly, but only
# through the symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a
# different way, replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.
## 注释: 这里只有三条解析,我们做修改
nameserver 127.0.0.53
options edns0 trust-ad
search .
root@k8s-master01:~# cat /etc/systemd/resolved.conf
# This file is part of systemd.
#
# systemd is free software; you can redistribute it and/or modify it under the
# terms of the GNU Lesser General Public License as published by the Free
# Software Foundation; either version 2.1 of the License, or (at your option)
# any later version.
#
# Entries in this file show the compile time defaults. Local configuration
# should be created by either modifying this file, or by creating "drop-ins" in
# the resolved.conf.d/ subdirectory. The latter is generally recommended.
# Defaults can be restored by simply deleting this file and all drop-ins.
#
# Use 'systemd-analyze cat-config systemd/resolved.conf' to display the full config.
#
# See resolved.conf(5) for details.
## 注释: 这个文件没有解析,不做修改
[Resolve]
# Some examples of DNS servers which may be used for DNS= and FallbackDNS=:
# Cloudflare: 1.1.1.1#cloudflare-dns.com 1.0.0.1#cloudflare-dns.com 2606:4700:4700::1111#cloudflare-dns.com 2606:4700:4700::1001#cloudflare-dns.com
# Google: 8.8.8.8#dns.google 8.8.4.4#dns.google 2001:4860:4860::8888#dns.google 2001:4860:4860::8844#dns.google
# Quad9: 9.9.9.9#dns.quad9.net 149.112.112.112#dns.quad9.net 2620:fe::fe#dns.quad9.net 2620:fe::9#dns.quad9.net
#DNS=
#FallbackDNS=
#Domains=
#DNSSEC=no
#DNSOverTLS=no
#MulticastDNS=no
#LLMNR=no
#Cache=no-negative
#CacheFromLocalhost=no
#DNSStubListener=yes
#DNSStubListenerExtra=
#ReadEtcHosts=yes
#ResolveUnicastSingleLabel=no
root@k8s-master01:~# cat /run/systemd/resolve/resolv.conf
# This is /run/systemd/resolve/resolv.conf managed by man:systemd-resolved(8).
# Do not edit.
#
# This file might be symlinked as /etc/resolv.conf. If you're looking at
# /etc/resolv.conf and seeing this text, you have followed the symlink.
#
# This is a dynamic resolv.conf file for connecting local clients directly to
# all known uplink DNS servers. This file lists all configured search domains.
#
# Third party programs should typically not access this file directly, but only
# through the symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a
# different way, replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.
# 这个文件有多条nameserver解析,我们注释其中的两条,因为我们不使用ip6,所以这里注释掉ip6解析
nameserver 192.168.0.1
nameserver 192.168.1.1
# 注释掉
# nameserver fe80::1%2
# Too many DNS servers configured, the following entries may be ignored.
# 注释掉
# nameserver 240c::6666
search .
root@k8s-master01:~# vim /run/systemd/resolve/resolv.conf
# 重启
root@k8s-master01:~# systemctl restart kubelet
# 重置kubeadm后,重新执行以上kubeadm init命令,即可解决
root@k8s-master01:~# kubeadm reset \
--cri-socket=unix:///var/run/cri-dockerd.sock
[reset] Reading configuration from the cluster...
[reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W0530 00:45:44.483361 37269 reset.go:106] [reset] Unable to fetch the kubeadm-config ConfigMap from cluster: failed to get config map: configmaps "kubeadm-config" not found
W0530 00:45:44.483476 37269 preflight.go:56] [reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
W0530 00:45:46.625805 37269 removeetcdmember.go:106] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] Deleted contents of the etcd data directory: /var/lib/etcd
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of directories: [/etc/kubernetes/manifests /var/lib/kubelet /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d
The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.
If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.
The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.
重要的说明:
要重新配置一个已经创建的集群, 请参见重新配置一个 kubeadm 集群。
要再次运行 kubeadm init
,你必须首先卸载集群。
配置用户可以使用kubectl
命令
非root用户执行(root用户也可以):
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
root用户(如果执行完上面的语句, 下面可以不执行)
export KUBECONFIG=/etc/kubernetes/admin.conf
验证kubectl
命令是否可用
root@k8s-master01:~# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master NotReady control-plane 21m v1.27.1
安装容器网络(CNI)插件
为了保证pod之间能够正常访问,需要安装容器网络插件。
注意: 每个集群只能安装一个 Pod 网络(容器网络插件)。
你必须部署一个基于 Pod 网络插件的 容器网络接口 (CNI),以便你的 Pod 可以相互通信。 在安装网络之前,集群 DNS (CoreDNS) 将不会启动。
我们选择: Calico
root@k8s-master01:~# kubectl apply -f https://projectcalico.docs.tigera.io/archive/v3.25/manifests/calico.yaml
poddisruptionbudget.policy/calico-kube-controllers created
serviceaccount/calico-kube-controllers created
serviceaccount/calico-node created
configmap/calico-config created
customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/bgppeers.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/blockaffinities.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/caliconodestatuses.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/clusterinformations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/felixconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworksets.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/hostendpoints.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamblocks.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamconfigs.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamhandles.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ippools.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipreservations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/kubecontrollersconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networksets.crd.projectcalico.org created
clusterrole.rbac.authorization.k8s.io/calico-kube-controllers created
clusterrole.rbac.authorization.k8s.io/calico-node created
clusterrolebinding.rbac.authorization.k8s.io/calico-kube-controllers created
clusterrolebinding.rbac.authorization.k8s.io/calico-node created
daemonset.apps/calico-node created
deployment.apps/calico-kube-controllers created
查看是否安装完成:
root@k8s-master01:~# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-6c99c8747f-dmwg7 0/1 Pending 0 118s
kube-system calico-node-t5bc7 0/1 Init:0/3 0 118s
kube-system coredns-5d78c9869d-gm5vt 0/1 Pending 0 33m
kube-system coredns-5d78c9869d-xgkbj 0/1 Pending 0 33m
kube-system etcd-k8s-master 1/1 Running 0 34m
kube-system kube-apiserver-k8s-master 1/1 Running 0 34m
kube-system kube-controller-manager-k8s-master 1/1 Running 0 34m
kube-system kube-proxy-d26m7 1/1 Running 0 33m
kube-system kube-scheduler-k8s-master 1/1 Running 0 34m
root@k8s-master01:~# kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-6c99c8747f-cfqg9 1/1 Running 0 41s
kube-system calico-node-rczss 1/1 Running 0 41s
kube-system coredns-5d78c9869d-gm5vt 1/1 Running 0 80m
kube-system coredns-5d78c9869d-xgkbj 1/1 Running 0 80m
kube-system etcd-k8s-master 1/1 Running 1 (10m ago) 80m
kube-system kube-apiserver-k8s-master 1/1 Running 1 (10m ago) 80m
kube-system kube-controller-manager-k8s-master 1/1 Running 1 (10m ago) 80m
kube-system kube-proxy-d26m7 1/1 Running 1 (10m ago) 80m
kube-system kube-scheduler-k8s-master 1/1 Running 1 (10m ago) 80m
需要等10多分钟。。。
最后 calico
相关的pod都 Running
状态, 并且都是 READY 1/1
,表示安装完成, 可以看到 calico
安装完成以后, coredns
相关的pod也都是Running
状态了。
移除控制平面污点
默认控制平面会打上污点,pod是不会被调度到控制平面上的, 如果想要让pod调度到控制平面上,可以执行以下命令移除污点:
root@k8s-master01:~# kubectl taint nodes --all node-role.kubernetes.io/control-plane-
node/k8s-master untainted
7. Worker节点加入集群(所有Woker节点执行,必须使用root用户执行)
我这里就是: k8s-slave-1
和 k8s-slave02
这两台集群执行:
root@k8s-slave01:~# kubeadm join k8s-master:8443 --token th5i1f.fnzc9v0yb6z3aok8 \
--discovery-token-ca-cert-hash sha256:25357bff7f44a787886222dc9439916ab271dc5af5d5bbef274288fdd8e245b4 \
--cri-socket=unix:///var/run/cri-dockerd.sock
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
如果在等一段时间还是node还是NotReady
状态,则使用在master
节点中执行以下命令:
root@k8s-master01:~# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master01 Ready control-plane 27m v1.27.1
k8s-master02 Ready control-plane 10m v1.27.1
k8s-master03 Ready control-plane 9m45s v1.27.1
k8s-slave01 NotReady <none> 49s v1.27.1
k8s-slave02 NotReady <none> 44s v1.27.1
root@k8s-master01:~# kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-6c99c8747f-fslqk 1/1 Running 0 23m
kube-system calico-node-6wgk4 0/1 Init:Error 4 (76s ago) 2m37s
kube-system calico-node-grb97 1/1 Running 0 11m
kube-system calico-node-ltczv 0/1 Init:CrashLoopBackOff 4 (56s ago) 2m42s
kube-system calico-node-pffcg 1/1 Running 0 12m
kube-system calico-node-vtcqg 1/1 Running 0 23m
kube-system coredns-5d78c9869d-m5zgd 1/1 Running 0 28m
kube-system coredns-5d78c9869d-mnxzj 1/1 Running 0 28m
kube-system etcd-k8s-master01 1/1 Running 0 29m
kube-system etcd-k8s-master02 1/1 Running 0 12m
kube-system etcd-k8s-master03 1/1 Running 0 11m
kube-system haproxy-k8s-master01 1/1 Running 0 29m
kube-system haproxy-k8s-master02 1/1 Running 0 12m
kube-system haproxy-k8s-master03 1/1 Running 0 11m
kube-system keepalived-k8s-master01 1/1 Running 0 29m
kube-system keepalived-k8s-master02 1/1 Running 0 12m
kube-system keepalived-k8s-master03 1/1 Running 0 10m
kube-system kube-apiserver-k8s-master01 1/1 Running 0 29m
kube-system kube-apiserver-k8s-master02 1/1 Running 0 12m
kube-system kube-apiserver-k8s-master03 1/1 Running 1 (11m ago) 11m
kube-system kube-controller-manager-k8s-master01 1/1 Running 1 (12m ago) 29m
kube-system kube-controller-manager-k8s-master02 1/1 Running 0 12m
kube-system kube-controller-manager-k8s-master03 1/1 Running 0 10m
kube-system kube-proxy-lmw7g 1/1 Running 0 11m
kube-system kube-proxy-mb8hx 0/1 ErrImagePull 0 2m42s
kube-system kube-proxy-nvx8b 0/1 ImagePullBackOff 0 2m37s
kube-system kube-proxy-phvcm 1/1 Running 0 28m
kube-system kube-proxy-psst7 1/1 Running 0 12m
kube-system kube-scheduler-k8s-master01 1/1 Running 1 (12m ago) 29m
kube-system kube-scheduler-k8s-master02 1/1 Running 0 12m
kube-system kube-scheduler-k8s-master03 1/1 Running 0 10m
# 可以看到kube-proxy镜像可能现在失败,我们看一下Pod的详情
root@k8s-master01:~# kubectl describe pod kube-proxy-mb8hx -n kube-system
# 在Events里面可以看到是应为镜像没有pull下来,我们去两个node节点手动下载一下镜像
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 3m57s default-scheduler Successfully assigned kube-system/kube-proxy-mb8hx to k8s-slave01
Warning Failed 3m24s kubelet Failed to pull image "registry.k8s.io/kube-proxy:v1.27.1": rpc error: code = Unknown desc = Error response from daemon: Head "https://asia-east1-docker.pkg.dev/v2/k8s-artifacts-prod/images/kube-proxy/manifests/v1.27.1": dial tcp 64.233.187.82:443: i/o timeout
Normal Pulling 62s (x4 over 3m54s) kubelet Pulling image "registry.k8s.io/kube-proxy:v1.27.1"
Warning Failed 31s (x4 over 3m24s) kubelet Error: ErrImagePull
Warning Failed 31s (x3 over 2m42s) kubelet Failed to pull image "registry.k8s.io/kube-proxy:v1.27.1": rpc error: code = Unknown desc = Error response from daemon: Head "https://asia-east1-docker.pkg.dev/v2/k8s-artifacts-prod/images/kube-proxy/manifests/v1.27.1": dial tcp 64.233.188.82:443: i/o timeout
Normal BackOff 5s (x6 over 3m23s) kubelet Back-off pulling image "registry.k8s.io/kube-proxy:v1.27.1"
Warning Failed 5s (x6 over 3m23s) kubelet Error: ImagePullBackOff
# k8s-slave01节点下载镜像
root@k8s-slave01:~# bash download.sh registry.k8s.io/kube-proxy:v1.27.1
AILI_KUBERNETES_REGISTRY => registry.cn-hangzhou.aliyuncs.com/google_containers
downloading image => registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.27.1
v1.27.1: Pulling from google_containers/kube-proxy
b6425c1785a5: Pull complete
5730c7a042b6: Pull complete
Digest: sha256:958ddb03a4d4d7a567d3563c759a05f3e95aa42ca8af2964aa76867aafc43610
Status: Downloaded newer image for registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.27.1
registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.27.1
rename image registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.27.1 to registry.k8s.io/kube-proxy:v1.27.1
# k8s-slave02节点下载镜像
root@k8s-slave02:~# bash download.sh registry.k8s.io/kube-proxy:v1.27.1
AILI_KUBERNETES_REGISTRY => registry.cn-hangzhou.aliyuncs.com/google_containers
downloading image => registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.27.1
v1.27.1: Pulling from google_containers/kube-proxy
b6425c1785a5: Pull complete
5730c7a042b6: Pull complete
Digest: sha256:958ddb03a4d4d7a567d3563c759a05f3e95aa42ca8af2964aa76867aafc43610
Status: Downloaded newer image for registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.27.1
registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.27.1
rename image registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.27.1 to registry.k8s.io/kube-proxy:v1.27.
# 看到kube-proxy已经Running并且READY 了
root@k8s-master01:~# kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-6c99c8747f-fslqk 1/1 Running 0 28m
kube-system calico-node-6wgk4 0/1 Init:CrashLoopBackOff 6 (57s ago) 7m38s
kube-system calico-node-grb97 1/1 Running 0 16m
kube-system calico-node-ltczv 0/1 Init:CrashLoopBackOff 6 (79s ago) 7m43s
kube-system calico-node-pffcg 1/1 Running 0 17m
kube-system calico-node-vtcqg 1/1 Running 0 28m
kube-system coredns-5d78c9869d-m5zgd 1/1 Running 0 33m
kube-system coredns-5d78c9869d-mnxzj 1/1 Running 0 33m
kube-system etcd-k8s-master01 1/1 Running 0 34m
kube-system etcd-k8s-master02 1/1 Running 0 17m
kube-system etcd-k8s-master03 1/1 Running 0 16m
kube-system haproxy-k8s-master01 1/1 Running 0 34m
kube-system haproxy-k8s-master02 1/1 Running 0 17m
kube-system haproxy-k8s-master03 1/1 Running 0 16m
kube-system keepalived-k8s-master01 1/1 Running 0 34m
kube-system keepalived-k8s-master02 1/1 Running 0 17m
kube-system keepalived-k8s-master03 1/1 Running 0 15m
kube-system kube-apiserver-k8s-master01 1/1 Running 0 34m
kube-system kube-apiserver-k8s-master02 1/1 Running 0 17m
kube-system kube-apiserver-k8s-master03 1/1 Running 1 (16m ago) 16m
kube-system kube-controller-manager-k8s-master01 1/1 Running 1 (17m ago) 34m
kube-system kube-controller-manager-k8s-master02 1/1 Running 0 17m
kube-system kube-controller-manager-k8s-master03 1/1 Running 0 15m
kube-system kube-proxy-lmw7g 1/1 Running 0 16m
kube-system kube-proxy-mb8hx 1/1 Running 0 7m43s
kube-system kube-proxy-nvx8b 1/1 Running 0 7m38s
kube-system kube-proxy-phvcm 1/1 Running 0 33m
kube-system kube-proxy-psst7 1/1 Running 0 17m
kube-system kube-scheduler-k8s-master01 1/1 Running 1 (17m ago) 34m
kube-system kube-scheduler-k8s-master02 1/1 Running 0 17m
kube-system kube-scheduler-k8s-master03 1/1 Running 0 15m
# 但是现在还有Calico网络插件没有Running, 我们看一下原因
root@k8s-master01:~# kubectl describe pod calico-node-ltczv -n kube-system
# Events看到镜像已经下载成功了, 但是Back-off了,这里我们重新启动一下
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 9m38s default-scheduler Successfully assigned kube-system/calico-node-ltczv to k8s-slave01
Normal Pulled 9m35s kubelet Container image "docker.io/calico/cni:v3.25.0" already present on machine
Normal Created 9m35s kubelet Created container upgrade-ipam
Normal Started 9m35s kubelet Started container upgrade-ipam
Normal Pulled 7m55s (x5 over 9m35s) kubelet Container image "docker.io/calico/cni:v3.25.0" already present on machine
Normal Created 7m55s (x5 over 9m35s) kubelet Created container install-cni
Normal Started 7m54s (x5 over 9m34s) kubelet Started container install-cni
Warning BackOff 4m32s (x22 over 9m27s) kubelet Back-off restarting failed container install-cni in pod calico-node-ltczv_kube-system(c89e2e76-5045-4474-af93-9b839e1d2206)
# 重启一下Calico DaemonSet控制器控制的Pod
root@k8s-master01:~# kubectl -n kube-system rollout restart DaemonSet/calico-node
daemonset.apps/calico-node restarted
root@k8s-master01:~# kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-6c99c8747f-fslqk 1/1 Running 0 33m
kube-system calico-node-bbfx2 1/1 Running 0 15s
kube-system calico-node-cf55q 0/1 Init:2/3 0 4s
kube-system calico-node-ltczv 1/1 Running 0 12m
kube-system calico-node-pffcg 1/1 Running 0 22m
kube-system calico-node-vtcqg 1/1 Running 0 33m
kube-system coredns-5d78c9869d-m5zgd 1/1 Running 0 38m
kube-system coredns-5d78c9869d-mnxzj 1/1 Running 0 38m
kube-system etcd-k8s-master01 1/1 Running 0 39m
kube-system etcd-k8s-master02 1/1 Running 0 22m
kube-system etcd-k8s-master03 1/1 Running 0 21m
kube-system haproxy-k8s-master01 1/1 Running 0 39m
kube-system haproxy-k8s-master02 1/1 Running 0 22m
kube-system haproxy-k8s-master03 1/1 Running 0 21m
kube-system keepalived-k8s-master01 1/1 Running 0 39m
kube-system keepalived-k8s-master02 1/1 Running 0 22m
kube-system keepalived-k8s-master03 1/1 Running 0 20m
kube-system kube-apiserver-k8s-master01 1/1 Running 0 39m
kube-system kube-apiserver-k8s-master02 1/1 Running 0 22m
kube-system kube-apiserver-k8s-master03 1/1 Running 1 (21m ago) 21m
kube-system kube-controller-manager-k8s-master01 1/1 Running 1 (22m ago) 39m
kube-system kube-controller-manager-k8s-master02 1/1 Running 0 22m
kube-system kube-controller-manager-k8s-master03 1/1 Running 0 20m
kube-system kube-proxy-lmw7g 1/1 Running 0 21m
kube-system kube-proxy-mb8hx 1/1 Running 0 12m
kube-system kube-proxy-nvx8b 1/1 Running 0 12m
kube-system kube-proxy-phvcm 1/1 Running 0 38m
kube-system kube-proxy-psst7 1/1 Running 0 22m
kube-system kube-scheduler-k8s-master01 1/1 Running 1 (22m ago) 39m
kube-system kube-scheduler-k8s-master02 1/1 Running 0 22m
kube-system kube-scheduler-k8s-master03 1/1 Running 0 20m
# 过一会就都启动了
root@k8s-master01:~# kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-6c99c8747f-fslqk 1/1 Running 0 34m
kube-system calico-node-82c5z 1/1 Running 0 48s
kube-system calico-node-9vrzk 1/1 Running 0 37s
kube-system calico-node-bbfx2 1/1 Running 0 69s
kube-system calico-node-cf55q 1/1 Running 0 58s
kube-system calico-node-scrp4 1/1 Running 0 26s
kube-system coredns-5d78c9869d-m5zgd 1/1 Running 0 39m
kube-system coredns-5d78c9869d-mnxzj 1/1 Running 0 39m
kube-system etcd-k8s-master01 1/1 Running 0 40m
kube-system etcd-k8s-master02 1/1 Running 0 23m
kube-system etcd-k8s-master03 1/1 Running 0 22m
kube-system haproxy-k8s-master01 1/1 Running 0 40m
kube-system haproxy-k8s-master02 1/1 Running 0 23m
kube-system haproxy-k8s-master03 1/1 Running 0 22m
kube-system keepalived-k8s-master01 1/1 Running 0 40m
kube-system keepalived-k8s-master02 1/1 Running 0 23m
kube-system keepalived-k8s-master03 1/1 Running 0 20m
kube-system kube-apiserver-k8s-master01 1/1 Running 0 40m
kube-system kube-apiserver-k8s-master02 1/1 Running 0 23m
kube-system kube-apiserver-k8s-master03 1/1 Running 1 (22m ago) 22m
kube-system kube-controller-manager-k8s-master01 1/1 Running 1 (23m ago) 40m
kube-system kube-controller-manager-k8s-master02 1/1 Running 0 23m
kube-system kube-controller-manager-k8s-master03 1/1 Running 0 21m
kube-system kube-proxy-lmw7g 1/1 Running 0 22m
kube-system kube-proxy-mb8hx 1/1 Running 0 13m
kube-system kube-proxy-nvx8b 1/1 Running 0 13m
kube-system kube-proxy-phvcm 1/1 Running 0 39m
kube-system kube-proxy-psst7 1/1 Running 0 23m
kube-system kube-scheduler-k8s-master01 1/1 Running 1 (23m ago) 40m
kube-system kube-scheduler-k8s-master02 1/1 Running 0 23m
kube-system kube-scheduler-k8s-master03 1/1 Running 0 21m
# 节点都是 Ready 状态
root@k8s-master01:~# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master01 Ready control-plane 41m v1.27.1
k8s-master02 Ready control-plane 24m v1.27.1
k8s-master03 Ready control-plane 24m v1.27.1
k8s-slave01 Ready <none> 15m v1.27.1
k8s-slave02 Ready <none> 15m v1.27.1
可能会出现镜像下载失败问题等等。
8. 高可用Master主节点加入集群(比如使用Root用户执行)
参考: https://kubernetes.io/zh-cn/docs/setup/production-environment/tools/kubeadm/high-availability/
如果使用kubeadm init
时,没有指定--upload-certs
选项, 则可以现在重新配置上传证书阶段:
root@k8s-master01:~# kubeadm init phase upload-certs --upload-certs
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
c82900d92a026aa6f6498b41ea70c9602e052c88eaca3e019d99b297af43230e
- 不要忘记默认情况下,
--certificate-key
中的解密秘钥会在两个小时后过期。
我们这里就是root@k8s-master02
和 root@k8s-master03
这台机器, 使用上面命令生成的 certificate key: c82900d92a026aa6f6498b41ea70c9602e052c88eaca3e019d99b297af43230e
:
root@k8s-master02:~# kubeadm join k8s-master:8443 --token th5i1f.fnzc9v0yb6z3aok8 \
--discovery-token-ca-cert-hash sha256:25357bff7f44a787886222dc9439916ab271dc5af5d5bbef274288fdd8e245b4 \
--control-plane \
--certificate-key 3636bc7d84515aeb36ca79597792b07cc64a888ebdea9221ab68a5bae93ac947 \
--cri-socket=unix:///var/run/cri-dockerd.sock
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks before initializing the new control plane instance
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
W0520 20:29:27.062790 14892 images.go:80] could not find officially supported version of etcd for Kubernetes v1.27.1, falling back to the nearest etcd version (3.5.7-0)
W0520 20:29:27.337990 14892 checks.go:835] detected that the sandbox image "registry.k8s.io/pause:3.6" of the container runtime is inconsistent with that used by kubeadm. It is recommended that using "registry.k8s.io/pause:3.9" as the CRI sandbox image.
[download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[download-certs] Saving the certificates to the folder: "/etc/kubernetes/pki"
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [k8s-master k8s-master02 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.0.20]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [k8s-master02 localhost] and IPs [192.168.0.20 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [k8s-master02 localhost] and IPs [192.168.0.20 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[certs] Using the existing "sa" key
[kubeconfig] Generating kubeconfig files
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
W0520 20:29:29.131836 14892 endpoint.go:57] [endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "admin.conf" kubeconfig file
W0520 20:29:29.206366 14892 endpoint.go:57] [endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
W0520 20:29:29.479200 14892 endpoint.go:57] [endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[check-etcd] Checking that the etcd cluster is healthy
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[etcd] Announced new etcd member joining to the existing etcd cluster
[etcd] Creating static Pod manifest for "etcd"
W0520 20:29:31.931154 14892 images.go:80] could not find officially supported version of etcd for Kubernetes v1.27.1, falling back to the nearest etcd version (3.5.7-0)
[etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s
The 'update-status' phase is deprecated and will be removed in a future release. Currently it performs no operation
[mark-control-plane] Marking the node k8s-master02 as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node k8s-master02 as control-plane by adding the taints [node-role.kubernetes.io/control-plane:NoSchedule]
This node has joined the cluster and a new control plane instance was created:
* Certificate signing request was sent to apiserver and approval was received.
* The Kubelet was informed of the new secure connection details.
* Control plane label and taint were applied to the new node.
* The Kubernetes control plane instances scaled up.
* A new etcd member was added to the local/stacked etcd cluster.
To start administering your cluster from this node, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Run 'kubectl get nodes' to see this node join the cluster.
配置节点kubectl
命令
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
查看节点是否正常 (到此基本集群安装完成)
root@k8s-master03:~# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master01 Ready control-plane 18m v1.27.1
k8s-master02 Ready control-plane 90s v1.27.1
k8s-master03 Ready control-plane 37s v1.27.1
9. 可选操作
(可选)从控制平面节点以外的计算机控制集群
为了使 kubectl 在其他计算机(例如笔记本电脑)上与你的集群通信, 你需要将管理员 kubeconfig 文件从控制平面节点复制到工作站,如下所示:
scp root@<control-plane-host>:/etc/kubernetes/admin.conf .
kubectl --kubeconfig ./admin.conf get nodes
说明:
上面的示例假定为 root 用户启用了 SSH 访问。如果不是这种情况, 你可以使用 scp
将 admin.conf
文件复制给其他允许访问的用户。
admin.conf 文件为用户提供了对集群的超级用户特权。 该文件应谨慎使用。对于普通用户,建议生成一个你为其授予特权的唯一证书。 你可以使用 kubeadm alpha kubeconfig user --client-name <CN>
命令执行此操作。 该命令会将 KubeConfig 文件打印到 STDOUT,你应该将其保存到文件并分发给用户。 之后,使用 kubectl create (cluster)rolebinding
授予特权。
(可选)将 API 服务器代理到本地主机
如果要从集群外部连接到 API 服务器,则可以使用 kubectl proxy
:
scp root@<control-plane-host>:/etc/kubernetes/admin.conf .
kubectl --kubeconfig ./admin.conf proxy
你现在可以在本地访问 API 服务器 http://localhost:8001/api/v1
。
评论(0)