优秀的编程知识分享平台

网站首页 > 技术文章 正文

那些年我们踩过的坑,K8s二进制部署之Etcd (1)

nanyue 2024-10-12 05:42:50 技术文章 4 ℃

前言:本篇谨献给 那些年我们踩过的K8S坑

最新在整理K8S 这块的基础知识,有幸写了点教程发出来,和大伙一起学习交流

一共分为几个片段,小编会定期更新发出

  • K8S 二进制部署
  • ansible 批量部署K8S ,新增工作节点
  • K8S pod , service, 存储
  • K8S 安全框架,证书
  • K8S 弹性伸缩
  • K8S 集群网络
  • 基于jenkins 发布k8s 微服务项目
  • EFK 日志系统收集K8s中日志
  • Prometheus 监控 for k8s, Grafana 展示 k8s 监控图
  • SpringCloud 微服务容器化迁移

期待完善这些内容与小伙伴们一起学习探讨

本期,我分享的是 k8s二进制部署之Etcd,包含Etcd 集群部署,Etcd 数据备份&快照恢复

在部署之前,我会贴出 此次我分享的K8S 这块的基础环境


软件环境:

软件

版本

操作系统

CentOS7.6_x64

Docker

19.03.9-CE

Kubernetes

1.18.5

etcd

3.4.9

服务器资源规划:

角色

IP 地址

组件

机器配置

k8s-node1

1.1.1.20

kubelet, kube-proxy, docker

4c8G100g

k8s-node2

1.1.1.22

kubelet, kube-proxy, docker

4c8G100g

k8s-node3

1.1.1.24

kubelet,kube-proxy,docker

4c8G100g

etcd1

1.1.1.28

etcd

4c8G100g

k8s-master,k8s-node-4,etcd2

1.1.1.30

kube-apiserver,kube-controller-manager,kube-scheduler

kubelet,kube-proxy,docker

etcd

4c8G100g

单master拓扑图:

涉及到的软件包:

下载链接: https://pan.baidu.com/s/1jou9A-0qXo4E8KVexz_SzA

提取码: 私聊小编,回复 K8S

一、操作系统初始化配置:

# 关闭防火墙
systemctl stop firewalld
systemctl disable firewalld

# 关闭selinux
sed -i 's/enforcing/disabled/' /etc/selinux/config  # 永久
setenforce 0  # 临时

# 关闭swap
swapoff -a  # 临时
sed -ri 's/.*swap.*/#&/' /etc/fstab    # 永久

# 根据规划设置主机名
hostnamectl set-hostname <hostname>

# 在master添加hosts
cat >> /etc/hosts << EOF
#node节点
1.1.1.20	node1
1.1.1.22	node2
1.1.1.24	node3
#etcd节点
1.1.1.28	etcd1
1.1.1.30	etcd2
#master节点
1.1.1.30	master
EOF

# 将桥接的IPv4流量传递到iptables的链
cat > /etc/sysctl.d/k8s.conf << EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
sysctl --system  # 生效


#安装常用的软件
yum install vim nc wget lrzsz telnet net-tools epel-release bind-utils tree cifs-utils ntpdate iptables* -y

#卸载不需要的软件
yum remove mariadb*  firewalld*

# 时间同步
ntpdate time.windows.com

#修改所有用户的最大文件数
cat >> /etc/security/limits.conf <<eof
*       soft    nofile  65535
eof

cat >> /etc/security/limits.conf <<eof
*       hard    nofile  65535
eof


#修改普通用户的最大进程数
sed -i "/*/s#* soft nproc.*#* soft nproc 65535#g" /etc/security/limits.d/20-nproc.conf


二、部署etcd集群

Etcd 是一个分布式键值存储系统,Kubernetes使用Etcd进行数据存储,所以先准备一个Etcd数据库,为解决Etcd单点故障,应采用集群方式部署,这里使用2台组建集群,实际生产中建议部署单数集群,比如 3台,5 台.......

etcd集群节点

IP地址

项目部署路径

etcd1

1.1.1.28

/opt/etcd

etcd2

1.1.1.30

注:为了节省机器,这里与K8s节点机器复用。也可以独立于k8s集群之外部署,只要能连接到apiserver就行


2.1 使用cfssl 自签证书

使用cfssl:R1.2 证书签发工具,

需要三个软件: cfssl ,cfssl-json ,cfssl-certinfo

下载 cfssl 工具到 1.1.1.30 master 上的/usr/local/bin/ 目录下,以后签发证书的操作都在master 上进行

 [root@master1 ~]# curl -L https://pkg.cfssl.org/R1.2/cfssl_linux-amd64 -o /usr/local/bin/cfsslbrbr[root@master1 ~]# curl -L https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64 -o /usr/local/bin/cfssljsonbrbr[root@master1 ~]# curl -L https://pkg.cfssl.org/R1.2/cfssl-certinfo_linux-amd64 -o /usr/local/bin/cfssl-certinfobrbr[root@master1 ~]# chmod +x /usr/local/bin/cfssl*


1.1.1.30 master 上 创建etcd证书目录:

 [root@master1 ~]# mkdir /cert/etcd -p
 [root@master1 ~]# cd /cert/etcd


(1)、首先自建一个本地CA,生成ca证书, 准备配置文件:

[root@manager etcd]# vim ca-csr.json 
{
"CN": "etcd CA",
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"L": "Nanjing",
"ST": "Nanjing"
}
	]
	   	}

#设置证书过期时间10年

[root@manager etcd]# vim ca-config.json 
{
  "signing": {
    "default": {
      "expiry": "87600h"
    },
    "profiles": {
      "www": {
         "expiry": "87600h",
         "usages": [
            "signing",
            "key encipherment",
            "server auth",
            "client auth"
        ]
      }
    }
  }
}

#执行命令生成ca文件

 [root@master1 etcd]# cfssl gencert -initca ca-csr.json | cfssljson -bare ca

生成了 ca.csr ca-key.pem ca.pem

(2)、签发etcd 证书

#创建etcd 证书请求文件,修改请求文件中hosts字段包含所有etcd节点IP;配置需要颁发证书的信息,域名、地区等

[root@master etcd]# cat server-csr.json 
{
"CN": "etcd",
"hosts": [
"1.1.1.28",
"1.1.1.30",
"1.1.1.40",
"1.1.1.42",
"1.1.1.44",
"1.1.1.46"
],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"L": "Nanjing",
"ST": "Nanjing"
}
	 ]
		    }

注:上述文件hosts字段中IP为所有etcd节点的集群内部通信IP,一个都不能少!为了方便后期扩容可以多写几个预留的IP。


#根据请求文件颁发证书

[root@master1 etcd]# cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=www server-csr.json | cfssljson -bare server
2020/07/09 15:19:25 [INFO] generate received request
2020/07/09 15:19:25 [INFO] received CSR
2020/07/09 15:19:25 [INFO] generating key: rsa-2048
2020/07/09 15:19:25 [INFO] encoded CSR
2020/07/09 15:19:25 [INFO] signed certificate with serial number 674413723888927965027203222224747804504806490190
2020/07/09 15:19:25 [WARNING] This certificate lacks a "hosts" field. This makes it unsuitable for
websites. For more information see the Baseline Requirements for the Issuance and Management
of Publicly-Trusted Certificates, v.1.1.6, from the CA/Browser Forum (https://cabforum.org);
specifically, section 10.2.3 ("Information Requirements").
[root@master1 etcd]# ls
ca-config.json ca.csr ca-csr.json ca-key.pem ca.pem server.csr server-csr.json server-key.pem server.pem

生成了 server.csr server-csr.json server-key.pem(相当于key) server.pem (相当于crt)

#检查

[root@master1 etcd]# ls *pem

ca-key.pem ca.pem server-key.pem server.pem


2.2 下载etcd 软件包

V3.4.9

https://github.com/etcd-io/etcd/releases/download/v3.4.9/etcd-v3.4.9-linux-amd64.tar.gz


2.3 安装etcd,创建工作目录

以下在节点在etcd1【1.1.1.28】上操作,为简化操作,待会将节点etcd1生成的所有文件拷贝到 etcd2【1.1.1.30】上

1. 创建工作目录并解压二进制包

mkdir /opt/etcd/{bin,cfg,ssl} -p
tar zxvf etcd-v3.4.9-linux-amd64.tar.gz
mv etcd-v3.4.9-linux-amd64/{etcd,etcdctl} /opt/etcd/bin/

2. 创建etcd配置文件

[root@etcd1 cfg]# cat etcd.conf 
#[Member]  注释就是注释,不是配置文件定义的内容
ETCD_NAME="etcd-1"
ETCD_DATA_DIR="/var/lib/etcd/default.etcd"
ETCD_LISTEN_PEER_URLS="https://1.1.1.28:2380"
ETCD_LISTEN_CLIENT_URLS="https://1.1.1.28:2379"

##[Clustering]   注释就是注释,不是配置文件定义的内容
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://1.1.1.28:2380"
ETCD_ADVERTISE_CLIENT_URLS="https://1.1.1.28:2379"
ETCD_INITIAL_CLUSTER="etcd-1=https://1.1.1.28:2380,etcd-2=https://1.1.1.30:2380"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
ETCD_INITIAL_CLUSTER_STATE="new"
  • ETCD_NAME:节点名称,集群中唯一
  • ETCD_DATA_DIR:数据目录
  • ETCD_LISTEN_PEER_URLS:集群通信监听地址
  • ETCD_LISTEN_CLIENT_URLS:客户端访问监听地址
  • ETCD_INITIAL_ADVERTISE_PEER_URLS:集群通告地址
  • ETCD_ADVERTISE_CLIENT_URLS:客户端通告地址
  • ETCD_INITIAL_CLUSTER:集群节点地址
  • ETCD_INITIAL_CLUSTER_TOKEN:集群Token
  • ETCD_INITIAL_CLUSTER_STATE:加入集群的当前状态,new是新集群,existing表示加入已有集群


3. 使用 systemd 管理etcd服务

[root@etcd1 cfg]# vim /usr/lib/systemd/system/etcd.service 
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target

[Service]
Type=notify
EnvironmentFile=/opt/etcd/cfg/etcd.conf
ExecStart=/opt/etcd/bin/etcd \
        --cert-file=/opt/etcd/ssl/server.pem \
        --key-file=/opt/etcd/ssl/server-key.pem \
        --peer-cert-file=/opt/etcd/ssl/server.pem \
        --peer-key-file=/opt/etcd/ssl/server-key.pem \
        --trusted-ca-file=/opt/etcd/ssl/ca.pem \
        --peer-trusted-ca-file=/opt/etcd/ssl/ca.pem
Restart=on-failure
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

4. 拷贝刚才在master1 上/cert/etcd 中生成的证书


 [root@master etcd]# scp /crt/etcd/ca*pem server*pem etcd1:/opt/etcd/ssl/


5. 启动并设置开机启动

systemctl daemon-reload
systemctl start etcd
stemctl enable etcd

6. 将上面etcd1 上所有生成的文件拷贝到etcd2【1.1.1.30】上

scp -r /opt/etcd/ etcd2:/opt/
scp /usr/lib/systemd/system/etcd.service etcd2:/usr/lib/systemd/system/

etcd2【1.1.1.30】上 修改etcd.conf配置文件中的节点名称和当前服务器IP

[root@master cfg]# vim etcd.conf 
#[Member]
ETCD_NAME="etcd-2"
#节点名称,注意不能相同
ETCD_DATA_DIR="/var/lib/etcd/default.etcd"
#etcd集群之间通信
ETCD_LISTEN_PEER_URLS="https://1.1.1.30:2380" 
#etcd与外部客户端之间通信
ETCD_LISTEN_CLIENT_URLS="https://1.1.1.30:2379"

#[Clustering]
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://1.1.1.30:2380"
ETCD_ADVERTISE_CLIENT_URLS="https://1.1.1.30:2379"
#etcd集群集群列表
ETCD_INITIAL_CLUSTER="etcd-1=https://1.1.1.28:2380,etcd-2=https://1.1.1.30:2380"
#etcd集群名称
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
ETCD_INITIAL_CLUSTER_STATE="new"

7、etcd2 上启动并设置开机启动

systemctl daemon-reload
systemctl start etcd
systemctl enable etcd

8、查看etcd集群状态

[root@master etcd]# /opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/ca.pem --cert=/opt/etcd/ssl/server.pem --key=/opt/etcd/ssl/server-key.pem --endpoints="https://1.1.1.28:2379,https://1.1.1.30:2379" endpoint health

https://1.1.1.28:2379 is healthy: successfully committed proposal: took = 17.119749ms
https://1.1.1.30:2379 is healthy: successfully committed proposal: took = 18.545429ms

如果输出上面信息,就说明集群部署成功。如果有问题第一步先看日志:/var/log/message 或 journalctl -u etcd

三、Etcd 备份与恢复

etcd数据备份和恢复时要注意使用的API接口,分为2和3两个版本。

这里我使用的是ETCD 3


先来谈谈为什么要备份,首先我etcd 是部署的2节点的集群 ,按理说 已经算上是高可用了吧大家都知道etcd 是 k8s 集群 中配置存储中心,与api-server 进行互相通信,任何写入的操作最终的数据都落地到etcd 中,可见etcd 在k8s 集群中的重要性。但是 一些队友 在k8s 中人为的一些误操作 还是要对etcd 的数据进行备份,提高数据安全性


环境:

k8s 集群 二进制部署

etcd版本:     v3.4.9

etcd1              1.1.1.28

etcd2              1.1.1.30


3.1 备份

#在每个节点上都创建备份目录

[root@master-1 ~]#mkdir  /opt/etcd/bak

#在etcd-1 节点上 使用命令备份

#ETCDCTL_API=3 使用的etcd api 接口版本为3版本

#/opt/etcd/bak/snap.db 备份的快照保存的位置

#指定etcd 公钥私钥,以及ca证书

[root@master-1 bin]# ETCDCTL_API=3 /opt/etcd/bin/etcdctl snapshot save /opt/etcd/bak/snap.db \
--endpoints=https://1.1.1.28:2379 \
--cacert=/opt/etcd/ssl/ca.pem \
--cert=/opt/etcd/ssl/server.pem \
--key=/opt/etcd/ssl/server-key.pem

#查看备份的快照

[root@etcd1 ~]# ll /opt/etcd/bak/
total 2840
-rw------- 1 root root 2904096 Jul 26 14:35 snap.db
[root@etcd1 ~]# du -sh /opt/etcd/bak/
2.8M	/opt/etcd/bak/

#查看快照状态

[root@etcd1 ~]# ETCDCTL_API=3 /opt/etcd/bin/etcdctl snapshot status /opt/etcd/bak/snap.db 
2584e00c, 4099502, 1534, 2.9 MB

#把备份的数据,拷贝一份到etcd2 上,用来快照恢复

[root@etcd1 ~]#  scp -rp /opt/etcd/bak  root@1.1.1.30:/opt/etcd


3.2 模拟故障 + 快照恢复

3.2.1 模拟故障

#看看下当前deployment控制器和 pod

[root@master ~]# kubectl get pods,deployment
NAME                       READY   STATUS    RESTARTS   AGE
pod/web-5dcb957ccc-mnlsq   1/1     Running   0          14d
pod/web-5dcb957ccc-tl7fm   1/1     Running   0          14d

NAME                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/web   2/2     2            2           14d


#删除deployments控制器

[root@master ~]# kubectl delete deployments.apps web
deployment.apps "web" deleted

#再查看下当前POD,发现默认命名空间下已经没有Pod 存在了

[root@master ~]# kubectl get pods,deployment

No resources found in default namespace.


3.2.2 快照恢复

1、在每个节点上停止kube-apiserver和etcd

# 注意: 停止的 方式有所不同:

如果是kubeadm 部署的话,需要删除 etcd , Kube-apiserver 的 yaml 文件才行,如果只是删除deployment 控制器的话是没有用的

如果是 二进制 部署的话,只需要在master 节点上 停止 kube-apiserver 进程,在各etcd 节点 停止 etcd 进程即可


#我这里是 二进制部署 的 k8s 集群

master节点:

[root@master-1 bak]# systemctl stop kube-apiserver


etcd 各节点:

[root@master-1 bak]# systemctl stop etcd


2、删除个节点 etcd 数据目录(最好直接备份走从命名)

[root@master-1 bak]# mv /var/lib/etcd/default.etcd /var/lib/etcd/default.etcd-2021-0726-bak


3、在每个节点上恢复

(注意etcd节点名称,以及IP地址,以及cluster-token,每个人的环境都不一样)

#别忘了指定备份的快照

Etcd1:

[root@etcd1 ~]#  ETCDCTL_API=3 /opt/etcd/bin/etcdctl snapshot restore /opt/etcd/bak/snap.db \
> --name etcd-1 \
> --initial-cluster="etcd-1=https://1.1.1.28:2380,etcd-2=https://1.1.1.30:2380" \
> --initial-cluster-token=etcd-cluster \
> --initial-advertise-peer-urls=https://1.1.1.28:2380 \
> --data-dir=/var/lib/etcd/default.etcd
{"level":"info","ts":1627282348.6668184,"caller":"snapshot/v3_snapshot.go:296","msg":"restoring snapshot","path":"/opt/etcd/bak/snap.db","wal-dir":"/var/lib/etcd/default.etcd/member/wal","data-dir":"/var/lib/etcd/default.etcd","snap-dir":"/var/lib/etcd/default.etcd/member/snap"}
{"level":"info","ts":1627282348.7638705,"caller":"mvcc/kvstore.go:380","msg":"restored last compact revision","meta-bucket-name":"meta","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":4098223}
{"level":"info","ts":1627282348.7978258,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"dc76f537f56989d7","local-member-id":"0","added-peer-id":"4a514157b56e2ed0","added-peer-peer-urls":["https://1.1.1.30:2380"]}
{"level":"info","ts":1627282348.7979114,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"dc76f537f56989d7","local-member-id":"0","added-peer-id":"a8bdbbc86e3e09cf","added-peer-peer-urls":["https://1.1.1.28:2380"]}
{"level":"info","ts":1627282348.8819501,"caller":"snapshot/v3_snapshot.go:309","msg":"restored snapshot","path":"/opt/etcd/bak/snap.db","wal-dir":"/var/lib/etcd/default.etcd/member/wal","data-dir":"/var/lib/etcd/default.etcd","snap-dir":"/var/lib/etcd/default.etcd/member/snap"}

Etcd2:

[root@master bak]# ETCDCTL_API=3 /opt/etcd/bin/etcdctl snapshot restore /opt/etcd/bak/snap.db \
> --name etcd-2 \
> --initial-cluster="etcd-1=https://1.1.1.28:2380,etcd-2=https://1.1.1.30:2380" \
> --initial-cluster-token=etcd-cluster \
> --initial-advertise-peer-urls=https://1.1.1.30:2380 \
> --data-dir=/var/lib/etcd/default.etcd
{"level":"info","ts":1627282555.1442761,"caller":"snapshot/v3_snapshot.go:296","msg":"restoring snapshot","path":"/opt/etcd/bak/snap.db","wal-dir":"/var/lib/etcd/default.etcd/member/wal","data-dir":"/var/lib/etcd/default.etcd","snap-dir":"/var/lib/etcd/default.etcd/member/snap"}
{"level":"info","ts":1627282555.4099832,"caller":"mvcc/kvstore.go:380","msg":"restored last compact revision","meta-bucket-name":"meta","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":4098223}
{"level":"info","ts":1627282555.5169249,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"dc76f537f56989d7","local-member-id":"0","added-peer-id":"4a514157b56e2ed0","added-peer-peer-urls":["https://1.1.1.30:2380"]}
{"level":"info","ts":1627282555.517036,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"dc76f537f56989d7","local-member-id":"0","added-peer-id":"a8bdbbc86e3e09cf","added-peer-peer-urls":["https://1.1.1.28:2380"]}
{"level":"info","ts":1627282555.642511,"caller":"snapshot/v3_snapshot.go:309","msg":"restored snapshot","path":"/opt/etcd/bak/snap.db","wal-dir":"/var/lib/etcd/default.etcd/member/wal","data-dir":"/var/lib/etcd/default.etcd","snap-dir":"/var/lib/etcd/default.etcd/member/snap"}



#分别启动 api-server 和 etcd各节点

master节点:

[root@master-1 bak]# systemctl start kube-apiserver


etcd 节点

[root@etcd1 ~]# systemctl start etcd


#验证数据是否还原:

[root@master bak]# kubectl get pods,deployment
NAME                       READY   STATUS    RESTARTS   AGE
pod/web-5dcb957ccc-mnlsq   1/1     Running   0          14d
pod/web-5dcb957ccc-tl7fm   1/1     Running   0          14d

NAME                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/web   2/2     2            2           14d

可以看到被删除的POD 已经恢复回来了

另外 小编再给大家分享一个Etcd 的定时备份脚本:

[root@master-1 /]# vim /opt/etcd/back_etcd.sh

#!/bin/bash
set -e
exec >> /var/log/backup_etcd.log

Date=`date +%Y-%m-%d-%H-%M`
EtcdEndpoints="https://1.1.1.30:2379"
EtcdCmd="/opt/etcd/bin/etcdctl"
BackupDir="/opt/etcd/bak"
BackupFile="snapshot.db.$Date"
cacertfile="/opt/etcd/ssl/ca.pem"
certfile="/opt/etcd/ssl/server.pem"
keyfile="/opt/etcd/ssl/server-key.pem"

echo "`date` backup etcd..."

export ETCDCTL_API=3
$EtcdCmd snapshot save  $BackupDir/$BackupFile --endpoints=$EtcdEndpoints --cacert=$cacertfile --cert=$certfile --key=$keyfile

echo  "`date` backup done!"

效果如图所示

最后小编再罗嗦几句

Etcd备份的节点最好不是单一的,虽然是集群,但也怕哪天突然一个节点没有备份,等日后要恢复的时候就傻眼了; 所有最好是 备份 >= 2 个节点, 且 要给备份的文件 做实时的监控 [别平时在备份,等到需要的时候发现备份的都是错误的!!!]

最近发表
标签列表