Longhorn_create_block_flow
31 May 2023 - joy717
csi总体流程:
-> createVolume (结束后,pvc变成Bound状态)
-> ControllerPublishVolume (先创建volumeAttachment,再call。 结束后,Node .status.VolumesAttached会显示对应的VolumeAttachement)
-> NodeStageVolume (将卷格式化,并mount到staging目录。之所以多这步,是因为k8s允许一个卷被mount到多个pod里面。staging目录是一个节点上的全局目录,先将卷mount到此目录,再通过NodePublishVolume mount到pod的目录里。)
(但是longhorn这边,针对block,stage的时候,几乎没做事情,mount的逻辑都放在nodePublishVolume里面)
-> NodePublishVolume(将staging目录挂载到pod 目录里面)
版本: longhorn v1.2.2
csi createVolume:
csi-plugin createVolume -> longhornManager daemon的http api的volumeCreate(longhorn-manager router.go)
longhornManager daemon 各种controller在listAndWatch:
volume_controller sync
-> v.ownerID = handyops-1
-> v.status.CurrentImage = v.spec.engineImage
-> v.status.CurrentNodeID = v.spec.nodeID (由于v.spec.nodeID 为"",因此这边依然为空)
-> 创建engine的crd (desireState stoped, e.spec.engineImage = v.status.currentImage)
-> 创建replicas的crd(desireState stoped, r.spec.engineImage = v.status.currentImage, r.spec.active = true)
-> 调度replicas到对应的disk (r.spec.nodeID = disk.NodeID, r.spec.DataDirectoryName赋值, r.spec.DiskID/DiskPath 赋值)
-> v.status.condition.schduled = true
-> v.status.state=creating
-> v.Status.Robustness=unknown
-------> volume_controller:1299 (等待engine stop)
engine_controller sync
-> e.status.ownerID = handyops-1
-> e.status.started = false
-> e.status.currentState = stopped, e.status.currentImage = "", e.status.ip = "", e.status.port = 0
--------> engine_controller, instance_handler:206 (等待e.spec.nodeID 不为"")
replicas_controller sync
-> r.status.ownerID = handyops-1
-> r.status.started = false
-> r.status.currentState = stopped, r.status.currentImage = "", r.status.ip = "", r.status.port = 0
volume_controller sync
-> v.status.state = detaching (engines, replicas 如果status.currentState不是stop状态,则返回,保留此状态,等待下次sync)
-> v.status.state = detached
总结:此时v.status.state = detached, e.status.currentState = stopped, r.status.currentState = stopped, v.spec.nodeID = "", v.status.currentNodeId = "", e.spec.nodeID = "", r.spec.nodeID = "handyops-1"
controllerPublishVolume:
manager/volume.go Attach
v.spec.nodeID = handyops-1, v.spec.DisableFrontend 赋值, v.spec.LastAttachedBy赋值(实际为"",因为csi那边没有传)
volume_controller sync:
-> v.Status.CurrentNodeID = v.Spec.NodeID
-> v.Status.State = attaching
-> e.Spec.UpgradedReplicaAddressMap 初始化空map
-> r.Spec.DesireState = running
-------> volume_controller:1393 (等待replica running状态)
replica_controller sync:
-> createInstance (创建replica的process)
下一次sync(等待process状态, 正常这次应该是starting或者running状态,取决于此次sync,process是否已经running,并且被instance-manager同步)
-> r.status.InstanceManagerName = im.name(instanceManagerName)
-> r.status.CurrentState = starting, r.status.CurrentImage = "", r.status.IP = "", r.status.Port = 0
下一次sync(process running状态)
-> r.status.started = true
-> r.status.CurrentState = running, r.status.CurrentImage = r.spec.EngineImage, r.status.IP = im.status.IP, r.status.Port = int(instance.Status.PortStart)
volume_controller sync:
-> e.Spec.NodeID = v.Status.CurrentNodeID, e.Spec.ReplicaAddressMap = replicaAddressMap(replicas的地址map), e.Spec.DesireState = types.InstanceStateRunning, e.Spec.DisableFrontend = v.Status.FrontendDisabled, e.Spec.Frontend = v.Spec.Frontend
-------->volume_controller:1434 (等待engine running)
engine_controller sync:
e.Status.CurrentReplicaAddressMap = e.Spec.ReplicaAddressMap
-> createInstance (创建engine的process)
下一次sync(等待process状态, 正常这次应该是starting或者running状态,取决于此次sync,process是否已经running,并且被instance-manager同步)
-> e.status.InstanceManagerName = im.name(instanceManagerName)
-> e.status.CurrentState = starting, e.status.CurrentImage = "", e.status.IP = "", e.status.Port = 0
下一次sync(process running状态)
-> e.status.started = true
-> e.status.CurrentState = running, e.status.CurrentImage = e.spec.EngineImage, e.status.IP = im.status.IP, e.status.Port = int(instance.Status.PortStart)
-> (start engine monitor, 同步backup/restore/snapshot/clone/expand相关信息) ec.engineMonitorMap[e.name] = stopCh
-> e.Status.Endpoint = endpoint
-> e.Status.CurrentSize, e.Status.IsExpanding, e.Status.LastExpansionError, e.Status.LastExpansionFailedAt 赋值
-> e.Status.RebuildStatus = rebuildStatus(空{})
-> e.Status.BackupStatus = backupStatusList(空{})
-> e.Status.PurgeStatus = purgeStatus
-> e.Status.ReplicaModeMap = currentReplicaModeMap
volume_controller sync:
-> r.Spec.HealthyAt = vc.nowHandler(), r.Spec.RebuildRetryCount = 0
-> v.Status.Robustness = healthy
-> v.Status.State = attached
总结: 此时一切正常,v.Status.State = attached, e.status.CurrentState = running, r.status.CurrentState = running, 各种相关的nodeID也有值。
nodeStageVolume:
对于block设备,几乎不做事情.
nodePublishVolume:
将/dev/longhorn/pvc-xxxx 格式化并 mount到 /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevice/pvc-xxxx
其他
instance_manager sync 每一秒查询一次,instance-manager pod里面的processList,更新到status里面