Windows failover clustering in Red Hat OpenShift Virtualization using SCSI-3 persistent reservation

Set up clustered storage while running your Windows virtual machines (VMs) in a Red Hat OpenShift cluster using the Virtualization operator.

Before we can do any of the tasks required in Windows, there are several steps that must be taken in order to prepare the OpenShift cluster for Windows Failover Clustering. Follow the steps below to implement the needed configurations.

In order to get full benefit from taking this lesson, you need to:

  • Have a functioning Red Hat OpenShift Container Platform (RHOCP) cluster with back-end iSCSI LUNs presented to the worker nodes.

In this lesson, you will:

  • Enable the persistentReservation feature gate.
  • Create the Network Attached Definition.
  • Create the iSCSI PVCs.

Environment

Here are some details about the OpenShift environment setup for this Windows Failover Clustering learning path:

  • The environment used in this learning path consists of a bare-metal OpenShift cluster running Red Hat OpenShift Container Platform (RHOCP) version 4.14.1 with three control plane nodes and three worker nodes.
  • Red Hat OpenShift Data Foundation is used to provide storage using the worker node's local disks.
  • iSCSI LUNs are presented to the cluster's worker nodes and used by the VMs running Windows Server Failover Cluster (WSFC) for shared storage. We will be using SCSI-3 Persistent Reservations to help us complete our task since these are required.
  • The VMs for our example each run Windows Server 2019. Further details regarding the VMs are as follows:
    • One VM is used as both an Active Directory Domain Controller (AD) and also as a NAT gateway to route between networks. This is due to the VMs running WSFC services not being connected to the Default pod network and therefore they cannot download updates without the gateway.
    • Two VMs are used for the WSFC functioning as a file server that does not connect to the Default pod network. These VMs have two PVCs shared between them that are backed by iSCSI LUNs which are in turn backed by a NetApp appliance. These VMs will act as a clustered file share for the VM running the frontend application server.
    • All the VMs connect to a NetworkAttachmentDefinition (NAD) that does not route outside the cluster. The AD VM is also connected to the Default pod network.

Enable the persistentReservation feature gate

SCSI-3 persistent reservation pass-through requires version 4.14.1 of the OpenShift Virtualization Operator. It also requires the persistentReservation feature gate to be enabled in the HyperConverged deployment. We can do this by running the oc edit command and changing the persistentReservation to true:

oc edit -n openshift-cnv hco
apiVersion: hco.kubevirt.io/v1beta1
kind: HyperConverged
metadata:
  name: kubevirt-hyperconverged
  namespace: openshift-cnv
spec:
  featureGates:
    persistentReservation: true

Create a Network Attached Definition (NAD)

Let’s start with creating a Network Attached Definition (NAD) that defines a layer 2 network overlay for the network that is used for the WSFC VMs to communicate to other VMs. This network is used since all VMs on the Default pod network get the same IP address of 10.0.2.2. This causes WSFC validation to fail since it checks if any interface on the servers in the WSFC have the same IP address. Setting a different address statically does not fix the issue, because traffic from an address not provided by Dynamic Host Configuration Protocol (DHCP) is blocked by default.

To avoid this, the VMs will not be attached to the default pod network and instead we will connect them to an internal cluster wide logical switch. This will allow the VMs to communicate within the Red Hat OpenShift Container Platform (RHOCP) cluster to each other, but not outside the cluster. Since the WSFC nodes are being used as a backend for another VM, they do not need to be exposed for incoming cluster connections.

The following YAML creates a NAD called l2-cluster-net when it is applied to the configured RHOCP cluster:

apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: l2-cluster-net
  namespace: winding
spec:
  config: |
    {
            "cniVersion": "0.3.1",
            "name": "l2-cluster-net",
            "type": "ovn-k8s-cni-overlay",
            "topology":"layer2",
            "netAttachDefName": "winding/l2-cluster-net"
    }

It is important to note the following:

  • The metadata.namespec.config.{name}, and spec.config.{networkAttachDefName} should be the same.
  • Make sure the correct namespace is specified since NADs are namespaced objects.
  • The namespace is specified in two fields: the .metadata.namespace and the .spec.config.{netAttachDefName}fields.

This NAD configuration does not specify a subnet, nor does it provide DHCP to any VM connected to it. This learning path will use an arbitrary, non-routable subnet of 192.168.33.0/24 for connections to this NAD. We are only using a few VMs and the WSFC VMs use statically set addresses to pass validation, so no DHCP service will be defined. If DHCP is desired, a VM that provides a DHCP service can be connected to this NAD.

Create iSCSI PVCs

We will need two iSCSI LUNs to be presented to each of the worker nodes in the OpenShift cluster. These will be used by the WSFC.

It is beyond the scope of this learning path to configure a device or server to act as an iSCSI Target that is SCSI-3 compliant. Please view the Red Hat Enterprise Linux (RHEL) documentation for information on how to configure a RHEL server to provide iSCSI targets or view the documentation of your current storage device for how to configure it to provide the iSCSI targets.

The current initiator name of the worker nodes can be viewed by connecting to each node and viewing the /etc/iscsi/initiatorname.iscsi file. The initiator can also be changed to something more specific if desired, but it should conform to rfc3721 of the IETF standards. Please see the OpenShift documentation for information on creating a ConfigMap to make changes to the initiator's name, if desired.

The following iscsiadm command can be issued on the worker nodes to view the paths to the iSCSI storage. The command requires the IP address of the iSCSI Target to query:

iscsiadm --mode discovery --op update --type sendtargets --portal 172.31.131.45
172.31.131.45:3260,1042 
iqn.1992-08.com.netapp:sn.1bef662c782411eebba7d039ea98c4c8:vs.26
172.31.130.45:3260,1043 
iqn.1992-08.com.netapp:sn.1bef662c782411eebba7d039ea98c4c8:vs.26

The above output shows two paths to the iSCSI target and its LUNS. This is done for redundancy and you should make note of this information as it is used when creating the PV.

The following YAML file creates two PVs that are backed by iSCIS LUNs. The first PV uses a 50Gi LUN. This PV will be used to store data. The second PV uses a 30Gi LUN, for this example we saved this file as pv-iscsi-share.yaml. This PV will be used for WSFC Quorum:

---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-data-share
spec:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 50Gi
  iscsi:
    iqn: iqn.1992-08.com.netapp:sn.1bef662c782411eebba7d039ea98c4c8:vs.26
    iscsiInterface: default
    lun: 0
    targetPortal: 172.31.130.45:3260
    portals: ['172.31.131.45:3260']
  persistentVolumeReclaimPolicy: Retain
  volumeMode: Block
  storageClassName: local-scsi
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-cluster-share
spec:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 30Gi
  iscsi:
    iqn: iqn.1992-08.com.netapp:sn.1bef662c782411eebba7d039ea98c4c8:vs.26
    iscsiInterface: default
    lun: 1
    targetPortal: 172.31.130.45:3260
    portals: ['172.31.131.45:3260']
  persistentVolumeReclaimPolicy: Retain
  volumeMode: Block
  storageClassName: local-scsi

There are a few items to note in the above YAML file:

  • .metadata.name  
    • This is the name to give to the PV when it is created.
  • .spec.accessModes  
    • This should be ReadWriteMany so multiple VMs can access the PV.
  • .spec.capacity  
    • This is a description of the size of the created PV.
  • .spec.iscsi.iqn 
    • This is the iqn of the iSCSI Target. It is provided by the iscsiadm command.
  • .spec.iscsi.lun  
    • This is the iSCSI Target’s LUN number. The iSCSI Target device can provide this information.
  • .spec.iscsi.targetPortal  
    • This is the IP address and port of the iSCSI Target’s portal provided by the iscsiadm command.
  • .spec.iscsi.portals  
    • This is not needed unless multipathing is desired. This is a list of other iSCSI Target portals.
  • .spec.storageClassName  
    • A storageClass is specified, but it does not exist. It is specified so the PV does not assume an existing storageClass.

Apply the YAML to the cluster to create the PV:

oc create -f pv-iscsi-share.yaml 
persistentvolume/pv-data-share created
persistentvolume/pv-cluster-share created
oc get pv
NAME                CAPACITY        ACCESS MODES               RECLAIM POLICY       STATUS      CLAIM    STORAGECLASS 
pv-cluster-share    30Gi            RWX                        Retain               Available            local-scsi        
pv-data-share       50Gi            RWX                        Retain               Available            local-scsi         

[... Output Truncated ...]

We can see the PVs are ready to use as indicated by the STATUS column. Now we can create two PVCs that use the new PVs with the following YAML: 

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-data-share
  namespace: winding
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 50Gi
  volumeMode: Block
  volumeName: pv-data-share
  storageClassName: local-scsi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-cluster-share
  namespace: winding
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 30Gi
  volumeMode: Block
  volumeName: pv-cluster-share
  storageClassName: local-scsi

The .spec.volumeName field specifies the PV that this PVC should bind against. This is not required, but it helps ensure the PVC binds to the correct PV. Also note the .spec.storageClassName is the same as the one used in the PVs definition. Again, this storageClass does not exist on the OpenShift cluster:

oc create -f pvc-iscsi-share.yaml 
persistentvolumeclaim/pvc-data-share created
persistentvolumeclaim/pvc-cluster-share created
oc get pvc
NAME                    STATUS     VOLUME              CAPACITY         ACCESS MODES              STORAGECLASS                AGE
pvc-cluster-share       Bound      pv-cluster-share    30Gi             RWX                       local-scsi                  3s
pvc-data-share          Bound      pv-data-share       50Gi             RWX                       local-scsi                  3s

We can see the PVCs are created and Bound to the PVs. We can now use them with our VMs, so let's configure the virtual machines.

Previous resource
Overview: Windows failover clustering in Red Hat OpenShift Virtualization using SCSI-3 persistent reservations
Next resource
Configure the virtual machines