OpenShift Workstation with Single GPU passthrough#

Feb 27, 2023

25 min read

Introduction#

This article describes how to run OpenShift as a workstation with GPU PCI passthrough and Container Native Virtualization (CNV) to provide a virtualized desktop experience on a single OpenShift node. This is useful to provide a virtual desktop experience with a single GPU, and is used to run Microsoft FlightSimulator in a Windows VM with performances close from a Bare metal Windows installation.

Hardware description#

The workstation used for this demo has the following hardware:

  • AMD Ryzen 9 3950X 16-Core 32-Threads

  • 64GB DDR4 3200MHz

  • Nvidia RTX 3080 FE 10GB

  • 2x 2TB NVMe Disks (guests)

  • 1x 500GB SSD Disk (root system)

  • 10Gbase-CX4 Mellanox Ethernet

Backup of existing system partitions#

To avoid boot order conflicts, the OpenShift assisted installer will format the first 512 bytes of any disks that contain a bootable partition. Therefore, it is important to backup and remove any existing partition table that you would like to preserve.

Installing OpenShift SNO#

Once any existing file system is backed up and there is no more bootable partitions we can proceed with the OpenShift Single Node install.

It is important to note that CoreOS, the underlying operating system requires an entire disk for installation.

Here, we will keep the two NVMe disks for the persistant volumes as LVM Physical volumes belonging to a same Volume Group and we will use the SSD disk for the OpenShift operating system.

 1#!/bin/bash
 2
 3OCP_VERSION=latest-4.10
 4
 5curl -k https://mirror.openshift.com/pub/openshift-v4/clients/ocp/latest/openshift-client-linux.tar.gz > oc.tar.gz
 6tar zxf oc.tar.gz
 7chmod +x oc && mv oc ~/.local/bin/
 8
 9curl -k https://mirror.openshift.com/pub/openshift-v4/clients/ocp/$OCP_VERSION/openshift-install-linux.tar.gz > openshift-install-linux.tar.gz
10tar zxvf openshift-install-linux.tar.gz
11chmod +x openshift-install && mv openshift-install ~/.local/bin/
12
13curl $(openshift-install coreos print-stream-json | grep location | grep x86_64 | grep iso | cut -d\" -f4) > rhcos-live.x86_64.iso
install-config.yaml#
 1# This file contains the configuration for an OpenShift cluster installation.
 2
 3apiVersion: v1
 4
 5# The base domain for the cluster.
 6baseDomain: epheo.eu
 7
 8# Configuration for the compute nodes.
 9compute:
10- name: worker
11  replicas: 0 
12
13# Configuration for the control plane nodes.
14controlPlane:
15  name: master
16  replicas: 1 
17
18# Metadata for the cluster.
19metadata:
20  name: da2
21
22# Networking configuration for the cluster.
23networking:
24  networkType: OVNKubernetes
25  clusterNetwork:
26  - cidr: 10.128.0.0/14
27    hostPrefix: 23
28  serviceNetwork:
29  - 172.30.0.0/16
30
31# Platform configuration for the cluster.
32platform:
33  none: {}
34
35# Configuration for bootstrapping the cluster.
36bootstrapInPlace:
37  installationDisk: /dev/sda
38
39# Pull secret for accessing the OpenShift registry.
40pullSecret: '{"auths":{"cloud.openshift.com":{"auth":"XXXXXXXX"}}}' 
41
42# SSH key for accessing the cluster nodes.
43sshKey: |
44  ssh-rsa AAAAB3XXXXXXXXXXXXXXXXXXXXXXXXX
Generate OpenShift Container Platform assets#
mkdir ocp && cp install-config.yaml ocp
openshift-install --dir=ocp create single-node-ignition-config
Embed the ignition data into the RHCOS ISO:#
alias coreos-installer='podman run --privileged --rm \
      -v /dev:/dev -v /run/udev:/run/udev -v $PWD:/data \
      -w /data quay.io/coreos/coreos-installer:release'
cp ocp/bootstrap-in-place-for-live-iso.ign iso.ign
coreos-installer iso ignition embed -fi iso.ign rhcos-live.x86_64.iso
dd if=discovery_image_sno.iso of=/dev/usbkey status=progress

Once the ISO is copied to the USB drive, you can use the USB drive to boot your workstation node and install OpenShift Container Platform.

Install CNV Operator#

Activate Intel VT or AMD-V hardware virtualization extensions in BIOS or UEFI.

cnv-resources.yaml#
 1# This YAML file contains Kubernetes resources for installing the KubeVirt Hyperconverged Operator (HCO) on the OpenShift Container Platform.
 2# It creates a namespace named "openshift-cnv", an operator group named "kubevirt-hyperconverged-group" in the "openshift-cnv" namespace, and a subscription named "hco-operatorhub" in the "openshift-cnv" namespace.
 3# The subscription specifies the source, source namespace, name, starting CSV, and channel for the KubeVirt Hyperconverged Operator.
 4
 5apiVersion: v1
 6kind: Namespace
 7metadata:
 8  name: openshift-cnv
 9---
10apiVersion: operators.coreos.com/v1
11kind: OperatorGroup
12metadata:
13  name: kubevirt-hyperconverged-group
14  namespace: openshift-cnv
15spec:
16  targetNamespaces:
17    - openshift-cnv
18---
19apiVersion: operators.coreos.com/v1alpha1
20kind: Subscription
21metadata:
22  name: hco-operatorhub
23  namespace: openshift-cnv
24spec:
25  source: redhat-operators
26  sourceNamespace: openshift-marketplace
27  name: kubevirt-hyperconverged
28  startingCSV: kubevirt-hyperconverged-operator.v4.10.0
29  channel: "stable"
oc apply -f cnv-resources.yaml
Installing the Virtctl client on your desktop.#
subscription-manager repos --enable cnv-4.10-for-rhel-8-x86_64-rpms
dnf install kubevirt-virtctl

Remove Local Storage operator (if installed)#

As we do not need to manage LVM volumes automatically we would like to avoid automatically formating Logical Volumes once they are deleted from OpenShift.

While this could lead to data leak in a multi-tenant environment, removing the Local Storage Operator also avoid loosing your Virtual Machine partitions once you delete it.

Configure OpenShift for single GPU passthrough#

As our GPU is the only one attached to the node a few additional steps are required.

We will use MachineConfig to configure our node accordingly.

All MachineConfig are applied on the master machineset because we have a single node OpenShift. With a multi nodes cluster those would be applied to workers instead.

Passing kernel arguments at boot time#

Multiple Kernel arguments have to be passed at boot time in order to configure our node for GPU passthrough. This can be done using the MachineConfigOperator.

  • amd_iommu=on: Enables IOMMU (Input/Output Memory Management Unit) support for AMD platforms, allowing for direct memory access (DMA) by PCI devices without going through the CPU. This improves performance and reduces overhead.

  • vga=off: Disables VGA (Video Graphics Array) console output during boot time.

  • rdblaclist=nouveau: Enables the blacklisting of the Nouveau open-source NVIDIA driver.

  • video=efifb:off: Disables the EFI (Extensible Firmware Interface) framebuffer console output during boot time.

Setting Kernel Arguments at boot time.#
 1variant: openshift
 2version: 4.10.0
 3metadata:
 4  name: 100-vfio
 5  labels:
 6    machineconfiguration.openshift.io/role: master
 7openshift:
 8  kernel_arguments:
 9    - amd_iommu=on
10    - vga=off
11    - rdblaclist=nouveau
12    - 'video=efifb:off'
1cd articles/openshift-workstation/machineconfig/build
2butane -d . vfio-prepare.bu -o ../vfio-prepare.yaml
3oc patch MachineConfig 100-vfio --type=merge -p ../vfio-prepare.yaml

Note

If you’re using an Intel CPU you’ll have to set intel_iommu=on instead.

Installing and configuring the NVidia GPU Operator#

Install the GPU Operator using OLM / OpenShift Marketplace.

When deploying the operator’s ClusterPolicy we have to set sandboxWorkloads.enabled to true to enable the sandbox-device-plugin and vfio-manager.

sandboxWorkloadsEnabled.yaml#
1kind: ClusterPolicy
2metadata:
3  name: gpu-cluster-policy
4spec:
5  sandboxWorkloads:
6    defaultWorkload: container
7    enabled: true
oc patch ClusterPolicy gpu-cluster-policy --type=merge -p sandboxWorkloadsEnabled.yaml

As the Nvidia GPU Operator does not supports consumer grade GPUs it does not take the audio device into consideration and therefore doesn’t bind it to vfiopci driver. This has to be done manually but can be achieved once at boot time using the following machine config.

vfio-prepare.bu#
 1variant: openshift
 2version: 4.10.0
 3metadata:
 4  name: 100-vfio
 5  labels:
 6    machineconfiguration.openshift.io/role: master
 7storage:
 8  files:
 9  - path: /usr/local/bin/vfio-prepare
10    mode: 0755
11    overwrite: true
12    contents:
13      local: ./vfio-prepare.sh
14  - path: /etc/modules-load.d/vfio-pci.conf
15    mode: 0644
16    overwrite: true
17    contents:
18      inline: vfio-pci
19systemd:
20  units:
21    - name: vfioprepare.service
22      enabled: true
23      contents: |
24       [Unit]
25       Description=Prepare vfio devices
26       After=ignition-firstboot-complete.service
27       Before=kubelet.service crio.service
28
29       [Service]
30       Type=oneshot
31       ExecStart=/usr/local/bin/vfio-prepare
32
33       [Install]
34       WantedBy=kubelet.service
vfio-prepare.sh#
 1#!/bin/bash
 2
 3vfio_attach () {
 4  if [ -f "${path}/driver/unbind" ]; then
 5    echo $address > ${path}/driver/unbind
 6  fi
 7  echo vfio-pci > ${path}/driver_override
 8  echo $address > /sys/bus/pci/drivers/vfio-pci/bind || \
 9  echo $name > /sys/bus/pci/drivers/vfio-pci/new_id ||true
10}
11
12# 0a:00.1 Audio device [0403]: NVIDIA Corporation GA102 High Definition Audio Controller [10de:1aef] (rev a1)
13address=0000:0a:00.1
14path=/sys/bus/pci/devices/0000\:0a\:00.1
15name="10de 1467"
16vfio_attach
1cd articles/openshift-workstation/machineconfig/build
2butane -d . vfio-prepare.bu -o ../vfio-prepare.yaml
3oc patch MachineConfig 100-vfio --type=merge -p ../vfio-prepare.yaml

Changing the driver binded to the GPU#

  • This workstation only have a single GPU.

  • I’d like to use it for both Virtual Machines and AI/ML workload.

  • Containers requires the GPU device to bind to the Nvidia driver.

  • Virtual machines requires the GPU device to bind to the VFIO-PCI driver.

  • I’d like an efficient way to bind / unbind the GPU to a driver without reboot.

We can label the node in order to configure it with the GPU bound to Nvidia kernel driver in order to satisky container workloads.

oc label node da2 --overwrite nvidia.com/gpu.workload.config=container

Or to bind the GPU to the vfio-pci driver to satisfy Virtual Machines workloads with PCI passthrough.

oc label node da2 --overwrite nvidia.com/gpu.workload.config=vm-passthrough

The whole operation takes a few minutes.

Add GPU as Hardware Device of your node#

We indentify the Vendor and Product ID of the GPU

lspci -nnk |grep VGA

We indentify the device name provided by the gpu-feature-discovery.

oc get nodes da2 -ojson |jq .status.capacity |grep nvidia
 1kind: HyperConverged
 2metadata:
 3  name: kubevirt-hyperconverged
 4  namespace: openshift-cnv
 5spec:
 6  permittedHostDevices:
 7    pciHostDevices:
 8    - externalResourceProvider: true
 9      pciDeviceSelector: 10DE:2206
10      resourceName: nvidia.com/GA102_GEFORCE_RTX_3080
oc patch hyperconverged kubevirt-hyperconverged -n openshift-cnv  --type=merge -d hyperconverged.yaml

The pciDeviceSelector field specifies the vendor ID and device ID of the PCI device, while the resourceName field specifies the name of the resource that will be created in Kubernetes/OpenShift.

Passthrough the USB Host Controllers to the VM#

In order to directly connect a mouse, keyboard, audio device etc directly to the VM we passthrough one if the USB controller directly to the VM.

Identify a USB Controller and its IOMMU group#

https://docs.openshift.com/container-platform/4.8/virt/virtual_machines/advanced_vm_management/virt-configuring-pci-passthrough.html

We first need to indentify it using pciutils.

lspci -nnk

After selecting the USB Controller we want to dedicate to the Virtual Machine we should verify that this is the only PCI device in its IOMMU group. We first look for the PCI address in the iommu_groups folder structure, the list the PCI addresses belonging to this IOMMU group.

find /sys/kernel/iommu_groups/ -iname "*0b:00.3*"
ls /sys/kernel/iommu_groups/27/devices/

Add the USB Controller as Hardware Device of your node#

Once identified we add its Vendor and product IDs to the list of permitted Host Devices.

Currently, Kubevirt does not allow providing a specific PCI address, therefore the pciDeviceSelector will match all similar USB Host Controller from the node. However, as we will only bind the one we are interested in to the VFIO-PCI driver the other ones will not be available for pci passthrough.

 1kind: HyperConverged
 2metadata:
 3  name: kubevirt-hyperconverged
 4  namespace: openshift-cnv
 5spec:
 6  permittedHostDevices:
 7    pciHostDevices:
 8      - pciDeviceSelector: 1022:149C
 9        resourceName: devices.kubevirt.io/USB3_Controller
10      - pciDeviceSelector: 8086:2723
11        resourceName: intel.com/WIFI_Controller
oc patch hyperconverged kubevirt-hyperconverged -n openshift-cnv  --type=merge -d hyperconverged.yaml

Binding the USB Controller to VFIO-PCI driver at boot time#

vfio-prepare.bu#
 1variant: openshift
 2version: 4.10.0
 3metadata:
 4  name: 100-vfio
 5  labels:
 6    machineconfiguration.openshift.io/role: master
 7storage:
 8  files:
 9  - path: /usr/local/bin/vfio-prepare
10    mode: 0755
11    overwrite: true
12    contents:
13      local: ./vfio-prepare.sh
14  - path: /etc/modules-load.d/vfio-pci.conf
15    mode: 0644
16    overwrite: true
17    contents:
18      inline: vfio-pci
19  - path: /etc/modprobe.d/vfio.conf
20    mode: 0644
21    overwrite: true
22    contents:
23      inline: |
24        options vfio-pci ids=8086:2723,1022:149c
25systemd:
26  units:
27    - name: vfioprepare.service
28      enabled: true
29      contents: |
30       [Unit]
31       Description=Prepare vfio devices
32       After=ignition-firstboot-complete.service
33       Before=kubelet.service crio.service
34
35       [Service]
36       Type=oneshot
37       ExecStart=/usr/local/bin/vfio-prepare
38
39       [Install]
40       WantedBy=kubelet.service
41openshift:
42  kernel_arguments:
43    - amd_iommu=on
44    - vga=off
45    - rdblaclist=nouveau
46    - 'video=efifb:off'

Create a bash script to unbind specific PCI devices and bind them to the VFIO-PCI driver.

vfio-prepare.sh#
 1#!/bin/bash
 2
 3vfio_attach () {
 4  if [ -f "${path}/driver/unbind" ]; then
 5    echo $address > ${path}/driver/unbind
 6  fi
 7  echo vfio-pci > ${path}/driver_override
 8  echo $address > /sys/bus/pci/drivers/vfio-pci/bind || \
 9  echo $name > /sys/bus/pci/drivers/vfio-pci/new_id ||true
10}
11
12# 0a:00.1 Audio device [0403]: NVIDIA Corporation GA102 High Definition Audio Controller [10de:1aef] (rev a1)
13address=0000:0a:00.1
14path=/sys/bus/pci/devices/0000\:0a\:00.1
15name="10de 1467"
16vfio_attach
17
18# Bind "useless" device to vfio-pci to satisfy IOMMU group
19address=0000:07:00.0
20path=/sys/bus/pci/devices/0000\:07\:00.0
21name="1043 87c0"
22vfio_attach
23
24# Unbind USB switch and handle via vfio-pci kernel driver
25address=0000:07:00.1
26path=/sys/bus/pci/devices/0000\:07\:00.1
27name="1043 87c0"
28vfio_attach
29
30# Unbind USB switch and handle via vfio-pci kernel driver
31address=0000:07:00.3
32path=/sys/bus/pci/devices/0000\:07\:00.3
33name="1022 149c"
34vfio_attach
35
36# Unbind USB switch and handle via vfio-pci kernel driver
37address=0000:0c:00.3
38path=/sys/bus/pci/devices/0000\:0c\:00.3
39name="1022 148c"
40vfio_attach
1cd articles/openshift-workstation/machineconfig/build
2butane -d . vfio-prepare.bu -o ../vfio-prepare.yaml
3oc patch MachineConfig 100-vfio --type=merge -p ../vfio-prepare.yaml

Creating a Virtual Machine#

The virtual machine will use existing LVM Logical volumes, here we will assume we already have the Operating System installed on the LV with a UEFI boot.

Create PV and PV Claim out of local LVM disks#

Binding PV and PVC by label https://docs.openshift.com/container-platform/3.3/install_config/storage_examples/binding_pv_by_label.html

fedora35.yaml#
 1---
 2kind: PersistentVolume
 3apiVersion: v1
 4metadata:
 5  name: fedora35
 6  labels:
 7    vol: fedora35
 8spec:
 9  capacity:
10    storage: 100Gi
11  local:
12    path: /dev/fedora_da2/fedora35
13  accessModes:
14    - ReadWriteOnce
15  persistentVolumeReclaimPolicy: Retain
16  storageClassName: local-storage
17  volumeMode: Block
18  nodeAffinity:
19    required:
20      nodeSelectorTerms:
21        - matchExpressions:
22            - key: kubernetes.io/hostname
23              operator: In
24              values:
25                - da2
26---
27kind: PersistentVolumeClaim
28apiVersion: v1
29metadata:
30  name: fedora35
31  labels:
32    vol: fedora35
33spec:
34  accessModes:
35  - ReadWriteOnce
36  volumeMode: Block
37  resources:
38    requests:
39      storage: 100Gi 
40  storageClassName: local-storage

Defining the Virtual Machine#

The virtual machines we will use as Desktops comes with a few specities:

fedora.yaml#
 1apiVersion: kubevirt.io/v1
 2kind: VirtualMachine
 3metadata:
 4  name: fedora
 5  namespace: epheo
 6spec:
 7  runStrategy: Halted
 8  template:
 9    metadata:
10      labels:
11        kubevirt.io/domain: fedora
12    spec:
13      architecture: amd64
14      domain:
15        cpu:
16          cores: 8
17          model: host-passthrough
18          sockets: 2
19          threads: 1
20        features:
21          acpi: {}
22          smm:
23            enabled: true 
24        firmware:
25          bootloader:
26            efi:
27              secureBoot: false # For Nvidia Driver...
28        devices:
29          disks:
30            - bootOrder: 1
31              disk:
32                bus: virtio
33              name: pvdisk
34            - disk:
35                bus: virtio
36              name: cloudinitdisk
37          autoattachGraphicsDevice: false
38          gpus:
39          - deviceName: nvidia.com/GA102_GEFORCE_RTX_3080
40            name: gpuvideo
41          hostDevices:
42          - deviceName: devices.kubevirt.io/USB3_Controller
43            name: usbcontroller
44          - deviceName: devices.kubevirt.io/USB3_Controller
45            name: usbcontroller2
46          - deviceName: intel.com/WIFI_Controller
47            name: wificontroller
48          interfaces:
49          - masquerade: {}
50            name: default
51          - bridge: {}
52            model: virtio
53            name: nic-0
54          networkInterfaceMultiqueue: true
55          rng: {}
56        machine:
57          type: q35
58        resources:
59          requests:
60            memory: 16G
61      hostname: fedora
62      networks:
63      - name: default
64        pod: {}
65      - multus:
66          networkName: br1
67        name: nic-0
68      terminationGracePeriodSeconds: 0
69      volumes:
70        - persistentVolumeClaim:
71            claimName: 'fedora35'
72          name: pvdisk
73        - cloudInitNoCloud:
74            userData: |-
75              #cloud-config
76              password: fedora
77              chpasswd: { expire: False }
78          name: cloudinitdisk
windows.yaml#
  1apiVersion: kubevirt.io/v1
  2kind: VirtualMachine
  3metadata:
  4  annotations:
  5    vm.kubevirt.io/os: windows10
  6    vm.kubevirt.io/workload: desktop
  7  name: windows
  8spec:
  9  runStrategy: Manual
 10  template:
 11    metadata:
 12      labels:
 13        kubevirt.io/domain: windows
 14    spec:
 15      architecture: amd64
 16      domain:
 17        clock:
 18          timer:
 19            hpet:
 20              present: false
 21            hyperv: {}
 22            pit:
 23              tickPolicy: delay
 24            rtc:
 25              tickPolicy: catchup
 26          utc: {}
 27        cpu:
 28          cores: 8
 29          dedicatedCpuPlacement: true
 30          sockets: 2
 31          threads: 1
 32        devices:
 33          autoattachGraphicsDevice: false
 34          disks:
 35          - cdrom:
 36              bus: sata
 37            name: windows-guest-tools
 38          - bootOrder: 1
 39            disk:
 40              bus: virtio
 41            name: pvdisk
 42          - disk:
 43              bus: virtio
 44            name: pvdisk1
 45          gpus:
 46          - deviceName: nvidia.com/GA102_GEFORCE_RTX_3080
 47            name: gpuvideo
 48          hostDevices:
 49          - deviceName: devices.kubevirt.io/USB3_Controller
 50            name: usbcontroller
 51          - deviceName: devices.kubevirt.io/USB3_Controller
 52            name: usbcontroller2
 53          - deviceName: intel.com/WIFI_Controller
 54            name: wificontroller
 55          interfaces:
 56          - bridge: {}
 57            model: virtio
 58            name: nic-0
 59          networkInterfaceMultiqueue: true
 60          rng: {}
 61          tpm: {}
 62        features:
 63          acpi: {}
 64          apic: {}
 65          hyperv:
 66            frequencies: {}
 67            ipi: {}
 68            reenlightenment: {}
 69            relaxed: {}
 70            reset: {}
 71            runtime: {}
 72            spinlocks:
 73              spinlocks: 8191
 74            synic: {}
 75            synictimer:
 76              direct: {}
 77            tlbflush: {}
 78            vapic: {}
 79            vpindex: {}
 80          smm: {}
 81        firmware:
 82          bootloader:
 83            efi:
 84              secureBoot: true
 85        machine:
 86          type: q35
 87        memory:
 88          hugepages:
 89            pageSize: 1Gi
 90        resources:
 91          requests:
 92            memory: 32Gi
 93      evictionStrategy: None
 94      hostname: windows
 95      networks:
 96      - multus:
 97          networkName: br1
 98        name: nic-0
 99      terminationGracePeriodSeconds: 3600
100      volumes:
101      - containerDisk:
102          image: registry.redhat.io/container-native-virtualization/virtio-win-rhel9@sha256:0c536c7aba76eb9c1e75a8f2dc2bbfa017e90314d55b242599ea41f42ba4434f
103        name: windows-guest-tools
104      - name: pvdisk
105        persistentVolumeClaim:
106          claimName: windows
107      - name: pvdisk1
108        persistentVolumeClaim:
109          claimName: windowsdata

Unused anymore, for reference only#

Binding GPU to VFIO Driver at boot time#

We first gather the PCI Vendor and product IDs from pciutils.

lspci -nn |grep VGA
100-sno-vfiopci.bu#
 1variant: openshift
 2version: 4.10.0
 3metadata:
 4  name: 100-sno-vfiopci
 5  labels:
 6    machineconfiguration.openshift.io/role: master
 7storage:
 8  files:
 9  - path: /etc/modprobe.d/vfio.conf
10    mode: 0644
11    overwrite: true
12    contents:
13      inline: |
14        options vfio-pci ids=10de:2206,10de:1aef
15  - path: /etc/modules-load.d/vfio-pci.conf 
16    mode: 0644
17    overwrite: true
18    contents:
19      inline: vfio-pci
dnf install butane
butane 100-sno-vfiopci.bu -o 100-sno-vfiopci.yaml
oc apply -f 100-sno-vfiopci.yaml
98-sno-xhci-unbind.yaml#
 1apiVersion: machineconfiguration.openshift.io/v1
 2kind: MachineConfig
 3metadata:
 4  labels:
 5    machineconfiguration.openshift.io/role: master
 6  name: 98-sno-xhci-unbind
 7spec:
 8  config:
 9    ignition:
10      version: 3.1.0
11    systemd:
12      units:
13      - contents: |
14         [Unit]
15         Description=Unbind USB Host Controller Driver
16         After=ignition-firstboot-complete.service
17         Before=kubelet.service crio.service
18
19         [Service]
20         Type=oneshot
21         ExecStart=/bin/bash -c "/bin/echo 0000:0b:00.3 > /sys/bus/pci/devices/0000\\:0b\\:00.3/driver/unbind"
22         ExecStart=/bin/bash -c "/bin/echo vfio-pci > /sys/bus/pci/devices/0000\\:0b\\:00.3/driver_override"
23         ExecStart=/bin/bash -c "/bin/echo 1043 87c0 > /sys/bus/pci/drivers/vfio-pci/new_id"
24
25         [Install]
26         WantedBy=kubelet.service
27        enabled: true
28        name: unbindusbcontroller.service

Unbinding VTConsole at boot time#

98-sno-vtconsole-unbind.yaml#
 1apiVersion: machineconfiguration.openshift.io/v1
 2kind: MachineConfig
 3metadata:
 4  labels:
 5    machineconfiguration.openshift.io/role: master
 6  name: 98-sno-vtconsole-unbind
 7spec:
 8  config:
 9    ignition:
10      version: 3.1.0
11    systemd:
12      units:
13      - contents: |
14         [Unit]
15         Description=Dettach GPU VT Console 
16         After=ignition-firstboot-complete.service
17         Before=kubelet.service crio.service
18
19         [Service]
20         Type=oneshot
21         ExecStart=/bin/bash -c "/bin/echo 0 > /sys/class/vtconsole/vtcon0/bind"
22
23         [Install]
24         WantedBy=kubelet.service
25        enabled: true
26        name: dettachvtconsole.service

What’s next#

This chapter is kept as a reference for furture possible improvements.

  • Reducing the Control Plane footprint by relaying on microshift instead.

  • Using GPU from containers instead of virtual machines for Linux Desktop.

Replace node prep by qemu hooks#

Enabling dedicated resources for virtual machines#

Using MicroShift and RHEL for Edge#

Troubleshooting#

{"component":"virt-launcher","kind":"","level":"error","msg":"Failed to start VirtualMachineInstance with flags 0.","name":"windows","namespace":"epheo","pos":"manager.go:1027","reason":"virError(Code=1, Domain=10, Message='internal error: qemu unexpectedly closed the monitor: 2024-02-08T13:10:06.726594Z qemu-kvm: -device {\"driver\":\"vfio-pci\",\"host\":\"0000:07:00.1\",\"id\":\"ua-hostdevice-usbcontroller\",\"bus\":\"pci.9\",\"addr\":\"0x0\"}: vfio 0000:07:00.1: group 19 is not viable\nPlease ensure all devices within the iommu_group are bound to their vfio bus driver.')","timestamp":"2024-02-08T13:10:07.353704Z","uid":"cc6fa39c-db31-4f2e-bba1-42dfc4b6efad"}
{"component":"virt-launcher","kind":"","level":"error","msg":"Failed to sync vmi","name":"windows","namespace":"epheo","pos":"server.go:202","reason":"virError(Code=1, Domain=10, Message='internal error: qemu unexpectedly closed the monitor: 2024-02-08T13:10:06.726594Z qemu-kvm: -device {\"driver\":\"vfio-pci\",\"host\":\"0000:07:00.1\",\"id\":\"ua-hostdevice-usbcontroller\",\"bus\":\"pci.9\",\"addr\":\"0x0\"}: vfio 0000:07:00.1: group 19 is not viable\nPlease ensure all devices within the iommu_group are bound to their vfio bus driver.')","timestamp":"2024-02-08T13:10:07.353770Z","uid":"cc6fa39c-db31-4f2e-bba1-42dfc4b6efad"}

[core@da2 ~]$ ls /sys/kernel/iommu_groups/19/devices/
0000:03:08.0  0000:07:00.0  0000:07:00.1  0000:07:00.3

[core@da2 ~]$ lspci -nnks 07:00.0
07:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP [1022:1485]
    Subsystem: ASUSTeK Computer Inc. Device [1043:87c0]