Setting up a virtual workstation in OpenShift with VFIO passthrough

Feb 27, 2023

25 min read

This article provides a detailed guide on how to configure OpenShift as a workstation with GPU PCI passthrough and Container Native Virtualization (CNV) on a single OpenShift node (SNO).

This setup allows you to leverage Kubernetes orchestration capabilities while still enjoying near-native performance for GPU-intensive applications.

Why this approach?

  • Run both containerized workloads and virtual machines on the same hardware

  • Use a single GPU for both Kubernetes pods and virtual machines by switching the driver binding

  • Achieve near-native performance for gaming and professional applications in VMs

  • Maintain the flexibility and power of Kubernetes/OpenShift for other workloads

In testing, this configuration successfully ran Microsoft Flight Simulator in a Windows VM with performance smiliar to a bare metal Windows installation.

The workstation used for this demo has the following hardware:

Component

Specification

CPU

AMD Ryzen 9 3950X 16-Core 32-Threads

Memory

64GB DDR4 3200MHz

GPU

Nvidia RTX 3080 FE 10GB

Storage

2x 2TB NVMe Disks (for virtual machine storage)
1x 500GB SSD Disk (for OpenShift root system)

Network

10Gbase-CX4 Mellanox Ethernet

Similar configurations with equivalent Intel CPUs should work with minor adjustments noted throughout the guide.

Installing OpenShift SNO

Before proceeding with the installation, ensure you’ve completed the backup steps for any existing partitions.

Backup of existing system partitions

To avoid boot order conflicts, the OpenShift assisted installer will format the first 512 bytes of any disks that contain a bootable partition. Therefore, it is important to backup and remove any existing partition table that you would like to preserve.

OpenShift Installation

Once any existing file system is backed up and there are no more bootable partitions, we can proceed with the OpenShift Single Node installation.

It is important to note that CoreOS, the underlying operating system, requires an entire disk for installation. For this workstation setup:

  1. We’ll use the 500GB SSD disk for the OpenShift operating system

  2. The two 2TB NVMe disks will be reserved for persistent volumes as LVM Physical volumes belonging to the same Volume Group

  3. This configuration allows for flexible VM storage management while keeping the system installation separate

 1#!/bin/bash
 2
 3OCP_VERSION=latest-4.10
 4
 5curl -k https://mirror.openshift.com/pub/openshift-v4/clients/ocp/latest/openshift-client-linux.tar.gz > oc.tar.gz
 6tar zxf oc.tar.gz
 7chmod +x oc && mv oc ~/.local/bin/
 8
 9curl -k https://mirror.openshift.com/pub/openshift-v4/clients/ocp/$OCP_VERSION/openshift-install-linux.tar.gz > openshift-install-linux.tar.gz
10tar zxvf openshift-install-linux.tar.gz
11chmod +x openshift-install && mv openshift-install ~/.local/bin/
12
13curl $(openshift-install coreos print-stream-json | grep location | grep x86_64 | grep iso | cut -d\" -f4) > rhcos-live.x86_64.iso
install-config.yaml
 1# This file contains the configuration for an OpenShift cluster installation.
 2
 3apiVersion: v1
 4
 5# The base domain for the cluster.
 6baseDomain: epheo.eu
 7
 8# Configuration for the compute nodes.
 9compute:
10- name: worker
11  replicas: 0 
12
13# Configuration for the control plane nodes.
14controlPlane:
15  name: master
16  replicas: 1 
17
18# Metadata for the cluster.
19metadata:
20  name: da2
21
22# Networking configuration for the cluster.
23networking:
24  networkType: OVNKubernetes
25  clusterNetwork:
26  - cidr: 10.128.0.0/14
27    hostPrefix: 23
28  serviceNetwork:
29  - 172.30.0.0/16
30
31# Platform configuration for the cluster.
32platform:
33  none: {}
34
35# Configuration for bootstrapping the cluster.
36bootstrapInPlace:
37  installationDisk: /dev/sda
38
39# Pull secret for accessing the OpenShift registry.
40pullSecret: '{"auths":{"cloud.openshift.com":{"auth":"XXXXXXXX"}}}' 
41
42# SSH key for accessing the cluster nodes.
43sshKey: |
44  ssh-rsa AAAAB3XXXXXXXXXXXXXXXXXXXXXXXXX
Generate OpenShift Container Platform assets
mkdir ocp && cp install-config.yaml ocp
openshift-install --dir=ocp create single-node-ignition-config
Embed the ignition data into the RHCOS ISO:
alias coreos-installer='podman run --privileged --rm \
      -v /dev:/dev -v /run/udev:/run/udev -v $PWD:/data \
      -w /data quay.io/coreos/coreos-installer:release'
cp ocp/bootstrap-in-place-for-live-iso.ign iso.ign
coreos-installer iso ignition embed -fi iso.ign rhcos-live.x86_64.iso
dd if=discovery_image_sno.iso of=/dev/usbkey status=progress

Once the ISO is copied to the USB drive, you can use the USB drive to boot your workstation node and install OpenShift Container Platform.

Install CNV Operator

Activate Intel VT or AMD-V hardware virtualization extensions in BIOS or UEFI.

cnv-resources.yaml
 1# This YAML file contains Kubernetes resources for installing the KubeVirt Hyperconverged Operator (HCO) on the OpenShift Container Platform.
 2# It creates a namespace named "openshift-cnv", an operator group named "kubevirt-hyperconverged-group" in the "openshift-cnv" namespace, and a subscription named "hco-operatorhub" in the "openshift-cnv" namespace.
 3# The subscription specifies the source, source namespace, name, starting CSV, and channel for the KubeVirt Hyperconverged Operator.
 4
 5apiVersion: v1
 6kind: Namespace
 7metadata:
 8  name: openshift-cnv
 9---
10apiVersion: operators.coreos.com/v1
11kind: OperatorGroup
12metadata:
13  name: kubevirt-hyperconverged-group
14  namespace: openshift-cnv
15spec:
16  targetNamespaces:
17    - openshift-cnv
18---
19apiVersion: operators.coreos.com/v1alpha1
20kind: Subscription
21metadata:
22  name: hco-operatorhub
23  namespace: openshift-cnv
24spec:
25  source: redhat-operators
26  sourceNamespace: openshift-marketplace
27  name: kubevirt-hyperconverged
28  startingCSV: kubevirt-hyperconverged-operator.v4.10.0
29  channel: "stable"
oc apply -f cnv-resources.yaml
Installing the Virtctl client on your desktop.
subscription-manager repos --enable cnv-4.10-for-rhel-8-x86_64-rpms
dnf install kubevirt-virtctl

Configure OpenShift for single GPU passthrough

As our GPU is the only one attached to the node a few additional steps are required.

We will use MachineConfig to configure our node accordingly.

All MachineConfig are applied on the master machineset because we have a single node OpenShift. With a multi nodes cluster those would be applied to workers instead.

Passing kernel arguments at boot time

Multiple Kernel arguments have to be passed at boot time in order to configure our node for GPU passthrough. This can be done using the MachineConfigOperator.

  • amd_iommu=on: Enables IOMMU (Input/Output Memory Management Unit) support for AMD platforms, allowing for direct memory access (DMA) by PCI devices without going through the CPU. This improves performance and reduces overhead.

  • vga=off: Disables VGA (Video Graphics Array) console output during boot time.

  • rdblaclist=nouveau: Enables the blacklisting of the Nouveau open-source NVIDIA driver.

  • video=efifb:off: Disables the EFI (Extensible Firmware Interface) framebuffer console output during boot time.

Setting Kernel Arguments at boot time.
 1variant: openshift
 2version: 4.10.0
 3metadata:
 4  name: 100-vfio
 5  labels:
 6    machineconfiguration.openshift.io/role: master
 7openshift:
 8  kernel_arguments:
 9    - amd_iommu=on
10    - vga=off
11    - rdblaclist=nouveau
12    - 'video=efifb:off'
1cd articles/openshift-workstation/machineconfig/build
2butane -d . vfio-prepare.bu -o ../vfio-prepare.yaml
3oc patch MachineConfig 100-vfio --type=merge -p ../vfio-prepare.yaml

Note

If you’re using an Intel CPU you’ll have to set intel_iommu=on instead.

Installing and configuring the NVIDIA GPU Operator

The NVIDIA GPU Operator automates the management of NVIDIA GPUs in Kubernetes environments.

Step 1: Install the GPU Operator

  1. Navigate to the OpenShift web console

  2. Go to OperatorsOperatorHub

  3. Search for “NVIDIA GPU Operator”

  4. Select the operator and click Install

  5. Keep the default installation settings and click Install again

Alternatively, you can install it through the CLI using the following commands:

oc create -f https://raw.githubusercontent.com/NVIDIA/gpu-operator/master/deployments/git/operator-namespace.yaml
oc create -f https://raw.githubusercontent.com/NVIDIA/gpu-operator/master/deployments/git/operator-source.yaml

Step 2: Configure the ClusterPolicy

When deploying the operator’s ClusterPolicy, we need to set sandboxWorkloads.enabled to true to enable the sandbox-device-plugin and vfio-manager components, which are essential for GPU passthrough.

sandboxWorkloadsEnabled.yaml
1kind: ClusterPolicy
2metadata:
3  name: gpu-cluster-policy
4spec:
5  sandboxWorkloads:
6    defaultWorkload: container
7    enabled: true
oc patch ClusterPolicy gpu-cluster-policy --type=merge -p sandboxWorkloadsEnabled.yaml

As the Nvidia GPU Operator does not officialy supports consumer grade GPUs it does not take the audio device into consideration and therefore doesn’t bind it to vfiopci driver. This has to be done manually but can be achieved once at boot time using the following machine config.

vfio-prepare.bu
 1variant: openshift
 2version: 4.10.0
 3metadata:
 4  name: 100-vfio
 5  labels:
 6    machineconfiguration.openshift.io/role: master
 7storage:
 8  files:
 9  - path: /usr/local/bin/vfio-prepare
10    mode: 0755
11    overwrite: true
12    contents:
13      local: ./vfio-prepare.sh
14  - path: /etc/modules-load.d/vfio-pci.conf
15    mode: 0644
16    overwrite: true
17    contents:
18      inline: vfio-pci
19systemd:
20  units:
21    - name: vfioprepare.service
22      enabled: true
23      contents: |
24       [Unit]
25       Description=Prepare vfio devices
26       After=ignition-firstboot-complete.service
27       Before=kubelet.service crio.service
28
29       [Service]
30       Type=oneshot
31       ExecStart=/usr/local/bin/vfio-prepare
32
33       [Install]
34       WantedBy=kubelet.service
vfio-prepare.sh
 1#!/bin/bash
 2
 3vfio_attach () {
 4  if [ -f "${path}/driver/unbind" ]; then
 5    echo $address > ${path}/driver/unbind
 6  fi
 7  echo vfio-pci > ${path}/driver_override
 8  echo $address > /sys/bus/pci/drivers/vfio-pci/bind || \
 9  echo $name > /sys/bus/pci/drivers/vfio-pci/new_id ||true
10}
11
12# 0a:00.1 Audio device [0403]: NVIDIA Corporation GA102 High Definition Audio Controller [10de:1aef] (rev a1)
13address=0000:0a:00.1
14path=/sys/bus/pci/devices/0000\:0a\:00.1
15name="10de 1467"
16vfio_attach
1cd articles/openshift-workstation/machineconfig/build
2butane -d . vfio-prepare.bu -o ../vfio-prepare.yaml
3oc patch MachineConfig 100-vfio --type=merge -p ../vfio-prepare.yaml

Dynamically Switching GPU Drivers

One of the key advantages of this setup is the ability to use a single GPU for both container workloads and virtual machines without rebooting the system.

Use Case Scenario

  • Our workstation has a single NVIDIA GPU

  • Container workloads (such as AI/ML applications) require the NVIDIA kernel driver

  • Virtual machines with GPU passthrough require the VFIO-PCI driver

  • We need to switch between these modes without system reboots

Driver Switching Using Node Labels

The NVIDIA GPU Operator with sandbox workloads enabled provides a convenient way to switch driver bindings using node labels:

For container workloads (NVIDIA driver):

# Replace 'da2' with your node name
oc label node da2 --overwrite nvidia.com/gpu.workload.config=container

For VM passthrough (VFIO-PCI driver):

# Replace 'da2' with your node name
oc label node da2 --overwrite nvidia.com/gpu.workload.config=vm-passthrough

Notes on Driver Switching

  • The driver switching process takes a few minutes to complete

  • You can verify the current driver binding with lspci -nnk | grep -A3 NVIDIA

  • All GPU workloads must be stopped before switching drivers

  • No system reboot isually required for the switch to take effect

  • This have prouved to be a bit unreliable and may require a reboot

Add GPU as Hardware Device of your node

We indentify the Vendor and Product ID of the GPU

lspci -nnk |grep VGA

We indentify the device name provided by the gpu-feature-discovery.

oc get nodes da2 -ojson |jq .status.capacity |grep nvidia
 1kind: HyperConverged
 2metadata:
 3  name: kubevirt-hyperconverged
 4  namespace: openshift-cnv
 5spec:
 6  permittedHostDevices:
 7    pciHostDevices:
 8    - externalResourceProvider: true
 9      pciDeviceSelector: 10DE:2206
10      resourceName: nvidia.com/GA102_GEFORCE_RTX_3080
oc patch hyperconverged kubevirt-hyperconverged -n openshift-cnv  --type=merge -d hyperconverged.yaml

The pciDeviceSelector field specifies the vendor ID and device ID of the PCI device, while the resourceName field specifies the name of the resource that will be created in Kubernetes/OpenShift.

Passthrough USB Host Controllers to the VM

For a complete desktop experience, you’ll want to connect input devices (mouse, keyboard) and audio devices directly to your virtual machine. Instead of passthrough individual USB devices, we’ll passthrough an entire USB controller to the VM for better performance and flexibility.

Step 1: Identify a Suitable USB Controller

First, we need to identify an appropriate USB controller that we can dedicate to the virtual machine:

  1. List all PCI devices on your system:

    lspci -nnk | grep -i usb
    

    Example output: ` 0b:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller [1022:149c] `

  2. Note the PCI address (e.g., 0b:00.3) and the device ID (1022:149c in the example).

  3. Verify the IOMMU group of the controller to ensure it can be safely passed through:

    find /sys/kernel/iommu_groups/ -iname "*0b:00.3*"
    # Shows which IOMMU group contains this device
    
    ls /sys/kernel/iommu_groups/27/devices/
    # Lists all devices in the same IOMMU group
    
  4. Important: For clean passthrough, the USB controller should ideally be alone in its IOMMU group. If other devices are in the same group, you’ll need to pass those through as well.

Add the USB Controller as Hardware Device of your node

Once identified we add its Vendor and product IDs to the list of permitted Host Devices.

Currently, Kubevirt does not allow providing a specific PCI address, therefore the pciDeviceSelector will match all similar USB Host Controller from the node. However, as we will only bind the one we are interested in to the VFIO-PCI driver the other ones will not be available for pci passthrough.

 1kind: HyperConverged
 2metadata:
 3  name: kubevirt-hyperconverged
 4  namespace: openshift-cnv
 5spec:
 6  permittedHostDevices:
 7    pciHostDevices:
 8      - pciDeviceSelector: 1022:149C
 9        resourceName: devices.kubevirt.io/USB3_Controller
10      - pciDeviceSelector: 8086:2723
11        resourceName: intel.com/WIFI_Controller
oc patch hyperconverged kubevirt-hyperconverged -n openshift-cnv  --type=merge -d hyperconverged.yaml

Binding the USB Controller to VFIO-PCI driver at boot time

vfio-prepare.bu
 1variant: openshift
 2version: 4.10.0
 3metadata:
 4  name: 100-vfio
 5  labels:
 6    machineconfiguration.openshift.io/role: master
 7storage:
 8  files:
 9  - path: /usr/local/bin/vfio-prepare
10    mode: 0755
11    overwrite: true
12    contents:
13      local: ./vfio-prepare.sh
14  - path: /etc/modules-load.d/vfio-pci.conf
15    mode: 0644
16    overwrite: true
17    contents:
18      inline: vfio-pci
19  - path: /etc/modprobe.d/vfio.conf
20    mode: 0644
21    overwrite: true
22    contents:
23      inline: |
24        options vfio-pci ids=8086:2723,1022:149c
25systemd:
26  units:
27    - name: vfioprepare.service
28      enabled: true
29      contents: |
30       [Unit]
31       Description=Prepare vfio devices
32       After=ignition-firstboot-complete.service
33       Before=kubelet.service crio.service
34
35       [Service]
36       Type=oneshot
37       ExecStart=/usr/local/bin/vfio-prepare
38
39       [Install]
40       WantedBy=kubelet.service
41openshift:
42  kernel_arguments:
43    - amd_iommu=on
44    - vga=off
45    - rdblaclist=nouveau
46    - 'video=efifb:off'

Create a bash script to unbind specific PCI devices and bind them to the VFIO-PCI driver.

vfio-prepare.sh
 1#!/bin/bash
 2
 3vfio_attach () {
 4  if [ -f "${path}/driver/unbind" ]; then
 5    echo $address > ${path}/driver/unbind
 6  fi
 7  echo vfio-pci > ${path}/driver_override
 8  echo $address > /sys/bus/pci/drivers/vfio-pci/bind || \
 9  echo $name > /sys/bus/pci/drivers/vfio-pci/new_id ||true
10}
11
12# 0a:00.1 Audio device [0403]: NVIDIA Corporation GA102 High Definition Audio Controller [10de:1aef] (rev a1)
13address=0000:0a:00.1
14path=/sys/bus/pci/devices/0000\:0a\:00.1
15name="10de 1467"
16vfio_attach
17
18# Bind "useless" device to vfio-pci to satisfy IOMMU group
19address=0000:07:00.0
20path=/sys/bus/pci/devices/0000\:07\:00.0
21name="1043 87c0"
22vfio_attach
23
24# Unbind USB switch and handle via vfio-pci kernel driver
25address=0000:07:00.1
26path=/sys/bus/pci/devices/0000\:07\:00.1
27name="1043 87c0"
28vfio_attach
29
30# Unbind USB switch and handle via vfio-pci kernel driver
31address=0000:07:00.3
32path=/sys/bus/pci/devices/0000\:07\:00.3
33name="1022 149c"
34vfio_attach
35
36# Unbind USB switch and handle via vfio-pci kernel driver
37address=0000:0c:00.3
38path=/sys/bus/pci/devices/0000\:0c\:00.3
39name="1022 148c"
40vfio_attach
1cd articles/openshift-workstation/machineconfig/build
2butane -d . vfio-prepare.bu -o ../vfio-prepare.yaml
3oc patch MachineConfig 100-vfio --type=merge -p ../vfio-prepare.yaml

Creating a Virtual Machine with GPU Passthrough

This section guides you through creating virtual machines that can utilize the GPU via PCI passthrough. We’ll use existing LVM Logical Volumes where the operating system is already installed with UEFI boot.

Step 1: Create Persistent Volumes from LVM Disks

First, we need to make our LVM volumes available to OpenShift by creating a Persistent Volume Claims (PVCs). This assume you have the Local Storage Operator installed and running.

  1. Create a YAML file for each VM disk. Here’s an example for a Fedora 35 VM:

fedora_pvc.yaml
 1---
 2kind: PersistentVolumeClaim
 3apiVersion: v1
 4metadata:
 5  name: fedora35
 6spec:
 7  accessModes:
 8  - ReadWriteOnce
 9  volumeMode: Block
10  resources:
11    requests:
12      storage: 100Gi 
13  storageClassName: lvms-vg1
  1. Apply the YAML to create the PV and PVC:

oc apply -f fedora35.yaml
  1. Verify the PV and PVC are created and bound:

oc get pv
oc get pvc -n <your-namespace>

Step 2: Defining the Virtual Machine with GPU Passthrough

When creating virtual machines for desktop use with GPU passthrough, several important configurations need to be applied:

Key Configuration Elements

  1. GPU Passthrough: Pass the entire physical GPU to the VM

  2. Disable Virtual VGA: Remove the default emulated VGA device since we’re using the physical GPU

  3. USB Controller Passthrough: Include the USB controller for connecting peripherals directly

  4. UEFI Boot: Use UEFI boot mode for compatibility with modern operating systems and GPU drivers

  5. CPU/Memory Configuration: Allocate appropriate resources based on workload requirements

fedora.yaml
 1apiVersion: kubevirt.io/v1
 2kind: VirtualMachine
 3metadata:
 4  name: fedora
 5  namespace: epheo
 6spec:
 7  runStrategy: Halted
 8  template:
 9    metadata:
10      labels:
11        kubevirt.io/domain: fedora
12    spec:
13      architecture: amd64
14      domain:
15        cpu:
16          cores: 8
17          model: host-passthrough
18          sockets: 2
19          threads: 1
20        features:
21          acpi: {}
22          smm:
23            enabled: true 
24        firmware:
25          bootloader:
26            efi:
27              secureBoot: false # For Nvidia Driver...
28        devices:
29          disks:
30            - bootOrder: 1
31              disk:
32                bus: virtio
33              name: pvdisk
34            - disk:
35                bus: virtio
36              name: cloudinitdisk
37          autoattachGraphicsDevice: false
38          gpus:
39          - deviceName: nvidia.com/GA102_GEFORCE_RTX_3080
40            name: gpuvideo
41          hostDevices:
42          - deviceName: devices.kubevirt.io/USB3_Controller
43            name: usbcontroller
44          - deviceName: devices.kubevirt.io/USB3_Controller
45            name: usbcontroller2
46          - deviceName: intel.com/WIFI_Controller
47            name: wificontroller
48          interfaces:
49          - masquerade: {}
50            name: default
51          - bridge: {}
52            model: virtio
53            name: nic-0
54          networkInterfaceMultiqueue: true
55          rng: {}
56        machine:
57          type: q35
58        resources:
59          requests:
60            memory: 16G
61      hostname: fedora
62      networks:
63      - name: default
64        pod: {}
65      - multus:
66          networkName: br1
67        name: nic-0
68      terminationGracePeriodSeconds: 0
69      volumes:
70        - persistentVolumeClaim:
71            claimName: 'fedora35'
72          name: pvdisk
73        - cloudInitNoCloud:
74            userData: |-
75              #cloud-config
76              password: fedora
77              chpasswd: { expire: False }
78          name: cloudinitdisk
windows.yaml
  1apiVersion: kubevirt.io/v1
  2kind: VirtualMachine
  3metadata:
  4  annotations:
  5    vm.kubevirt.io/os: windows10
  6    vm.kubevirt.io/workload: desktop
  7  name: windows
  8spec:
  9  runStrategy: Manual
 10  template:
 11    metadata:
 12      labels:
 13        kubevirt.io/domain: windows
 14    spec:
 15      architecture: amd64
 16      domain:
 17        clock:
 18          timer:
 19            hpet:
 20              present: false
 21            hyperv: {}
 22            pit:
 23              tickPolicy: delay
 24            rtc:
 25              tickPolicy: catchup
 26          utc: {}
 27        cpu:
 28          cores: 8
 29          dedicatedCpuPlacement: true
 30          sockets: 2
 31          threads: 1
 32        devices:
 33          autoattachGraphicsDevice: false
 34          disks:
 35          - cdrom:
 36              bus: sata
 37            name: windows-guest-tools
 38          - bootOrder: 1
 39            disk:
 40              bus: virtio
 41            name: pvdisk
 42          - disk:
 43              bus: virtio
 44            name: pvdisk1
 45          gpus:
 46          - deviceName: nvidia.com/GA102_GEFORCE_RTX_3080
 47            name: gpuvideo
 48          hostDevices:
 49          - deviceName: devices.kubevirt.io/USB3_Controller
 50            name: usbcontroller
 51          - deviceName: devices.kubevirt.io/USB3_Controller
 52            name: usbcontroller2
 53          - deviceName: intel.com/WIFI_Controller
 54            name: wificontroller
 55          interfaces:
 56          - bridge: {}
 57            model: virtio
 58            name: nic-0
 59          networkInterfaceMultiqueue: true
 60          rng: {}
 61          tpm: {}
 62        features:
 63          acpi: {}
 64          apic: {}
 65          hyperv:
 66            frequencies: {}
 67            ipi: {}
 68            reenlightenment: {}
 69            relaxed: {}
 70            reset: {}
 71            runtime: {}
 72            spinlocks:
 73              spinlocks: 8191
 74            synic: {}
 75            synictimer:
 76              direct: {}
 77            tlbflush: {}
 78            vapic: {}
 79            vpindex: {}
 80          smm: {}
 81        firmware:
 82          bootloader:
 83            efi:
 84              secureBoot: true
 85        machine:
 86          type: q35
 87        memory:
 88          hugepages:
 89            pageSize: 1Gi
 90        resources:
 91          requests:
 92            memory: 32Gi
 93      evictionStrategy: None
 94      hostname: windows
 95      networks:
 96      - multus:
 97          networkName: br1
 98        name: nic-0
 99      terminationGracePeriodSeconds: 3600
100      volumes:
101      - containerDisk:
102          image: registry.redhat.io/container-native-virtualization/virtio-win-rhel9@sha256:0c536c7aba76eb9c1e75a8f2dc2bbfa017e90314d55b242599ea41f42ba4434f
103        name: windows-guest-tools
104      - name: pvdisk
105        persistentVolumeClaim:
106          claimName: windows
107      - name: pvdisk1
108        persistentVolumeClaim:
109          claimName: windowsdata

Unused anymore, for reference only

Binding GPU to VFIO Driver at boot time

We first gather the PCI Vendor and product IDs from pciutils.

lspci -nn |grep VGA
100-sno-vfiopci.bu
 1variant: openshift
 2version: 4.10.0
 3metadata:
 4  name: 100-sno-vfiopci
 5  labels:
 6    machineconfiguration.openshift.io/role: master
 7storage:
 8  files:
 9  - path: /etc/modprobe.d/vfio.conf
10    mode: 0644
11    overwrite: true
12    contents:
13      inline: |
14        options vfio-pci ids=10de:2206,10de:1aef
15  - path: /etc/modules-load.d/vfio-pci.conf 
16    mode: 0644
17    overwrite: true
18    contents:
19      inline: vfio-pci
dnf install butane
butane 100-sno-vfiopci.bu -o 100-sno-vfiopci.yaml
oc apply -f 100-sno-vfiopci.yaml
98-sno-xhci-unbind.yaml
 1apiVersion: machineconfiguration.openshift.io/v1
 2kind: MachineConfig
 3metadata:
 4  labels:
 5    machineconfiguration.openshift.io/role: master
 6  name: 98-sno-xhci-unbind
 7spec:
 8  config:
 9    ignition:
10      version: 3.1.0
11    systemd:
12      units:
13      - contents: |
14         [Unit]
15         Description=Unbind USB Host Controller Driver
16         After=ignition-firstboot-complete.service
17         Before=kubelet.service crio.service
18
19         [Service]
20         Type=oneshot
21         ExecStart=/bin/bash -c "/bin/echo 0000:0b:00.3 > /sys/bus/pci/devices/0000\\:0b\\:00.3/driver/unbind"
22         ExecStart=/bin/bash -c "/bin/echo vfio-pci > /sys/bus/pci/devices/0000\\:0b\\:00.3/driver_override"
23         ExecStart=/bin/bash -c "/bin/echo 1043 87c0 > /sys/bus/pci/drivers/vfio-pci/new_id"
24
25         [Install]
26         WantedBy=kubelet.service
27        enabled: true
28        name: unbindusbcontroller.service

Unbinding VTConsole at boot time

98-sno-vtconsole-unbind.yaml
 1apiVersion: machineconfiguration.openshift.io/v1
 2kind: MachineConfig
 3metadata:
 4  labels:
 5    machineconfiguration.openshift.io/role: master
 6  name: 98-sno-vtconsole-unbind
 7spec:
 8  config:
 9    ignition:
10      version: 3.1.0
11    systemd:
12      units:
13      - contents: |
14         [Unit]
15         Description=Dettach GPU VT Console 
16         After=ignition-firstboot-complete.service
17         Before=kubelet.service crio.service
18
19         [Service]
20         Type=oneshot
21         ExecStart=/bin/bash -c "/bin/echo 0 > /sys/class/vtconsole/vtcon0/bind"
22
23         [Install]
24         WantedBy=kubelet.service
25        enabled: true
26        name: dettachvtconsole.service

What’s next

This chapter is kept as a reference for furture possible improvements.

  • Reducing the Control Plane footprint by relaying on microshift instead.

  • Using GPU from containers instead of virtual machines for Linux Desktop.

Replace node prep by qemu hooks

Enabling dedicated resources for virtual machines

Using MicroShift and RHEL for Edge

Troubleshooting

This section covers common issues you might encounter when setting up GPU passthrough with OpenShift and their solutions.

IOMMU Group Viability Issues

Problem: Virtual machine fails to start with an error similar to:

{"component":"virt-launcher","level":"error","msg":"Failed to start VirtualMachineInstance",
"reason":"virError... vfio 0000:07:00.1: group 19 is not viable
Please ensure all devices within the iommu_group are bound to their vfio bus driver."}

Diagnosis: This error occurs when not all devices in the same IOMMU group are bound to the vfio-pci driver. To check the IOMMU group:

# Check which devices are in the same IOMMU group
ls /sys/kernel/iommu_groups/19/devices/
# Output shows multiple devices in the group:
# 0000:03:08.0  0000:07:00.0  0000:07:00.1  0000:07:00.3

# Check what one of these devices is
lspci -nnks 07:00.0
# Output: AMD Starship/Matisse Reserved SPP [1022:1485]

Solution: All devices in the same IOMMU group need to be bound to the vfio-pci driver. Modify your vfio-prepare.sh script to include all devices in the IOMMU group:

# Add these lines to your vfio-prepare.sh script
echo "vfio-pci" > /sys/bus/pci/devices/0000:03:08.0/driver_override
echo "vfio-pci" > /sys/bus/pci/devices/0000:07:00.0/driver_override
echo "vfio-pci" > /sys/bus/pci/devices/0000:07:00.1/driver_override
echo "vfio-pci" > /sys/bus/pci/devices/0000:07:00.3/driver_override

# Make sure to unbind from current drivers first and then bind to vfio-pci
# as shown in the vfio-prepare.sh script example

Other Common Issues

No display output after GPU passthrough:

  • Ensure you’ve disabled the virtual VGA device in the VM specification

  • Check that you’ve passed through both the GPU and its audio device

  • Install the appropriate GPU drivers inside the virtual machine

Performance issues in Windows VM:

  • Ensure CPU pinning is configured correctly

  • Consider enabling huge pages for memory performance

  • Install the latest NVIDIA drivers from within the VM

  • Disable the Windows Game Bar and other overlay software

GPU driver switching fails:

  • Verify all GPU workloads are stopped before switching

  • Check the GPU operator pod logs: oc logs -n nvidia-gpu-operator <pod-name>

  • Verify IOMMU is properly enabled in BIOS/UEFI settings

For more troubleshooting help, check the logs of the following components:

  • virt-handler: oc logs -n openshift-cnv virt-handler-<hash>

  • virt-launcher: oc logs -n <namespace> virt-launcher-<vm-name>-<hash>

  • nvidia-driver-daemonset: oc logs -n nvidia-gpu-operator nvidia-driver-daemonset-<hash>