Setting up a virtual workstation in OpenShift with VFIO passthrough¶
Feb 27, 2023
25 min read
This article provides a detailed guide on how to configure OpenShift as a workstation with GPU PCI passthrough and Container Native Virtualization (CNV) on a single OpenShift node (SNO).
This setup allows you to leverage Kubernetes orchestration capabilities while still enjoying near-native performance for GPU-intensive applications.
Why this approach?
Run both containerized workloads and virtual machines on the same hardware
Use a single GPU for both Kubernetes pods and virtual machines by switching the driver binding
Achieve near-native performance for gaming and professional applications in VMs
Maintain the flexibility and power of Kubernetes/OpenShift for other workloads
In testing, this configuration successfully ran Microsoft Flight Simulator in a Windows VM with performance smiliar to a bare metal Windows installation.
The workstation used for this demo has the following hardware:
Component |
Specification |
---|---|
CPU |
AMD Ryzen 9 3950X 16-Core 32-Threads |
Memory |
64GB DDR4 3200MHz |
GPU |
Nvidia RTX 3080 FE 10GB |
Storage |
2x 2TB NVMe Disks (for virtual machine storage)
1x 500GB SSD Disk (for OpenShift root system)
|
Network |
10Gbase-CX4 Mellanox Ethernet |
Similar configurations with equivalent Intel CPUs should work with minor adjustments noted throughout the guide.
Installing OpenShift SNO¶
Before proceeding with the installation, ensure you’ve completed the backup steps for any existing partitions.
Backup of existing system partitions¶
To avoid boot order conflicts, the OpenShift assisted installer will format the first 512 bytes of any disks that contain a bootable partition. Therefore, it is important to backup and remove any existing partition table that you would like to preserve.
OpenShift Installation¶
Once any existing file system is backed up and there are no more bootable partitions, we can proceed with the OpenShift Single Node installation.
It is important to note that CoreOS, the underlying operating system, requires an entire disk for installation. For this workstation setup:
We’ll use the 500GB SSD disk for the OpenShift operating system
The two 2TB NVMe disks will be reserved for persistent volumes as LVM Physical volumes belonging to the same Volume Group
This configuration allows for flexible VM storage management while keeping the system installation separate
1#!/bin/bash
2
3OCP_VERSION=latest-4.10
4
5curl -k https://mirror.openshift.com/pub/openshift-v4/clients/ocp/latest/openshift-client-linux.tar.gz > oc.tar.gz
6tar zxf oc.tar.gz
7chmod +x oc && mv oc ~/.local/bin/
8
9curl -k https://mirror.openshift.com/pub/openshift-v4/clients/ocp/$OCP_VERSION/openshift-install-linux.tar.gz > openshift-install-linux.tar.gz
10tar zxvf openshift-install-linux.tar.gz
11chmod +x openshift-install && mv openshift-install ~/.local/bin/
12
13curl $(openshift-install coreos print-stream-json | grep location | grep x86_64 | grep iso | cut -d\" -f4) > rhcos-live.x86_64.iso
1# This file contains the configuration for an OpenShift cluster installation.
2
3apiVersion: v1
4
5# The base domain for the cluster.
6baseDomain: epheo.eu
7
8# Configuration for the compute nodes.
9compute:
10- name: worker
11 replicas: 0
12
13# Configuration for the control plane nodes.
14controlPlane:
15 name: master
16 replicas: 1
17
18# Metadata for the cluster.
19metadata:
20 name: da2
21
22# Networking configuration for the cluster.
23networking:
24 networkType: OVNKubernetes
25 clusterNetwork:
26 - cidr: 10.128.0.0/14
27 hostPrefix: 23
28 serviceNetwork:
29 - 172.30.0.0/16
30
31# Platform configuration for the cluster.
32platform:
33 none: {}
34
35# Configuration for bootstrapping the cluster.
36bootstrapInPlace:
37 installationDisk: /dev/sda
38
39# Pull secret for accessing the OpenShift registry.
40pullSecret: '{"auths":{"cloud.openshift.com":{"auth":"XXXXXXXX"}}}'
41
42# SSH key for accessing the cluster nodes.
43sshKey: |
44 ssh-rsa AAAAB3XXXXXXXXXXXXXXXXXXXXXXXXX
mkdir ocp && cp install-config.yaml ocp
openshift-install --dir=ocp create single-node-ignition-config
alias coreos-installer='podman run --privileged --rm \
-v /dev:/dev -v /run/udev:/run/udev -v $PWD:/data \
-w /data quay.io/coreos/coreos-installer:release'
cp ocp/bootstrap-in-place-for-live-iso.ign iso.ign
coreos-installer iso ignition embed -fi iso.ign rhcos-live.x86_64.iso
dd if=discovery_image_sno.iso of=/dev/usbkey status=progress
Once the ISO is copied to the USB drive, you can use the USB drive to boot your workstation node and install OpenShift Container Platform.
Install CNV Operator¶
Activate Intel VT or AMD-V hardware virtualization extensions in BIOS or UEFI.
1# This YAML file contains Kubernetes resources for installing the KubeVirt Hyperconverged Operator (HCO) on the OpenShift Container Platform.
2# It creates a namespace named "openshift-cnv", an operator group named "kubevirt-hyperconverged-group" in the "openshift-cnv" namespace, and a subscription named "hco-operatorhub" in the "openshift-cnv" namespace.
3# The subscription specifies the source, source namespace, name, starting CSV, and channel for the KubeVirt Hyperconverged Operator.
4
5apiVersion: v1
6kind: Namespace
7metadata:
8 name: openshift-cnv
9---
10apiVersion: operators.coreos.com/v1
11kind: OperatorGroup
12metadata:
13 name: kubevirt-hyperconverged-group
14 namespace: openshift-cnv
15spec:
16 targetNamespaces:
17 - openshift-cnv
18---
19apiVersion: operators.coreos.com/v1alpha1
20kind: Subscription
21metadata:
22 name: hco-operatorhub
23 namespace: openshift-cnv
24spec:
25 source: redhat-operators
26 sourceNamespace: openshift-marketplace
27 name: kubevirt-hyperconverged
28 startingCSV: kubevirt-hyperconverged-operator.v4.10.0
29 channel: "stable"
oc apply -f cnv-resources.yaml
subscription-manager repos --enable cnv-4.10-for-rhel-8-x86_64-rpms
dnf install kubevirt-virtctl
Configure OpenShift for single GPU passthrough¶
As our GPU is the only one attached to the node a few additional steps are required.
We will use MachineConfig to configure our node accordingly.
All MachineConfig are applied on the master machineset because we have a single node OpenShift. With a multi nodes cluster those would be applied to workers instead.
See also
https://github.com/openshift/machine-config-operator/blob/master/docs/SingleNodeOpenShift.md
Passing kernel arguments at boot time¶
Multiple Kernel arguments have to be passed at boot time in order to configure our node for GPU passthrough. This can be done using the MachineConfigOperator.
amd_iommu=on: Enables IOMMU (Input/Output Memory Management Unit) support for AMD platforms, allowing for direct memory access (DMA) by PCI devices without going through the CPU. This improves performance and reduces overhead.
vga=off: Disables VGA (Video Graphics Array) console output during boot time.
rdblaclist=nouveau: Enables the blacklisting of the Nouveau open-source NVIDIA driver.
video=efifb:off: Disables the EFI (Extensible Firmware Interface) framebuffer console output during boot time.
See also
https://www.reddit.com/r/VFIO/comments/cktnhv/bar_0_cant_reserve/ https://www.reddit.com/r/VFIO/comments/mx5td8/bar_3_cant_reserve_mem_0xc00000000xc1ffffff_64bit/
1variant: openshift
2version: 4.10.0
3metadata:
4 name: 100-vfio
5 labels:
6 machineconfiguration.openshift.io/role: master
7openshift:
8 kernel_arguments:
9 - amd_iommu=on
10 - vga=off
11 - rdblaclist=nouveau
12 - 'video=efifb:off'
1cd articles/openshift-workstation/machineconfig/build
2butane -d . vfio-prepare.bu -o ../vfio-prepare.yaml
3oc patch MachineConfig 100-vfio --type=merge -p ../vfio-prepare.yaml
Note
If you’re using an Intel CPU you’ll have to set intel_iommu=on instead.
Installing and configuring the NVIDIA GPU Operator¶
The NVIDIA GPU Operator automates the management of NVIDIA GPUs in Kubernetes environments.
Step 1: Install the GPU Operator¶
Navigate to the OpenShift web console
Go to Operators → OperatorHub
Search for “NVIDIA GPU Operator”
Select the operator and click Install
Keep the default installation settings and click Install again
Alternatively, you can install it through the CLI using the following commands:
oc create -f https://raw.githubusercontent.com/NVIDIA/gpu-operator/master/deployments/git/operator-namespace.yaml
oc create -f https://raw.githubusercontent.com/NVIDIA/gpu-operator/master/deployments/git/operator-source.yaml
Step 2: Configure the ClusterPolicy¶
When deploying the operator’s ClusterPolicy, we need to set sandboxWorkloads.enabled
to true
to enable the sandbox-device-plugin and vfio-manager components, which are essential for GPU passthrough.
1kind: ClusterPolicy
2metadata:
3 name: gpu-cluster-policy
4spec:
5 sandboxWorkloads:
6 defaultWorkload: container
7 enabled: true
oc patch ClusterPolicy gpu-cluster-policy --type=merge -p sandboxWorkloadsEnabled.yaml
As the Nvidia GPU Operator does not officialy supports consumer grade GPUs it does not take the audio device into consideration and therefore doesn’t bind it to vfiopci driver. This has to be done manually but can be achieved once at boot time using the following machine config.
1variant: openshift
2version: 4.10.0
3metadata:
4 name: 100-vfio
5 labels:
6 machineconfiguration.openshift.io/role: master
7storage:
8 files:
9 - path: /usr/local/bin/vfio-prepare
10 mode: 0755
11 overwrite: true
12 contents:
13 local: ./vfio-prepare.sh
14 - path: /etc/modules-load.d/vfio-pci.conf
15 mode: 0644
16 overwrite: true
17 contents:
18 inline: vfio-pci
19systemd:
20 units:
21 - name: vfioprepare.service
22 enabled: true
23 contents: |
24 [Unit]
25 Description=Prepare vfio devices
26 After=ignition-firstboot-complete.service
27 Before=kubelet.service crio.service
28
29 [Service]
30 Type=oneshot
31 ExecStart=/usr/local/bin/vfio-prepare
32
33 [Install]
34 WantedBy=kubelet.service
1#!/bin/bash
2
3vfio_attach () {
4 if [ -f "${path}/driver/unbind" ]; then
5 echo $address > ${path}/driver/unbind
6 fi
7 echo vfio-pci > ${path}/driver_override
8 echo $address > /sys/bus/pci/drivers/vfio-pci/bind || \
9 echo $name > /sys/bus/pci/drivers/vfio-pci/new_id ||true
10}
11
12# 0a:00.1 Audio device [0403]: NVIDIA Corporation GA102 High Definition Audio Controller [10de:1aef] (rev a1)
13address=0000:0a:00.1
14path=/sys/bus/pci/devices/0000\:0a\:00.1
15name="10de 1467"
16vfio_attach
1cd articles/openshift-workstation/machineconfig/build
2butane -d . vfio-prepare.bu -o ../vfio-prepare.yaml
3oc patch MachineConfig 100-vfio --type=merge -p ../vfio-prepare.yaml
Dynamically Switching GPU Drivers¶
One of the key advantages of this setup is the ability to use a single GPU for both container workloads and virtual machines without rebooting the system.
Use Case Scenario¶
Our workstation has a single NVIDIA GPU
Container workloads (such as AI/ML applications) require the NVIDIA kernel driver
Virtual machines with GPU passthrough require the VFIO-PCI driver
We need to switch between these modes without system reboots
Driver Switching Using Node Labels¶
The NVIDIA GPU Operator with sandbox workloads enabled provides a convenient way to switch driver bindings using node labels:
For container workloads (NVIDIA driver):
# Replace 'da2' with your node name
oc label node da2 --overwrite nvidia.com/gpu.workload.config=container
For VM passthrough (VFIO-PCI driver):
# Replace 'da2' with your node name
oc label node da2 --overwrite nvidia.com/gpu.workload.config=vm-passthrough
Notes on Driver Switching¶
The driver switching process takes a few minutes to complete
You can verify the current driver binding with
lspci -nnk | grep -A3 NVIDIA
All GPU workloads must be stopped before switching drivers
No system reboot isually required for the switch to take effect
This have prouved to be a bit unreliable and may require a reboot
Add GPU as Hardware Device of your node¶
See also
https://github.com/kubevirt/kubevirt/blob/main/docs/devel/host-devices-and-device-plugins.md
We indentify the Vendor and Product ID of the GPU
lspci -nnk |grep VGA
We indentify the device name provided by the gpu-feature-discovery.
oc get nodes da2 -ojson |jq .status.capacity |grep nvidia
1kind: HyperConverged
2metadata:
3 name: kubevirt-hyperconverged
4 namespace: openshift-cnv
5spec:
6 permittedHostDevices:
7 pciHostDevices:
8 - externalResourceProvider: true
9 pciDeviceSelector: 10DE:2206
10 resourceName: nvidia.com/GA102_GEFORCE_RTX_3080
oc patch hyperconverged kubevirt-hyperconverged -n openshift-cnv --type=merge -d hyperconverged.yaml
The pciDeviceSelector field specifies the vendor ID and device ID of the PCI device, while the resourceName field specifies the name of the resource that will be created in Kubernetes/OpenShift.
Passthrough USB Host Controllers to the VM¶
For a complete desktop experience, you’ll want to connect input devices (mouse, keyboard) and audio devices directly to your virtual machine. Instead of passthrough individual USB devices, we’ll passthrough an entire USB controller to the VM for better performance and flexibility.
Step 1: Identify a Suitable USB Controller¶
First, we need to identify an appropriate USB controller that we can dedicate to the virtual machine:
List all PCI devices on your system:
lspci -nnk | grep -i usb
Example output:
` 0b:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller [1022:149c] `
Note the PCI address (e.g., 0b:00.3) and the device ID (1022:149c in the example).
Verify the IOMMU group of the controller to ensure it can be safely passed through:
find /sys/kernel/iommu_groups/ -iname "*0b:00.3*" # Shows which IOMMU group contains this device ls /sys/kernel/iommu_groups/27/devices/ # Lists all devices in the same IOMMU group
Important: For clean passthrough, the USB controller should ideally be alone in its IOMMU group. If other devices are in the same group, you’ll need to pass those through as well.
Add the USB Controller as Hardware Device of your node¶
Once identified we add its Vendor and product IDs to the list of permitted Host Devices.
Currently, Kubevirt does not allow providing a specific PCI address, therefore the pciDeviceSelector will match all similar USB Host Controller from the node. However, as we will only bind the one we are interested in to the VFIO-PCI driver the other ones will not be available for pci passthrough.
1kind: HyperConverged
2metadata:
3 name: kubevirt-hyperconverged
4 namespace: openshift-cnv
5spec:
6 permittedHostDevices:
7 pciHostDevices:
8 - pciDeviceSelector: 1022:149C
9 resourceName: devices.kubevirt.io/USB3_Controller
10 - pciDeviceSelector: 8086:2723
11 resourceName: intel.com/WIFI_Controller
oc patch hyperconverged kubevirt-hyperconverged -n openshift-cnv --type=merge -d hyperconverged.yaml
Binding the USB Controller to VFIO-PCI driver at boot time¶
1variant: openshift
2version: 4.10.0
3metadata:
4 name: 100-vfio
5 labels:
6 machineconfiguration.openshift.io/role: master
7storage:
8 files:
9 - path: /usr/local/bin/vfio-prepare
10 mode: 0755
11 overwrite: true
12 contents:
13 local: ./vfio-prepare.sh
14 - path: /etc/modules-load.d/vfio-pci.conf
15 mode: 0644
16 overwrite: true
17 contents:
18 inline: vfio-pci
19 - path: /etc/modprobe.d/vfio.conf
20 mode: 0644
21 overwrite: true
22 contents:
23 inline: |
24 options vfio-pci ids=8086:2723,1022:149c
25systemd:
26 units:
27 - name: vfioprepare.service
28 enabled: true
29 contents: |
30 [Unit]
31 Description=Prepare vfio devices
32 After=ignition-firstboot-complete.service
33 Before=kubelet.service crio.service
34
35 [Service]
36 Type=oneshot
37 ExecStart=/usr/local/bin/vfio-prepare
38
39 [Install]
40 WantedBy=kubelet.service
41openshift:
42 kernel_arguments:
43 - amd_iommu=on
44 - vga=off
45 - rdblaclist=nouveau
46 - 'video=efifb:off'
Create a bash script to unbind specific PCI devices and bind them to the VFIO-PCI driver.
1#!/bin/bash
2
3vfio_attach () {
4 if [ -f "${path}/driver/unbind" ]; then
5 echo $address > ${path}/driver/unbind
6 fi
7 echo vfio-pci > ${path}/driver_override
8 echo $address > /sys/bus/pci/drivers/vfio-pci/bind || \
9 echo $name > /sys/bus/pci/drivers/vfio-pci/new_id ||true
10}
11
12# 0a:00.1 Audio device [0403]: NVIDIA Corporation GA102 High Definition Audio Controller [10de:1aef] (rev a1)
13address=0000:0a:00.1
14path=/sys/bus/pci/devices/0000\:0a\:00.1
15name="10de 1467"
16vfio_attach
17
18# Bind "useless" device to vfio-pci to satisfy IOMMU group
19address=0000:07:00.0
20path=/sys/bus/pci/devices/0000\:07\:00.0
21name="1043 87c0"
22vfio_attach
23
24# Unbind USB switch and handle via vfio-pci kernel driver
25address=0000:07:00.1
26path=/sys/bus/pci/devices/0000\:07\:00.1
27name="1043 87c0"
28vfio_attach
29
30# Unbind USB switch and handle via vfio-pci kernel driver
31address=0000:07:00.3
32path=/sys/bus/pci/devices/0000\:07\:00.3
33name="1022 149c"
34vfio_attach
35
36# Unbind USB switch and handle via vfio-pci kernel driver
37address=0000:0c:00.3
38path=/sys/bus/pci/devices/0000\:0c\:00.3
39name="1022 148c"
40vfio_attach
1cd articles/openshift-workstation/machineconfig/build
2butane -d . vfio-prepare.bu -o ../vfio-prepare.yaml
3oc patch MachineConfig 100-vfio --type=merge -p ../vfio-prepare.yaml
Creating a Virtual Machine with GPU Passthrough¶
This section guides you through creating virtual machines that can utilize the GPU via PCI passthrough. We’ll use existing LVM Logical Volumes where the operating system is already installed with UEFI boot.
Step 1: Create Persistent Volumes from LVM Disks¶
First, we need to make our LVM volumes available to OpenShift by creating a Persistent Volume Claims (PVCs). This assume you have the Local Storage Operator installed and running.
Create a YAML file for each VM disk. Here’s an example for a Fedora 35 VM:
1---
2kind: PersistentVolumeClaim
3apiVersion: v1
4metadata:
5 name: fedora35
6spec:
7 accessModes:
8 - ReadWriteOnce
9 volumeMode: Block
10 resources:
11 requests:
12 storage: 100Gi
13 storageClassName: lvms-vg1
Apply the YAML to create the PV and PVC:
oc apply -f fedora35.yaml
Verify the PV and PVC are created and bound:
oc get pv
oc get pvc -n <your-namespace>
Step 2: Defining the Virtual Machine with GPU Passthrough¶
When creating virtual machines for desktop use with GPU passthrough, several important configurations need to be applied:
Key Configuration Elements¶
GPU Passthrough: Pass the entire physical GPU to the VM
Disable Virtual VGA: Remove the default emulated VGA device since we’re using the physical GPU
USB Controller Passthrough: Include the USB controller for connecting peripherals directly
UEFI Boot: Use UEFI boot mode for compatibility with modern operating systems and GPU drivers
CPU/Memory Configuration: Allocate appropriate resources based on workload requirements
1apiVersion: kubevirt.io/v1
2kind: VirtualMachine
3metadata:
4 name: fedora
5 namespace: epheo
6spec:
7 runStrategy: Halted
8 template:
9 metadata:
10 labels:
11 kubevirt.io/domain: fedora
12 spec:
13 architecture: amd64
14 domain:
15 cpu:
16 cores: 8
17 model: host-passthrough
18 sockets: 2
19 threads: 1
20 features:
21 acpi: {}
22 smm:
23 enabled: true
24 firmware:
25 bootloader:
26 efi:
27 secureBoot: false # For Nvidia Driver...
28 devices:
29 disks:
30 - bootOrder: 1
31 disk:
32 bus: virtio
33 name: pvdisk
34 - disk:
35 bus: virtio
36 name: cloudinitdisk
37 autoattachGraphicsDevice: false
38 gpus:
39 - deviceName: nvidia.com/GA102_GEFORCE_RTX_3080
40 name: gpuvideo
41 hostDevices:
42 - deviceName: devices.kubevirt.io/USB3_Controller
43 name: usbcontroller
44 - deviceName: devices.kubevirt.io/USB3_Controller
45 name: usbcontroller2
46 - deviceName: intel.com/WIFI_Controller
47 name: wificontroller
48 interfaces:
49 - masquerade: {}
50 name: default
51 - bridge: {}
52 model: virtio
53 name: nic-0
54 networkInterfaceMultiqueue: true
55 rng: {}
56 machine:
57 type: q35
58 resources:
59 requests:
60 memory: 16G
61 hostname: fedora
62 networks:
63 - name: default
64 pod: {}
65 - multus:
66 networkName: br1
67 name: nic-0
68 terminationGracePeriodSeconds: 0
69 volumes:
70 - persistentVolumeClaim:
71 claimName: 'fedora35'
72 name: pvdisk
73 - cloudInitNoCloud:
74 userData: |-
75 #cloud-config
76 password: fedora
77 chpasswd: { expire: False }
78 name: cloudinitdisk
1apiVersion: kubevirt.io/v1
2kind: VirtualMachine
3metadata:
4 annotations:
5 vm.kubevirt.io/os: windows10
6 vm.kubevirt.io/workload: desktop
7 name: windows
8spec:
9 runStrategy: Manual
10 template:
11 metadata:
12 labels:
13 kubevirt.io/domain: windows
14 spec:
15 architecture: amd64
16 domain:
17 clock:
18 timer:
19 hpet:
20 present: false
21 hyperv: {}
22 pit:
23 tickPolicy: delay
24 rtc:
25 tickPolicy: catchup
26 utc: {}
27 cpu:
28 cores: 8
29 dedicatedCpuPlacement: true
30 sockets: 2
31 threads: 1
32 devices:
33 autoattachGraphicsDevice: false
34 disks:
35 - cdrom:
36 bus: sata
37 name: windows-guest-tools
38 - bootOrder: 1
39 disk:
40 bus: virtio
41 name: pvdisk
42 - disk:
43 bus: virtio
44 name: pvdisk1
45 gpus:
46 - deviceName: nvidia.com/GA102_GEFORCE_RTX_3080
47 name: gpuvideo
48 hostDevices:
49 - deviceName: devices.kubevirt.io/USB3_Controller
50 name: usbcontroller
51 - deviceName: devices.kubevirt.io/USB3_Controller
52 name: usbcontroller2
53 - deviceName: intel.com/WIFI_Controller
54 name: wificontroller
55 interfaces:
56 - bridge: {}
57 model: virtio
58 name: nic-0
59 networkInterfaceMultiqueue: true
60 rng: {}
61 tpm: {}
62 features:
63 acpi: {}
64 apic: {}
65 hyperv:
66 frequencies: {}
67 ipi: {}
68 reenlightenment: {}
69 relaxed: {}
70 reset: {}
71 runtime: {}
72 spinlocks:
73 spinlocks: 8191
74 synic: {}
75 synictimer:
76 direct: {}
77 tlbflush: {}
78 vapic: {}
79 vpindex: {}
80 smm: {}
81 firmware:
82 bootloader:
83 efi:
84 secureBoot: true
85 machine:
86 type: q35
87 memory:
88 hugepages:
89 pageSize: 1Gi
90 resources:
91 requests:
92 memory: 32Gi
93 evictionStrategy: None
94 hostname: windows
95 networks:
96 - multus:
97 networkName: br1
98 name: nic-0
99 terminationGracePeriodSeconds: 3600
100 volumes:
101 - containerDisk:
102 image: registry.redhat.io/container-native-virtualization/virtio-win-rhel9@sha256:0c536c7aba76eb9c1e75a8f2dc2bbfa017e90314d55b242599ea41f42ba4434f
103 name: windows-guest-tools
104 - name: pvdisk
105 persistentVolumeClaim:
106 claimName: windows
107 - name: pvdisk1
108 persistentVolumeClaim:
109 claimName: windowsdata
Unused anymore, for reference only¶
Binding GPU to VFIO Driver at boot time¶
We first gather the PCI Vendor and product IDs from pciutils.
lspci -nn |grep VGA
1variant: openshift
2version: 4.10.0
3metadata:
4 name: 100-sno-vfiopci
5 labels:
6 machineconfiguration.openshift.io/role: master
7storage:
8 files:
9 - path: /etc/modprobe.d/vfio.conf
10 mode: 0644
11 overwrite: true
12 contents:
13 inline: |
14 options vfio-pci ids=10de:2206,10de:1aef
15 - path: /etc/modules-load.d/vfio-pci.conf
16 mode: 0644
17 overwrite: true
18 contents:
19 inline: vfio-pci
dnf install butane
butane 100-sno-vfiopci.bu -o 100-sno-vfiopci.yaml
oc apply -f 100-sno-vfiopci.yaml
1apiVersion: machineconfiguration.openshift.io/v1
2kind: MachineConfig
3metadata:
4 labels:
5 machineconfiguration.openshift.io/role: master
6 name: 98-sno-xhci-unbind
7spec:
8 config:
9 ignition:
10 version: 3.1.0
11 systemd:
12 units:
13 - contents: |
14 [Unit]
15 Description=Unbind USB Host Controller Driver
16 After=ignition-firstboot-complete.service
17 Before=kubelet.service crio.service
18
19 [Service]
20 Type=oneshot
21 ExecStart=/bin/bash -c "/bin/echo 0000:0b:00.3 > /sys/bus/pci/devices/0000\\:0b\\:00.3/driver/unbind"
22 ExecStart=/bin/bash -c "/bin/echo vfio-pci > /sys/bus/pci/devices/0000\\:0b\\:00.3/driver_override"
23 ExecStart=/bin/bash -c "/bin/echo 1043 87c0 > /sys/bus/pci/drivers/vfio-pci/new_id"
24
25 [Install]
26 WantedBy=kubelet.service
27 enabled: true
28 name: unbindusbcontroller.service
Unbinding VTConsole at boot time¶
1apiVersion: machineconfiguration.openshift.io/v1
2kind: MachineConfig
3metadata:
4 labels:
5 machineconfiguration.openshift.io/role: master
6 name: 98-sno-vtconsole-unbind
7spec:
8 config:
9 ignition:
10 version: 3.1.0
11 systemd:
12 units:
13 - contents: |
14 [Unit]
15 Description=Dettach GPU VT Console
16 After=ignition-firstboot-complete.service
17 Before=kubelet.service crio.service
18
19 [Service]
20 Type=oneshot
21 ExecStart=/bin/bash -c "/bin/echo 0 > /sys/class/vtconsole/vtcon0/bind"
22
23 [Install]
24 WantedBy=kubelet.service
25 enabled: true
26 name: dettachvtconsole.service
What’s next¶
This chapter is kept as a reference for furture possible improvements.
Reducing the Control Plane footprint by relaying on microshift instead.
Using GPU from containers instead of virtual machines for Linux Desktop.
Replace node prep by qemu hooks¶
Enabling dedicated resources for virtual machines¶
Using MicroShift and RHEL for Edge¶
Troubleshooting¶
This section covers common issues you might encounter when setting up GPU passthrough with OpenShift and their solutions.
IOMMU Group Viability Issues¶
Problem: Virtual machine fails to start with an error similar to:
{"component":"virt-launcher","level":"error","msg":"Failed to start VirtualMachineInstance",
"reason":"virError... vfio 0000:07:00.1: group 19 is not viable
Please ensure all devices within the iommu_group are bound to their vfio bus driver."}
Diagnosis: This error occurs when not all devices in the same IOMMU group are bound to the vfio-pci driver. To check the IOMMU group:
# Check which devices are in the same IOMMU group
ls /sys/kernel/iommu_groups/19/devices/
# Output shows multiple devices in the group:
# 0000:03:08.0 0000:07:00.0 0000:07:00.1 0000:07:00.3
# Check what one of these devices is
lspci -nnks 07:00.0
# Output: AMD Starship/Matisse Reserved SPP [1022:1485]
Solution: All devices in the same IOMMU group need to be bound to the vfio-pci driver. Modify your vfio-prepare.sh script to include all devices in the IOMMU group:
# Add these lines to your vfio-prepare.sh script
echo "vfio-pci" > /sys/bus/pci/devices/0000:03:08.0/driver_override
echo "vfio-pci" > /sys/bus/pci/devices/0000:07:00.0/driver_override
echo "vfio-pci" > /sys/bus/pci/devices/0000:07:00.1/driver_override
echo "vfio-pci" > /sys/bus/pci/devices/0000:07:00.3/driver_override
# Make sure to unbind from current drivers first and then bind to vfio-pci
# as shown in the vfio-prepare.sh script example
Other Common Issues¶
No display output after GPU passthrough:
Ensure you’ve disabled the virtual VGA device in the VM specification
Check that you’ve passed through both the GPU and its audio device
Install the appropriate GPU drivers inside the virtual machine
Performance issues in Windows VM:
Ensure CPU pinning is configured correctly
Consider enabling huge pages for memory performance
Install the latest NVIDIA drivers from within the VM
Disable the Windows Game Bar and other overlay software
GPU driver switching fails:
Verify all GPU workloads are stopped before switching
Check the GPU operator pod logs:
oc logs -n nvidia-gpu-operator <pod-name>
Verify IOMMU is properly enabled in BIOS/UEFI settings
For more troubleshooting help, check the logs of the following components:
virt-handler:
oc logs -n openshift-cnv virt-handler-<hash>
virt-launcher:
oc logs -n <namespace> virt-launcher-<vm-name>-<hash>
nvidia-driver-daemonset:
oc logs -n nvidia-gpu-operator nvidia-driver-daemonset-<hash>