OpenShift Workstation with Single GPU passthrough#
Feb 27, 2023
25 min read
Introduction#
This article describes how to run OpenShift as a workstation with GPU PCI passthrough and Container Native Virtualization (CNV) to provide a virtualized desktop experience on a single OpenShift node. This is useful to provide a virtual desktop experience with a single GPU, and is used to run Microsoft FlightSimulator in a Windows VM with performances close from a Bare metal Windows installation.
Hardware description#
The workstation used for this demo has the following hardware:
AMD Ryzen 9 3950X 16-Core 32-Threads
64GB DDR4 3200MHz
Nvidia RTX 3080 FE 10GB
2x 2TB NVMe Disks (guests)
1x 500GB SSD Disk (root system)
10Gbase-CX4 Mellanox Ethernet
Backup of existing system partitions#
To avoid boot order conflicts, the OpenShift assisted installer will format the first 512 bytes of any disks that contain a bootable partition. Therefore, it is important to backup and remove any existing partition table that you would like to preserve.
Installing OpenShift SNO#
Once any existing file system is backed up and there is no more bootable partitions we can proceed with the OpenShift Single Node install.
It is important to note that CoreOS, the underlying operating system requires an entire disk for installation.
Here, we will keep the two NVMe disks for the persistant volumes as LVM Physical volumes belonging to a same Volume Group and we will use the SSD disk for the OpenShift operating system.
1#!/bin/bash
2
3OCP_VERSION=latest-4.10
4
5curl -k https://mirror.openshift.com/pub/openshift-v4/clients/ocp/latest/openshift-client-linux.tar.gz > oc.tar.gz
6tar zxf oc.tar.gz
7chmod +x oc && mv oc ~/.local/bin/
8
9curl -k https://mirror.openshift.com/pub/openshift-v4/clients/ocp/$OCP_VERSION/openshift-install-linux.tar.gz > openshift-install-linux.tar.gz
10tar zxvf openshift-install-linux.tar.gz
11chmod +x openshift-install && mv openshift-install ~/.local/bin/
12
13curl $(openshift-install coreos print-stream-json | grep location | grep x86_64 | grep iso | cut -d\" -f4) > rhcos-live.x86_64.iso
1# This file contains the configuration for an OpenShift cluster installation.
2
3apiVersion: v1
4
5# The base domain for the cluster.
6baseDomain: epheo.eu
7
8# Configuration for the compute nodes.
9compute:
10- name: worker
11 replicas: 0
12
13# Configuration for the control plane nodes.
14controlPlane:
15 name: master
16 replicas: 1
17
18# Metadata for the cluster.
19metadata:
20 name: da2
21
22# Networking configuration for the cluster.
23networking:
24 networkType: OVNKubernetes
25 clusterNetwork:
26 - cidr: 10.128.0.0/14
27 hostPrefix: 23
28 serviceNetwork:
29 - 172.30.0.0/16
30
31# Platform configuration for the cluster.
32platform:
33 none: {}
34
35# Configuration for bootstrapping the cluster.
36bootstrapInPlace:
37 installationDisk: /dev/sda
38
39# Pull secret for accessing the OpenShift registry.
40pullSecret: '{"auths":{"cloud.openshift.com":{"auth":"XXXXXXXX"}}}'
41
42# SSH key for accessing the cluster nodes.
43sshKey: |
44 ssh-rsa AAAAB3XXXXXXXXXXXXXXXXXXXXXXXXX
mkdir ocp && cp install-config.yaml ocp
openshift-install --dir=ocp create single-node-ignition-config
alias coreos-installer='podman run --privileged --rm \
-v /dev:/dev -v /run/udev:/run/udev -v $PWD:/data \
-w /data quay.io/coreos/coreos-installer:release'
cp ocp/bootstrap-in-place-for-live-iso.ign iso.ign
coreos-installer iso ignition embed -fi iso.ign rhcos-live.x86_64.iso
dd if=discovery_image_sno.iso of=/dev/usbkey status=progress
Once the ISO is copied to the USB drive, you can use the USB drive to boot your workstation node and install OpenShift Container Platform.
Install CNV Operator#
Activate Intel VT or AMD-V hardware virtualization extensions in BIOS or UEFI.
1# This YAML file contains Kubernetes resources for installing the KubeVirt Hyperconverged Operator (HCO) on the OpenShift Container Platform.
2# It creates a namespace named "openshift-cnv", an operator group named "kubevirt-hyperconverged-group" in the "openshift-cnv" namespace, and a subscription named "hco-operatorhub" in the "openshift-cnv" namespace.
3# The subscription specifies the source, source namespace, name, starting CSV, and channel for the KubeVirt Hyperconverged Operator.
4
5apiVersion: v1
6kind: Namespace
7metadata:
8 name: openshift-cnv
9---
10apiVersion: operators.coreos.com/v1
11kind: OperatorGroup
12metadata:
13 name: kubevirt-hyperconverged-group
14 namespace: openshift-cnv
15spec:
16 targetNamespaces:
17 - openshift-cnv
18---
19apiVersion: operators.coreos.com/v1alpha1
20kind: Subscription
21metadata:
22 name: hco-operatorhub
23 namespace: openshift-cnv
24spec:
25 source: redhat-operators
26 sourceNamespace: openshift-marketplace
27 name: kubevirt-hyperconverged
28 startingCSV: kubevirt-hyperconverged-operator.v4.10.0
29 channel: "stable"
oc apply -f cnv-resources.yaml
subscription-manager repos --enable cnv-4.10-for-rhel-8-x86_64-rpms
dnf install kubevirt-virtctl
Remove Local Storage operator (if installed)#
As we do not need to manage LVM volumes automatically we would like to avoid automatically formating Logical Volumes once they are deleted from OpenShift.
While this could lead to data leak in a multi-tenant environment, removing the Local Storage Operator also avoid loosing your Virtual Machine partitions once you delete it.
Configure OpenShift for single GPU passthrough#
As our GPU is the only one attached to the node a few additional steps are required.
We will use MachineConfig to configure our node accordingly.
All MachineConfig are applied on the master machineset because we have a single node OpenShift. With a multi nodes cluster those would be applied to workers instead.
See also
https://github.com/openshift/machine-config-operator/blob/master/docs/SingleNodeOpenShift.md
Passing kernel arguments at boot time#
Multiple Kernel arguments have to be passed at boot time in order to configure our node for GPU passthrough. This can be done using the MachineConfigOperator.
amd_iommu=on: Enables IOMMU (Input/Output Memory Management Unit) support for AMD platforms, allowing for direct memory access (DMA) by PCI devices without going through the CPU. This improves performance and reduces overhead.
vga=off: Disables VGA (Video Graphics Array) console output during boot time.
rdblaclist=nouveau: Enables the blacklisting of the Nouveau open-source NVIDIA driver.
video=efifb:off: Disables the EFI (Extensible Firmware Interface) framebuffer console output during boot time.
See also
https://www.reddit.com/r/VFIO/comments/cktnhv/bar_0_cant_reserve/ https://www.reddit.com/r/VFIO/comments/mx5td8/bar_3_cant_reserve_mem_0xc00000000xc1ffffff_64bit/
1variant: openshift
2version: 4.10.0
3metadata:
4 name: 100-vfio
5 labels:
6 machineconfiguration.openshift.io/role: master
7openshift:
8 kernel_arguments:
9 - amd_iommu=on
10 - vga=off
11 - rdblaclist=nouveau
12 - 'video=efifb:off'
1cd articles/openshift-workstation/machineconfig/build
2butane -d . vfio-prepare.bu -o ../vfio-prepare.yaml
3oc patch MachineConfig 100-vfio --type=merge -p ../vfio-prepare.yaml
Note
If you’re using an Intel CPU you’ll have to set intel_iommu=on instead.
Installing and configuring the NVidia GPU Operator#
Install the GPU Operator using OLM / OpenShift Marketplace.
When deploying the operator’s ClusterPolicy we have to set sandboxWorkloads.enabled
to true
to enable the sandbox-device-plugin and vfio-manager.
1kind: ClusterPolicy
2metadata:
3 name: gpu-cluster-policy
4spec:
5 sandboxWorkloads:
6 defaultWorkload: container
7 enabled: true
oc patch ClusterPolicy gpu-cluster-policy --type=merge -p sandboxWorkloadsEnabled.yaml
As the Nvidia GPU Operator does not supports consumer grade GPUs it does not take the audio device into consideration and therefore doesn’t bind it to vfiopci driver. This has to be done manually but can be achieved once at boot time using the following machine config.
1variant: openshift
2version: 4.10.0
3metadata:
4 name: 100-vfio
5 labels:
6 machineconfiguration.openshift.io/role: master
7storage:
8 files:
9 - path: /usr/local/bin/vfio-prepare
10 mode: 0755
11 overwrite: true
12 contents:
13 local: ./vfio-prepare.sh
14 - path: /etc/modules-load.d/vfio-pci.conf
15 mode: 0644
16 overwrite: true
17 contents:
18 inline: vfio-pci
19systemd:
20 units:
21 - name: vfioprepare.service
22 enabled: true
23 contents: |
24 [Unit]
25 Description=Prepare vfio devices
26 After=ignition-firstboot-complete.service
27 Before=kubelet.service crio.service
28
29 [Service]
30 Type=oneshot
31 ExecStart=/usr/local/bin/vfio-prepare
32
33 [Install]
34 WantedBy=kubelet.service
1#!/bin/bash
2
3vfio_attach () {
4 if [ -f "${path}/driver/unbind" ]; then
5 echo $address > ${path}/driver/unbind
6 fi
7 echo vfio-pci > ${path}/driver_override
8 echo $address > /sys/bus/pci/drivers/vfio-pci/bind || \
9 echo $name > /sys/bus/pci/drivers/vfio-pci/new_id ||true
10}
11
12# 0a:00.1 Audio device [0403]: NVIDIA Corporation GA102 High Definition Audio Controller [10de:1aef] (rev a1)
13address=0000:0a:00.1
14path=/sys/bus/pci/devices/0000\:0a\:00.1
15name="10de 1467"
16vfio_attach
1cd articles/openshift-workstation/machineconfig/build
2butane -d . vfio-prepare.bu -o ../vfio-prepare.yaml
3oc patch MachineConfig 100-vfio --type=merge -p ../vfio-prepare.yaml
Changing the driver binded to the GPU#
This workstation only have a single GPU.
I’d like to use it for both Virtual Machines and AI/ML workload.
Containers requires the GPU device to bind to the Nvidia driver.
Virtual machines requires the GPU device to bind to the VFIO-PCI driver.
I’d like an efficient way to bind / unbind the GPU to a driver without reboot.
We can label the node in order to configure it with the GPU bound to Nvidia kernel driver in order to satisky container workloads.
oc label node da2 --overwrite nvidia.com/gpu.workload.config=container
Or to bind the GPU to the vfio-pci driver to satisfy Virtual Machines workloads with PCI passthrough.
oc label node da2 --overwrite nvidia.com/gpu.workload.config=vm-passthrough
The whole operation takes a few minutes.
Add GPU as Hardware Device of your node#
See also
https://github.com/kubevirt/kubevirt/blob/main/docs/devel/host-devices-and-device-plugins.md
We indentify the Vendor and Product ID of the GPU
lspci -nnk |grep VGA
We indentify the device name provided by the gpu-feature-discovery.
oc get nodes da2 -ojson |jq .status.capacity |grep nvidia
1kind: HyperConverged
2metadata:
3 name: kubevirt-hyperconverged
4 namespace: openshift-cnv
5spec:
6 permittedHostDevices:
7 pciHostDevices:
8 - externalResourceProvider: true
9 pciDeviceSelector: 10DE:2206
10 resourceName: nvidia.com/GA102_GEFORCE_RTX_3080
oc patch hyperconverged kubevirt-hyperconverged -n openshift-cnv --type=merge -d hyperconverged.yaml
The pciDeviceSelector field specifies the vendor ID and device ID of the PCI device, while the resourceName field specifies the name of the resource that will be created in Kubernetes/OpenShift.
Passthrough the USB Host Controllers to the VM#
In order to directly connect a mouse, keyboard, audio device etc directly to the VM we passthrough one if the USB controller directly to the VM.
Identify a USB Controller and its IOMMU group#
We first need to indentify it using pciutils.
lspci -nnk
After selecting the USB Controller we want to dedicate to the Virtual Machine we should verify that this is the only PCI device in its IOMMU group. We first look for the PCI address in the iommu_groups folder structure, the list the PCI addresses belonging to this IOMMU group.
find /sys/kernel/iommu_groups/ -iname "*0b:00.3*"
ls /sys/kernel/iommu_groups/27/devices/
Add the USB Controller as Hardware Device of your node#
Once identified we add its Vendor and product IDs to the list of permitted Host Devices.
Currently, Kubevirt does not allow providing a specific PCI address, therefore the pciDeviceSelector will match all similar USB Host Controller from the node. However, as we will only bind the one we are interested in to the VFIO-PCI driver the other ones will not be available for pci passthrough.
1kind: HyperConverged
2metadata:
3 name: kubevirt-hyperconverged
4 namespace: openshift-cnv
5spec:
6 permittedHostDevices:
7 pciHostDevices:
8 - pciDeviceSelector: 1022:149C
9 resourceName: devices.kubevirt.io/USB3_Controller
10 - pciDeviceSelector: 8086:2723
11 resourceName: intel.com/WIFI_Controller
oc patch hyperconverged kubevirt-hyperconverged -n openshift-cnv --type=merge -d hyperconverged.yaml
Binding the USB Controller to VFIO-PCI driver at boot time#
1variant: openshift
2version: 4.10.0
3metadata:
4 name: 100-vfio
5 labels:
6 machineconfiguration.openshift.io/role: master
7storage:
8 files:
9 - path: /usr/local/bin/vfio-prepare
10 mode: 0755
11 overwrite: true
12 contents:
13 local: ./vfio-prepare.sh
14 - path: /etc/modules-load.d/vfio-pci.conf
15 mode: 0644
16 overwrite: true
17 contents:
18 inline: vfio-pci
19 - path: /etc/modprobe.d/vfio.conf
20 mode: 0644
21 overwrite: true
22 contents:
23 inline: |
24 options vfio-pci ids=8086:2723,1022:149c
25systemd:
26 units:
27 - name: vfioprepare.service
28 enabled: true
29 contents: |
30 [Unit]
31 Description=Prepare vfio devices
32 After=ignition-firstboot-complete.service
33 Before=kubelet.service crio.service
34
35 [Service]
36 Type=oneshot
37 ExecStart=/usr/local/bin/vfio-prepare
38
39 [Install]
40 WantedBy=kubelet.service
41openshift:
42 kernel_arguments:
43 - amd_iommu=on
44 - vga=off
45 - rdblaclist=nouveau
46 - 'video=efifb:off'
Create a bash script to unbind specific PCI devices and bind them to the VFIO-PCI driver.
1#!/bin/bash
2
3vfio_attach () {
4 if [ -f "${path}/driver/unbind" ]; then
5 echo $address > ${path}/driver/unbind
6 fi
7 echo vfio-pci > ${path}/driver_override
8 echo $address > /sys/bus/pci/drivers/vfio-pci/bind || \
9 echo $name > /sys/bus/pci/drivers/vfio-pci/new_id ||true
10}
11
12# 0a:00.1 Audio device [0403]: NVIDIA Corporation GA102 High Definition Audio Controller [10de:1aef] (rev a1)
13address=0000:0a:00.1
14path=/sys/bus/pci/devices/0000\:0a\:00.1
15name="10de 1467"
16vfio_attach
17
18# Bind "useless" device to vfio-pci to satisfy IOMMU group
19address=0000:07:00.0
20path=/sys/bus/pci/devices/0000\:07\:00.0
21name="1043 87c0"
22vfio_attach
23
24# Unbind USB switch and handle via vfio-pci kernel driver
25address=0000:07:00.1
26path=/sys/bus/pci/devices/0000\:07\:00.1
27name="1043 87c0"
28vfio_attach
29
30# Unbind USB switch and handle via vfio-pci kernel driver
31address=0000:07:00.3
32path=/sys/bus/pci/devices/0000\:07\:00.3
33name="1022 149c"
34vfio_attach
35
36# Unbind USB switch and handle via vfio-pci kernel driver
37address=0000:0c:00.3
38path=/sys/bus/pci/devices/0000\:0c\:00.3
39name="1022 148c"
40vfio_attach
1cd articles/openshift-workstation/machineconfig/build
2butane -d . vfio-prepare.bu -o ../vfio-prepare.yaml
3oc patch MachineConfig 100-vfio --type=merge -p ../vfio-prepare.yaml
Creating a Virtual Machine#
The virtual machine will use existing LVM Logical volumes, here we will assume we already have the Operating System installed on the LV with a UEFI boot.
Create PV and PV Claim out of local LVM disks#
Binding PV and PVC by label https://docs.openshift.com/container-platform/3.3/install_config/storage_examples/binding_pv_by_label.html
1---
2kind: PersistentVolume
3apiVersion: v1
4metadata:
5 name: fedora35
6 labels:
7 vol: fedora35
8spec:
9 capacity:
10 storage: 100Gi
11 local:
12 path: /dev/fedora_da2/fedora35
13 accessModes:
14 - ReadWriteOnce
15 persistentVolumeReclaimPolicy: Retain
16 storageClassName: local-storage
17 volumeMode: Block
18 nodeAffinity:
19 required:
20 nodeSelectorTerms:
21 - matchExpressions:
22 - key: kubernetes.io/hostname
23 operator: In
24 values:
25 - da2
26---
27kind: PersistentVolumeClaim
28apiVersion: v1
29metadata:
30 name: fedora35
31 labels:
32 vol: fedora35
33spec:
34 accessModes:
35 - ReadWriteOnce
36 volumeMode: Block
37 resources:
38 requests:
39 storage: 100Gi
40 storageClassName: local-storage
Defining the Virtual Machine#
The virtual machines we will use as Desktops comes with a few specities:
We will passthrough the entire GPU | Ref: https://kubevirt.io/2021/intel-vgpu-kubevirt.html
We will remove the existing default virtual VGA | Ref: https://kubevirt.io/api-reference/master/definitions.html#_v1_devices
We will passthrough an entire USB controller
We will use UEFI boot to be closer from typical BareMetal | Ref: https://docs.openshift.com/container-platform/4.10/virt/virtual_machines/advanced_vm_management/virt-efi-mode-for-vms.html
1apiVersion: kubevirt.io/v1
2kind: VirtualMachine
3metadata:
4 name: fedora
5 namespace: epheo
6spec:
7 runStrategy: Halted
8 template:
9 metadata:
10 labels:
11 kubevirt.io/domain: fedora
12 spec:
13 architecture: amd64
14 domain:
15 cpu:
16 cores: 8
17 model: host-passthrough
18 sockets: 2
19 threads: 1
20 features:
21 acpi: {}
22 smm:
23 enabled: true
24 firmware:
25 bootloader:
26 efi:
27 secureBoot: false # For Nvidia Driver...
28 devices:
29 disks:
30 - bootOrder: 1
31 disk:
32 bus: virtio
33 name: pvdisk
34 - disk:
35 bus: virtio
36 name: cloudinitdisk
37 autoattachGraphicsDevice: false
38 gpus:
39 - deviceName: nvidia.com/GA102_GEFORCE_RTX_3080
40 name: gpuvideo
41 hostDevices:
42 - deviceName: devices.kubevirt.io/USB3_Controller
43 name: usbcontroller
44 - deviceName: devices.kubevirt.io/USB3_Controller
45 name: usbcontroller2
46 - deviceName: intel.com/WIFI_Controller
47 name: wificontroller
48 interfaces:
49 - masquerade: {}
50 name: default
51 - bridge: {}
52 model: virtio
53 name: nic-0
54 networkInterfaceMultiqueue: true
55 rng: {}
56 machine:
57 type: q35
58 resources:
59 requests:
60 memory: 16G
61 hostname: fedora
62 networks:
63 - name: default
64 pod: {}
65 - multus:
66 networkName: br1
67 name: nic-0
68 terminationGracePeriodSeconds: 0
69 volumes:
70 - persistentVolumeClaim:
71 claimName: 'fedora35'
72 name: pvdisk
73 - cloudInitNoCloud:
74 userData: |-
75 #cloud-config
76 password: fedora
77 chpasswd: { expire: False }
78 name: cloudinitdisk
1apiVersion: kubevirt.io/v1
2kind: VirtualMachine
3metadata:
4 annotations:
5 vm.kubevirt.io/os: windows10
6 vm.kubevirt.io/workload: desktop
7 name: windows
8spec:
9 runStrategy: Manual
10 template:
11 metadata:
12 labels:
13 kubevirt.io/domain: windows
14 spec:
15 architecture: amd64
16 domain:
17 clock:
18 timer:
19 hpet:
20 present: false
21 hyperv: {}
22 pit:
23 tickPolicy: delay
24 rtc:
25 tickPolicy: catchup
26 utc: {}
27 cpu:
28 cores: 8
29 dedicatedCpuPlacement: true
30 sockets: 2
31 threads: 1
32 devices:
33 autoattachGraphicsDevice: false
34 disks:
35 - cdrom:
36 bus: sata
37 name: windows-guest-tools
38 - bootOrder: 1
39 disk:
40 bus: virtio
41 name: pvdisk
42 - disk:
43 bus: virtio
44 name: pvdisk1
45 gpus:
46 - deviceName: nvidia.com/GA102_GEFORCE_RTX_3080
47 name: gpuvideo
48 hostDevices:
49 - deviceName: devices.kubevirt.io/USB3_Controller
50 name: usbcontroller
51 - deviceName: devices.kubevirt.io/USB3_Controller
52 name: usbcontroller2
53 - deviceName: intel.com/WIFI_Controller
54 name: wificontroller
55 interfaces:
56 - bridge: {}
57 model: virtio
58 name: nic-0
59 networkInterfaceMultiqueue: true
60 rng: {}
61 tpm: {}
62 features:
63 acpi: {}
64 apic: {}
65 hyperv:
66 frequencies: {}
67 ipi: {}
68 reenlightenment: {}
69 relaxed: {}
70 reset: {}
71 runtime: {}
72 spinlocks:
73 spinlocks: 8191
74 synic: {}
75 synictimer:
76 direct: {}
77 tlbflush: {}
78 vapic: {}
79 vpindex: {}
80 smm: {}
81 firmware:
82 bootloader:
83 efi:
84 secureBoot: true
85 machine:
86 type: q35
87 memory:
88 hugepages:
89 pageSize: 1Gi
90 resources:
91 requests:
92 memory: 32Gi
93 evictionStrategy: None
94 hostname: windows
95 networks:
96 - multus:
97 networkName: br1
98 name: nic-0
99 terminationGracePeriodSeconds: 3600
100 volumes:
101 - containerDisk:
102 image: registry.redhat.io/container-native-virtualization/virtio-win-rhel9@sha256:0c536c7aba76eb9c1e75a8f2dc2bbfa017e90314d55b242599ea41f42ba4434f
103 name: windows-guest-tools
104 - name: pvdisk
105 persistentVolumeClaim:
106 claimName: windows
107 - name: pvdisk1
108 persistentVolumeClaim:
109 claimName: windowsdata
Unused anymore, for reference only#
Binding GPU to VFIO Driver at boot time#
We first gather the PCI Vendor and product IDs from pciutils.
lspci -nn |grep VGA
1variant: openshift
2version: 4.10.0
3metadata:
4 name: 100-sno-vfiopci
5 labels:
6 machineconfiguration.openshift.io/role: master
7storage:
8 files:
9 - path: /etc/modprobe.d/vfio.conf
10 mode: 0644
11 overwrite: true
12 contents:
13 inline: |
14 options vfio-pci ids=10de:2206,10de:1aef
15 - path: /etc/modules-load.d/vfio-pci.conf
16 mode: 0644
17 overwrite: true
18 contents:
19 inline: vfio-pci
dnf install butane
butane 100-sno-vfiopci.bu -o 100-sno-vfiopci.yaml
oc apply -f 100-sno-vfiopci.yaml
1apiVersion: machineconfiguration.openshift.io/v1
2kind: MachineConfig
3metadata:
4 labels:
5 machineconfiguration.openshift.io/role: master
6 name: 98-sno-xhci-unbind
7spec:
8 config:
9 ignition:
10 version: 3.1.0
11 systemd:
12 units:
13 - contents: |
14 [Unit]
15 Description=Unbind USB Host Controller Driver
16 After=ignition-firstboot-complete.service
17 Before=kubelet.service crio.service
18
19 [Service]
20 Type=oneshot
21 ExecStart=/bin/bash -c "/bin/echo 0000:0b:00.3 > /sys/bus/pci/devices/0000\\:0b\\:00.3/driver/unbind"
22 ExecStart=/bin/bash -c "/bin/echo vfio-pci > /sys/bus/pci/devices/0000\\:0b\\:00.3/driver_override"
23 ExecStart=/bin/bash -c "/bin/echo 1043 87c0 > /sys/bus/pci/drivers/vfio-pci/new_id"
24
25 [Install]
26 WantedBy=kubelet.service
27 enabled: true
28 name: unbindusbcontroller.service
Unbinding VTConsole at boot time#
1apiVersion: machineconfiguration.openshift.io/v1
2kind: MachineConfig
3metadata:
4 labels:
5 machineconfiguration.openshift.io/role: master
6 name: 98-sno-vtconsole-unbind
7spec:
8 config:
9 ignition:
10 version: 3.1.0
11 systemd:
12 units:
13 - contents: |
14 [Unit]
15 Description=Dettach GPU VT Console
16 After=ignition-firstboot-complete.service
17 Before=kubelet.service crio.service
18
19 [Service]
20 Type=oneshot
21 ExecStart=/bin/bash -c "/bin/echo 0 > /sys/class/vtconsole/vtcon0/bind"
22
23 [Install]
24 WantedBy=kubelet.service
25 enabled: true
26 name: dettachvtconsole.service
What’s next#
This chapter is kept as a reference for furture possible improvements.
Reducing the Control Plane footprint by relaying on microshift instead.
Using GPU from containers instead of virtual machines for Linux Desktop.
Replace node prep by qemu hooks#
Enabling dedicated resources for virtual machines#
Using MicroShift and RHEL for Edge#
Troubleshooting#
{"component":"virt-launcher","kind":"","level":"error","msg":"Failed to start VirtualMachineInstance with flags 0.","name":"windows","namespace":"epheo","pos":"manager.go:1027","reason":"virError(Code=1, Domain=10, Message='internal error: qemu unexpectedly closed the monitor: 2024-02-08T13:10:06.726594Z qemu-kvm: -device {\"driver\":\"vfio-pci\",\"host\":\"0000:07:00.1\",\"id\":\"ua-hostdevice-usbcontroller\",\"bus\":\"pci.9\",\"addr\":\"0x0\"}: vfio 0000:07:00.1: group 19 is not viable\nPlease ensure all devices within the iommu_group are bound to their vfio bus driver.')","timestamp":"2024-02-08T13:10:07.353704Z","uid":"cc6fa39c-db31-4f2e-bba1-42dfc4b6efad"}
{"component":"virt-launcher","kind":"","level":"error","msg":"Failed to sync vmi","name":"windows","namespace":"epheo","pos":"server.go:202","reason":"virError(Code=1, Domain=10, Message='internal error: qemu unexpectedly closed the monitor: 2024-02-08T13:10:06.726594Z qemu-kvm: -device {\"driver\":\"vfio-pci\",\"host\":\"0000:07:00.1\",\"id\":\"ua-hostdevice-usbcontroller\",\"bus\":\"pci.9\",\"addr\":\"0x0\"}: vfio 0000:07:00.1: group 19 is not viable\nPlease ensure all devices within the iommu_group are bound to their vfio bus driver.')","timestamp":"2024-02-08T13:10:07.353770Z","uid":"cc6fa39c-db31-4f2e-bba1-42dfc4b6efad"}
[core@da2 ~]$ ls /sys/kernel/iommu_groups/19/devices/
0000:03:08.0 0000:07:00.0 0000:07:00.1 0000:07:00.3
[core@da2 ~]$ lspci -nnks 07:00.0
07:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP [1022:1485]
Subsystem: ASUSTeK Computer Inc. Device [1043:87c0]