lfcode.ca notes compiled for future reference

lf

Hyper-V Manager throws obscure errors if the target computer calls itself something else than you do

I started testing Server 2019 as a Hyper-V host a few days ago, but getting the GUI manager to connect was a bit challenging. This article will be about as much documentation for me to set this machine up again as it will be instructive.

This machine is non domain joined.

First, name the computer what you want its final DNS name to be with Rename-Computer. Then reboot so you will avoid the issue described in the second half of the post.

Secondly, get a remote shell into it. Enable-PSRemoting, and ensure the firewall rules are allowing connections from the subnets you're OK with remote connections from with Get-NetFirewallRule piped to Get-NetFirewallAddressFilter and Set-NetFirewallAddressFilter.

Next, enable CredSSP with Enable-WSManCredSSP -Role Server and ensure that the appropriate fresh credential delegation, trusted hosts, and permit-CredSSP GPOs are applied on the client. Check also that the WinRM service is running on the client, and if there are still issues with lacking "permission to complete this task" while connecting with the manager, also run Enable-WSManCredSSP with the client role, delegating to the appropriate host.

Then, hopefully, the Hyper-V manager will just connect.


Now, for the problem I had, and as many details as feasible so the next person Googling for it will find this post.

The error that appeared was:

"Hyper-V encountered an error trying to access an object on computer 'LF-HV02' because the object was not found. The object might have been deleted. Verify that the Virtual Machine Management service on the computer is running".

Object not found error

I then investigated the event logs on the target system. In the WMI-Activity/Operational log, I found an error with event ID 5858, and result code 0x80041002:

Id = {8FA5E5DB-34E0-0001-31E6-A58FE034D401}; ClientMachine = WIN-QKHK3OGNV1V; User = WIN-QKHK3OGNV1V\Administrator; ClientProcessId = 2532; Component = Unknown; Operation = Start IWbemServices::GetObject - root\virtualization\v2 : Msvm_VirtualSystemManagementService.CreationClassName="Msvm_VirtualSystemManagementService",Name="vmms",SystemCreationClassName="Msvm_ComputerSystem",SystemName="LF-HV02"; ResultCode = 0x80041002; PossibleCause = Unknown

event5858

When poking around at the mentioned CIM object with Get-CimInstance -ClassName 'Msvm_VirtualSystemManagementService' -Namespace 'root\virtualization\v2', I found that the system name was some randomized name starting with WIN-. So, I renamed it to what it was supposed to be called with Rename-Computer, rebooted, and that fixed the issue.

Tags: hyper-v, Windows Server, PowerShell, Server 2019

Dell XPS 15: "I can't understand why some people _still_ think ACPI is a good idea.." -Linus Torvalds

I got my new machine in the mail, an XPS 15 bought on one of the numerous sales which pretty much happen every couple of days, and while most of the hardware is amazing compared to my previous machine (a beat-up X220), there are some significant hardware issues that need to be worked around. Besides, of course, the fact that the keyboard and lack of trackpoint is objectively inferior to the previous machine.

The first thing that many people may do after booting up a new machine on any operating system is to make sure they got what they paid for, and check detected hardware. So, naturally, I run lspci... and it hangs. I could change virtual console, but it said something about a watchdog catching a stalled CPU core. Fun! Off to Google, which states that it's the NVidia driver, specifically related to Optimus (which, by the way, this video remains an excellent description of). So I blacklist it, and lspci seems to work fine. Next, I install X and all the other applications I want to use, and being a sensible Arch user, I read the Arch wiki on the hardware, which states that the dedicated graphics card will use a lot of power if it isn't turned off.

So, I turn it off. For this, I use acpi_call with a systemd-tmpfiles rule to turn it off at boot. The setup is as follows:

~ » cat /etc/tmpfiles.d/acpi_call.conf
w /proc/acpi/call - - - - \\_SB.PCI0.PEG0.PEGP._OFF
~ » cat /etc/modules-load.d/acpi_call.conf
acpi_call

Next, I get to work doing some programming on it. It was a massive improvement on the previous hardware on account of having a 1080p screen instead of a 1366x768 device-usability-eliminator. However, my terminal-based vim sessions kept getting disturbed by messages such as the following:

kernel: pcieport 0000:00:1c.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e0(Transmitter ID)
kernel: pcieport 0000:00:1c.0:   device [8086:a110] error status/mask=00001000/00002000

After looking in the wiki again, I set pci=nommconf in the kernel options. At this point I was entirely unconvinced that the acpi_rev_override=1 stuff was necessary since I got rid of any NVidia software that could possibly break my machine.

Satisfied with my handiwork, I put the machine into service, and took it to school. Naturally, one may want to put a machine into sleep mode if it is not in use. Unfortunately, doing so was causing it to lock up upon any attempt at waking it. Another strange behaviour that I had been starting to notice at this point was that Xorg could not be started more than once a boot due to the same hard lock issue.

As it turns out, this was again the same issue as the sleep, which is fixed by the acpi_rev_override=1 in the kernel parameters. I had been dissuaded by the Arch developers disabling CONFIG_ACPI_REV_OVERRIDE_POSSIBLE at some point in the past, which was what was suggested by an outdated forum post (lesson learned: do more research on things which could easily change), but they reenabled it recently.

So, finally, the situation:

  • Power management appears to work correctly
  • Battery life is incredible (but could probably be hugely improved to "ridiculous")
  • The touchpad is a touchpad, which means it sucks, although it is one of the better ones
  • There is a significant and very annoying key-repeatt isssuee which happens on occasion, some users have reported it also occurs on Windows. It has happened at least 5 times while writing this post.
  • I hadn't noticed this earlier, but the keyboard has a tendency to scratch the screen while the laptop is closed. Since this is a thoroughly modern machine, there isn't really space to just shove a microfiber cloth between the screen and keyboard like I had done with my X220 with missing rubber standoffs.

Would I recommend buying one?

Maybe. For my use case, it made sense since I want to have a dedicated GPU which can be used in Windows for CAD work. The hardware with the exception of the keyboard and trackpad is very nice, especially for the price (a bit more than half what Apple charges for a similarly specced MacBook Pro 15"). If you don't need or want a dedicated GPU, buy another machine. NVidia still has awful Linux problems.

Which machine? Probably a ThinkPad since they have very good Linux support right out of the box. That being said, I acknowledge that Dell has a group dedicated to Linux support on their hardware, and both companies have similar complete lacks of desire to lift a finger with regards to pressuring their fingerprint reader vendor (the same one for both companies!) to release the driver spec.

Since Linus Torvalds provides such excellent material to quote,

The thing is, you have two choices:
 - define interfaces in hardware
 - not doing so, and then trying to paper it over with idiotic tables.

Sadly, Intel decided that they should do the latter, and invented ACPI.

There are two kinds of interfaces: the simple ones, and the broken ones.

<...>

The broken ones are the ones where hardware people know what they want to
do, but they think the interface is sucky and complicated, so they make it
_doubly_ sucky by then saying "we'll describe it in the BIOS tables", so
that now there is another (incompetent) group that can _also_ screw things
up. Yeehaa!

Tags: linux, arch-linux, hardware, laptop, dell-xps-15

Meshmixer: Turn Off Smooth Display

The default display for meshes in Meshmixer is just a bad idea, especially for people who use it as an STL viewer for technical models.

The setting responsible for this silliness is called "Mesh Normal Mode", which as we all know, should be completely obvious to anyone and everyone. Set that to "Face Normals" and it will display without making the model look like an amorphous blob. Alternately, hold spacebar and select the sphere that has vertices as in the picture below.

Setting in the "Hotbox"

meshmixer-setting-fix

Default

meshmixer-defaults

Face Normals

meshmixer-fixed

Tags: 3dprinting, meshmixer

SELinux notes

ausearch -m avc to find denials. If there are none, that's probably because some distro maintainer decided that the denial should be silent:

semodule -DB turns on dontaudit events, semodule -B turns them back off.

When trying to get things to work correctly with audit2allow, skip the 15 minutes of doing things over and over triggering different denials and running audit2allow -M mymodule < fails; semodule -i mymodule.pp by just doing a quick setenforce 0 before doing it once. All of the actions (AVCs?) in creating a file will show up in the log in one shot. Obviously turn on enforcing mode afterwards.

When in doubt, consult the colouring book. Yes, that's real.

Tags: linux, selinux

MS Documentation sucks (or how I got my VM hostnames to be set automatically from kickstart)

I wanted to automate my linux VM deployment on my Hyper-V based lab infrastructure. One small flaw: while DHCP does automatically update DNS, it does not do too much when your VM is named "localhost". I wanted to make the fedora deployment completely automated... which it is after I wrote a kickstart, except you can't get into the new box because you can't find its IP address.

I wrote a small tool to deal with this issue:
https://github.com/lf-/kvputil

You want the variable VirtualMachineName in /var/lib/hyperv/.kvp_pool_3.

Documentation that took way too long to find:
https://technet.microsoft.com/en-us/library/dn798287.aspx

Tags: hyper-v, linux

Launching PowerShell using the Win32 API

I was working on a personal project in C on Windows when I stumbled upon a really strange roadblock: a PowerShell instance would not actually run the script given to it when started via Windows API but it would when launched manually from a cmd.exe.

Eventually the realisation came to me: PowerShell doesn't like the DETACHED_PROCESS option for CreateProcess(). I have no idea what it was doing with it there, but it didn't involve actually working.

I changed it to CREATE_NO_WINDOW and all is fine in the world.

Tags: windows, PowerShell, win32

Setting up client certs for secure remote access to home lab services

Because I have some masochistic tendencies at times, I decided that it was a totally good idea™ to set up client certificate authentication to secure remote access to my lab services such as Grafana or Guacamole.

Unsurprisingly, since it's a rather uncommonly used finicky authentication method, there were problems. There were quite a few.

I'm writing this post mostly just for myself if I ever do this again, because it felt like it took too long to accomplish.

First, the list of requirements:

  • Should allow access without certs on the local network

  • Should use nginx

The latter was pretty easy, since I'm most familiar with nginx, however the former was rather interesting. I realized that, to implement this, I need to set verification as optional, then enforce it manually. This meant modifying the back ends (meaning maintaining patches, nope!) or doing it within nginx.

One issue is that nginx has if statements that are rather strange, presumably due to simplistic grammar while parsing the configuration. There is no way to do an and statement without hacks. The hack that I chose to use was some variable concatenation (which cannot be done in a single line on the if statement, it must be in its own separate if statement). Here's how I enforce certs from non-LAN hosts:

if ( $ssl_client_verify != "SUCCESS" ) {
    set $clientfail "F";
}
if ( $client_loc = "OUT" ) {
    set $clientfail "${clientfail}F";
}
if ( $clientfail = "FF" ) {
    return 401;
}

$client_loc is defined in a geo block:

geo $client_loc {
    default OUT;
    10.10.0.0/16 IN;
    10.11.0.0/16 IN;
}

But defining ssl_client_certificate and setting up the clients would be too easy. In setting this up, I learned that nginx has an error message: "The SSL certificate error". Yes. That's an error message. It's so bad that it could be written by Microsoft. Fortunately, it's very simple to just write an error_log logs/debug.log debug and get some slightly less cryptic details.

The big thing that tripped me up with the server setup was that ssl_verify_depth is set by default such that with a Root→Intermediate→Client hierarchy, clients fail to be verified. Set it to something like 3 and it will work.

Next, for the certificate setup:

The server directive ssl_client_certificate needs to point to a chain certificate file, or else it will fail with an error that suggests problems with the server certificate (thankfully).

The clients (for now, Chrome on Linux), need a pkcs12 file with some chain like stuff in it. Generate one with something like:

openssl pkcs12 -in client-chain.cert.pem -out client.pfx -inkey client.key.pem -export

where client-chain.cert.pem is a full chain from client to root CA and client.key.pem is a key file.

The other issue with the clients was that they didn't trust my CA that was imported as part of the pfx file to authenticate servers. This was quickly solved with a trip to the CA tab in the Chrome cert settings.

The client certs used in this were from my CA and have the Client Authentication property enabled.

Tags: homelab, nginx, tls

NUT not finding my UPS + fix

I use a CyberPower CP1500AVRLCD as a UPS in my lab. I'm just now getting more stuff running on it to the point that I want automatic shutdown (because it won't run for long with the higher power usage of more equipment). So, I plugged it into the pi that was running as a cups-cloud-print server and sitting on a shelf with my network equipment. The problem was that the driver for it in NUT didn't want to load. As is frighteningly common, it's a permissions problem:

Here's the log showing the issue:

Jul 09 16:49:58 print_demon upsdrvctl[8816]: USB communication driver 0.33
Jul 09 16:49:58 print_demon upsdrvctl[8816]: No matching HID UPS found
Jul 09 16:49:58 print_demon upsdrvctl[8816]: Driver failed to start (exit status=1)

Here's the udev rule that fixes it:

ACTION=="ADD",SUBSYSTEM=="usb",ATTR{idProduct}=="0501",ATTR{idVendor}=="0764",MODE="0660",GROUP="nut"

What this does is, when udev gets an event of the device with USB product id 0501 and vendor id 0764 being added to the system, it changes the permissions on the device files (think /dev/bus/usb/001/004 and /devices/platform/soc/20980000.usb/usb1/1-1/1-1.3) to allow group nut to read and write to it, allowing comms between the NUT driver and the device.

Tags: homelab, linux, raspberry-pi, udev

Windows folder extremely slow to load

Due to some weirdness and presumably thumbnail rendering, if a folder is set to "Optimize for Pictures", it takes 10+ times as long as it should to load. This was happening for my Downloads folder. It seems to only apply when it's accessed through "This PC".

Anyway, to fix it, in the properties of the folder in question, under customize, change Optimize this folder for: to General Items and it will work much better.

slowfolder

Tags: windows, small-fixes

Human error is the root of all problems, especially with network bridges

When in doubt, the problem is directly caused by one's own stupidity.

I was trying to run an LXD host in a Hyper-V VM and went to set up bridged networking (in this case, notworking). Twice. The good old rule that it's caused by my stupidity rang very true. The problem was caused by the network adapter in the VM not having the ability to change the MAC address of its packets. The toggle is in the VM properties under advanced settings in the child node on the NIC.

This is why you should have a routed network.

Tags: homelab, hyper-v, lxd, containers, networking