April 12, 2020 | by bgarmon
What did I do this weekend? Thank you for asking.
The #homelab is getting a substantial upgrade moving from Netgate PFSense appliances and HP hardware to a full Ubiquity stack with the Unifi Dream Machine Pro, the UniFi Gen2 Switch Pro and a 3-pack of the UniFi nanoHD Access Points. The Switch and Wireless Access Points arrived on Friday evening which meant Saturday was new hardware unboxing day. Due to Covid-19 shipping delays the UDM Pro won’t arrive until Monday.
During the network installation on Saturday, my VCenter 6.7 virtual appliance decided it was time to die. It was running on local storage on one of my Intel NUC devices. A few hours of troubleshooting eventually led to formatting the hard drive of the host server and a fresh install and since I’m going fresh install anyway, why not go with the hot off the press VSphere 7.0 – both ESXi 7 and vCenter 7.0. I also decided it was time to go with shared storage for the virtual appliance. I went to bed Saturday feeling good after everything went smoothly with the install.
I woke up Sunday morning to find my whole house network running slow and when my daughter complained she was unable to watch Plex, I logged into vCenter Server to find this pink screen of death waiting for me.
The ESXi host is a Dell PowerEdge T640 and so I logged into the iDrac and found a second set of hardware errors that pointed to a problem with one of the sticks of RAM in the server. It had only been a week since the server in question received a RAM upgrade, so I powered everything off, reseated the offending RAM stick and rebooted. All was well, except for VCenter Server 7. It died just like it’s predecessor. And hard. No big deal – delete the VM, install new just like the day before and back to normal right – nope. Let me share the nightmare with you.
I use 7th Gen Intel NUC’s in the #homelab because they take up very little space. The one running VCenter decided it no longer wanted to boot. The VCenter Server 7 server had started it’s life on the NUC but migrated to the Dell which unfortunately is a 1 way street because the Dell is running a newer CPU and while EVC Mode is enabled for the NUC cluster, the Dell is a stand alone server and is not part of a cluster so EVC mode is not enabled. I’m also using a 5gbE USB-C adapter on these NUCs as the #homelab is 10GbE so VMware’s migration wizard would only migrate the VCenter to the Dell if it got to continue using a 10GbE connection, which is fine because the Dell has 10GbE as well, but ultimately is not where I wanted vCenter to live.
So when the vCenter 7 server died Sunday morning, rather than bother troubleshooting I just deleted the appliance, and tried to set up a new appliance on the NUC. During Phase 1 of the vCenter Server setup, I would make it to 99% and then the install would die. The installer.log was full of:
error: Request timed out after 30000 ms
when trying to connect to the newly deployed appliance.
It turns out the NUC is not at all happy with the VMWare Fling 7.0 USB-C driver. As long as the USB-C adapter was physically connected, the install would fail always at that 99% mark. It took 6 attempts at the install to figure this out, as the error logs would point to failures at different points and well, I’m dense. So I disconnected the USB-C 5GbE adapter and thought I would be home free. But no, instead the NUC decided to stop POSTing and was DOA.
It’s Easter Sunday mind you, and while my girlfriend and my son and daughter are here in “shelter in place” with me, and begging me to play board games and Fortnite, instead I’m screw driver deep into the guts of a NUC trying to find the CMOS battery to yank it so I can hopefully get this thing to boot. Frustrated, I ended up breaking one of the plastic brackets that connect the SATA hard drive to the motherboard but figured as long as one side was in and the pins were not damaged that I would be ok. I decided while I was in there I would go ahead and add an M2 drive to the system as well so I did that. I get the thing re-assembled and boom POST. Finally! I format the two hard drives, install ESXi 7, configure ESXi 7, and I start the vCenter Server 7 installation. I make it past Phase 1. I’m 59% into the install
and I glance over at the NUC and notice it suddenly no longer has power. I have to hand it to the VMWare installer logic though: that thing waits a full 5 minutes before it finally figures out that something is wrong and dies.
By this point I’ve lost half the day to troubleshooting and can no longer ignore the pleas of the kids. Defeated by the NUC (for now) that no longer boots (again), I pull on the gaming headphones and our family of 4 proceeds to unleash hell on unsuspecting 12 year olds the world over in Fortnite. We finished 1st in Fortnite several times… a Happy Easter indeed.