It feels good to be able to type up a post! It’s been a bit and most of my free time has been allocated to the VCDX process. Shout out to my study group friends Jesper, Kyle, and Gio! Soon, I’ll start writing up the experiences that we have all encountered so that I can share it with all of you! It’s quite a journey, but that’s a story for another time. Also, I want to give a shout out to my buddy, Sandeep! He was in the weeds with me on this one. Anyways, let’s jump right into the meat on this one because this could really be a time saver.
What’s the deally?
Here’s the scoop – a customer of ours wants to upgrade to VCF 4.x. Through their partnership with VMware they have acquired services to assist with the upgrade, but being as how there still is no direct upgrade path provided from VMware for 3.x -> 4.x I have to assume that they are perfecting the process. VMware has told them straight up that they need to be at least at version 126.96.36.199, which makes me believe that this is going to be a hard requirement when the upgrade path becomes available. Anyhow, they are sitting on VCF 188.8.131.52 still. We’ve got some upgrading to do. That’s where my team comes into play.
If you’re not familiar with VCF or version 3.x, at its most basic form we have a management domain with its own VCSA and PSC backed by NSX-V on the network side and vSAN on the storage end. The workload domain, again, has its own VCSA and PSC, but it has NSX-T providing network infrastructure and also vSAN for the storage side. The entire instance is overseen by the SDDC Manager. What happens when you patch? Well this is executed through the SDDC Manager via Life Cycle Manager. The first thing that occurs is SDDC Manager applies its patches. Next comes the management domain’s NSX-V followed by the PSC, VCSA, and finally the hosts. Rinse and repeat for the workload domain (minus the SDDC Manager, of course).
To try to make life easier, I went for gusto and decided to do a skip-level upgrade from 184.108.40.206 to 220.127.116.11. When VCF says, “skip-level upgrade” they really mean “apply every patch from 18.104.22.168+ to 22.214.171.124” instead of what you would think a skip-level upgrade really means. So I downloaded the Bundle Transfer Utility and Skip-Level Upgrade Tool from My VMware and got to work. Transferred the updates successfully, verified them, and we’re good to go. Simple. Let’s upgrade this bad boy.
Right out of the gates, we hit a snag -_- . Updates for the SDDC Manager not only failed to complete, but they failed to even start.
“Failed to run date command”. So naturally, the first thing we do is check the log file (trimmed to save you some time)…
2021-09-28 21:19:54.590 [main] INFO [console-logger]
2021-09-28 21:20:10.786 [main] ERROR [com.vmware.evo.sddc.lcm.common.utils.SshCommandRunner]
Command '[date, +%s]' execution failed:
com.jcraft.jsch.JSchException: Auth fail
2021-09-28 21:20:10.786 [main] ERROR [com.vmware.evo.sddc.lcm.tools.slu.util.RemoteFileOperations]
2021-09-28 21:20:12.895 [main] ERROR [com.vmware.evo.sddc.lcm.tools.slu.SkipLevelUpgradeApplication]
Skip Level Upgrade Tool error
com.vmware.evo.sddc.lcm.tools.slu.error.SkipLevelUpgradeException: Failed to run date command.
Caused by: java.lang.NullPointerException: null
... 4 common frames omitted
2021-09-28 21:20:12.911 [main] ERROR [console-logger]
SDDC Manager Skip Level Upgrade Tool failed with error: Failed to run date command.
Caused by: java.lang.NullPointerException
Well, the thing that stands out the most is the glaring AUTH FAILED message. Again, naturally the first thing we try is to make sure that the credentials for the Primary User, Basic authentication user, and root for the SDDC Manager are all what we think they are. Sure enough, after testing them we deemed them valid. Valid credentials? Check! What else?
We tried everything. I mean everything. Well, everything except one thing, clearly. We were verifying DNS resolution from the utility VM we were using to ensure we were in fact hitting the SDDC Manager, to SSH versions, and even tried to execute the upgrade from a RHEL8 box instead of a Windows. I’ll save you all dozens of minutes of reading and myself hours of typing by leaving out all of the troubleshooting details. Eventually, a white flag was raised and a support case with VMware was opened. After troubleshooting with support for a few hours, the ticket was eventually escalated to engineering, who then internally escalated it. The response finally came back and it wasn’t an actual resolution, but more so one of those things that make the lightbulb go off. The engineer mentioned that the SLU CLI tool “makes direct reference to the Primary User, Basic authentication user, and root passwords that are entered.” Hmmm.
I had an idea. I took a look at the passwords that they were using. They were complex with a myriad of letters, numbers, and…special characters. The root password, in particular, caught my eye because it had a ‘#’ in it. On a hunch, we changed the root password to something basic and re-ran the SLU CLI upgrade process…
BOOM! The upgrade is finally underway! I don’t want to hold you in suspense, so I’ll tell you that the SDDC Manager wound up completing successfully after this!
There’s apparently a bug in 126.96.36.199 as it relates to password complexity. Hashtags seems to cause a problem with the SLU CLI tool. We reset it to a basic password to initiate the upgrade, and once it completed it went right back to a complex password. Thanks for reading. If you enjoyed the post make sure you check us out at dirmann.tech and follow us on LinkedIn, Twitter, Instagram, and Facebook!
Paul Dirmann (vExpert*), VCIX-DCV, VCAP-DCV Design, VCAP-DCV Deploy, VCP-DCV, VCA-DBT, C|EH, MCSA, MCTS, MCP, CIOS, Network+, A+) is the owner and current Lead Consultant at Dirmann Technology Consultants. A technology evangelist, Dirmann has held both leadership positions, as well as technical ones architecting and engineering solutions for multiple multi-million dollar enterprises. While knowledgeable in the majority of the facets involved in the information technology realm, Dirmann honed his expertise in VMware’s line of solutions with a primary focus in hyper-converged infrastructure (HCI) and software-defined data centers (SDDC), server infrastructure, and automation. Read more about Paul Dirmann here, or visit his LinkedIn profile.