UCS Director provisioning fails, “The computer restarted unexpectedly” message shows in VMWare Console

This is one of those “no kidding” moments when you find out what went wrong, but it can be incredibly frustrating when it happens. Check out this screenshot I took when recently trying out a new Server 2012 R2 template:

UnexpectedRestart

“The computer restarted unexpectedly or encountered an unexpected error. Windows installation cannot proceed. To install Windows, click “OK” to restart the computer, and then restart the installation.”

I also got another error about the unattended file not being right, but I didn’t grab the exact error.

It turns out that this was a result of using the same Standard Catalog item to test a 2008R2 template earlier:

ModifyCatalog

 

I had changed the template, but not the Windows License Pool. The result was that UCS Director was trying to use a 2008 key for a 2012 installation….which naturally won’t work well.

UCS Director: Waiting for network status of VM…

So here is something that has been bugging me since we deployed Cisco’s UCS Director to manage our VMWare infrastructure:

Max Wait Time 10

That is a message you see in the service request log when you provision a VMWare VM. The task is the standard “VMWare VM Provision” task that ships with UCSD.

Why? What is it doing?

We had originally thought that the UCS Director Virtual Appliance was pinging the IP of the newly-created VM and then assuming (correctly, on our network) that ICMP was blocked but that the VM was up anyway. Even after allowing ICMP traffic, however, the problem persisted.

I contacted Cisco’s UCS Director engineering group and was basically told that yes, it’s pinging, and no, there’s no way to lower the 10 minute threshold. That was in 5.0, and we’re on 5.2.0.0 today.

I inherited the Director installation from another engineer and a couple of weeks ago I finally had the opportunity to start reading the administration guide. As I was going through the guide, I was looking at our settings to see how everything had been configured. When I got to the System Policies section, I noticed a new (AND UNDOCUMENTED!!!) field between “WINS Server List” and the “Auto Logon” checkbox*:

Max Wait Time 10 Setting

Ooooohhh!!! I wonder what that does!

Surprised Batman

So naturally, I changed it on one of my policies and deployed a new vm:

Max Wait Time 4 Setting

And the results:

Max Wait Time 4

In my environment, there is no reason to wait even 4 minutes. By the time Director gets to the “Waiting for status of VM” step, the VM is already booted with all customizations complete. The 10 minute wait was completely unnecessary.

Now if only I can figure out why it waits ten minutes in the “Waiting for VM list discovery” step…

*Incidentally, there is another undocumented field in between the “Wins Server List” and the “Auto Logon” checkbox: A checkbox for “Create a unique SID”. I am slightly afraid of doing anything to that box since everything is working today.

Virtual Machine Manager Error ID 2607

I’m setting up an instance of Virtual Machine Manager 2012 R2 in our multitenant environment. The MT domain is separate from the CORP domain. Most of our initial users will be in CORP, but over time we will bring on more and more users who use ADFS to log in to the MT domain.

There is a one-way trust between MT and CORP–MT trusts CORP but not the other way around. When we set up SQL, we used an account on MT to run SQL Server and VMM. Users in MT can log in to VMM just fine, but users in CORP get the following error:

VMM_SQL_Error_2607

The SQL Server service account does not have permission to access Active Directory Domain Services (AD DS).

Ensure that the SQL Server service is running under a domain account or a computer account that has permission to access AD DS. For more information, see “Some applications and APIs require access to authorization information on account objects” in the Microsoft Knowledge Base at http://go.microsoft.com/fwlink/?LinkId=121054.

ID: 2607

What this error means is that the SQL Server Service Account cannot authenticate the user on CORP, because the CORP Active Directory server is telling it “I don’t trust you, go away!”

Since MT trusts CORP, and easy way to fix this is to change the SQL Server Service Account to one that is on the CORP domain, but that’s ugly and makes me uncomfortable.

I’m not actually sure what the answer is here. We are also deploying federation, so perhaps I will try to federate an MT account to CORP and try that.

The connection cannot be completed because the remote computer that was reached is not the one you specified.

I was doing some maintenance on some Citrix Provisioning Services servers. These guys are VMs running on VMWare. After a reboot, I got this error when trying to RDP back in to one of the servers

RDP_DNS_Error

The connection cannot be completed because the remote computer that was reached is not the one you specified. This could be caused by an outdated entry in the DNS cache. Try using the IP address of the computer instead of the name.

In the Windows System Event Log, we see this error:

Log Name: System
Source: Microsoft-Windows-Security-Kerberos
Date: 12/10/2013 10:14:28 AM
Event ID: 4
Task Category: None
Level: Error
Keywords: Classic
User: N/A
Computer: ClientName.DomainName
Description:
The Kerberos client received a KRB_AP_ERR_MODIFIED error from the server servername$. The target name used was TERMSRV/SERVERNAME. This indicates that the target server failed to decrypt the ticket provided by the client. This can occur when the target server principal name (SPN) is registered on an account other than the account the target service is using. Please ensure that the target SPN is registered on, and only registered on, the account used by the server. This error can also happen when the target service is using a different password for the target service account than what the Kerberos Key Distribution Center (KDC) has for the target service account. Please ensure that the service on the server and the KDC are both updated to use the current password. If the server name is not fully qualified, and the target domain (DomainName) is different from the client domain (DomainName), check if there are identically named server accounts in these two domains, or use the fully-qualified name to identify the server.

Initially I ran ipconfig /flushdns on the client because that’s kind of what the RDP error told me to do. When that didn’t fix it, a coworker mentioned that he had seen it before.

When this VM came back up, the time was all jacked up in vCenter. It was reporting a time in GMT when the server was actually in EST. That, apparently, was enough to set off alarm bells and RDP thought the server was being impersonated!

We fixed the time in vCenter and rebooted the VM. Success!

.NET Runtime version 2.0.50727.5420 – Fatal Execution Engine Error (000007FEF081AF0E) (80131506)

We had a problem on Citrix XenApp Shared Desktops where certain .NET apps were crashing. In our case, it was an in-house app, a vendor app, and a Microsoft Office app.

In the Windows Application Event Log, we were getting several errors:

Log Name: Application
Source: .NET Runtime
Date: 12/5/2013 6:56:31 AM
Event ID: 1023
Task Category: None
Level: Error
Keywords: Classic
User: N/A
Computer: ServerName
Description:
.NET Runtime version 2.0.50727.5420 – Fatal Execution Engine Error (000007FEF081AF0E) (80131506)

Log Name: Application
Source: Application Error
Date: 12/5/2013 6:56:32 AM
Event ID: 1000
Task Category: (100)
Level: Error
Keywords: Classic
User: N/A
Computer: ServerName
Description:
Faulting application name: ApplicationName, version: 1.0.0.0, time stamp: 0x524c8c08
Faulting module name: mscorwks.dll, version: 2.0.50727.5420, time stamp: 0x4ca2b7e1
Exception code: 0xc0000005
Fault offset: 0x00000000001d1908
Faulting process id: 0x%9
Faulting application start time: 0x%10
Faulting application path: %11
Faulting module path: %12
Report Id: %13

Log Name: Application
Source: RISH
Date: 12/5/2013 6:56:32 AM
Event ID: 1307
Task Category: (2)
Level: Error
Keywords: Classic
User: N/A
Computer: ServerName
Description:
Failed to get format message from C:\Windows\System32\wer.dll with Message ID 3e8.

What we determined was that the Citrix Virtual Memory Optimization service was messing with .NET DLLs, so we disabled it. As long as that service isn’t running, we don’t get those errors anymore.

Please note that there may be a scheduled task called “Memory Optimization Schedule” that will need to be disabled as well. This scheduled task re-enables and starts the service at reboot and 3AM.