The black screen of death
The logon process for users accessing a XenApp/Virtual App-environments is not completely simple to explain or understand in its entirety. There are several processes and services that need to work together, to let a user log on and begin to work in a virtual session. An issue that is not especially uncommon with regards to the logon process is what I would like to call the Black screen of death, BSOD. This should not be confused by the other BSOD! 🙂 When an environment has black screen issues I know that the troubleshooting and eventually finding a solution could most likely be long and challenging.
There have been several discussions regarding black screens at logon lately, especially when looking at Virtual Apps (i.e. XenApp) and published desktops. There are some obvious, and quite straight forward reasons why users get a black screen at logon. I’m not going to get into those in this blog post, apart from mentioning two really good articles from Citrix on the subject; XenApp/XenDesktop : Black Screen Is Displayed While Launching A Published Applications From Windows Server 2016 VDA  and XA/XD – Black or Blue Screen Connecting to Published Desktop .
I would also like to shed some light on a second ”Black Screen-issue” also currently discussed, the Windows-service AppReadiness and black screen at logon. Funnily enough, it seems like that issue is also introduced with VDAs newer than 7.15 CU1. If there’s an interest in diving into that issue too, I’m happy to do so in another blog post. My explanation of that issue can be found on the Citrix Discussion forum .
Last but not least, the latest of all ”Black Screen-issues” I have encountered, and the topic of today’s blog post.
Users log on to a published desktop where the VDA is newer than 7.15 CU1, in my case i tried them all, 7.16, 7.17, and the newly released version 7.18. The session went black at logon and explorer.exe did not start. Even after waiting for more than 30 minutes. It did not matter if it was new profile or existing, in this case Citrix User Profile Mgmt, nor did it matter if the VDA was newly installed or updated from 7.15 CU1. Sending CTRL+ALT+DEL did not do a thing.
Everything worked fine on VDA 7.15 CU1 and previous versions, the only change I did to the MCS image when this occured was updating the VDA.
I did some initial trial and error without any luck, so I decided to use my favorite troubleshooting tool, Process Monitor (aka Procmon). Within a couple of minutes I noticed that there was a process stuck in some kind of never-ending loop when a user tried to log on to the VDA. The process stuck was the ”Citrix Profile management message utility”, upmEvent.exe .
What I also could see was that the process upmEvent.exe was the last process during the logon before the login process got stuck, and the user got the BSOD. I could not at the time identify exactly why, other than I knew which process broke the attempted login. It didn’t matter if it was a new or existing profile.
After having identified the culprit process I forcefully terminated it, and boom, the login process progressed as we are used to. Explorer.exe and all the other processes eventually started like nothing was wrong. From a user perspective, everything began to work and the desktop was shown as soon as the process upmEvent.exe was terminated.
From experience I knew that this was not the first time that specific process have have had different kind of issues. If you do a quick Google search on “upmEvent.exe” you will see that there have been some interesting issues with it over the past. The last change I know of, were when customers needed help because Citrix made a change in how it should be configured to upload data to Citrix Director. In short that change was needed because we hade to change from using UpmUserMsg.exe to upmEvent.exe. I also knew that the startup of the process had been changed previously, from the Run-key to the Userinit-key. From this I had reason to believe that this scenario might not be very different from last time  .
I knew that upmEvent.exe by default has moved from the legacy Run-key to Userinit starting the process in user context. I also knew that the way the process needs to be configured has historically changed depending on what VDA-version is used. What I finally knew was that the configuration of the process is usually controlled in one way or another, for example with a scheduled task, GPP, GPO, registry, or something completely else.
I did a quick check to verify that the Key changed between my two VDA-versions.
Indeed, there’s a difference! Closer to the solution, great!
In this specific environment I found out that the user-context startup of the upmEvent.exe-process was made with a GPO. When looking at the configuration I could see that it was configured in the old way of using upmEvent.exe. Not the new way of doing it!
When the VDA was updated to a newer version than 7.15 CU1 the GPO was reconfigured at the same time. In this case we removed the logon script and let the VDA configure the Userinit registry value. When the MCS machine was rolled out everything worked as it should, even though the VDA was updated!
I didn’t do more digging than needed, as I could see that everything started to work after the reconfiguration. It seems like newer versions of the VDA, and the move to Userinit, collide with the GPO configuration. Because of the collide the users gets a black screen at logon. A deadlock occurs when the script and Userinit is configured to run the process at the same time.
Hope this helps someone out there!