Thursday, October 16, 2008

WPF Threading Model via Windbg

In this post, I will show you how to discover the threading model of WPF via Windbg. This post is not a complete description of WPF threading model, but just listed what I've found after my efforts.

Windbg is a powerful debugger, though may be not the most powerful one. It can really help you to find root causes for some though bugs. Download Windbg here.

To dive into WPF, first, create a simple WPF application. Just use Visual Studio to create a default WPF application project. And build the application to get executable generated. Then, launch Windbg, and load it into Windbg. Don't forget to load Windbg extensions such as sos or psscor2, to help your debugging.

OK, let the application run. When you see the application window, break the application by clicking the 'break' button on the Windbg's toolbar. Let's go to thread 0 and see its stack by these two commands:

~0s
!clrstack

The first command switches current thread to 0, and the second will display CLR stack. It's very clear that thread 0 is our UI thread.

0030e824 56d5dfe0 MS.Win32.HwndSubclass.SubclassWndProc(IntPtr, Int32, IntPtr, IntPtr)
0030ec5c 00a20bd3 [InlinedCallFrame: 0030ec5c] MS.Win32.UnsafeNativeMethods.CallWindowProc(IntPtr, IntPtr, Int32, IntPtr, IntPtr)
0030ec58 56d5f0fe MS.Win32.HwndSubclass.DefWndProcWrapper(IntPtr, Int32, IntPtr, IntPtr)
0030ede8 00a20bd3 [NDirectMethodFrameStandalone: 0030ede8] MS.Win32.UnsafeNativeMethods.CallWindowProc(IntPtr, IntPtr, Int32, IntPtr, IntPtr)
0030ee04 56d5dfe0 MS.Win32.HwndSubclass.SubclassWndProc(IntPtr, Int32, IntPtr, IntPtr)
0030ef90 00a20bd3 [NDirectMethodFrameStandalone: 0030ef90] MS.Win32.UnsafeNativeMethods.DispatchMessage(System.Windows.Interop.MSG ByRef)
0030efa0 56d61824 System.Windows.Threading.Dispatcher.PushFrameImpl(System.Windows.Threading.DispatcherFrame)
0030eff0 56d616c7 System.Windows.Threading.Dispatcher.PushFrame(System.Windows.Threading.DispatcherFrame)
0030f00c 56d6162d System.Windows.Threading.Dispatcher.Run()
0030f018 5533ddb0 System.Windows.Application.RunInternal(System.Windows.Window)
0030f044 5533dbe5 System.Windows.Application.Run(System.Windows.Window)
0030f054 5533d836 System.Windows.Application.Run()
0030f05c 003500ad SimpleWPFApp.App.Main()
0030f284 79e7c74b [GCFrame: 0030f284]

You may want to try other threads, and will find they are not CLR threads. And next step, let's try to find which thread is render thread. Use command 'kb' repeatedly and you will find one thread looks interesting. It is trying to GetWork and is waiting for something.

0:005> kb
ChildEBP RetAddr Args to Child
0491fab0 77719254 7679c244 00000234 00000000 ntdll!KiFastSystemCallRet [d:\rtm\base\ntos\rtl\i386\userdisp.asm @ 645]
0491fab4 7679c244 00000234 00000000 00000000 ntdll!ZwWaitForSingleObject+0xc [o:\rtm.obj.x86fre\base\ntdll\daytona\objfre\i386\usrstubs.asm @ 2939]
0491fb24 7679c1b2 00000234 ffffffff 00000000 KERNEL32!WaitForSingleObjectEx+0xbe [d:\rtm\base\win32\client\synch.c @ 1524]
0491fb38 72790b19 00000234 ffffffff 004dd410 KERNEL32!WaitForSingleObject+0x12 [d:\rtm\base\win32\client\synch.c @ 1429]
0491fb68 7278aadc 0491fb7c 004dd410 004dd4a0 MilCore!CPartitionManager::GetWork+0x1a5 [d:\rtm\windows\mil\core\uce\partitionmanager.cpp @ 1111]
0491fb80 7278aaa4 00000000 00000000 004dd410 MilCore!CPartitionThread::Run+0x1a [d:\rtm\windows\mil\core\uce\partitionthread.cpp @ 222]
0491fb98 76794911 004dd410 0491fbe4 776fe4b6 MilCore!CPartitionThread::ThreadMain+0x1e [d:\rtm\windows\mil\core\uce\partitionthread.cpp @ 46]
0491fba4 776fe4b6 004dd410 73edafdb 00000000 KERNEL32!BaseThreadInitThunk+0xe [d:\rtm\base\win32\client\thread.c @ 66]
0491fbe4 776fe489 7278aa86 004dd410 00000000 ntdll!__RtlUserThreadStart+0x23 [d:\rtm\base\ntos\rtl\rtlexec.c @ 2740]
0491fbfc 00000000 7278aa86 004dd410 00000000 ntdll!_RtlUserThreadStart+0x1b [d:\rtm\base\ntos\rtl\rtlexec.c @ 2672]

By checking the call stack, we find there's an argument passed to WaitForSingleObject and WaitForSingleObjectEx. The argument is 00000234. It looks like some synchronization objects. Let's check what is it:

0:000> !handle 0x234 ff
Handle 234
Type Event
Attributes 0
GrantedAccess 0x1f0003:
Delete,ReadControl,WriteDac,WriteOwner,Synch
QueryState,ModifyState
HandleCount 2
PointerCount 4
Name <none>
Object Specific Information
Event Type Auto Reset
Event is Waiting

Windbg tells us it's an event handle. Hmmm, that's reasonable. If you continue to check other thread's call stack, and you will find there's only one thread is waiting for some synchronization events. This thread must be render thread, because communication between render thread and UI thread must be based on synchronization object. So, next task is to find where the event been set. Set breakpoint on KERNEL32!SetEvent won't work for us because there are so many calls to SetEvent everywhere in the application. Windbg offers a kind of conditional breakpoint which can help us to break the application at the right position.

bp kernel32!SetEvent "j (edx=234) '.echo foo';'gc'"

Here 234 is the event handle, remember that we found this handle in render thread. You should replace it with the handle you find in your situation. Let the application go. The breakpoint is so great though, it seems that it can't be hit every time when the code goes to the right position. I don't know why. But if you try to resize the application's window repeatedly, you will have greater chance to hit the breakpoint. When the application got hit the breakpoint, you can see the call stack of thread 0 like this:

0:000> kb
ChildEBP RetAddr Args to Child
0030d95c 72748a36 00000234 004de51c 004b3a18 KERNEL32!SetEvent+0x14 [d:\rtm\base\win32\client\synch.c @ 506]
0030d978 727489d1 004b3a18 004de2f0 00000000 MilCore!CPartitionManager::ScheduleBatchProcessing+0x54 [d:\rtm\windows\mil\core\uce\partitionmanager.cpp @ 482]
0030d9d0 7274974e 0030d9ec 0030da24 004e7188 MilCore!CComposition::SubmitBatch+0x96 [d:\rtm\windows\mil\core\uce\composition.cpp @ 359]
0030d9e4 727488ba 004b3a18 0030da34 004de1d0 MilCore!CConnectionContext::SendBatchToChannel+0x30 [d:\rtm\windows\mil\core\uce\connectioncontext.cpp @ 568]
0030d9f8 72748868 00000004 004e7294 0030da3c MilCore!CConnectionContext::SendPacketToConnection+0xee [d:\rtm\windows\mil\core\uce\connectioncontext.cpp @ 450]
0030da08 727491cd 0030da24 01d47b28 00000003 MilCore!CMilCrossThreadTransport::PostPacket+0x3b [d:\rtm\windows\mil\core\uce\crossthreadtransport.cpp @ 145]
0030da3c 727491ff 004e7294 004e7208 72742c35 MilCore!CMilConnection::SubmitBatch+0xa9 [d:\rtm\windows\mil\core\uce\connection.cpp @ 619]
0030da48 72742c35 01d47b28 72742c9f 004e7208 MilCore!CMilChannel::FlushCommands+0x21 [d:\rtm\windows\mil\core\uce\clientchannel.cpp @ 721]
0030da50 72742c9f 004e7208 0030daa0 5395bc2d MilCore!CMilChannel::Commit+0x20 [d:\rtm\windows\mil\core\uce\clientchannel.cpp @ 626]
0030da5c 5395bc2d 004e7208 a1590ca7 79e73940 MilCore!MilChannel_CommitChannel+0x21 [d:\rtm\windows\mil\core\uce\apifunc.cpp @ 523]
0030daa0 539bb1b3 0000004f 00000003 00000000 PresentationCore_ni!`dynamic atexit destructor for '<CrtImplementationDetails>::AtExitLock::_lock'' <PERF> (PresentationCore_ni+0xbbc2d)
0030db10 53977fea 00000000 0000020f 0000024e PresentationCore_ni!`dynamic atexit destructor for '<CrtImplementationDetails>::AtExitLock::_lock'' <PERF> (PresentationCore_ni+0x11b1b3)
0030db98 79e75519 56d61428 00000000 01d116b0 PresentationCore_ni!`dynamic atexit destructor for '<CrtImplementationDetails>::AtExitLock::_lock'' <PERF> (PresentationCore_ni+0xd7fea)
0030db9c 56d61428 00000000 01d116b0 01d6c134 mscorwks!ThreadNative::FastGetCurrentThread+0x6 [f:\redbits\ndp\clr\src\vm\comsynchronizable.cpp @ 1004]
00000000 00000000 00000000 00000000 00000000 WindowsBase_ni!IsEncryptedPackageEnvelope+0x53 [d:\SP1\windows\wcp\Base\System\IO\Packaging\EncryptedPackage.cs @ 595]

It's interesting. Isn't it? Check the handle 234 right now and you will see the event is still in waiting state. But if you step over KERNEL32!SetEvent, you will find the event turned to 'set state':

0:000> !handle 0x234 ff
Handle 234
Type Event
Attributes 0
GrantedAccess 0x1f0003:
Delete,ReadControl,WriteDac,WriteOwner,Synch
QueryState,ModifyState
HandleCount 2
PointerCount 4
Name <none>
Object Specific Information
Event Type Auto Reset
Event is Set

From current UI thread's call stack you can find the data submission and synchronization secrete. UI thread and render thread are working in an asynchronous mode. It means that even if you completed some invocations of drawing objects, these drawing commands may be pending in the queue and won't be executed by render thread. I remember that once I am developing a program that drawing some animation, I found that even if I received the animation completion event, the animation was still going on on the screen.