OK, you’re still here. You are a brave soul if you’ve stuck around after a title like that, or else you are desperate! That is exactly where I found myself over the last two days.
A product we are currently working on has a process that, well, processes a lot. It goes through several different data gathering, manipulation, saving and printing operations. The end result of this process is a print job that takes about an hour and produces about 1000 printed pages.
For the development we normally sent the jobs to a PDF printer or simply had it stop after printing 20 or 30 pages. Finally the time came to give this a real test, a complete dry run!
I know what you’re thinking, “you should have done that in Dev at least once!” You are right of course, however sometimes we let things slip due to schedules and pressure. Lesson learned, I hope!
It appeared as though there was a memory leak causing the application to crash. Monitoring the memory usage with Process Explorer confirmed this to be the case. Now to the task of tracking it down.
I must admit that I have never had a leak like this one. After some initial code reviews there were a few places where we were able to determine the potential for problems. Implementing code to fix these “phantom menaces” were not successful. Now it was time to really dig in. The downside was, I did not know how to dig, and I didn’t have a shovel. 😦
After some searching around on Google, I ran across Finding .NET Memory Leaks by Phil Write. It was not the easiest thing to find, but it was well worth the time. Phil goes step by step through using sos.exe (Son of Strike) debugging extensions. He explained the basis of how to track down what you think the problem is. Unfortunately the problem was not that easy to find. I ended up doing comparisons of the output of !dumpheap -stat from very early in the process and another dump from much later on down the line. It was a tedious exercise, but a necessary one. Finally I happened upon an object that had a large jump in it’s count between the two samples. Now I had a place to start! Using Phil’s instructions again I was able to find out what was holding on to a reference and implement a fix. It also lead me to a second leak that we did not know existed and had been around for quite a while. It turned out that the first leak that we fixed would not have been a problem if the other one had been behaving properly.
This is a good example of why bugs can be good. The second memory leak will be taken care of within the next day or two and the product will be that much better for it.
Thanks Phil for such a wonderful and simple to understand article!
EDIT: 8/18/2010 – updated link to Phil’s article. Thanks Aaron D for pointing it out!