17 Nov
2010

The Black Art of P/Invoke and Marshaling in .NET

Category:UncategorizedTag: :

Last week I finally managed to hunt down and resolve a bug that I had been chasing for quite some time. A couple of years ago I built an ASP.NET web service that makes use of a native library to provide most of its functionality. This native DLL exposes a C API, much like the Win32 API, that we’ve been using for integrating a highly-expensive product from a certain vendor into our system.

I didn’t notice any issues during the initial development of this web service, in fact, I was very pleased with how easy it was to use and integrate this API. Until this very day, I still consider this as one of the nicest and most stable C API’s I’ve ever come across. After I finished most of the features for the web service, I ran a couple of load tests in order to determine whether the respective product could withstand a fair amount of requests and also to detect any glitches that might come up in the interop layer. Nothing major turned up, so we decided to bring this service into production and go on with out merry lives.

Aft first we didn’t receive any complaints. Everything worked out as expected, until earlier this year, we noticed some failures of the application pool for our web service in IIS. The event log showed some random crashes of the CLR (= very bad) with some highly obscure error message. These issues occurred completely random. One day, there were about five to six application pool crashes after which it behaved very stable again, sometimes for months in a row.

After doing some investigation on my part, which also involved stretching my shallow knowledge of WinDbg, I found out that .NET runtime was doing a complete shutdown after an AccessViolationException being thrown. This exception reported the following issue:

Attempted to read or write protected memory. This is often an indication that other memory is corrupt.

The first thing I considered was a memory leak turning up somewhere. It’s unmanaged code after all, right? After reviewing the code that uses the imported native functions, I discovered two potential memory leaks that might come up during some edge case scenarios. So I fixed those, but unfortunately it didn’t resolve the problem.

I tried a couple of other things and also approached the issue from a couple of different angles. I’m not going to bore you with the details here, but it clearly didn’t solve anything. I started to become a bit frustrated with this problem. At some point, I was even convinced that this was all caused by the native library itself and not in our code, which is a classic mistake in most long debugging sessions. The good thing was that I was able to reproduce the problem on my machine, by running the load tests with a particular high load. I noticed that there was also a certain pattern in the number of requests between each crash of the application pool.

Vastly determined to fix this, I decided to read up on some information about P/Invoke and marshaling in .NET. I ended up reading this blog post on P/Invoke and memory related issues. While it didn’t provide a clear solution, reading that blog post certainly guided me in the right direction. I started to turn off feature by feature until I was able to isolate the cause of the crash to a single function call. The native function in question has the following signature:

char* FunctionThatReturnsAString();

As it turned out, I created the following P/Invoke prototype for this native function in C#:

[DllImport("SomeProduct.dll")]
public static extern String FunctionThatReturnsAString();

At first sight (and even after a number of subsequent reviews), there seems to be nothing wrong with the way the char * return value is marshaled to a String. I started reading the excellent documentation for this function again, and it explicitly mentioned that the caller should not cleanup the memory for the return value. The allocated memory for the return value is always cleaned up by the native function itself.

Doing some more research on marshaling strings, I found out that in case the native function does its own cleanup, an IntPtr must be returned by the C# prototype declaration instead of a String.

Most c-style strings returned by native functions must be cleaned up by the calling code. For this common scenario, when the return value is marshaled to a String, the interop marshaler assumes that it has to free the unmanaged memory returned by the function. In our case, this actually means that the interop marshaler tried to cleanup memory that at some point was already freed up by the native function or vice versa.  

So I changed the return value for the P/Invoke prototype declaration to an IntPtr. This way, the interop marshaler does not automatically free the memory that is referenced by the IntPtr.

[DllImport("SomeProduct.dll")]
public static extern IntPtr FunctionThatReturnsAString();

Marshaling the data referenced by the IntPtr to a string is quite easy. This can be done using the Marshal.PtrToStringAuto method.

The lesson I learned here is that we need to watch out for these kind of issues when using interop with unmanaged code. On the surface everything seems to work out just fine, but one can still run into some nasty issues afterwards. Carefully considering how to correctly marshal data from and to unmanaged code is an essential technique and sometimes feels like a black art that is not suited for the faint of heart, like myself :-).

6 thoughts on “The Black Art of P/Invoke and Marshaling in .NET

  1. I think I could make a case that it _was_ a problem with the C API that was causing the crashes. Every good C API I’ve come across, including all the Windows APIs I can think of, would use the signature:

    int FunctionThatGetsAString(char* szBuffer, int cBuffer)

    This way if it fails you get a return code and there’s no confusion about who’s responsible for allocating the memory. In my opinion they’re asking for trouble if users have to carefully read the documentation for their function to find out how to use it.

    It is opinion though, and I’m sure there are arguments for doing things the other way. Sometimes I just have to be a smarty pants. 🙂

  2. Some C experience goes a *long* way to playing nice with P/Invoke.

    Also, if I start allocating and managing a lot of unmanaged buffers from C#, that’s a point where I consider doing that work in C directly, and then surfacing a nice clean interface for my managed code to use instead 🙂

  3. @Mike C

    “Every good C API I’ve come across, including all the Windows APIs I can think of, would use the signature”

    Not all Windows API calls are like that. They are mostly conform, but I came across one function that exposed the same behavior as the function described in the post.

    Also there’s no problem with exposing an API like that. The general scenario here is that you have to do the cleanup yourself in most cases. This is also a place where a lot of memory leaks find their origin. The API designers wanted to relieve their users from having to deal with this. It’s just that for interop scenarios (which the API designers probably didn’t take into account when they were designing the API), you as the developer need to watch for this issue.

  4. @Jan Van Ryswyck
    I agree that there’s no technical problem with exposing the API like that, and you could argue that it’s even slightly more efficient because the method will only allocate exactly the amount of memory it needs instead of using part a larger buffer. However, it fails the test I first saw described in Scott Meyers’s Effective C : Make interfaces easy to use correctly and hard to use incorrectly. More specifically “Any interface that requires that clients remember to do something is prone to incorrect use, because clients can forget to do it.” It’s no different if it’s requiring the client remember not to do something. Just looking at the function signature you can’t tell who’s responsible for that pointer, and if there’s a way to remove that ambiguity then that way is the better design in my opinion.

Comments are closed.