Retrying Operations
Here’s something I hacked together last night: I’m writing an app that involves a lot of web requests to a very unreliable server. Maybe the site is down, maybe we only get part of a message back because the stream was interrupted, maybe the network cable is loose and being chewed on by gremlins. Who knows. The application is unattended though, so we need it to wait a little while, and then retry the operation a few time – before ultimately giving up and terminating, leading to other actions.
Also, it would be best if this retrying could happen out of sight of the code making the web request – I want to hide all the network activity behind various facades, for testability and to make the logic using the results of the call much easier to write. So, last night the “RetryOperation” class was written.
The code is pretty simple, and I have a feeling that this is already implemented in the .Net framework somewhere… I’ve attached the code and an example program below.
Basically here’s what happens: you call retryOperation.Try(() => DoSomethingUnreliable()); either that will return (synchronously) with a result, or you’ll get an exception when retryOperation gives up.
public TResult Try<TResult>(Func<TResult> action) { int tries = 0; while (tries < MaxTries) { try { // go do it. return action(); } catch (Exception ex) { // action failed. // log about our failure, and sleep for "a while" and then try again, // if we're out of retries then give up and send the exception // back up the call stack. tries++; string logMsg = "Retry Attempt " + tries; log.Warn(logMsg, ex); if (tries >= MaxTries) { // YOU FAIL! throw; } // note: this could be configurable, pick your favorite // timeout-waiting-strategy! int timeout = 10 * 1000 * tries; log.Warn("Sleeping for " + timeout + " ms"); Thread.Sleep(timeout); } } // this point should "never happen.." // either we get a successful result, or we go through our maximum number // of retries and throw an exception above. throw new RetryException("Error condition in Try() escaped from custody!"); }
That’s cool, was just thinking of a few places I could use this!
Not sure about treating every exception as needing a retry though, what if there’s a real exception trying to get through? You might think about optionally passing in a Func type filter which would allow the caller to determine what it considers to be a retry-able exception.
You’re not alone! I had to write something similar for this scenario, that I posted here: http://anydiem.com/2008/02/27/some-functional-programming-in-c-30/
It’s not in the framework, as far as I can tell, though a search for it on Google seems to indicate that such a thing has been suggested! 🙂
@Jon: Excellent point. In my original (non-sample) code I was filtering out WebExceptions and other network related exceptions for retry, while anything else would just get thrown upwards without 2nd attempt. But I like the idea of a caller-supplied type filter better.
@Sean: Very cool, thanks for the link!
I’ve also done something similar – in addition to the error filter that Jon suggested I also used a retry filter to see if retrying again was appropriate. I immediately added a MaxAttempts and Timeout filter and ended up adding ones like “ProcessIsRunning” and “MachineIsNotPingable”
I wrote something similar about a month ago. Glad to see that the solution I came up with matches the one developed by a thought leader. :-). Good stuff.
I remember something similar that Davy Brion did to implement the circuit breaker pattern some time ago. Obviously a slightly different use case, but still close enough to bear looking at: http://davybrion.com/blog/2008/05/the-circuit-breaker/
Hi, why not use a Timer, instead of Thread.Sleep? It’d be non-blocking and the timing would be much more reliable. cheers
It sounds like you’re trying to implement MSMQ.
@Stefano: because I needed the caller to have synchronous semantics, so i do need the calling thread to be blocked until we succeed or fail. in this case the timing can be off by hundreds of ms, it doesn’t make much of a difference – i’m just saying wait ‘for a bit.’ I could have used a Timer, and then a ManualResetEvent and a callback or something to stop until the timer fires, I guess.
@udi: not really. I think using msmq for this (and only this) would be overkill.
I have a small issue with this – the error thrown after the last retry is the last Exception. I would think about throwing a ExceptionWithExceptions thingie – it might allow invokers to better know what went wrong along the way.
Sorry for nitpicking. 🙂
Retrying actions is a real life-saver, indeed.
I’m using a similar approach to transfer data between application boundaries (i.e.: from repository to DB, while resolving dead-locks, or between client applications and web services, while handling communication exceptions). Ability to inject the specific policy via the IoC from a single place makes it even more easy to
If you are interested, there is a production-quality open-source library that leverages this concept (with a configuration syntax for retry policies). Here’s the article introducing the retry aspect:
http://abdullin.com/journal/2008/12/1/net-exception-handling-action-policies-application-block.html
BTW, these policies are compatible with policies used in Windows Azure.