Batching Remote Operations Is Not Premature Optimization
I spent a day at Devoxx (formerly known as Javapolis) this week, and one of the presentations i saw was about common performance anti-patterns, given by Alois Reitbauer from DynaTrace. While i didn’t really hear anything new during the presentation, i did kinda like how the speaker stressed that while premature optimization is indeed evil, certain things simply don’t belong in that category and are things you should definitely keep an eye on throughout the development of your projects.
Those of you who’ve been reading this blog for a while know that i’ve often stressed the importance of reducing the number of remote calls. Yeah that’s right, i just linked to 7 of my own posts in a single sentence :p
Anyways, there were a few people who thought that my preference for batching queries/service calls was actually a case of premature optimization, and that it was therefor evil and not something you should be doing until it was actually necessary to do so. The speaker of the presentation explained that there is a difference between premature optimization and pro-active performance management. Performance and scalability simply do not come for free, and you have to keep certain things in mind if you want your system to have those qualities.
Now, before i go further, i would like to state that i do believe that clean, simple and reusable code is something that developers should always strive for. I also believe that you should try to limit the number of times you hit the database, or the number of remote service calls you make in a single business transaction. Those goals often seem to contradict each other. There aren’t too many data access layers that allow you to easily perform multiple queries in a single roundtrip while still making sure that each query is reusable in a different context. It gets even worse when it comes to remote services. As you undoubtedly know, a lot of industry experts will recommend that you provide coarse-grained service interfaces instead of fine-grained service interfaces. The upside of coarse-grained interfaces is that they often offer better performance due to less chattiness in communication. Unfortunately, it also often also leads to services that are implicitly coupled against the clients that are known to be using them. With that i mean that many of those coarse-grained services are designed with certain client-characteristics in mind. And shouldn’t services be independent of the clients that use them? This approach typically has an impact on the reusability of those services for clients which are to be developed after the service has already been deployed.
So how do we solve these issues? It’s pretty simple. I want each database query to be reusable in whatever way i need: combined with other queries in a single roundtrip, or executed separately. I also want a fine-grained service interface where i can execute precisely the ‘remote action’ i need, and avoid the chattiness when i need to execute several of those ‘remote actions’ in a single business transaction. The answer is of course: batching of remote operations, whether they are database queries or remote service calls or whatever else that essentially boils down to an out-of-process call.
I really think that every kind of architecture should at least make this reasonably easy to do so. It really doesn’t take that much effort (or in some cases i’d even call it imagination) to make all of this possible. I’m actually pretty lazy so if i can manage to get it done, there’s no reason in the world why you shouldn’t. It can be done pretty easily, and it really doesn’t come with that much of a cost. Of course, writing your code this way takes a little bit more work than doing it the ‘easy way’ (little more being ‘minutes’ vs ‘hours’ though). But then again, if the ‘easy way’ were the right way, we wouldn’t even be talking about performance anti-patterns, right?
For me personally, i’ve gotten to the point where i really couldn’t care less about possibly slow performing code in advance, as long as that code is executed in-process. No matter how good you are, you will practically always guess wrong when it comes to slow-performing in-process code. Just write clean and readable code and if some parts of it turn out to be slow, you just use a profiler and it will quickly tell you which parts are causing the slowdowns. It’s pretty much always the last place you suspect so why bother writing difficult code for parts that probably weren’t going to be a problem anyway. But for out-of-process stuff: be vigilant, because any performance problem related to it could have easily been avoided from the start.