Abstractions and performance
When it comes down to performance, enterprise developers are lazy, especially the managed (Java/.NET) kind. After all, when performance problems pop up experience tells you it’s always a question of database, disk I/O or network latency issue in the end, right ? Well not always.
One of the main objectives in the creation of managed platforms was to increase overall developer productivity, and they are immensely successful at that…but this comes at a cost: performance. When you think about it, it’s no surprise that in cases where performance per cycle still matters, native development is still very dominant. Think cases like games, database engines, operating systems, or apps for mobile platforms. Although current desktop CPUs are so immensely powerful enterprise code can most of the time perform despite it being managed and despite it being developed by a person without any sense of what his code will compile too, you will sometimes encounter cases where it’s just not enough.
Despite knowing the above, and having a passion for native/performance coding besides .NET, I recently almost fell into this very trap.
For the next iteration of our software asset management solution I had to, based on a set of selected assets, compile a manager’s view of the employees who these assets could be ordered for. Moreover, I had to take into account employees who already had the asset, and whether they could order the asset at all based on their place in the organization. While the User Interface was new, the established - and time tested - business queries were already there, so instead of making a new one I used 2 existing queries to get lists of employees and assets from the database independently. Next I joined them together in memory (LINQ to objects), and was baffled to see the overall process take 43 seconds. With my enterprise managed developer hat on, I immediately assumed this must be a problem of the database, but some simple timing showed the 2 initial queries returned almost instantly. It was undeniable: the problem was really my LINQ query itself.
How could that be? Aren’t in memory queries much faster than database queries? Well…yes and no. Compared to a CPU cycle, the overhead involved in a roundtrip to the database is so astronomical, that even when the database returns instantly, you can still do a lot of work in that time. So when you have a small in memory dataset doing an in memory query is indeed much faster. This changes when the dataset gets larger, and it changes fast. To understand why, you have to be able to look under the covers of the LINQ abstraction and understand what really happens.
In the case of LINQ to entities (or LINQ2SQL), the concatenation of select, join, where and group by clauses is ultimately compiled into a SQL query which is sent to the database. Next, the database engine is responsible for analysing the query and establishing a query plan which makes it execute fast. In the case of a LINQ to objects query however, the result is just a series of operations on your dataset as you specified them: there is no engine optimizing it.
‘But wait! Isn’t the compiler that engine in the case of in memory operations ?’ Not in this case. As good as the compiler is, it won’t optimize a full dataset scan away and replace it with an index scan (something the database engine would do). It’s even worse: the LINQ abstraction will even prevent the compiler from unrolling inner loops for us. This is because the LINQ syntax is a concatenation of methods (closures even), while the compiler optimizations in .NET are limited to method scope: there is no global scope optimization like in native C++ (templating).
So what now? The solution I commonly encounter in enterprise development is: if it doesn’t perform, push the code into the database. In other words: implement the same thing as a stored procedure, and the database will do the heavy lifting for you (a.k.a. lazy enterprise developer plan B). This approach can work, but it has some disadvantages. Over time, complex business code which is pushed inside the database becomes sort of a black box for other developers: somehow there is this magical thing which gives us the right answers, and before someone else dares touch that thing again, we have to rule out everything else. Another problem is fragmentation: with parts of the logic in code and parts in the database you easily lose overview.
Either way, considering we had 2 in memory datasets already, the idea to push the whole thing into a stored procedure really didn’t sit well with me. I refused to accept I couldn’t make it perform in code. So with the knowledge above I took a minute to sit down and really look at the query. The first thing I did was move one subquery outside the main query. This already shaved off a factor of 3. Nice, but still: 15 seconds is not nearly good enough for a query used to populate a website view. Next I realized one of the where clauses was doing a full scan of the dataset while it could be easily grouped in sets per employee. So I transformed the List of simple objects with (employeeId and assetId in them) into a Dictionary of Lists of assetIds indexed by employeeId, and the query immediately executed in 0.3 seconds. Over 100 times faster than the original, and now acceptable for a webpage.
Abstractions are nice, but sometimes you have to lift the covers.