C# MultiMap data structure

During my quest to improve performance of my home build ORM (DynamicModel) , I noticed that querying with eager loading enabled was quite slow compared to other ORM frameworks.

The lazy loading strategy used in DynamicModel is the same as in many ORMs. Simple but effective:

  • Load all parent objects (‘one’ end of a one-to-many association)
  • Load all child objects (‘many’ end of a one-to-many association)
  • For every parent entity, link associated child entities to it

After retrieving entities from the database and storing them in memory, I used an in-memory Linq join query to attach childs entity to their parents. Thanks to some Linq magic, this resulted in compact and readable code. But this code also proved to be the root cause of my performance issue…

The solution lies in finding an efficient data structure to store key/value pairs that provides fast key lookup time. I hear you think: that sounds like… a Dictionary<T>!

Well, almost. For every key in our datastructure (primary key in the parent table), we want to return a collection of value objects (child rows with a foreign key value equal its parent primary key), instead of just one.

Because I focused too much on standard .NET data types, I simply forgot about a very useful, but not build in, data structure which is perfect for this task: a MultiMap.

A MultiMap is a generalization of a map (or Dictionary class C#) in which more than one value may be associated with and returned for a given key. The base .NET class library does not provide a MultiMap. But I found a simple base implementation for C# on dotnetperls. It uses a dictionary as underlying data structure, so lookups are close to O(1) (see MSDN).

The MultiMap class is declared as:

public class MultiMap<T, V>

You can download the full source here.

Using a MultiMap we can efficiently store child rows indexed by their foreign key value. Than, for every parent we can do a fast lookup in the map with primary key of parent row as dictionary key. The result is a list of children for that parent.

Let’s look at an example with the Chinook database.
Using the old (group join) method, an eager loading query might look something like this:

[Test]
public void GroupJoin()
{
    DmContext context = DmTest.mscontext;
    var q = from a in context.FindAll<Album>()
            join t in context.FindAll<Track>()
            on a.AlbumId equals t.AlbumId into tracks
            select new Album {
                Title = a.Title,
                Tracks = tracks.ToList()
            };
    foreach (var album in q)
    {
        Console.WriteLine(album.Title);
        foreach (Track track in album.Tracks)
        {
            Console.WriteLine(track.Name);
        }
    }
}

The new (multimap) method:

[Test]
public void Multimap()
{
    DmContext context = DmTest.mscontext;
    MultiMap<int, Track> map = new MultiMap<int, Track>();
    foreach (var t in context.FindAll<Track>())
        map.Add(t.AlbumId, t);
    var q = from a in context.FindAll<Album>()
            select new Album
            {
                Title = a.Title,
                Tracks = map[a.AlbumId]
            };
    foreach (var album in q)
    {
        Console.WriteLine(album.Title);
        foreach (Track track in album.Tracks)
        {
            Console.WriteLine(track.Name);
        }
    }                                          
}

I tested both methods using 5000 entities generated in memory (so I took database access out of the equation), and only measured the time it took to fill the MultiMap and to execute the Linq query that links parents to children (in the examples above that would be q). The performance improvement is significant!

Method Time
MultiMap ~ 8 ms.
Group join ~ 124 ms.

5 Responses to “C# MultiMap data structure”

  1. James Curran says:

    However, by remving the database access from the equation, you also removed the database query engine, and replaced it with a less optimized in-memory search.

    Database engines highly optimize queries like this.

    1. admin says:

      Yes, I left database access out of the equation; because that was not the bottleneck.
      Of course, I could have used a join query to retrieve all rows in one SQL statement, but I still would have to build the object graph in memory for my ORM from the returned rows. And that (building object graph) was the part I was trying to optimize.

      Unless of course you’re talking about object-relational layers on top of database engines in general. That’s a whole different discussion….

  2. Excellent site you have here.. It’s difficult to find high-quality writing like yours these days. I honestly appreciate people like you! Take care!!

  3. Liberty says:

    Hey, that’s the grttaese! So with ll this brain power AWHFY?

  1. C# MultiMap data structure…

    Thank you for submitting this cool story – Trackback from DotNetShoutout…

Leave a Reply





Human Verification