Unintuitive Optimization For Performing Paths Union

The Task At Hand

I once had the task of processing whatever vector graphics the user would input into the software, and compute the contour of all shapes from that. That contour would then be sent to a printer. Most of the time users would upload vector art they made in something like Illustrator, and that art could get very complex. In this blog post I’ll take one real SVG I had to work with which consists of about 1981 paths.

Computing the contour comes down to performing union on all shapes. Here’s one basic example that illustrates what union is:

So as you can see, it quite literally means getting the contour of everything, without “cutting” inside any shape. Computing union on paths for the general case is extremely difficult to do robustly. The paths uploaded by users are arbitrary: they can self-intersect, they can have overlapping edges, they can have holes (even nested holes), and so on. Fortunately there are existing solutions for this already, like Skia’s PathOps module and Paper.js. I’ve used Skia for this particular task.

Here is the SVG that I had to deal with:

Coming Up With A Solution

To solve this, Skia gives us two ways to perform path union:

The path builder claims it’s even optimized for doing union on everything, so it seems like it’s exactly what I needed. At first I tried the naive version:

def perform_union_naive(paths):
    if len(paths) == 0:
        return None
    if len(paths) == 1:
        return paths[0]

    result = paths[0]

    for path in paths[1:]:
        new_path = pathops.op(result, path, pathops.PathOp.UNION)
        result = new_path

    return result

So I just take the first path, compute the union with the next path, store the result, and then repeating this with all other paths. I’ve ran this on an M2 Pro chip, and while I expected it to be slow, I didn’t expect it to take more than a minute. I thought maybe I was doing something stupid Python-side but no, things were fast until a certain point (approximately the first 1500 paths), but then everything started getting slower and slower. So it was definitely related to the actual union step.

Next I’ve tried the smarter version which is supposed to be very fast:

builder = pathops.OpBuilder()
for path in paths:
    builder.add(path, pathops.PathOp.UNION)
result = builder.resolve()

Running this took about 23 seconds. While the number of seconds can vary greatly because of the environment, the relevant thing is that now the function is roughly three times faster. While that’s great, it still feels way too slow. There was the possibility of giving up: after all, I’m using the fastest version directly from Skia and it’s still slow. Considering that it’s difficult to do union in general and that Skia has had years of engineering put into it for achieving great performance everywhere, it’s understandable one might conclude that this is just the way it is. But I still wanted to at least try something different, switch things up, see if I could change anything.

Building An Intuition For Why It’s So Slow

Keeping in mind that the union operation is commutative, one trivial “fix” could have been just distributing the work across multiple threads, and then combining the results. That was something I wanted to avoid entirely. It might have improved things a bit, but it wouldn’t have solved the core problem and there’s not even any guarantee it would have been that much faster. Not to mention adding threading where there’s no obvious need for it is already like opening a can of worms.

So what I tried to do next was to better understand how Skia’s path ops work under the hood. There is a great presentation by Cary Clark himself on how everything works. I also read a bit of source code and I have extracted some interesting bits:

Details might be wrong here, I don’t know if operations are actually O(n2). But what I do know is that the performance of the operations depends a lot on the number of curves/segments that each path has. While I can’t change the paths that I receive, what I can change is how exactly I perform the union.

There is one thing about union that is very relevant for us:

In this example the resulting contour is only the square, because the star is fully contained within that. This means that union can reduce the number of segments in the result. It can of course result in more curves in other cases, but the core idea is that when two paths don’t overlap, there is no way for the number of curves to get smaller. And since we’ve seen that the runtime performance of path ops depends a lot on the number of segments, this gives a very good hint for why it’s so slow in our case. When we do union over 1900 times in a row, even just a few segments that haven’t been eliminated early can add a lot of overhead.

How To Make It Faster?

So now we have at least a clue for why everything takes so long to complete. But how to improve this?

Remember that union is commutative. It doesn’t matter in which order we do it. So I thought maybe I can localize paths that don’t reduce the number of segments so that they don’t affect too many other unions. I tried a very simple “divide and conquer” approach where I just perform the union for every N paths (where N=100, chosen arbitrarily), and then combine the results (possibly doing divide and conquer again).

def perform_union_intervals(paths, interval_length=100):
    if len(paths) == 0:
        return None
    if len(paths) == 1:
        return paths[0]

    if len(paths) > interval_length:
        results = []
        iterations = int(len(paths) / interval_length)
        for i in range(iterations + 1):
            start_index = i * interval_length
            end_index = (i + 1) * interval_length
            results.append(perform_union_naive(paths[start_index:end_index]))
        return perform_union_intervals(results)

    return perform_union_naive(paths)

If I use the slowest version for perform_union_naive (the first one presented in this post), the whole thing takes under six seconds to complete. While it’s still somewhat slow, this time it is acceptable and it’s about four times faster than the best result we’ve had until now. And I remind you again that it uses the slowest version, just doing union one-by-one. If I instead use the path builder version, everything takes just a bit over two seconds. That is amazing. But it is unintuitive: I am performing more union operations than before, and yet it’s faster.

Can We Make It Even Faster?

While the result is great as it is, I was still very curious to see if I could make it even faster. The default interval length of 100 was chosen completely arbitrarily, so the first thing I did was to try and test multiple interval lengths. Notice that when I call the function recursively, I don’t pass it the same interval length and I just leave the default on. This is because, when using the path builder, the script would segfault. I didn’t bother debugging it so I have no idea why it does this. But I have tried passing the interval length recursively too (with the slowest version) and it doesn’t really change much in terms of performance.

Trying with different interval lengths lead me to a surprising result: if I choose the length to be 2, the runtime cost is now just over one second. That is even better, but I still wanted to try and see if I can make it even faster.

I tried using a heuristic to sort the paths before performing the actual union. I tried sorting the paths by their number of segments, in reverse order. The theory is that the more complex the paths, the more likely it is they overlap with something else. It’s not always true, of course, but it’s just a heuristic. This failed spectacularly, and the runtime cost is even bigger than just doing the naive union for everything as it is. Note that I only measure the union step, not including the sorting time.

I also had another theory: maybe I can see which paths overlap the most, and sort by that criterion. The paths that overlap the most with others are handled first, and that might eliminate lots of segments very early. Computing path overlaps is not easy, but there is one trick that we can do. While working with paths mathematically is very difficult, we can render them to a buffer a lot more easily (and robustly). So I created a basic renderer that would know, for each pixel, which paths overlap there. Summing every overlap would give the overall overlap for every path. This also failed.

This one is even more unintuitive because this was my understanding for why my optimizations even worked in the first place. Perhaps I’m doing the sorting incorrectly, or maybe I need to pair paths in a smarter way. But it’s getting to the point where this is more of a research project, so I’m not diving deeper into it for now.

Reproducible Example

You are free to try everything I’ve shown here. I’ve created a repo where you can see all the things I’ve implemented.

Conclusion

This was one of the most unintuitive problems I’ve had to work on. I was proven wrong on a lot of intuitions that I had, continuously. While I’m glad I was able to come up with a solution to make everything fast, I am still surprised that I was able to perform union on paths faster than Skia’s default way. I also tried to explain this problem to others, people that don’t necessarily work with graphics stuff, and it was considerably difficult to explain. It still is, but this blog post hopefully helps in giving more context, while being an example of real-world graphics problems that are extremely niche.