How we trick Bundler and cause interesting bugs
“So, uhm, there’s this process on our server that always starts to leak memory like no tomorrow and then gets restarted because it blows the process limits. Any idea?”
I want you to try something right now:
- Start your favourite process monitor (htop works great, but Activity Monitor on a Mac would work as well) and sort the processes by used memory.
- Open up irb and type the following code in and excute it (you don’t need to really understand what’s going on right now, I’ll explain later):
( * 26).combination(13).to_a
- Watch your process monitor (and make sure you CTRL-C the Ruby process before you run out of memory)
What would you say if I told you that before Bundler 1.16, it wasn’t actually that hard to trigger similar behavior by running a slightly odd bundle update?
Let’s do some math
Fundamentally, this is a math problem.
Array#combination returns, as the name suggests, all combinations of all elements of the incoming array for a given length. I’m really bad at explaining this, I know, so let’s take a look at an example:
[1,2,3,4].combination(2).to_a => [[1, 2], [1, 3], [1, 4], [2, 3], [2, 4], [3, 4]]
I’m sure you’re furiously calculating in your head right now how many elements the code from above would yield. No need to break out the old trusty TI though,
Array#combination returns an iterator and we can ask it for its size:
( * 26).combination(13).size => 10400600
Yes, that’s about 10 million arrays. Each with 13 entries. Plus the array content. Actually, scratch that. For one, we’re dealing with Fixnum’s here which don’t take up additional space on top of the array structure. And second, even if we would store full blown, heavy Ruby objects in the array, even after all the combination, the millions of arrays would still point to the same 26 objects. In the grand scheme of things, we can actually ignore the contents.
So, how much space do the arrays consume? Arrays in Ruby are not the same as C arrays which are in essence a block of memory - Ruby arrays are a much more flexible data structure and therefore have vastly different characteristics when it comes to memory consumption. In fact, still ignoring the actual contents of the arrays, we can come up with a simple formula (valid for all arrays with n > 3):
(n - 4) * 8 + 72
If you wonder how I came with that formula: If you use
ObjectSpace.memsize_of, you can see that Arrays up to
n = 3 consume 40 bytes of memory (meaning, they are held in one Ruby object record) and for
n = 4 the amount of memory is 72 bytes, after which each additional item adds 8 bytes.
That would be 144 bytes for each of the combination arrays, which sums up to a little less than 1.5 GB for those arrays. Let’s add 80 MB (!) for the array of arrays, but that’s hardly noticeable. Of course this is only for the end result - when I finally let the code run until it finished, the Ruby process claimed about 2,8 GB of memory in total. Wether that’s due to the way the combination algorithm works or anything else, I can’t say.
That’s a pretty massive amount of memory for lots and lots of references to a humble Fixnum.
How about objects?
If we store real objects instead, like I said, memory wise nothing much changes. Interestingly, though, the performance characteristics change. On my machine, the following code takes more than double the CPU time to finish than the initial example:
To sum this up:
Array#combination with large arrays ➡️ bad idea.
Okay, but remind me again what this was all about? Ah, right, I remember. Process restarts.
The real issue
If you’ve ever run stuff on platforms like Heroku, sooner or later you probably ran into some of their limitations. One of them is memory. A Heroku Dyno in their cheapest offering has a whopping 512 MB of memory to run in. (The nostalgic 80’s kid in me chuckles hysterically - I believe my maxed out Amiga A1200 had 6 MB on board) It start’s swapping if you use more than 512 MB, but at 1 GB Heroku will kill your process.
What was happening was that one of our processes that runs the actual bundle update for Depfu frequently got shot down by the Heroku sheriff. Looking at the logs, we quickly figured out that it was one specific update causing these issues. This made it easy to reproduce locally with Bundler. And sure enough, the process went up to several gigabytes. By then stopping the process and looking at the stack trace at the time of the interruption I was able to find the rough area of the problem in the Bundler source code. Mind: this was only possible because the problem actually lies in a single method call, otherwise the place you come out at with CTRL-C is rather random.
The issue was, you guessed it: a call to
Array#combination. With an input array of 26 elements, all of which can be found out easily by just adding a couple of
putses to the source file.
What is the code about?
Here it gets really interesting. This piece of code tries to reduce down the list of dependency conflicts to something readable after Bundler fails to resolve the given dependencies. And normally, that works really well. It’s probably not a very efficient (in terms of memory usage and runtime) algorithm, but usually, the number of dependency conflicts is rather small when running bundle update.
And here our own Depfu magic comes into play. Because if you run bundle update, you can’t tell Bundler which version of the gem you want to install. It simply takes the latest version available that both matches your version specification and is resolvable with all the other dependencies.
Since we want to make sure that the update will contain that specific version, we use a dirty little trick: We patch the Gemfile to use a specific version, run the
bundle update, so that Bundler is forced to try to use that version and then, after the Gemfile.lock has been updated, revert that change on the Gemfile.
Before and after:
gem 'arbitrary', '~> 1.2'
During our Bundler run:
gem 'arbitrary', '= 1.2.5'
In the case of the specific update mentioned above, we tried to update a dependency to a version that had ActiveSupport as dependency, in a higher version than the Rails version currently in use. If you’ve ever looked at the full dependency tree of a large Rails application you can probably see where this is going: Lots and lots of dependency conflicts between the updated gem and almost every other vaguely Rails related gem.
In total, in this case, 26 dependency conflicts arrived at the doorstep of our neat little
Array#combination engine. And blew everything up.
Let’s fix it
When looking at that data, I noticed that it contained what looked like a lot of duplicates. A very simple run over the array to make all elements unique already brought these 26 down to 12, which brings down the maximum number of arrays after
Array#combination to a much more manageable 1716.
The first thing I did was to monkey patch Bundler on our own systems to do the deduplication and also to add a bail out clause that would simply skip the reduction code if the incoming array is larger than a certain size. The result would be a slightly longer, slightly more awkward to read error message, but that’s an acceptable compromise when the alternative is to blow all limits.
I also filed a bug with Bundler, of course, and basically my two fixes are also implemented now, with a patch to the Bundler dependency resolver Molinillo making sure that the list of dependency conflicts is already deduplicated when fed into the reduction code and my patch to the reduction code that will bail out.