Ruby — Hashes and Mutable Default Values

Ruby’s Hash object is an associative data structure used to store key-value pairs. Many languages have objects that serve a similar purpose: Python has dictionaries, JavaScript has Maps, Java has HashMaps, and so on. Ruby’s Hash object is pretty simple to use.

my_hash = {}
my_hash[0] = 'Hello!'
puts my_hash[0]
#=> Hello!

This works just fine. We can also change the value associated with a key that already exists in the hash.

my_hash[0] << "I've been mutated!"
puts my_hash[0]
#=> Hello! I've been mutated!

The [] operator is just syntactic sugar for the #[] method defined in the Hash class. Here is the description from the Ruby Docs:

hsh[key] → value

Retrieves the value object corresponding to the key object. If not found, returns the default value.

The last sentence in this description deserves a bit more explanation.

my_hash = {}
my_hash[:exists] = 0
my_hash[:exists] += 1
puts my_hash[:exists]
#=> 1
my_hash[:does_not_exist] += 1
#=> NoMethodError (undefined method `+' for nil:NilClass)

Hashes have a default value, which is the value returned when accessing keys that do not exist in the hash. In the code above, we are attempting to access the key :does_not_exist and then call the #+ method on the value associated with that key. The error message tells us that there is no #+ method defined for the value that is returned by accessing this key. This is because hashes initialized using the code hash_name = {} will have their default value set to nil.

p my_hash[:does_not_exist]
#=> nil

We can set the default value ourselves by using the Hash::new method:

my_hash = Hash.new(0)
my_hash[:does_not_exist] += 1
puts my_hash[:does_not_exist]
#=> 1

None of this is particularly mind-blowing, but what happens if we try to run this code:

my_hash = Hash.new([])
my_hash[:does_not_exist] << 'Uh oh.'
p my_hash
#=> {}
p my_hash[:does_not_exist]
#=> ["Uh oh."]

Wait a minute. The p method uses the #inspect method to print a human-readable representation of the object passed in. So you may have expected to see {:does_not_exist=>["Uh oh."]}, but instead it looks like the hash is still empty. But if we retrieve the value associated with the :does_not_exist symbol (which, as its name implies, does not exist), we get the value we expected.

p my_hash[:does_not_exist_either]
#=> ["Uh oh."]

Alright, this is weird. It looks like the hash has key-value pairs that we didn’t even specify.

There is actually a reasonable explanation for this, but it requires careful attention to the description of the Hash::new method in the Ruby Docs.

new → new_hash

new(obj) → new_hash

Returns a new, empty hash. If this hash is subsequently accessed by a key that doesn’t correspond to a hash entry, the value returned depends on the style of new used to create the hash. In the first form, the access returns nil. If obj is specified, this single object will be used for all default values.

The last sentence is key. When we initialize a hash using Hash.new([]), we are passing in an array object. This single object will be used as the default value for the hash. We can prove this by printing a few object ids.

my_hash = Hash.new([])
my_hash[0] << 1
puts my_hash[0].object_id
#=> 180
puts my_hash[:does_not_exist].object_id
#=> 180

When we use the << operator on the second line of this code, we aren’t mutating the hash at all. In fact, we are mutating the default value that the Hash#[] method returns when we attempt to access a key that doesn’t exist. Think carefully about what is happening in this code:

Line 1: We invoke the Hash::new method, passing in an empty array object as the argument. This returns a new hash object which will use the empty array object as the default value.

Line 2: We use the [] operator on the hash, which is syntactic sugar for the Hash#[] method. We pass in the integer 0 as the argument for this method. Because there is no key with the value 0 in our hash, the method returns the default value (our single empty array object). We then use the << operator to invoke the Array#<< method to mutate that empty array object, not the hash! We pass an integer with value 1 into the #<< method, which appends that integer onto the end of our formerly empty array. The key here is that our hash is left unchanged by this code.

Consider this code:

my_array = []
my_hash = Hash.new(my_array)
my_hash[0] << 1
p my_hash
#=> {}
p my_array
#=> [1]
p my_array.object_id
#=> 200
p my_hash[:does_not_exist].object_id
#=> 200

This code is fundamentally the same as the previous code snippet, except this should make it clear that it is our default value that is being mutated, not the hash.

What we’ve discovered here is not unique to arrays as hash values. This same concept applies for all mutable objects.

my_string = ''
my_hash = Hash.new(my_string)
my_hash[0] << 'nope.'
p my_hash
#=> {}
p my_string
#=> "nope."

So, is there another way to make this work? Can we use a mutable object as a default value and have it work as we originally expected? To answer this question, we have to take another look at the Ruby Docs for the Hash::new method.

…If a block is specified, it will be called with the hash object and the key, and should return the default value. It is the block’s responsibility to store the value in the hash if required.

Alright, so we can pass a block into the Hash::new method. That means we can change our code to look like this:

my_hash = Hash.new { |hash, key| hash[key] = [] }
my_hash[0] << 'This works!'
my_hash[1] << 'This works too!'
p my_hash
#=> {0=>["This works!"], 1=>["This works too!"]}
p my_hash[0].object_id
#=> 220
p my_hash[1].object_id
#=> 240

So what is happening here? On line 1, we are again invoking the Hash::new method. But this time, we are passing in a block instead of an array object. As the docs state, when we attempt to access a key that does not exist this block will be called. The hash object and the key we are trying to access will be passed as arguments into the block. Within the block, we will associate a new unique array object with the object assigned to key. The return value of the Hash#[]= operator is the value used for the assignment, so our block will also return that same value. This fulfills the obligation to return the default value from the block, as stated in the description of the Hash::new method.

There is an arguably simpler solution. We don’t have to pass a block into the Hash::new method if we just don’t try to mutate the default value.

my_hash = Hash.new([])
my_hash[:does_not_exist] += ['Uh oh.']
p my_hash
#=> {:does_not_exist=>["Uh oh."]

Pay close attention to the operators used on line 2. Instead of mutating the array with the << operator, we use the += operator to construct a new string and then assign it to the key :does_not_exist. Our default value is left alone and our hash is correctly mutated.

I hope that you found this helpful, or at least somewhat interesting. I ran into this problem and was pretty confused about what was going on. Figuring out why the code wasn’t doing what I expected was a great learning opportunity and reasoning through it was a great opportunity to describe, in detail, exactly what the code I had written was doing. I would like to credit Andrew Marshall for a very helpful explanation on this topic posted on Stack Overflow a few years ago.