Memoization: What, Why, and How
Recently, I spent an hour with one of my protégés demonstrating some common advanced techniques we use at Webflow. One of the techniques I showed them was memoization. Let’s learn what memoization is, why you might use it, and how do we write it from scratch.
Memoization is a technique that enhances a function by creating and using a cache
to store and retrieve results of that function. Memoization uses the arguments of a function to create a key
for the cache
. The first time a memoized function is called with a set of arguments, those arguments are turned into a key
and the result of the function is stored as the value for that key
in the cache
. Then, on subsequent calls of the function with the same arguments, we can retrieve that previously calculated value from the cache
and return it before we ever do the work of calculating the result. This improves the performance of our function, since hitting the cache
is a constant time operation (O(1)
in big O notation).
We can use memoization on any pure function and it’s not too difficult to implement either. Let’s create a higher order function that will allow us to create a memoized version of a pure function.
Typically, we want to use memoization on functions with expensive calculations and we’ll talk about why later, but for the sake of writing our memoize
function, I’m going to use a very simple pure function.
function add(x, y) {
return x + y
}
Can’t get much simpler than an add
function! Since it’s a pure function, we can guarantee that the same inputs always lead to the same output. So if we’re able to create a cache that will use the function arguments as a key and store the results of the function as a value, we can retrieve that value from the cache knowing it’s the same as the result of running the function.
function memoize(fn) {
// We create a cache inside the closure of the `memoize` function
// This creates a new cache for each function we pass
// to `memoize`. We'll discuss other cache strategies later
const cache = {}
// The goal is to return a function with an identical
// signature as the one passed in
return (...args) => {
// We need to create a key for our cache. A simple way to do this is
// to join the args with a delimiter that clearly separates arguments
// This way the arguments (12, 3) don't create the same key as (1, 23)
const key = args.map(JSON.stringify).join('---')
if (cache[key] !== undefined) {
// Just for our example, let's make it clear we returned a value
// from the cache
console.log('from cache')
return cache[key]
}
const result = fn(...args)
cache[key] = result
return result
}
}
Let’s break that down. memoize
receives a function as an argument. It then creates a cache in closure. This will store the results of our function in memory to be retrieved later.
Next, we return a function that can receive whatever args
the original function receives. Inside of the function we are returning, we create a key
for our cache. In this case, I chose to stringify and join the arguments with a delimiter. This delimiter prevents similar arguments from accidentally being the same key. Without a delimiter, calling add(2, 34)
and add(23, 4)
would result in the key
234
. We would end up with a mistaken cache hit for the second call, and even worse, our result would be incorrect.
Next, if there is a defined value at the cache[key]
, we return it and skip calculating the result. Otherwise, we calculate the result
, store it in the cache
, and then return it.
Now, let’s create a memoized add
function and use it.
const memoizedAdd = memoize(add)
Seriously. Adding memoization to a pure function is that simple.
Let’s try it out. I’ve made a component below that allows you to give it two numbers to add together and it will tell you every time the result comes from the cache instead of being calculated. Try swapping the X
and Y
values and see that it doesn’t hit the cache
because the key
s generated are different.
Why Use Memoization?
Memoization is a tradeoff exchanging space (memory) for time. The purpose of this tradeoff is to gain a performance benefit by skipping the calculation of the function’s result, i.e. save time. Thus, it only makes sense to use memoization where memory is not a concern and the original calculation is both expensive and often called with the same arguments. Our add
function was a bad example. It’s not an expensive calculation at all. Let’s create a slightly better example.
I’ve used this example before in my writing, but a factorial
function might be a good candidate for memoization. It’s recursive, which means that factorial
will be called with the same argument frequently. In fact, if it’s ever been called with a number greater than the current argument, it will simply return the value from the cache, since it’s already been calculated before. Let’s write the factorial function.
const factorial = memoize(n => {
if (n === 0) {
return 1
}
return n * factorial(n - 1)
})
Below, I’ve added a component that demonstrates how this works. With this component, every result of factorial
is cached and I display the cache as it grows. Notice the cache doesn’t change when you call a number less than a previous argument because no new values were added to the cache and the value was immediately retrieved and returned.
Memoization and React
I recognized a few years ago that React function components could be memoized. Of course, I wasn’t forward thinking enough to recognize that this was a feature coming to React in the future. That said, memoizing in React is quite a bit different than memoizing a function like we have thus far. Rather than creating a large cache of many results, we can think of the caches in React’s memoization as a cache of a single result.
React now ships with two memoization mechanisms. The first, React.memo
, is for memoizing function components as I just described. If the props
don’t change, the component won’t rerender. This is the function equivalent of using the PureComponent
class or implementing a shallow props check of this.props === prevProps
in the shouldComponentUpdate
lifecycle. I’d like to discuss React.memo
further in a future post, so I’ll save that conversation for there.
React.useMemo
, the hook, is the other memoizing function. We give React.useMemo
a “create” function that it will call and cache the returned value. However, we don’t give this function any arguments. Instead, we use values within the scope of the component, various props
and state
to derive our value. Then, our useMemo
hook only recalculates the value if any of these dependencies change.
Both of these memoization mechanisms are quite different than the technique I presented earlier. Think of them the differences this way. Our memoize
function is designed to have a cache that continues to grow as more results are generated. We haven’t limited our cache in anyway (though we could). React, on the other hand, has given you a cache, but it’s precisely the size of one result. When the values that achieved that result are changed, then the function is run again and the cache of one is set with a new value. This is perfect for preventing extraneous rerenders and it’s good to understand these differences.
Memoization and Other Caches
In my memoize
function, I used a plain old JavaScript object to create key/value pairs. This required that I turn the arguments into a string to act as a key. There might be occasions where this is impractical or unnecessary. It might be useful to make your cache using a Map
or a WeakMap
instead.
A Map
can essentially use any value as a key, including a function. A WeakMap
can only use objects as keys, but allows for garbage collection of those objects. Both can be useful in the right situation.
We use a memoizing function at Webflow that makes use of a WeakMap
cache. The gist of it is that because we make updates to objects immutably, we can guarantee that two objects that are referentially equal are also deeply equal. We use these objects as keys in a WeakMap
cache, and use their reference to retrieve the value. This requires understanding mutations and immutability, but greatly improves performance in some of the areas of our app. I encourage you to do some further research on your own to understand different caching strategies for memoization.
Conclusion
Memoization is a trade off of space for time. It’s used to cache the result of a function so that the next time it’s called with the same arguments, we can just return the answer. We don’t need to calculate it again. This technique is useful when we have an expensive calculation that will be called frequently with the same arguments. It’s used more often than you might realize and is a useful tool to have in your tool belt.