June 11, 2020

Memoization: What, Why, and How

Trading Space for Time
edit

Recently, I spent an hour with one of my protégés demonstrating some common advanced techniques we use at Webflow. One of the techniques I showed them was memoization. Let’s learn what memoization is, why you might use it, and how do we write it from scratch.

Memoization is a technique that enhances a function by creating and using a cache to store and retrieve results of that function. Memoization uses the arguments of a function to create a key for the cache. The first time a memoized function is called with a set of arguments, those arguments are turned into a key and the result of the function is stored as the value for that key in the cache. Then, on subsequent calls of the function with the same arguments, we can retrieve that previously calculated value from the cache and return it before we ever do the work of calculating the result. This improves the performance of our function, since hitting the cache is a constant time operation (O(1) in big O notation).

We can use memoization on any pure function and it’s not too difficult to implement either. Let’s create a higher order function that will allow us to create a memoized version of a pure function.

Typically, we want to use memoization on functions with expensive calculations and we’ll talk about why later, but for the sake of writing our memoize function, I’m going to use a very simple pure function.

function add(x, y) {
  return x + y
}

Can’t get much simpler than an add function! Since it’s a pure function, we can guarantee that the same inputs always lead to the same output. So if we’re able to create a cache that will use the function arguments as a key and store the results of the function as a value, we can retrieve that value from the cache knowing it’s the same as the result of running the function.

function memoize(fn) {
  // We create a cache inside the closure of the `memoize` function
  // This creates a new cache for each function we pass
  // to `memoize`. We'll discuss other cache strategies later
  const cache = {}

  // The goal is to return a function with an identical
  // signature as the one passed in
  return (...args) => {
    // We need to create a key for our cache. A simple way to do this is
    // to join the args with a delimiter that clearly separates arguments
    // This way the arguments (12, 3) don't create the same key as (1, 23)
    const key = args.map(JSON.stringify).join('---')

    if (cache[key] !== undefined) {
      // Just for our example, let's make it clear we returned a value
      // from the cache
      console.log('from cache')
      return cache[key]
    }

    const result = fn(...args)
    cache[key] = result

    return result
  }
}

Let’s break that down. memoize receives a function as an argument. It then creates a cache in closure. This will store the results of our function in memory to be retrieved later.

Next, we return a function that can receive whatever args the original function receives. Inside of the function we are returning, we create a key for our cache. In this case, I chose to stringify and join the arguments with a delimiter. This delimiter prevents similar arguments from accidentally being the same key. Without a delimiter, calling add(2, 34) and add(23, 4) would result in the key 234. We would end up with a mistaken cache hit for the second call, and even worse, our result would be incorrect.

Next, if there is a defined value at the cache[key], we return it and skip calculating the result. Otherwise, we calculate the result, store it in the cache, and then return it.

Now, let’s create a memoized add function and use it.

const memoizedAdd = memoize(add)

Seriously. Adding memoization to a pure function is that simple.

Let’s try it out. I’ve made a component below that allows you to give it two numbers to add together and it will tell you every time the result comes from the cache instead of being calculated. Try swapping the X and Y values and see that it doesn’t hit the cache because the keys generated are different.

Why Use Memoization?

Memoization is a tradeoff exchanging space (memory) for time. The purpose of this tradeoff is to gain a performance benefit by skipping the calculation of the function’s result, i.e. save time. Thus, it only makes sense to use memoization where memory is not a concern and the original calculation is both expensive and often called with the same arguments. Our add function was a bad example. It’s not an expensive calculation at all. Let’s create a slightly better example.

I’ve used this example before in my writing, but a factorial function might be a good candidate for memoization. It’s recursive, which means that factorial will be called with the same argument frequently. In fact, if it’s ever been called with a number greater than the current argument, it will simply return the value from the cache, since it’s already been calculated before. Let’s write the factorial function.

const factorial = memoize(n => {
  if (n === 0) {
    return 1
  }

  return n * factorial(n - 1)
})

Below, I’ve added a component that demonstrates how this works. With this component, every result of factorial is cached and I display the cache as it grows. Notice the cache doesn’t change when you call a number less than a previous argument because no new values were added to the cache and the value was immediately retrieved and returned.

Memoization and React

I recognized a few years ago that React function components could be memoized. Of course, I wasn’t forward thinking enough to recognize that this was a feature coming to React in the future. That said, memoizing in React is quite a bit different than memoizing a function like we have thus far. Rather than creating a large cache of many results, we can think of the caches in React’s memoization as a cache of a single result.

React now ships with two memoization mechanisms. The first, React.memo, is for memoizing function components as I just described. If the props don’t change, the component won’t rerender. This is the function equivalent of using the PureComponent class or implementing a shallow props check of this.props === prevProps in the shouldComponentUpdate lifecycle. I’d like to discuss React.memo further in a future post, so I’ll save that conversation for there.

React.useMemo, the hook, is the other memoizing function. We give React.useMemo a “create” function that it will call and cache the returned value. However, we don’t give this function any arguments. Instead, we use values within the scope of the component, various props and state to derive our value. Then, our useMemo hook only recalculates the value if any of these dependencies change.

Both of these memoization mechanisms are quite different than the technique I presented earlier. Think of them the differences this way. Our memoize function is designed to have a cache that continues to grow as more results are generated. We haven’t limited our cache in anyway (though we could). React, on the other hand, has given you a cache, but it’s precisely the size of one result. When the values that achieved that result are changed, then the function is run again and the cache of one is set with a new value. This is perfect for preventing extraneous rerenders and it’s good to understand these differences.

Memoization and Other Caches

In my memoize function, I used a plain old JavaScript object to create key/value pairs. This required that I turn the arguments into a string to act as a key. There might be occasions where this is impractical or unnecessary. It might be useful to make your cache using a Map or a WeakMap instead.

A Map can essentially use any value as a key, including a function. A WeakMap can only use objects as keys, but allows for garbage collection of those objects. Both can be useful in the right situation.

We use a memoizing function at Webflow that makes use of a WeakMap cache. The gist of it is that because we make updates to objects immutably, we can guarantee that two objects that are referentially equal are also deeply equal. We use these objects as keys in a WeakMap cache, and use their reference to retrieve the value. This requires understanding mutations and immutability, but greatly improves performance in some of the areas of our app. I encourage you to do some further research on your own to understand different caching strategies for memoization.

Conclusion

Memoization is a trade off of space for time. It’s used to cache the result of a function so that the next time it’s called with the same arguments, we can just return the answer. We don’t need to calculate it again. This technique is useful when we have an expensive calculation that will be called frequently with the same arguments. It’s used more often than you might realize and is a useful tool to have in your tool belt.


Liked the post?
Give the author a dopamine boost with a few "beard strokes". Click the beard up to 50 times to show your appreciation.
Need help with your software problems?

My team and I are ready to help you. Hire Agathist to build your next great project or to improve one of your existing ones.

Get in touch
Kyle Shevlin's face, which is mostly a beard with eyes

Kyle Shevlin is the founder & lead software engineer of Agathist, a software development firm with a mission to build good software with good people.

Agathist
Good software by good people.
Visit https://agath.ist to learn more
Sign up for my newsletter
Let's chat some more about TypeScript, React, and frontend web development. Unsubscribe at any time.
Logo for Just Enough Functional Programming
Just Enough Functional Programming
Check out my courses!
If you enjoy my posts, you might enjoy my courses, too. Click the button to view the course or go to Courses for more information.