Basic Category Theory for Programmers, Part 6: Monads

Jan 04, 2024

Let's go back to a question we asked before we got all wrapped up in algebra. What's the deal with flatMap? And what does it have to do with monoids?

We saw that, given an Optional, if we use map with a function which returns an Optional, we end up with a nested Optional. We started with an Optional<String> and ended up with an Optional<Optional<Number>>. If you squint you might be able to see that in a certain sense we're "adding" Optionals.

We used Number and the tuple (Number,Number) as examples when we discussed monoidal categories. But we also mentioned that the theory is very general. The product category holds objects from two other categories, but there is no requirement as to how these categories are related. If Optional<T> is a category, could Optional<Optional<T>> be a product category?

A monoid's multiplication arrow goes from the product category (C,C) to the category C. If we replace C with Optional, multiplication would look like this.

\(Optional\langle Optional\langle T\rangle\rangle → Optional\langle T\rangle\)

What does this look like in real life? An Optional<T> holds either a value of T or a value called none. For Optional<Number> we could represent these two options as Optional(3) and Optional(none). If we embed them in another Optional, the two different cases would look like this, Optional(Optional(3)) and Optional(Optional(none)). But the outer Optional could also contain none, so for a nested Optional, we have a third option, Optional(none).

How can we "flatten" a nested Optional to a single level? What do we have to do to have a function which takes a nested Optional and returns a single level Optional? We first have to see what's inside the outer Optional. If it contains none we we can simply return Optional(none). Otherwise it will contain another Optional. This Optional is already a single level Optional, so we can simply return it. We don't even have to check to see if it has a Number or none. Let's call this function flatten.

Optional<T> func flatten(Optional<Optional<T>> optional) {
  switch(optional) {
    case some(value): return value;
    case none: return Optional(none);
  }
}

We have a similar situation with Array.

\(Array\langle Array\langle T\rangle\rangle → Array\langle T\rangle\)

A nested Array looks very different from a nested Optional. It is an Array of Arrays, such as [[1,2,3],[4,5,6]]. We can't flatten it in the same way that an Optional is flattened. In fact we have many options for how to do this. For example, we could take the first element of each embedded Array and create a new Array with just those values. [[1,2,3],[4, 5, 6]]→[1,4]. We want to preserve as much information as possible so let's do something more sensible. To flatten a nested Array, we'll concatenate all of its embedded Arrays.

\([[1, 2, 3], [4, 5, 6]] → [1, 2, 3, 4, 5, 6]\)

We'll also call this function flatten.

Array<T> func flatten(Array<Array<T>> array) {
  Array<T> newArray = [];
  foreach nestedArray in array {
    newArray = newArray + nestedArray;
  }
  return newArray;
}

We now have two flatten functions, but they behave very differently. They have to take into account the particular way in which the different types are nested. Notice that flattening an Array made use of its monoidal properties. We "added" the nested Arrays to create a new single level Array. This approach wasn't available for Optional because it is not a monoid. There's no way to "add" Optionals and it doesn't have an identity element.

We don't have to stop there. We could take any functor which can be "nested" in some way and provide it with its own flatten function. For example, The Result type could hold a nested Result. This is similar to a nested Optional and it would be helpful to be able to flatten it.

To be able to nest functors, we need to be able to go from one category back to the same category. When we nested an Optional, we went from Optional<T> to Optional<Optional<T>>. We went from one kind of Optional to another kind of Optional. A functor such as this which maps a category back to itself is called an endofunctor.

Let's generalize this discussion to cover any nestable functor. Using angle brackets is getting harder to read, let’s use curly brackets instead. We'll use T for functors, and X for the basic non-functor type it contains, T{X}. Two layers of functors will look like this T{T{X}}. If the basic type isn't important, we could also just write T{T}. Now our monoid multiplication arrow looks like this.

\(T\{T\} → T\)

For Array, this would be equivalent to

\(Array\{Array\} → Array\)

If a functor can be nested and if it has a flatten function, we will be able to apply monoid multiplication to it. Does this mean we have a monoid? If so, it would be a special kind of categorical monoid. A monoid where the elements are endofunctors is called a monad.

An endofunctor with the multiplication function does not guarantee that we have a monad. Remember that a categorical monoid requires two morphisms, 𝛍 and 𝜂, multiplication and unit. So far we have only been considering the multiplication morphism.

Let's look at 𝜂 now. It looks like this, I→M. In our current discussion, M corresponds to T. But what is I? We mentioned that it somehow corresponds to the identity element of regular monoids. When we discussed Array as a monoid we found that the empty Array served as its identity element. But that doesn't help us here because we are no longer talking about Array’s monoidal operation. We're talking about endofunctor monoids in general.

Let's take a closer look at the monoid morphisms from the perspective of endofunctors. First, let's replace all Ms with Ts.

\(𝛍: (T, T) → T\)

\(𝜂: I → T\)

T is an endofunctor. 𝛍 goes from the product category of T to T. Notice that 𝛍 is now going from one functor to another. In the context of monads, 𝛍 is a natural transformation. Since T is a functor, it must mean that I is also a functor. I is called the identity functor.

The identity functor is simply the mapping of a functor back to itself. We can think of it as a functor which remains in place and changes nothing. 𝜂 then takes this functor and maps it to the functor T. In our case this amounts to taking the first functor and nesting it in another layer of the same kind of functor. Let's look at this using our new notation.

\(𝛍: T\{T\} → T\)

\(𝜂: T → T\{T\}\)

We can think of 𝜂 and 𝛍 as natural transformations which allow us to nest and flatten functors.

Keep in mind that 𝜂 isn't just any morphism from T to T{T}. It is the morphism that takes some T and embeds it in another layer of T without changing the original T. Likewise, 𝛍 isn't just any morphism from T{T} to T. It is a morphism which needs to satisfy the monoid rules which we will discuss next.

Applying monadic multiplication looks like this.

\(𝛍( \ T\{T\{X\}\} \ ) = T\{X\}\)

Applying the unit natural transformation looks like this.

\(𝜂( \ X \ ) = T\{X\} \hspace{12pt} or \hspace{12pt} 𝜂( \ T\{X\} \ ) = T\{T\{X\}\}\)

To qualify as a monad, the functor has to obey two rules. The first looks like this.

\(T\{T\{T\}\} \ldots \ 𝛍( \ T\{T\{T\}\} \ ) \ldots \ 𝛍( \ T\{T\} \ ) \ldots \ T\)

\(T\{T\{T\}\} \ldots \ T\{ \ 𝛍( \ T\{T\} \ ) \} \ldots \ 𝛍( \ T\{T\} \ ) \ldots \ T\)

Given a three-layered functor we can flatten it with 𝛍 down to a single layer by either starting at the outer layer or by starting one layer in. Either way we should get the same result.

The second rule says the following must be true.

\(T\{X\} \ldots \ 𝜂( \ T\{X\} \ ) \ldots \ 𝛍( \ T\{T\{X\}\} \ ) \ldots \ T\{X\}\)

\(T\{X\} \ldots \ T\{ \ 𝜂( \ X \ ) \} \ldots \ 𝛍( \ T\{T\{X\}\} \ ) \ldots \ T\{X\}\)

We can appy 𝜂 to T{X} and get T{T{X}} or we apply 𝜂 to the X in T{X} and again we get T{T{X}}. We can then apply 𝛍 to each and this brings us back to a T{X} which is equivalent to the T{X} we started with.

It’s a little hard to see but these rules correspond to the rules for regular monoids.

\((x + y) + z = x + (y + z)\)

\(x + i = x = i + x\)

The first monad rule is a form of associativity. It says that it doesn't matter in what order we flatten functors, similar to how it doesn't matter in what order we add or multiply.

The second rule concerns the identity element. It says that when we add an identity layer to a functor, it doesn't matter whether we do it from the outside or from the inside. The value of the original functor is preserved when we flatten the new functor. This is similar to how it doesn't matter whether the identity element in addition or multiplication comes before or after the second value. In either case it doesn't change the value.

But what's the deal with flatMap? It should be easy to see what's going on now. If we start with an Optional and use map on it with a function that returns another Optional, we end up with a nested Optional. We can use the flatten function to get back to a single-level Optional, and that's exactly what flatMap does. Behind the scenes flatMap applies the map function and then applies flatten.

Notice that flatMap takes a function that goes from some type to some type wrapped in Optional.

\(A → Optional\langle B\rangle\)

This sounds like a job for the unit function! In PL, functors come pre-equipped with the unit function since we need some way to go from a type in PL to the functor type. As we mentioned before, this is usually the functor's constructor or initializer. A function which goes from A to Optional<B> must first transform the A to a B and then put the B in an Optional using the unit function.

If we have a type that somehow contains another type and it comes equipped with a map function, we probably have a functor. To be a functor, the map function must have the behavior we described in the section on functors. If a functor can be nested and it has a flatten function, we might have a monad. To be a monad, the flatten function must also follow the rules for monads.

And what about flatMap? It is built out of monad functions, but in itself it is not part of the definition of a monad. It just offers a convenient way to deal with monads.

This essay covered some of the basic ideas in category theory and examined how they apply to a programming language. But we only touched the surface. We covered just enough material to get a basic understanding of monads. We left out a significant amount of basic concepts in category theory and only hinted at how they can apply to programming. We skipped mathematical rigor for the sake of clarity, hopefully without making too many errors in the process. We hope it shed some light for curious minds and served as an inspiration for further study. Now go forth and find more monoids!

For anyone interested in digging deeper into category theory, the work of Bartosz Milewski1 is an excellent introduction with a focus on programming. For a general introduction I recommend the work of Eugenia Cheng2. In addition to their books, both authors have published extensive and valuable material online. The relevant Wikipedia3 articles were an important resource when putting this essay together.

Many thanks to my colleagues for their feedback on previous versions of this essay. Farangis Akhmedova, Erk Ekin, Denys Garashchenko, Marco Meschini, Vladislav Zagorodnyuk

Bartosz Milewsky. Category Theory for Programmers, 2019.

Eugenia Cheng. The Joy of Abstraction: An Exploration of Math, Category Theory, and Life, 2023.

https://en.wikipedia.org/wiki/Monad_(category_theory)

Jose Villegas

Discussion about this post