Defunctionalization
Or converting recursive stack-unsafe code into stack-safe code
See these resources:
- Don't Think, Just Defunctionalize
- The reasonable effectivness of the ConT monad
- The best refactoring you've never heard of
The rest of this page will do two things:
- Provide a few examples showing the simple but stack-unsafe code and its corresponding stack-safe but complex code.
- Provide general principles to follow to help you figure out how to write stack-safe code for any given situation.
Stack-Safety: Opening Example
The below examples move from simple to more complex. Start with the first one to get a general idea, make sure you understand it well, and only then move on to the next example.
Mapping a list from the last element to the first
Let's say one had to change every element in a List a
from type a
to b
. You're thinking, "That's easy. We'll just use map
." Correct.
But, let's add a new constraint: you must change the elements in a specific order. The first element you need to change is the last one in the list and the last element you should change is the first element in the list. In other words, a list like 4 : 3 : 2 : 1 : Nil
should have 1
changed first and 4
changed last.
"Phwah!" you say. "I'll just use reverse <<< map f <<< reverse
and call it a day!" As easy as that is to write, that will iterate through the list 3 times. Not the most performant thing in the world.
So, you might write something like the following stack-unsafe code which iterates through a list twice, once as it descends down the list, and once as it ascends back up the list while constructing the returned value:
mapLastElemFirst_unsafe :: forall a b. (a -> b) -> List a -> List b
mapLastElemFirst_unsafe f = case _ of
Nil -> Nil
Cons h tail -> do
let newTail = mapLastElemFirst_unsafe f tail -- stack unsafe!
Cons (f h) newTail
While the above code is short and easy to read, it'll also cause a stack-overflow error if you give it a large enough list.
A stack-safe version of the above code might look like this. Unfortunately, it's not as easy to read if you're not familiar with this style of writing. However, it only iterates through the list twice, similar to the stack-unsafe version, and it will never throw a stack-overflow error:
mapLastElemFirst_safe :: forall a b. (a -> b) -> List a -> List b
mapLastElemFirst_safe f ls = go Nil (Left ls)
where
-- To keep track of "where" we are in the data structure, we'll
-- maintain our own stack of what else still needs to be done.
--
-- `Left` values represent items in the tree we haven't yet examined
-- and/or changed.
-- `Right` values represent either the final list or the current state
-- of the final list as we are constructing it.
go :: List a
-> Either (List a) (List b)
-> List b
go stack = case _ of
Left (Cons h tail) ->
-- we've hit the next element in the list
-- remember, we can't modify the value of type `a`
-- represented by `h` by calling `f h` because
-- this might not be the last element.
go (h : stack) (Left tail)
Left Nil ->
-- we've hit the end of the list
-- we can now start consuming the stack we've created
go stack $ Right Nil
Right val ->
-- `val` is either the final list or a portion of that final list
-- because we're still constructing it.
case stack of
-- `a` is the element next closest to the end of the original list.
-- We've already changed all elements after it
-- so we can now map it's type from `a` to `b`.
Cons a rest -> do
let b = f a
go rest $ Right (b : val)
-- the stack is now empty; there's no more elements to map.
-- So, we return the final value
Nil -> val
To see the unsafe version fail and the safe version succeed with the same large input, see the Writing Stack-Safe Code - Part 1.
Zipping two lists together with one in reverse
While the previous example is not the most realistic thing ever, it does set us up for the next twist. Let's say you need to zip two lists together where one has values in the opposite order (e.g. 4 : 3 : 2 : 1 : Nil
) and the other has values in the normal order (e.g. 1 : 2 : 3 : 4 : Nil
) . Your goal is to get the following (Tuple 1 1) : (Tuple 2 2) : (Tuple 3 3) : (Tuple 4 4) : Nil
.
You're right. Calling something like zipOpposingOrder revList normList = zip (reverse revList) normList
might be the easier and better thing to do. Still, you might have some circumstances where a variation of this idea forces you to do things differently. Fortunately, our example above only needs to change slightly.
The stack-unsafe code:
zipOpposingOrder_unsafe :: forall a b. List a -> List b -> Maybe (List (Tuple a b))
zipOpposingOrder_unsafe revList normalList = map snd $ go revList normalList
where
go :: List a -> List b -> Maybe (Tuple (List a) (List (Tuple a b)))
go revRemaining normalRemaining = case normalRemaining of
Nil ->
Just $ Tuple revRemaining Nil
Cons headB tail -> do
-- We're using the Maybe monad here to make this easier to read.
-- A `Nothing` will be produced if the normal list had more elements
-- than reversed one did at this particular level or
-- if either one of the lists did at a deeper level
-- finish zipping the tail first and return the remaining
-- elements from the `revRemaining` list
Tuple newRevRemaining newTail <- go revRemaining tail
-- if the `newRevRemaining` has a value, zip it with this
-- level's value, and return the tail for the parent's
-- computation (if any)
{ head: headA, tail: revTail } <- uncons newRevRemaining
pure $ Tuple revTail $ Cons (Tuple headA headB) newTail
Again, the above code is straight forward, but it'll cause a stack-overflow error if you give it a large enough input.
A stack-safe version of the above code might look like this:
zipOpposingOrder_safe :: forall a b. List a -> List b -> Maybe (List (Tuple a b))
zipOpposingOrder_safe reveredList normalList = go reveredList Nil (Left normalList)
where
-- To keep track of "where" we are in the data structure, we'll
-- maintain our own stack of what else still needs to be done.
--
-- `Left` values represent items in the normal list we haven't yet changed.
-- They will appear in a reversed order when we start consuming them.
--
-- `Right` values will either be the final list or the current state
-- of the final list as it is being built.
go :: List a
-> List b
-> Either (List b) (List (Tuple a b))
-> Maybe (List (Tuple a b))
go revList stack = case _ of
Left (Cons head tail) ->
-- we've hit the next element in the list
-- remember, we can't merge the head value yet
-- because it might not be the last element.
go revList (head : stack) (Left tail)
Left Nil ->
-- we've hit the end of the normal list
-- we can now start consuming the stack we've created
go revList stack $ Right Nil
Right val ->
-- `val` is either the end of the final list (i.e. `Nil`)
-- or the current state of the final list since we're
-- still in the process of creating it.
-- We need to look at the stack to see what to do
case stack of
-- `b` is the element next closest to the end of the normal list
-- so we can now zip it together with the revList's next element
-- (if it exists)
Cons b restB -> case revList of
Cons a restA ->
go restA restB $ Right $ (Tuple a b) : val
Nil ->
Nothing -- more elems in normalList than in reversedList
-- the stack is now empty, so we return the final list
-- (assuming there wasn't any other values left in the reverse list).
Nil -> case revList of
Nil -> Just val
Cons _ _ -> Nothing -- more elems in reversedList than in normalList
To see the unsafe version fail and the safe version succeed with the same large input, see the Writing Stack-Safe Code - Part 2.
Topologically sorting a graph
See topologicalSort
from purescript-graphs
. We'll cover the types first and then explain each step in the computation.
First, SortState
stores 1) the final result
that will be returned, which is either the final list or the current state of that still-being-constructed final list, and 2) the map of not yet visited keys. For this version of an FP graph, the map keys are directed edges that point towards a vertex and all other edges to which the vertex is connected and points. These keys will be removed from the map after they have been visited.
Second, SortStep
indicates two actions via defunctionalization:
Visit
means add the current key (i.e. edge) to the returned list and then add all of the edges coming out of the vertex to the stack, so that they can be visited, too.Emit
means the key has already been visited and doesn't need to be checked again. Rather, add it to the final list that will be returned
Below is a general explanation of the control flow:
- Visit the first smallest key. Indicate that the key should be added to the final list via
Emit
and indicate that all of its relationships should be examined viaVisit
, remove the key from the keys map, then loop. - Visit the first key's first relationship. If it hasn't already been visited, do the same as step 1. If it has been visited, just loop.
- At some point, the first key and all of its relationships will have been visited, and the stack will be a topologically-sorted list in reverse. All of its values will be
Emit key
. For example, it may store,Emit 5 : Emit 4 : Emit 3 : Emit 2 : Emit 1 : Nil
, when the final list should be1 : 2 : 3 : 4 : 5 : Nil
. - Now the stack is reversed, producing the final topologically-sorted list and storing it in
result
. - Since the stack is empty, the
visit
loop stops. We return to thego
loop and find the next smallest key, repeating steps 1 - 5. - At some point, there are no more smallest keys because all keys have been visited. Thus, we get a
Nothing
in thego
loop and we return theresult
, which is now a topologically-sorted list.
Principles
- Write a simpler stack-safe function that isn't performant (e.g.
reverse <<< map f <<< reverse
) - Once you know you need the performance, write a stack-unsafe function first that solves the problem
- Make the function tail-recursive by calling the same function on every possible path except one
- Use defunctionalization to indicate what should happen in each loop.
- Is it easier to read if you use explicit data constructors (e.g.
Visit
andEmit
in the above graph example) or do you just need to distinguish between two states (e.g.Either
in the abovemapLastElemFirs
List examples)?
- Is it easier to read if you use explicit data constructors (e.g.
- Write the code "in order" of how it would execute. First X occurs, then Y, then Z.
- Figure out what the stack should look like.
- Is it a simple
List a
or something more complicated likeList (Either a b)
?
- Is it a simple