A functional interpretation of state

The challenge

Interpreting state in Scheme from a purely functional perspective:

We will illustrate how imperative algorithms in languages like Scheme may be trivially translated into equivalent, purely functional code. In this sense, Scheme and ML are no less "pure" than Haskell, which provides the IO monad for achieving the same goal.

A solution

Mutable state can be simulated in a functional language by using a state transformer monad. We will use a state transformer monad to provide a functional interpretation of existing imperative features of Scheme. The state in this case includes a store for modeling assignable variables.

In principle, *all* variables in Scheme can be thought of using a store model. However, since usage of the store imposes a sequential order on the execution of a program, this needlessly complicates compiler optimization and equational reasoning. A better solution is to model only *assignable* variables in the spirit of Felleisen and Hieb (The Revised Report on the Syntactic Theories of Sequential Control and State). This provides a more economical representation affecting only the imperative part and leaving the functional fragment alone.

The translation

Using the monad operations defined below, the Scheme code fragment
(define (f x) 
  (set! x 10) 
  x)

(f 4)     
may be regarded as syntactic sugar for the purely functional
(define (f x)
  (do (!x <- (bind x)) 
      (set !x 10)
      (get !x)))

(run (f 4))
which is an abbreviation for
(define (f x)
  (lambda (env label)
    (let*-values (((env*  label*  !x)    ((bind x) env label))
                  ((env** label** dummy) ((set !x 10) env* label*)))
      ((get !x) env** label**))))
     
((f 4) '() 0)
Here we can explicitly see how the store is linearly threaded through the calculation (the store is modeled as an association list and the label argument is a counter used for assigning unique locations to assignable variables).

Note that now the code is purely functional and referentially transparent. Note also that this fact alone does not necessarily make the code easier to reason about.

As another example, consider the simple program

(define (counter n)
  (lambda ()
    (set! n (+ n 1))
    n))

(let ((count (counter 1)))
  (count)
  (count))                     
This may be transformed to the purely functional program
(define (counter n)
  (do (!n <- (bind n))               
      (return (lambda ()
                (do (n <- (get !n))
                    (set !n (+ n 1))
                    (get !n))))))

(run (do (count <- (counter 1))
         (count)
         (count)))
The translation involves: In the above, the store is hidden in the monad abstraction defined by do, return and run. Also, note that we could not have written (set !n (+ (get !n) 1)) because (get !n) does not evaluate to an integer value but rather to a state transformer in the monad. This forces us to write the transformed procedure in monadic style instead of the direct style of the original fragment.

However, if we impose the convention that the evaluation order in the monadic fragment should be what call-by-value evaluation would give in the original fragment, the translation is unambiguous. For example,

(let ((count (counter 9)))
  (count)
  (add1 (count)))
becomes
(run (do (count  <- (counter 9))
         (count)
         (c <- (count))
         (return (add1 c)))) 
In this example, note that the purely functional procedure add1 is used without modification, since its arguments are not assignable. It can therefore be separately understood without any reference to the monad.

Implementation

Monadic primitives for manipulating the store are defined as follows:
; Type env = list ((label . value))
; Type label = int

; Environment transformer.  The second argument is the label
; to be used for the next assignable variable.

; Type env-transformer a = env label -> env label a  

; return : a -> env-transformer a

(define (return a) (lambda (env label) (values env label a)))

(define-syntax do
  (syntax-rules (<-)
    ((do result) result)
    ((do (a <- m) exp)
     (lambda (env label) (let-values (((env* label* a) (m env label)))
                           (exp env* label*))))
    ((do m exp)
     (lambda (env label) (let-values (((env* label* dummy) (m env label)))
                           (exp env* label*))))
    ((do exp1 exp2 ...) 
     (do exp1 (do exp2 ...)))))

; run : env-transformer -> value

(define (run m)
  (let-values (((env label res) (m '() 0)))
    res))

; bind : value -> env-transformer

(define (bind x)
  (lambda (env label)
    (values (cons (cons label x) env) 
            (+ label 1)
            label)))
  
; get : label -> env-transformer

(define (get label)
  (lambda (env label*)
    (cond ((assq label env) => (lambda (pair) (values env label* (cdr pair))))
          (else 'error-get-unbound))))

; set : label value -> env-transformer

(define (set x y)
  (lambda (env label)
    (match env
      (() (error "error-set-unbound"))
      ((pair . pairs*)  
       (cond ((eq? x (car pair)) (values (cons (cons x y) pairs*) 
                                         label 
                                         'void))
             (else (let-values (((env* label* dummy) ((set x y) pairs* label)))
                     (values (cons pair env*) 
                             label* 
                             'void))))))))
The above code does a trivial form of "garbage collection" (the run procedure discards the store and returns just the result). Implementing a more sophisticated algorithm would be nontrivial, and is not necessary for reasoning about the evaluation semantics. Note that it would be easy to add a facility for explicitly discarding references as in C.

Some examples:

(define (counter n)
  (lambda ()
    (set! n (+ n 1))
    n))
translates to
(define (counter n)
  (do (!n <- (bind n))
      (return (lambda ()
                (do (a <- (get !n))
                    (set !n (+ a 1))
                    (get !n))))))
Then
(let ((count (counter 7)))
  (count)
  (count))                     
translates to
(run (do (count <- (counter 7))
         (count)
         (count)))
                     ;===> answer : 9
                     ;     env    : ((0 . 9))
                     ;     label  : 1
Similarly
(let* ((count  (counter 7))
       (count* (counter 7)))
  (count)
  (count*))
translates to
(run (do (count  <- (counter 7))
         (count* <- (counter 7))
         (count)
         (count*)))
                     ;===> answer : 8
                     ;     env    : ((1 . 8) (0 . 8))
                     ;     label  : 2

What about mutable pairs?

Mutable pairs can be accomodated in the above framework by implementing (mutable-cons x y) in analogy with the above counter example as follows:
(define (mutable-cons x y)
  (lambda (msg) 
    (case (msg)
      ((set-car!) (lambda (x*) (set! x x*)))
      ((car)      (lambda () x))
      ...


(define (set-car! c x) ((c 'set!) x))
...
which may then be translated to the monadic representation as above.

Unfortunately Scheme does not explicitly distinguish between mutable and immutable pairs. This makes it impossible to guarantee in general that the result of a purely functional procedure using cons will not be mutated by its caller. As a consequence, the built-in cons has to be mutable by default, and reasoning about cons in the functional interpretation always involves something equivalent to the above monad.

However, an immutable version of cons, which does not involve the store for its interpretation (its arguments are not assignable parameters), can easily be defined as

(define (immutable-cons x y)
  (lambda (matcher) 
    (matcher x y)))
      
(define (immutable-car c) (c (lambda (x y) x)))
(define (immutable-cdr c) (c (lambda (x y) y)))

Dependencies

The above code relies on the following macro for pattern matching on lists:
  (define-syntax match
    (syntax-rules ()
      ((match exp 
         (()      exp1)
         ((h . t) exp2))
       (let ((lst exp))
         (cond ((null? lst) exp1)
               ((pair? lst) (let ((h (car lst))
                                  (t (cdr lst)))
                              exp2))
               (else 'match-error))))))
It also relies on the let-values macro from SRFI-11:
(define-syntax let-values
  (syntax-rules ()
    ((let-values (?binding ...) ?body0 ?body1 ...)
     (let-values "bind" (?binding ...) () (begin ?body0 ?body1 ...)))
    
    ((let-values "bind" () ?tmps ?body)
     (let ?tmps ?body))
    
    ((let-values "bind" ((?b0 ?e0) ?binding ...) ?tmps ?body)
     (let-values "mktmp" ?b0 ?e0 () (?binding ...) ?tmps ?body))
    
    ((let-values "mktmp" () ?e0 ?args ?bindings ?tmps ?body)
     (call-with-values 
       (lambda () ?e0)
       (lambda ?args
         (let-values "bind" ?bindings ?tmps ?body))))
    
    ((let-values "mktmp" (?a . ?b) ?e0 (?arg ...) ?bindings (?tmp ...) ?body)
     (let-values "mktmp" ?b ?e0 (?arg ... x) ?bindings (?tmp ... (?a x)) ?body))
    
    ((let-values "mktmp" ?a ?e0 (?arg ...) ?bindings (?tmp ...) ?body)
     (call-with-values
       (lambda () ?e0)
       (lambda (?arg ... . x)
         (let-values "bind" ?bindings (?tmp ... (?a x)) ?body))))))

(define-syntax let*-values
  (syntax-rules ()
    ((let*-values () ?body0 ?body1 ...)
     (begin ?body0 ?body1 ...))

    ((let*-values (?binding0 ?binding1 ...) ?body0 ?body1 ...)
     (let-values (?binding0)
       (let*-values (?binding1 ...) ?body0 ?body1 ...)))))