Library FingerTree.DependentFingerTreeModule

We shall now define the Finger Trees over some monoid and measure. We will see in section how particular instantiations of these will give different structures.

Ltac program_simpl ::= program_simplify ; auto with ×.

Module DependentFingerTree (M : Monoid).
  Import M.

  Module Digits := DigitMeasure M.
  Import Digits.

  Hint Unfold digit_measure option_digit_measure option_measure.


Nodes

Finger Trees are implemented using 2-3 nodes as in 2-3 trees, with the addition of a cached value that stores the combined measure of the subobjects.

  Section Nodes.
    Variables (A : Type) (measure : A v).
    Notation " 'lparr' x 'rparr' " := (measure x).

Intuitively, the measure represents an abstraction of the sequence of objects in the node (later, in the tree) considered from left to right. Taking the list monoid previously defined and the singleton measure gives the most concrete abstraction of this sequence of elements: the sequence itself. The measure function is denoted by in the following, when possible (notations local to a function are not supported). We now define 2-3 nodes with cached measures:

    Inductive node : Type :=
    | Node2 : x y, { s : v | s = lparr x rparr cdot lparr y rparr } node
    | Node3 : x y z, { s : v | s = lparr x rparr cdot lparr y rparr cdot lparr z rparr } node.

We use a subset type here to specify the invariant on the cached value. This invariant could not be checked using simple types, because it is dependent on the values carried by the node. Moreover, it uses the measure as a so it would require an additional abstraction mechanism like type classes or functors to achieve it in a regular ML-based functional language. A regular dependent product is used here instead.
We have smart constructors node2 and node3 which compute the measure, e.g:

    Program Definition node2 (x y : A) : node :=
      @Node2 x y (lparr x rparr cdot lparr y rparr).

The generated obligations are trivially true as they are of the form . Correspondingly, node_measure pulls the cached measure.

    Program Definition node_measure (n : node) : v :=
      match n with Node2 _ _ ss | Node3 _ _ _ ss end.
  End Nodes.

Although it may seem that the node_measure function is independent of the measure function, it is not. Witness its type after we closed the section: The node datatype itself is parameterized by the measure function, hence all operations on nodes take it as an implicit parameter. Had we not added the invariant, we would not need this parameter, but we could not prove much about node measures either, because they might be any element of type v. We could define an inductive invariant on nodes asserting that their measures were built using the measure function coherently, but we would need to show that this invariant is preserved in each of the functions, while in our case we get it for free by typing.


The fingertree datatype

Before presenting the definition of fingertree in Coq, we introduce the original Haskell datatype and justify why the direct translation would be unsatisfactory. In Haskell, the FingerTree algebraic datatype is defined as:
The Empty constructor represents the empty tree, the Single constructor builds a singleton tree from an element of a and the Deep constructor is the branching node. It builds a FingerTree v a from a cached measure of type v, two digits of a and a middle FingerTree v (Node v a) of of a with measures of type v. This is where the nesting occurs; it is best described by a picture (figure ).
At the first level we have a Deep node with a 2-digit of a at the left and a 3-digit at the right. The middle tree is a FingerTree v (Node v a) consisting of a Deep node with an Empty middle tree. It has a 2-digit of Node v a at its left and a 1-digit at its right. The circled nodes represent elements of type a and the shaded ones indicate loci where cached measures are stored
We could directly define the Finger Tree data structure in Coq by translating the Haskell definition. However doing so would cause a rather subtle problem: the cached measures in two Finger Trees could be computed by two measure functions but these trees would still have the same type because the FingerTree datatype is parameterized only by the type of the computed measures and not by the measure function. Again, we could live with it and have a predicate formalizing the fact that the measures appearing in a FingerTree were built coherently with some measure function, but preservation of this invariant would have to be proved for each new definition on Finger Trees. We prefer to view the measure function as an additional parameter of the Finger Tree datatype. In this case coherence is ensured by the type system because two Finger Trees will have different types if they are built from different measure functions. Hence the following definition:
A fingertree inductive is parameterized by a type A and a measure function measure : A v on this type. Each fingertree object is also indexed by its corresponding measure:
  • Empty builds the empty tree of measure ;
  • Single x builds the tree with sole element x of measure ;
  • Deep pr ms m sf builds the tree of prefix pr, subtree m of measure ms and suffix sf.
Its measure is given by combining the measures of its subtrees. The argument ms will be implicit when constructing Deep nodes, as it can be infered from the type of m. We place this argument ms just before the middle Finger Tree contrary to the original datatype in which the cached measure is the first field and stores the measure of the tree whereas for us the latter is given by the index.
We present the definition using inference rules for enhanced readability, omitting A and measure in the premises:
This inductive family is indexed by values of type v. Why we need a dependent type follows from a simple observation. If we want the cached measure on the Deep node and the invariant that the measure is really the one of the middle tree, we must have a way to refer to the measure of the middle tree, but we are actually defining trees, so we cannot recurse on them at this time. Note also that we need recursion to go into the middle subtrees of Deep nodes.

Adding elements

We can add an element a to the left of a tree t of measure s to get a tree with measure . Due to polymorphic recursion, our functions will now always have A and measure arguments as they are arguments which will change during recursive calls. If t is empty, a singleton or a tree with a left prefix which is not full, then we simply put a at the leftmost position of the tree. Otherwise, we need to reorganize the tree to make space at the left for a, and this requires recursing polymorphically to add a node a to the left of the middle subtree in the Deep node.

    Program Fixpoint add_left (A : Type) (measure : A v)
      (a : A) (s : v) (t : fingertree measure s) {struct t} :
      fingertree measure (measure a cdot s) :=
      match t in fingertree _ s
      return @fingertree A measure (measure a cdot s) with
        | EmptySingle a
        | Single bDeep (One a) Empty (One b)
        | Deep pr st' t' sf
          match pr with
            | Four b c d e
              let sub := add_left (node3 measure c d e) t' in
                Deep (Two a b) sub sf
            | xDeep (add_digit_left a pr) t' sf
          end
      end.

The first match expression uses dependent elimination. Its meaning is that from a particular fingertree of measure s each branch will build a fingertree of measure measure a cdot s where s has been substituted by the measure of the matched pattern. For example in the first branch we must build an object of type fingertree measure (measure a cdot epsilon). However, the right hand side Single a has type fingertree measure (measure a), hence we must use the inductive equivalence rule , which generates an obligation , easily solved using the right identity of the monoid. The in and return clauses are mandatory in Coq due to undecidability of higher-order unification, but they can be dropped in Russell, in which case substitution is replaced by equational reasoning. Had we dropped the clauses, we would have the equation s = epsilon in the context of the first branch hence the generated obligation . This one would be solved by first substituting s by in the goal then applying the right identity ; we just postponed the substitution.
The nested match expression uses an alias x for capturing prefixes which are not Four and applies the partial function add_digit_left defined earlier. There is an application of the subset equivalence rule here, which generates an obligation to show that pr is not full. It can be solved because we have in the context.
It is an essential property of this function that the measure is preserved. To see that, let us instantiate the Finger Tree by the list monoid and measure defined earlier. You can check here that adding to the left is prepending the measure of the element to the measure of the Finger Tree, hence consing to the sequence of elements of the tree with the list monoid interpretation. For each of the following operations, this correspondence with the list instanciation of the measure will hold.

The view from the left... of a fingertree

We will now construct views on the Finger Trees which decompose a tree into its leftmost element and a remaining Finger Tree (View_L) (or dually, the rightmost one and the "preceding" Finger Tree). It serves to abstract from the implementation and give a list-like view of fingertree, which is used to write functions which recurse on fingertrees without having to deal with the intricacies of the structure (c.f. figure ). The View_L inductive is a little less polymorphic, as it does not need carry the measure function which views ignore. However View_L still stores the measure s of the rest of the tree in the cons_L constructor (s will be implicit), hence we need to abstract by the sequence's type seq which is indexed by v. It will be instanciated by fingertree measure.

    Inductive View_L (A : Type) (seq : v Type) : Type :=
    | nil_L : View_L A seq
    | cons_L : A s, seq s View_L A seq.


Such a view will be constructed by the view_L function, by structural (polymorphic) recursion on the fingertree. We can seamlessly use the digit_tail function which is partial (it accepts only digits which are not One) and need not add any type annotations when calling view_L recursively on t'. Note that we use a partial type application (fingertree measure) in the return type, which is perfectly legal.

    Program Fixpoint view_L A (measure : A v)
      (s : v) (t : fingertree measure s) {struct t} :
      View_L A (fingertree measure) :=
      match t with
        | Emptynil_L
        | Single xcons_L x Empty
        | Deep pr st' t' sf
          match pr with
            | One x
              match view_L t' with
                | nil_Lcons_L x (digit_to_tree _ sf)
                | cons_L a st' t'
                  cons_L x (Deep (node_to_digit a) t' sf)
              end
            | ycons_L (digit_head y) (Deep (digit_tail y) t' sf)
          end
      end.


We can show that view_L preserves the measure of the input tree; had we indexed View_L by the measure of the input tree, these generation lemmas would have appeared as obligations.

    Lemma view_L_nil : A (measure : A v) s
      (t : fingertree measure s), view_L t = nil_L s = epsilon.

    Lemma view_L_cons : A (measure : A v) s
      (t : fingertree measure s) x st' t',
      view_L t = cons_L x (s:=st') t' s = measure x cdot st'.

Dependence hell

Our views are useful to give an high-level interfaces to Finger Trees, but in their current state they are very limited as we can write only non-recursive definitions on views. That is, we will not be able to convince Coq that functions defined by recursion on the view of a tree are just as valid as those defined by recursion on the tree itself. To do that, we must have a measure on our Finger Trees, e.g. their number of elements (named tree_size), from which we can trivially build a measure on the views View_L_size. Then it is only a matter of showing that for all t, view_L t returns a view of size tree_size t to prove that a recursive call on the tail of a view is correct (c.f. ).
However, doing such a thing is not as easy as it looks because view_L manipulates objects of dependent types and reasoning about them is somewhat tricky. An essential difficulty is that the usual Leibniz equality is not large enough to compare objects in dependent types as they may be comparable but in different types. A simple example is that the proposition vnil = vcons x n v is not well-typed because vnil is of type vector 0 and vcons x n v of type vector (S n) which are not convertible.
In our case, we want to say that an arbitrary tree t of measure s with view nil_L must be the Empty tree, but those two trees do not have the same type. We apply the usual trick of heterogeneous equality : prove they must be in the same type. The inductive JMeq defines an heterogeneous equality (previously denoted by ) in Coq. It is used to compare objects which are not of the same type. Its sole constructor is JMeq_refl of type . The point of this admittedly strange notion of equality is to postpone the moment at which we need to prove that the types of the compared objects are equal. When we arrive at this point, we can apply the JMeq_eq axiom of type to get back a regular equality between the objects.
In the first lemma, we compare t of measure s with Empty of measure ; clearly, replacing JMeq by regular equality would yield a type error here.

    Lemma view_L_nil_JMeq_Empty : A (measure : A v) s
      (t : fingertree measure s), view_L t = nil_L
      JMeq t (Empty (measure:=measure)).

Now that we have shown the equality for a general s index, we can instantiate it with a particular one, i.e. . Given that is now of measure , we can use the regular equality with Empty.

    Lemma view_L_nil_Empty : A (measure : A v)
      (t : fingertree measure epsilon), view_L t = nil_L t = Empty.
    Proof.
      intros ; apply JMeq_eq.
      apply (view_L_nil_JMeq_Empty _ H).
    Qed.

These technical lemmas about nil_L and Empty will help us build our measure. Indeed, they are needed to prove the following:

      Lemma view_L_size : A (measure : A v) (s : v)
      (t : fingertree measure s), View_L_size (view_L t) = tree_size t.
      Proof.
        intros.
        unfold View_L_size, tree_size ; apply view_L_size_gen.
      Qed.

This gives us a decreasing measure on view_L results. We will use it when building the instances.

      Lemma view_L_size_measure : A (measure : A v) (s : v)
        (t : fingertree measure s) x st' (t' : fingertree measure st'),
        view_L t = cons_L x t' tree_size t' < tree_size t.

We can also define deep_L, the smart constructor which rearranges a tree given a potentially empty digit for the prefix. It is a dependent version of the internal function for the Deep case of view_L, which is used when splitting Finger Trees. The code has not changed at all from the non-dependent version hence we do not show it. However, the specification would greatly benefit from an overloading mechanism to factor the different measure specializations.

Back to normal

We are now ready to implement regular deque viewing operations on our Finger Trees. We do not need recursion on the trees here, so we can define the operations for a fixed element type, measure function and measure index.

  Section View.
    Variables (A : Type) (measure : A v) (s : v).

We define an isEmpty predicate for some operations will be partial, i.e. defined only for non-empty fingertree objects. We do not match directly on the object to maintain the abstraction from the implementation.

    Definition isEmpty (t : fingertree measure s) :=
      match view_L t with nil_LTrue | _False end.

Emptiness is decidable.

    Definition isEmpty_dec (t : fingertree measure s) :
      { isEmpty t } + { ¬ isEmpty t }.

The obvious operations are definable, we show the liat operation dual to tail. Here we return the index along with the fingertree in a dependent pair of type {s : v & fingertree measure s}, which corresponds more closely to the view that the cached measure is really data, not only an index used to refine a datatype. The pairing of a tree t with its measure m is denoted m :| t and reads "m measures t".

    Program Definition liat (t : fingertree measure s |
      ¬ isEmpty t) : { s' : v & fingertree measure s' } :=
      match view_R t with
        | nil_R! | cons_R st' t' lastst' :| t'
      end.
  End View.

Catenation & splitting, dependently

We can also define catenation using a dependent type which clearly states the function's specification. The implementation is the same as the one of Hinze & Paterson, except we proved the 100 obligations generated by concerning measures. The five mutually recursive functions hidden here which define app have the particularity of being quite big because they implement a specialization of concatenation for each combination of the digits at the right of the left tree and the left of the right tree. Hinze & Paterson even recommend to produce the 200 lines long definition automatically to prevent mistakes. Here the definition is checked directly: the type itself expresses that we did not mess up the obvious property of app on measures.

  Definition app : (A : Type) (measure : A v)
    (xs : v) (x : fingertree measure xs)
    (ys : v) (y : fingertree measure ys),
    fingertree measure (xs cdot ys).
  Notation " x +:+ y " := (app x y) (at level 20).

The last and most interesting operation from the specifier's point of view, is splitting a tree by a predicate on its measure. We begin by splitting a node.

    Section Nodes.
      Variables (A : Type) (measure : A v).
      Variables (p : v bool) (i : v).

We split a node n by finding where some predicate p on measures turns to true, starting with an initial measure i and accumulating the measures of subtrees from left to right. We simply copy the invariant given by here, there is nothing to change, we only add the corresponding property on the measure which clearly generalizes the to_list equation. The code is also a direct translation from Haskell to Russell except we use a regular tuple instead of a custom split datatype. We have also defined a split_digit function with the same specification.
    End Nodes.

The interesting case is for trees, as usual. Instead of returning a tuple, we use a dependent tree_split inductive which directly captures the invariant that a split is a decomposition of a fingertree preserving measures. We also put the invariants on the left and right trees, which can be used by clients to prove properties about their code. This is one of the most distinctive parts of this work: we derive not only reusable code but also reusable proofs using rich types. Indeed, split_tree can be seen as a proof combinator, turning a property on a measure and monoid to a property on words on this monoid.

    Section Trees.
      Variable p : v bool.

      Inductive tree_split (A : Type) (measure : A v)
        (i : v) : v Type :=
        mkTreeSplit : (xsm : v) (xs : fingertree measure xsm |
          isEmpty xs p (i cdot xsm) = false)
        (x : A) (ysm : v) (ys : fingertree measure ysm |
          isEmpty ys p (i cdot xsm cdot measure x) = true),
        tree_split measure i (xsm cdot measure x cdot ysm).

This inductive combines both subsets and dependent inductive types, but we can still write our code as usual. Approximately 100 lines of proof are necessary to discharge the generated obligations.
    End Trees.

This concludes our implementation of (dependent) Finger Trees with . To sum up, we have proven that:
  • All functions are total, once properly annotated with their preconditions.
  • All functions respect the measure semantics and use them coherently.
  • All functions respect the invariants given in the paper by Hinze & Paterson.