This post is part of the Lambda Calculus series.
Algorithms have been an essential part of mathematics since it’s earliest days. Arithmetic algorithms such as long multiplication and long division are likely to be familiar to you, and some of these have been around for millennia, as early as 2500 BC. In the 3rd century BC, the algorithm called the ‘sieve of Eratosthenes’ was developed by the Greeks to find prime numbers. With the development of computers, which are essentially just machines that can carry out algorithms, the world has been transformed.
Despite their presence throughout human history, algorithms themselves were never really studied until recently. However, with the development of computers and the increasing formalisation of maths, a rigorous discussion of algorithms became inevitable. What’s more, as with many new developments, several different mathematicians attempted this formalisation at more or less the same time. The two approaches that are most widely used are Turing Machines (developed, but not named, by Turing) and the Lambda Calculus (developed and named by Church). In this series, we will focus on the Lambda Calculus—while it appears much more detached from modern computer hardware than Turing Machines, it is, in my opinion, far more elegant and beautiful.
The Foundation of Computation
One important question that should be answered by a theory of algorithms and computation is what the fundamental operations are. In other words, what is the simplest language you need to be able to describe any algorithm. This may seem like a purely academic question—it is likely to be beyond inconvenient to actually describe an algorithm using these fundamental operations—but it actually has important practical implications. When designing computer hardware and programming languages, you should, at a bare minimum, make sure that it can handle these fundamental operations. If you fail to do so, there will be some algorithms that it will be unable to perform!
In the ‘What is Maths’ series, we used addition by 1 as the fundamental operation in Number Theory. When discussing algorithms, however, there isn’t such a natural feeling fundamental operation—Church and Turing took two totally different approaches.
In Turing Machines, the algorithm and the data it’s working with are kept separately. The data is a list of 0s and 1s and the algorithm is always focused on a single item in that list. At each step, it may update the currently focused item, shift its focus left or right, and decide the next step to perform based on whether the currently focused item is a 0 or a 1.
In the Lambda Calculus, there is no fundamental distinction between algorithm and data. Any data can be interpreted as an algorithm and any algorithm can be treated as data, although it may not always make sense to do so. The fundamental operation of Lambda Calculus is substitution: locating a symbol and replacing it with some other sequence of symbols.
Despite the radical differences between these two approaches, they were proven to be equivalent in the seminal Church-Turing thesis. The idea behind this proof is remarkably simple: provide an algorithm described in the language of Lambda Calculus that can simulate any Turing Machine, and an algorithm described in the language of Turing Machines that can simulate the Lambda Calculus. The details of this proof, however, are anything but simple. In addition, it was argued in this thesis that the Lambda Calculus and Turing Machines are a good definition for computation. While definitions cannot be proven to be true or untrue, this proposal has been, as far as I’m aware, universally accepted. Furthermore, the argument for this definition has only been strengthened in the years since—every subsequent attempt at defining computation is either equivalent to the two above or less powerful.
The Terms in Lambda Calculus
Everything in the Lambda Calculus is made up of ‘terms’ and, as mentioned above, these are used to represent both algorithms and data. There are three rules by which we can construct terms and, furthermore, if something cannot be constructed by these rules, it is not a term of Lambda Calculus.
The first rule is that ‘variables’ are terms. We will represent variables with lower case letter from the English alphabet; for example \( a \), \( x \), and \( z \) are all variables and, therefore, also terms. It should be noted that the variables by themselves have no meaning, they are provided meaning by the terms in which they are used. In addition, we are allowed to use as many variable as we need—if we run out of letters we can get more variables by adding dashes to letters, such as \( a^\prime \).
The second rule is that of application: if \( X \) and \( Y \) are terms, then so is \( (XY) \). We will later see that many terms can be thought of as functions and this rule should be thought of as allowing us to represent a function being applied to some data. In this example, \( X \) represents the function and \( Y \) the data that \( X \) is acting upon. Combining this rule with the first, we can produce a few more terms: \( (xy) \), \( ((xy)z) \), \( (x(yz)) \), and \( (xx) \) for example. To avoid the tedious writing of far too many brackets, I will often cheat a bit and leave them out. When the brackets are left out, they should be added ‘from the left’, so when I write something like \( xyz \), I really mean \( ((xy)z) \) and not \( (x(yz)) \)—the difference is important!
The third and final rule is that of lambda abstraction, and the namesake of the Lambda Calculus itself. It states that if \( X \) is a variable and \( Y \) is a term, then \( ( \lambda X . Y ) \) is a term. This can loosely be thought of as defining functions and we will explore this idea more in the following section. Some example terms, using all three rules are \( ( \lambda x . x ) \), \( ( \lambda x . ( \lambda y . x ) ) \), and \( ( \lambda x . (xx) ) \). I will also introduce another shorthand for this rule: I will write \( \lambda xy . x \) instead of \( ( \lambda x . ( \lambda y . x ) ) \) and include the enclosing brackets only when necessary.
Before we move on, I would like to make a subtle point about these shorthands clear: something like \( \lambda xy . xxy \) is not a true term of the Lambda Calculus, but instead represents the term \( ( \lambda x . ( \lambda y . ((xx)y))) \).
Substitution
The heart of computation in the Lambda Calculus is substitution. There are two kinds of substitution we can perform, which are called \( \alpha \) (alpha) and \( \beta \) (beta) substitution. I think Church must have been really fascinated by the Greek alphabet when he designed this system!
The simpler substitution rule is \( \alpha \)-substitution, and is essentially the codification of the idea that the letters we choose to represent variables are arbitrary. To perform an \( \alpha \)-substitution on a term, we select an original variable that is used in our term and a new variable that is not used in our term. We then replace every occurrence of the original variable with the new one. For example, \( \lambda x . x \to_\alpha \lambda y . y \). The \( \to_\alpha \) symbol is used to indicate that we are applying \( \alpha \)-substitution. Another slightly more complex example is:
\[ \lambda xy . xxy \to_\alpha \lambda xb . xxb \to_\alpha \lambda ab . aab \]
For cases such as the above where several steps are needed to transform one term into another, we introduce the \( =_\alpha \) symbol and the idea of \( \alpha \)-equivalence. For example, we can write \( \lambda xy . xxy =_\alpha \lambda ab . aab \).
Instead of applying \( \alpha \)-substitution to an entire term, we may also apply it to subterms of the form \( \lambda X . Y \) so long as the variable we swap out is \( X \). This is done to encode the idea that when a variable is bound by a lambda abstraction, it is a ‘new’ version of that variable that is separate from any uses of the same symbol outside the abstraction. For example:
\[ ( \lambda x . x ) x \to_\alpha ( \lambda y . y ) x \]
The other substitution rule, \( \beta \)-substitution, is where the magic happens. This substitution rule justifies the meanings I gave for all three of the construction rules. The rule is applied to terms of the form \( ( \lambda X . Y ) Z \); note that \( X \) must be a variable and both \( Y \) and \( Z \) are arbitrary terms. To apply this rule, we will also require that the variable \( X \) is not rebound by any lambda abstraction in \( Y \); if it is then \( \alpha \)-substitution should first be used to resolve this. The result of the application is the term \( Y \), but with every occurrence of the variable \( X \) replaced by the term \( Z \). This is probably best explained by a few examples:
\begin{align*} ( \lambda x . x ) z &\to_\beta z \\ ( \lambda x . xx ) ( \lambda y . y ) &\to_\beta ( \lambda y . y ) ( \lambda y . y ) \\ ( \lambda y . y ) ( \lambda y . y ) &\to_\beta \lambda y . y \\ ( \lambda xy . xxy ) ( \lambda z . z ) a &\to_\beta ( \lambda z . z ) ( \lambda z . z ) a \\ ( \lambda z . z ) ( \lambda z . z ) a &\to_\beta ( \lambda z . z ) a \end{align*}
When using \( \beta \)-substitution, we must be careful about the shorthands we introduced earlier. We have that \( ( \lambda z . z ) a \to_\beta a \), but it is not the case that \( ( \lambda xy . xxy ) ( \lambda z . z ) a \to_\beta ( \lambda xy . xxy ) a \).
Finally, we introduce the \( =_\beta \) symbol, similarly to \( \alpha \)-substitution. However, there are a few subtle differences. Firstly, we allow \( \alpha \)-substitution as steps in a substitution subsequence for the \( =_\beta \) symbol. Secondly, there is a clear directionality to \( \beta \)-substitution but not for \( \alpha \)-substitution. When considering \( \beta \)-equivalence, we may apply \( \beta \)-substitution in either direction.
I imagine that, at this point, this may all seem like pointless symbol shuffling. However, in the next part, we’ll explore how we can represent data in this system and start to develop useful functions.