In our journey of formalising everything that we know, let’s reinspect something that we’ve taken for granted: Set sizes. Thus far we’ve sort of been eyeballing it for finitely sized sets. E.g. so we say that . So how do we say ? We’d probably say with . Then if , then we know .

But here, there are many things that actually have been left unformalised, and also unsolved. For example:

  1. Okay let’s say we accept that we can compare two natural numbers for free (which actually can be formalised, but story for another day).
  2. How do we rigorously say that the size of a set ? You might say “I can just list them out and count.” but sometimes that’s not enough for mathematicians.
  3. Even more worryingly, we’ve only talked about finite sets. What about infinite sets? You could count until the cows come home and never be done.
  4. Is there only one infinity? We all know that all behave a little different. seems to only be infinite in one way, seems to be infinite in two ways, seems to be so infinite, you can take two items and find infinitely many items between them (where “between” is based on the relation). seems to be so infinite it even contains infinitely many more items that does not. But are their “sizes” different?

So let’s talk about how to formalise these things and study not just counting the sizes of finite sets, but the sizes of infinite sets, and see how infinity behaves.

Question

Isn’t this a little too abstract? Why do I need to know this in computer science?

I’m glad you asked! So later on when we understand diagonalisation and the different infinities better, I’m going to show you how we can formally prove that there exists problems out there that computers cannot solve.

In fact, as newcomers (or not) to programming, you might wonder: Can my compiler/runtime environment ever look at my code and tell me it will loop forever, and then I can go fix it. The answer is nope. To be clear, it might be able to spot certain patterns and warn you, but there isn’t a perfect algorithm that has 100% accuracy on this. We’ll see this at the end of the notes.

Disclaimer

The following notes are not meant to be a formal foray into set theory and the cardinals. So we will be glossing over a lot of details (for convenience) and taking a lot of very hardcore theorems for granted. So to be clear we’re restricting ourselves to certain statements we want to make. And the focus of these notes is to just give intuition on higher level ideas about stuff between sets.

Let’s begin!

Our starting point is picking up where we let off with functions. So recall:

Pigeonhole Principle and Why Use Functions to Count

So our starting point is to start with how to “formalise” comparing set sizes. We will use functions. Recall:

A function is:

  1. Injective, if and only if,
  2. Surjective, if and only if,
  3. Bijective, if and only if, is both injective and surjective.

Important Disclaimer

Full disclaimer: I had to debate with myself for over weeks what exactly I should show you in these notes. I either have a natural and usable sequence that shows you how this all naturally fits in together but run the risk of you using theorems out of syllabus (which would not be accepted) or I stick with the very limited set of theorems that they’ve given you in CS1231S. Then our hands are tied behind our back and the exposition makes no sense. So for this section (the one titled Pigeonhole Principle and Why Use Functions to Count) I will do the following: I will mark the theorems and statements that are out of syllabus. My rationale for mentioning it is that they’re insanely intuitive anyway, and I want to appeal to your sense of intuition on how to navigate this tricky topic. The tradeoff is: You are not allowed to take the out-of-syllabus theorems and statements for use in CS1231S. If you quote Schroder-Bernstein etc, we have tutors that will understand you, but there’s no guarantee it will be accepted. But it’s not in the slides.

Let’s say I had two sets, and I wanted to say one was “at most the size” of the other. How should I do that? You see if we had finite sizes we can kind of count it. If there was only one kind of infinite size, we could then also just have happily declared “ok all infinite sizes are the same”, and finite sizes can be compared. Except.. the world doesn’t work that way. There are many infinite sizes, and also at the same time, when the world of mathematics started being formal around the mid 1800s, the community was finding ways to construct numbers from sets. It was easy to believe that sets are an intuitive concept we’re happy to accept exist (as a philosophical choice) and it was actually quite easy to construct numbers from sets. On the other hand, it was very hard to construct sets from assuming numbers exist. This sort of led us to actually build the foundation of mathematics based on the existence of sets. It’s very trippy that we can assume sets exist and then say “therefore numbers exist”. But it’s true!

So put yourself in the shoes of an 1850’s mathematician. Right now, you’re not allowed to assume numbers exist, and you have to construct them if need be. How should you argue that ?

Aside

Notice here I am saying that the here cannot actually yet mean a number. We must take two sets and then surround them by , and then this thing can sandwich a symbol if and only if there exists an injection .

So at least in the current formalisation: We can only compare set sizes, and we can only argue set cardinalities relatively between and . We cannot say for some .

The most natural solution (hah get it? Natural) was to just give a function that was injective. What’s the idea? You know that functions are actually just sets. Okay so they exist. Moreover, there’s a philosophical interpretation going on here:

Since for every element in , I can find a unique “item” for it in , this means that the number of items in are at most the number of items in . Otherwise, my function would have had to make two distinct elements of share the same item in .

Running with that same idea, philosophically speaking, can’t we also say the same if we had a surjection? Why indeed yes we can.

So if you gave a function that is surjective instead, you can also conclude .

Since for every element in , I can find at least one for it in (could be more than one), this means that the number of items in are at least the number of items in . Otherwise, my function should sort of fail to cover at least one element in .

To be clear, to show the statement , you can either give an injection or a surjection . Either is good.

Now lastly, to argue two sets have the same size, we can give a bijection to argue that . The philosophical interpretation works the same.

Out of Syllabus

But wait. If I gave two injections , and can’t we also say ? Because says that .

Yes. Yes you can.

…So you’re telling me, if I have , and both injective functions, then there exists a bijection ?

Yes! This is called the Cantor-Schroder-Bernstein theorem.

The below theorem is out of syllabus.

Theorem

(Cantor-Schoder-Bernstein) If there exists injective functions , between sets . Then there exists bijective function .

We’ve also not talked about how set cardinalities form a total order. That is to say: any two set cardinalities can be compared. In fact (for your intuition but it’s out of syllabus to use this):

Theorem

Let be two sets. Then either:

  1. ; or
  2. ; or
  3. .

And as a shorthand, we can write to mean . That is to say, there is an injection from to be, but no bijection. You can do something similarly for .

Using functions to talk about set cardinalities

So now hopefully you get why we’re starting with functions.

Finite Sets

So the first thing we should do is talk about finiteness. Now we can be (slightly more) formal. From this point on, we will use as a shorthand (this is not standard notation) to denote the set . Also, is defined to be . 1

Then, we will say a set is finite if:

There exists an and a bijection function (or equivalently, a bijection ).

Intuitively, a finite set either has 0 elements, or it has elements for some . The function here is basically just telling you how to show it has elements by literally saying: “This is element 1… this is element 2…” and so on.

Infinite Sets

Countably Infinite Sets

Okay let’s move onto infinite sets because here’s where things get very interesting. And we’ll talk about the “smallest” infinity: Countable infinity. So cardinality gets a bit weirder when infinite sets are a thing. For example, . In fact, let . We can also say . That seems weird, clearly there are more natural numbers than odd numbers. And perhaps you can say that. That’s why we say: the cardinalities are the same. Cardinalities are our way of talking about “size” between sets.

Technique

Let’s begin with the fundamental techinque of proving statements like:

To prove , in CS1231S, you need to give a bijection .

Two things to note about the additional methods (in fact, in CS1231S, they’ve mentioned this holds for finite sets): When there exists a surjection , you’re proving . Does this mean you’re proving that there exists an injection from ? Yes!

Just because there is an injection , and surjection does not mean is a surjection, and it does not mean is a surjection. What it means is that if you prove exist, then there exists a bijection .

Summary

To summarise, the following are equivalent (in CS12321S):

  1. such that is injective.
  2. such that is surjective.

Also, these follow are equivalent (in CS12321S):

  1. such that is bijective.
  2. and , such that is injective, and is surjective. (Out of syllabus)
  3. and , such that are both injective. (Out of syllabus)
  4. (Out of syllabus)

Lastly, these are equivalent (in CS1231S):

  1. Exists an injective function
  2. Exists a surjective function

Technique

Another technique given in CS1231S that was not made entirely explicit is the following: To show , one can give a surjective function .

They’ve stated in in the form of sequences, but is basically giving you your sequence here.

Why is this simpler? Because now instead of giving a bijection, you only have to give a surjection.

For the remainder of this subsection, let’s demonstrate some of these to show some of the equalities.

And for fun, we’ll prove some of the equivalences. The proofs will be quite long but they’re collapsible so hopefully the page isn’t too large. The important point for this portion is you getting the intuition behind countably infinite. To do this I will show you how a bunch of commonly known sets are countably infinite.

Techinque

This hopefully frees you up to also use these sets “interchangeably” when showing some other set is countably infinite. The other takeaway I want you to get is notice how some of these constructions are done, because the ideas might be re-usable in other ways. So, don’t read this for the sake of “gaining knowledge” or “knowing what is true”. I need you to be able to do is observe how some of these functions are made. Then you’ll have to port some of these ideas for your own later on.

First of all, let’s ask something a little simple: What happens if I remove from ? (I do get ) Can we show that removing from still something that is the same cardinality?

Theorem

Okay, how about if we had almost twice as many elements as ? For example, what happens if we added all the negative numbers too?

Theorem

Okay, how about instead of being infinite in 1 dimension, we were also infinite in another dimension?

Theorem

This is one of the more useful statements. We will use what is called Cantor’s Pairing Function Here’s a proof:

Proof

Here’s a picture that sort of illustrates the what the below function is doing:

Let’s think about it this way, we’re making triangles. How do I make a -tall, and -wide triangle? What’s the size? It’s actually the triangle number . Try it!

One more thing, given , we can actually see that it falls on the diagonal. We are going to refer to the . So the diagonal contains all pairs such that .

Combine the two ideas we had, and you can see that basically on the diagonal starts with the triangle number. Furthermore, the diagonal actually has elements on its diagonal. So what numbers are on the diagonal?

Any number of the form , for any (there are in total such numbers).

Okay we are in place to begin the proof. Here’s the intuition. Given , we map it to the diagonal (where , and its offset into the diagonal is . Consider function .

We first show that is surjective. The idea is the following: We first find the largest triangle number that is at most , then use that to figure out which diagonal sits on, then figure out what ‘s offset is on that diagonal.

  1. Let , arbitrarily chosen.
  2. Let be the largest natural number such that .
  3. We first claim that .
  4. Suppose not. Then .
  5. Then we obtain a contradiction because now , and .
  6. Now consider . Since , .
  7. Also since , .
  8. Now let . Since , .
  9. Now both , since .
  10. Now
  11. Therefore, .

Remark: To be clear, in case the argument looks circular, the important idea is the following: when we are given , we need to figure out which diagonal it’s on. To do this, we needed to find the largest triangle number that is less than or equals to . Let that be the triangle number. Then we know was due to some such that is . Then, to figure out , we know is exactly away from . So . We then had to check that and were natural numbers. I.e. , and I skipped a few steps on closure. E.g. is a natural number. But that’s the idea.

Next, we show that is injective. To do this, we again use the fact that we can compare two outputs using which diagonal they are sitting on. If it’s different, they we know they’re already different. If they’re the same, then their offsets into the diagonal must be different. To do this, we will actually prove the contrapositive statement.

  1. Let . Further assume that .
  2. Either , or .
  3. Case 1:
  4. Letting :
  5. Assume that .
    1. Then .
    2. Then and . Contradiction against line 1.
  6. Therefore
  7. Therefore,
  8. is injective (Contrapositivity of line 4).

We will save vs for a little later on.

Within Countable Infinity

It turns out we have a name for the “size” or cardinality of . We call it (pronounced “aleph null”). This is the first transfinite cardinal number. Think of cardinal numbers as “numbers” that represent the size of sets. So we write .

Definition

We will say a set is countably infinite if has a bijection to .

Furthermore, we will say a set is countable if either:

  1. is finite; Or
  2. is countably infinite.

It’s important to note the difference between countable and countably infinite.

That said, there is a different formulation that I’ve previously mentioned. Basically:

Remark

A set is countable if there exists a surjective function .

The CS1231S treatment is via sequences. But such a function that is surjective basically provides a sequence that satisfies their lemma 9.2.

There are a few useful facts (provable theorems, in fact) that you should know. Let’s list some out now:

  1. Let be disjoint, countable sets. Then is countable.
  2. Let where is countably infinite. Then is countable.
  3. If are countably infinite sets, then .
  4. If are countable sets. Then is countable. Also, is countable.
  5. Let be a sequence of countable sets. Define . Then is countable.

We’re going to try proving these statements. What I want you to think of it is more of us developing the tooling to argue about even more theorems. Also, I think some of these statements might be useful as lemmas. Also it would be good if you at least think about how to prove these statements first before comparing it against my proof to see if there’s any subtlety you’ve left out.

Before we do this, it’s going to be helpful to be able to union two sets. Now normally I would just show you how to do it via a bijection (just in case). There’s many variations of these, we’ll work up to them slowly because we can bootstrap our previous results to build bigger and bigger theorems.

Mini Tooling:

Let’s begin!

We’re going to union two countably sets, and show that they’re countable. There are many variants, and because we’re insisting on bijections (rather than surjections), the proofs all look a little different.

Okay, so we’ve done the simplest case when is countably infinite, is finite, and they are disjoint. Let’s try something a few more involved ones. From now on I’ll give you the functions, and you need to prove it yourself that they’re bijections.

Let be disjoint sets. Both are countably infinite. Then is countably infinite.

  1. Since is countably infinite, then there exists a bijection .
  2. Since is countably infinite, then there exists a bijection .
  3. Consider function :

Claim: is a bijection.

Okay we’ve dealt with the two cases but with the nice assumption that they’re disjoint. What about if they’re not disjoint? Then things get a bit more annoying. But remember we’re free now to use these 2 above statements as lemmas to help us prove the non-disjoint cases. Again I’ll give you guys a proof idea, and try to finish the proof by giving a bijection .

It’s going to be helpful to have this lemma:

Lemma

  1. Let be countably infinite. Then any subset of is countable.
  2. Let be finite, then any subset of is finite.
  3. Let be countably infinite, and be finite. Then is countably infinite.

What if we changed the above statement to either or being finite?

To summarise, I think knowing these facts hopefully give you an intuition on what is countable. For example, is pretty much just “two copies” of (but only one ). So it’s almost “two natural sets”. In fact, instead of using a bijection, let’s try using a surjection to show that. And maybe you’ll see how surjections are a lot easier to use. So let’s prove this:

is countably infinite.

We’ll basically be giving a surjection from . Why? Because we can re-use the previous parts to argue that has a bijection from . So we can compose those functions to get which is surjective but not injective.

Again, we’ll just give the surjection.

  1. Consider function .
  2. Define

To prove it’s surjective intuitively, every value is output at least once. Any positive output is mapped to by , and is mapped to by . But why is it not injective?

Now the surjection is not enough. That only proves it’s countable. But I mean we all know that is infinite so we’ll just say it’s countably infinite. (We could prove it but please don’t make me prove it. Please.)

Using “countably infinite” and bijection to interchangeably.

So to be clear, you’re going to see it quite often that when we say a set is countably infinite, then the following are synonymous:

That’s why you’ll see us sometimes use bijection when convenient, and other times using bijection instead.

But why? Is there a reason for this? Is and special?

Technically not, if you know there are two sets both countably infinite, then of course you now have and . They’re both bijections, so now you can do consider . We know is a bijection also, and the composition of two bijections forms a bijection so now is a bijection between the two of them. After all they’re both countably infinite, this means they have the same cardinality, so you should be able to find a bijection between them.

The “nice” thing about bijections and is that they tell you how to list out the elements in some kind of sequence. To be very clear: that sequence will and must list out all elements of .

The idea is that if you fail to do so, then the set is uncountably infinite.

Moving out of Countably Infinite

Okay, is there more than one “infinite size”? After all, countably infinite implies existence of uncountably infinite amirite?

So what’s the best candidate we have for an uncountably infinite set? The set of reals . After all, think about it: Can we construct a bijection from to ? We’re going to show that if one exists, we can create a contradiction.

Now to be clear, if we show that there is no bijection, we are only showing . Intuitively, we also want to show that . To do this, we can just give some injective function . But that’s easy! . Et Voila. Let’s move onto showing there’s no bijection.

Cantor’s Diagonalisation Argument

We’ll actually use , but we know that .

Aside

I don’t know if they told you this but you’re about to see diagonalisation in a bit. And this is one of the most powerful techniques in math and CS combined. Some of the biggest and most famous theorems about logic, the naturals vs the reals, about the nature of computation, about computational complexity theory, all use diagonalisation.

I’ll show you how to do it for the reals, then also show you how to do it for computation. But that one will be extra handwavy.

Let be the set or just .

We will show the two following claims:

Claim 1: There exists an injective function . Therefore Claim 2: There exists no bijection function . Therefore Claim 3: There exists an injective function . Therefore

Using all 3 claims, we get that:

(We actually haven’t shown on set cardinalities are transitive. Perhaps that’s a good exercise. We need to show it’s transitive to show that claims 1-3 imply .)

Claims 1 and 3 are actually somewhat straightforward. We’ll give the functions here, and it’ll be a mini-exercise to prove they are injective, and that their co-domains are correct.

We need the fact that every has a “decimal expansion”.

Lemma: For all such that , there exists a function such that, letting , then Or equivalently: For , let be the function for . In other words, given a real value , is what we use to list out the decimal points.

To prove the above lemma is pretty annoying unless you have a background in Real Analysis. So we’ll just take this as fact (I’m sure we can all believe it). Here’s the strategy:

  1. We want to assume that a bijection exists. Remember this means will list out all elements in somehow (surjection, injection).
  2. We will use to create another element so that , but is somehow not a possible output of .
  3. That’s a contradiction because was supposed to be bijective, and therefore surjective. But line 2 shows that is not surjective.

Remember, this means that on a positive integer gives us a value from set . Furthermore, any value in can be written in decimal representation using function . In particular, the gives us the decimal place of .

So how do we create the value ? We’ll build it decimal place by decimal place. The decimal place of , will be done based on decimal place of the output of . Yes, that is not a typo: We take the number, and look at its decimal place. Then what do we do? If that decimal place happens to be , we set the decimal place of to be . If that decimal place happens to be anything but , we’ll set the decimal place of to be . Why does this work?

Let’s ask ourselves: is definitely a number in the interval so it has to be output by , since it was assumed to be bijective (and therefore surjective).

Okay, so here comes the kicker: What is the decimal place of ? Remember, to set it, we said we looked at the output of and we purposely set the decimal place of so that it would be different from it.

But wait! That means (because they disagree on the decimal point). Since we let be arbitrarily chosen, that means . Oh no! That means is not surjective. But how can that be! It was supposed to be bijective (and therefore surjective).

Here’s the idea in proof form.

Proof by contradiction.

  1. Assume for a contradiction that there exists a bijection .

  2. For all , , therefore exists.

  3. Now define a new function in the following way:

    Thus, let .

  4. Now notice, by definition of :

    1. Let , arbitrarily chosen.
    2. .
    3. Since and differ on the decimal place, .
    4. Therefore, by universal generalisation .
  5. By line , this means is not in the range of .

  6. But note that and . Since only has decimal places that are either or .

  7. Therefore . But this means that is not a surjection on .

  8. Line 7 contradicts line 1. Therefore there does not exist bijections from .

This proof is basically an informal version of Cantor’s Diagonalisation Argument.

I’ll drawn some diagrams that might help explain the idea:

Diagrammatically:

The idea is that any number between 0 and 1 and can be written out in the following way:

Center

Where all of the happens to be a number between 0 and 9. So in particular:

So what happens if we took every number on the diagonal and collected them. Then did a bunch of changing? Like so:

Center

As long as we make all the values a value between 1 and 8 (inclusive) then

is a value greater than , and strictly less 1. The bigger question is whether can be an output of . Well we claim the answer is no. Why? Look at the output of . That happens to be . Then look at the decimal place of , that happens to be . Now we made the decimal place of (which is ) different from . So that means cannot be equals to (since they differ on the decimal point). So the above formal proof is basically just carrying that idea out.

But to be clear, is infinite, and it’s uncountable. So therefore, it is uncountably infinite.

Another generic way to create uncountable sets

There’s another way you can do this:

Theorem

Let be countably infinite. Then is uncountably infinite.

So the idea is to show two things: is infinite, we can do that with a simple function , where . This function is injective, and is infinite, so .

Now we need to show there is no bijection. So here’s what I’m going to do. I’ll draw the diagonalisation diagram. But you need to write out the proof formally. Okay here goes:

For you to write your proof, what should you assume for the sake of contradiction? How should you create set ? What two properties do you want about to derive a contradiction?

Once you are done with this, since is infinite, has no bijection to a countable set, therefore is uncountably infinite.

Aside

So have we created the next cardinality after ?

Well.. that’s complicated. We know that the reals indeed has a cardinality larger than the naturals. It turns out we actually cannot prove whether or not . Like not that humans are not smart enough. No, more like the proof system we all use is unable to. At least not with the current axioms we like to take for granted.

Tl;dr: The next cardinality after happens to be , but we cannot prove whether has cardinality or not.

Working with Uncountable Infinity

Now working with uncountable infinity is a lot more wacky. We’ll try to build our way up to at least prove the following theorem at the end of this section:

Theorem

Let be an uncountably infinite set and be a finite set. Explicitly give a bijection .

In the meantime, let’s prove some useful theorems about uncountable sets.

Theorem

Let be uncountably infinite, and let be finite. Then is uncountably infinite.

Here’s the idea, if is not uncountably infinite, then must be countable. In both cases, we show that it is pretty absurd.

You can try doing something similar when is countably infinite. Give is a try and see what changes compared to the current proof.

Here’s another idea, if , and is uncountably infinite. Then is uncountably infinite. So there are a few ways to prove this. But here’s the idea, in CS1231S I think they’ve bottomed-out the formalism and end at just saying “Look a set is either finite, countably infinite, or uncountably infinite.” If you can believe that countably infinite happens to be the smallest “infinite” size, and you also can believe that if and only if , then this means that any infinitely sized set is at least as large as . That is to say, you now have an injection . You can take this injection, and ask: What is the range of ? Not the co-domain, that happens to be . The range is exactly the set of elements that maps to. This set is countable! In fact, is precisely a bijection from to its output set — the range of .

Now also, the range of happens to be a subset of . So, if we know that is infinitely sized, then we know that there must exist a countable subset of . This just somewhat follows from our intuition that is the smallest possible infinitely sized set.

Can we actually prove it? So what I gave you was a sketch. I used the fact that:

  1. is the smallest possible infinitely sized set.
  2. Therefore, , since is infinitely sized.
  3. Therefore, exists injective function .
  4. Then consider the range of , i.e. .
  5. is also a bijection .
  6. Also, , since .
  7. Therefore is a countably infinite subset of .

We could be more formal than this, but not in CS1231S.

Also we can also say the following:

Theorem

If there exists sets , such that and is uncountably infinite, then is uncountably infinite.

This follows from the fact that if is countable, then any subset of is either finite, or uncountably infinite.

Theorem

Let be a finite set. Then if is a set such that , then is also finite. Let be countably infinite. Then if is a set such that , then is countable.

Okay! Let’s try proving the theorem we started the subsection with. Restating it here:

Theorem

Let be an uncountably infinite set and be a finite set. Explicitly give a bijection .

Proof

  1. Let be sets such that is finite, and is uncountably infinite.
  2. Now consider .
  3. Since , and is finite, is also finite.
  4. Furthermore, since is infinite, exists that is a countably infinite subset of .
  5. Since is countably infinite, we have a bijection .
  6. Since finite, and countably infinite, then is countably infinite. (We proved this earlier on a few sections ago.)
  7. So this means that there exists a bijection function .
  8. Composing , we get a bijection .
  9. Furthermore, note that and are disjoint. (Since ).
  10. So now define our bijection

So what’s the idea? What is it trying to do? Here’s a picture:

Here’s the idea: We take set and remove the elements already in . In my picture let’s say that means elements remain. Also, we take set which I stress: is countably infinite from set . Then we re-use the idea that we can union a countably infinite set with a finite one to get a countably infinite set.

Now we need to “slot it back it”. In some sense, we’ve shifted the over to the right by positions in the diagram. We’ve made holes in the original set that can now be mapped to .

So to create the function as a bijection, the idea is: On input , you ask yourself: is ? Because if it is, we can use the function to ask what was the original value. Otherwise, , in which case you should just forward the value along. It wasn’t involved in being shifted around.

How is this relevant to computer science?

So this proof idea actually extends to computer programs and program analysis. (Also further beyond that). I have two theorems to offer you, both saying roughly the same thing. Also I will only make informal statements because to formalise these concepts, you’ll need a full semester so the notions here are only pseudo-formal but should convey the key idea nonetheless.

Claim 1: There exist unsolve-able problems by computer programs.

Claim 2: Let be the following problem: Let be the set of computer programs, let be the set of program inputs. Let be a computer program, let be a program input. Then , if program on input will terminate. Otherwise, outputs .

Let’s begin with the first one. We first need a simple formalisation of a computer program. To simplify things, let’s consider problems as functions . So is a problem, because it takes a program input, and specifies for each input whether you need to say yes, or no. Here’s an example:

Example

Is Sorted Problem: Let be the set of all lists of numbers. Then if and only if the list is sorted. outputs otherwise.

Bear in mind, is not a program, it’s a problem. That means to solve the sorted problem, you need to give a computer program , maybe in Source, or in Javascript or Python, such that for all inputs , . Why are they different? Because you can write many programs that solve the same problem. After all, I’m sure you’ve seen at least your classmates write different code to solve the same problem. Here’s another example:

Example

Is Palindromic Problem: Let be the set of all strings. Then if and only if is a palindromic string.

Okay, so here’s a few observations we need to make in our formalisation. We think of our input as finite length objects. Our input lists or strings should be finite in length. And furthermore, there are countably infinitely many of such inputs. This means that our is basically a countably infinite set.

I already hear the 1% of you protesting: “But what about infinite lists as our input or uncountably infinitely sized input sets ?“. Well do you want your program to take forever to just read the inputs? Also uncountably large is technically not an issue, but let’s table that for another day.

Okay, now you know what a problem is. I’ll tell you up front, if is countably infinite, then as a set is also countably infinite. I won’t prove this here because it’s beside the point.

Let be the set of all problems . Yes we can formalise set using first order logic, but I’ll skip that, we’re doing big picture right now.

Now, what is the cardinality of ? I’m actually going to show you that . How do we do this? Here’s the trick: We will construct an injection . And we know this interval itself has cardinality strictly larger than any countably infinite set. Thanks Dr. Georg Cantor.

The overall idea is the following:

  1. Take .
  2. Write out in binary expansion as , where , for all .
  3. Then when is , we map it to , and is , we map it to .
  4. Then this sequence of and defines a problem .
  5. The last thing to do is to map the sequence back into .

Pseudo-formally: We want to create a mapping where is injective.

  1. Since is a countably infinite set, there exists a bijective function .
  2. Let . Then can be written as , where , for all .
    1. Alternatively, one can say there is a corresponding function for which .
  3. Create a mapping , where:
    1. if ;
    2. if
    3. Note: is a bijection.
  4. Since , exists a bijective function .
  5. We now move to create a problem . This means has to be a function from .
    1. On input , note that since is a function, maps to a unique value .
    2. Since , , are functions, then:
    3. is a well-defined function.
    4. Define . Note, the choice of depends on . The choice of and are fixed beforehand. Furthermore, and are bijections.
  6. Then is a function .
  7. Thus, for an arbitrary , we have constructed a function . Thus by universal generalisation, , . In other words, on any input , there exists an output .
  8. Furthermore, let be arbitrarily chosen such that was created (using lines 1-5) using , and was created using . And further assume .
  9. Then, there exists such that . (They differ when given input .)
  10. Since and , and is a bijection:
  11. imply that .
  12. And because is a bijection, .
  13. Therefore and differ in their binary representation at location .
  14. Therefore .
  15. By contrapositivity: .
  16. Therefore , where is created from and is created from
  17. Therefore define to be a mapping where on input , it outputs by using lines 5.1 through 5.4.
  18. By lines 7, and 15, is a well-defined function since on every input, it has an output. Furthermore, on inputs , .

We’re not done! We need to argue that is injective.

Pseudo-proof:

  1. Let . Assume that .
  2. Let , let .
  3. Since , .
  4. Now, .
  5. Since bijective, this means: .
  6. Assume for the sake of contradiction that . Then for which .
    1. Since is surjective, exists for which . Now: .
    2. Contradiction, since .
  7. Therefore, . [Negation of assumption on line 6.]

This means that is an injection from . So now we say “the cardinality of is less than or equals to the cardinality of , the set of all computer problems”.

Ok what about the cardinality of the set of all computer problems? What is ? I’m going to appeal to your intuition here. Computer programs have to be finite in length. Regardless of whether you’re writing Source, or Python, or Javascript. The program description must always be finite in length. And for the sake of simplicity, let’s say the program code can only take on 256 possible characters. 2 So a code is a finite sequence of , effectively. As a shorthand, is basically a length sequence where each element in the sequence is one of possible values. Now . By the way, is countably infinite. So, is countably infinite.

So that is to say:

So what does this show? This shows there are no possible bijections between the set of computer programs , and the set of computer problems . What does no bijection mean? We know that since , there is an injection from to but no surjection (or else a bijection would exist). Intuitively, this means there are way more problems than there are solutions.

You might think: That’s it? We only know they exist? Can we at least solve everything that we care about? Well…

The Halting Problem

Consider the following problem:

Example

Let be a computer program, and denote the code of computer program . For simplicity, we will think of computer programs as functions that take any string as input, and outputs , if it terminates.

The halting problem, when set of inputs is the set of all computer program code and input pairs . I.e. . On input , decide if terminates.

Now you might think. Oh this is easy. Just take , run the program , then when it outputs something, just say , because it halted. But what if it doesn’t halt? How would we know?

The answer is that we actually can’t solve the halting problem in general.

Theorem

The halting problem is undecidable/uncomputable.

The idea is really similar to Cantor’s Diagonalisation Argument, we will assume for the sake of contradiction there is a computer program that solves this problem, we will make use of the fact that computer programs have a bijection to and therefore it is a fact that every program can be enumerated in a sequence. And then assuming the existence of then we will construct a computer program that is not enumerated in this sequence. This gives us a contraction, leading us to conclude that must not be able to exist.

Again, we will just use this very useful lemma:

Lemma

The set of all computer programs has a bijection to . The set of all possible computer inputs has a bijection to .

Intuitively, this is true, because every program code can be written as a finite sequence of characters from a finite set of characters (like ASCII). You can use the previous theorems and lemmas earlier on in the notes to basically argue that if we let be the set of programs of length , is countably infinite, because each is actually finite (It’s a finite length from finite set of characters).

Okay so again we will want to diagonalise. We want to show that if the algorithm exists, we can use it to create a new program such that is somehow not listed in the bijection .

Proof by contradiction

  1. Assume that exists a program that solves the halting problem.
  2. This means always terminates, and always outputs a correct answer when given input and , it says if terminates, and otherwise outputs .
  3. Create a new program in the following way:
    1. On input , since we know is a bijection, (handwaving a bit) we can find the number for which . This step is non-trivial but it can be done.
    2. Now, must output some program .
    3. We will now use our assumed-to-exist halting decider to see whether halts on input or not.
    4. If says , then we define program to loop forever and never terminate.
    5. Otherwise, says , in which case we define program to just immediately terminate.

We now move to show that cannot be in the output of despite being a valid program.

  1. Let’s consider the program and the input . And we want to compare ‘s behaviour against . Notice here that if were to ever terminate, then will loop forever. On the other hand, if were to not terminate, would immediately terminate.
  2. So cannot be the program. But since was arbitrarily chosen, this means cannot be the program, for all . So . But is a valid program, we just used if-else, function-call to , and looping forever can be done. And yet is clearly not surjective now.
  3. So where’s the contradiction? We know for a fact that must exist (it can be proven). So it was the function-call to that was problematic. i.e. Our initial assumption in this proof.

Thus, cannot exist. We cannot have a program/algorithm that looks at computer programs and see if they halt or not.

Here it is in picture form:


Footnotes

  1. In this part we’re basically assume exists already. There are axioms that we can use where does not exist, and we can construct it.

  2. Yes I know unicode is a thing. I’m too lazy to mention how many codepoints there are etc.