2022/05/10 – 2022/07/20
If you have a little background in classical constructions of the real numbers, you might have noticed that all of the usual constructions start with the rational numbers as a base, and then build up to real numbers. For example, Cauchy real numbers are constructed as sequences of rational numbers satisfying the Cauchy condition on the usual rational metric. Dedekind real numbers are constructed as sets of rational numbers satisfying the conditions to make them Dedekind cuts with the usual ordering.
This is a post on constructing real numbers without constructing rational numbers. Along the way the rationals will sort of be constructed, or at least heavily implied, but they donʼt directly figure into the definition! Instead all we need is functions from integers to integers.
The construction weʼll be talking about is my favorite esoteric construction of real numbers: the Eudoxus real numbers.
I want to explore why this definition works, not merely recite its definition and properties. So letʼs go on a journey together through various properties of linear and almost linear functions, and learn how to leverage “wrong” definitions of slope to create the real numbers!
If you donʼt have any background in real analysis or constructions of the real numbers, thatʼs okay. Bring along your intuitions nonetheless, they will aid you greatly! We will go over the necessary details as we get to them, although you may miss some of the depth of the discussion when I reference those other things. Just know that thereʼs a lot of strange and marvelous things going on that make constructing the real numbers an essential part of mathematics. Are these marvelous properties of the real numbers all coincidences? Surely not 😉.
Like all constructions of fundamental objects, we need to begin with our existing intuition about what real numbers are in order to construct them out of more basic building blocks. Just like the integers are built from natural numbers, and rational numbers are built from integers, real numbers need to be constructed from natural numbers, integers, and/or rationals. In the case of real numbers, we will need two ingredients: approximations, and a lot of them – infinitely many, in fact.
The key property of approximations is that they need to be easier to grasp than the thing we are approximating, otherwise it doesnʼt buy us any leverage! In our circumstance in particular, each individual approximation should contain a finite amount of data, and then we stitch together an infinite number of approximations to represent the complicated object we are constructing: a real number.
A convenient way to say that something has a finite amount of data is to say it is countable. In programmer terms, we could say we want it to be serializable into a string – itʼs equivalent. Why? Well each natural number or string has a finite amount of data, it is easy to understand or describe all at once. So if some set we are studying is countable/serializable, meaning it can be understood as if it was a set of natural numbers, then it can serve as an approximation for our purposes.
Thereʼs much more to say about this kind of process, indeed I hope to write a whole nother blog post about it, but for now know that there are two main questions to ask:
For example, letʼs think about approximation generators, functions that generate approximations by taking in an error bound and returning a rational number that lies within that error bound of the real number we are trying to approximate.Rational numbers are countable, so they qualify as a finite approximation for us! Because we can ask for smaller and smaller error bounds, we in fact get an infinite number of approximations that probe closer and closer to the real number. This is good: when approximating an irrational number, no single rational approximation will suffice to capture it! But not all functions will be well-behaved approximation generators, and even of those that are, there will be many ways of approximating the same number. Thatʼs why we ask those two questions: when do the approximations work, and when do they agree with each other.
Answering these two questions will let us invent real numbers. However, the process to get there wonʼt quite be linear: the notion of approximation will not be as clear as it was in the example above!
Letʼs dive in!
The idea of Eudoxus real numbers is that they represent a real number indirectly, via a functionʼs slope. We will cover why this works later, but first letʼs agree on what slope is so we can proceed with the construction and then analyze its properties that make it all work.
Letʼs say we have a function that takes in numbers and outputs numbers. (We donʼt need to be formal about what kind of numbers yet.)
What definitely has a slope? Well our favorite function from school mathematics does: . Weʼve memorized that this has slope !
Why do we say it has slope ? What is slope? Well, weʼve also memorized that slope is “Rise Over Run”. If we take two points at and , the distance we run from the first to the second point is , and the distance we had to rise to keep up with the function is . Slope as “rise over run” is therefore
The beautiful thing about is that functions of this form have constant slope, no matter what and are!
The inputs and disappear, meaning it traces out a line with a constant slope – itʼs why we call it a linear function. So weʼre pretty happy saying that this function has slope . Tada! 🎉
This is where we have to ask what kind of numbers we are using, because that determines what can be. If the function takes in real numbers and outputs real numbers, can surely be any real number too. But tough luck – we canʼt use that to construct real numbers, no circular definitions allowed!
Maybe can be any rational number. Thatʼs fine – slope works in the same way. But we run into another roadblock: the function still only has one well-defined slope, and itʼs stuck being a rational number. Thereʼs no magic of infinity happening.
What if we say that is an integer function? Here we have to be careful: integer division isnʼt necessarily well-defined, but if we know is an integer, then it happens to work out in this case: the denominator will always evenly divide the numerator, and out pops . This seems like an even worse situation, getting stuck with integers! But wait …
We need a little wiggle room. Having a constant slope is too restrictive! Can we loosen up and find something that is still like slope, but only approximately?
Having hit these roadblocks, we need some inspiration. Letʼs revisit the idea of integer functions but in a new way.
Hereʼs the thing: consider a graphing calculator, or a graphing app on a computer. What happens when we ask it to graph ? (Pick your favorite irrational-but-computable real number there.)
It does some math, and shows us a line on a screen … but wait, that line isnʼt a perfect mathematical function from real numbers to real numbers, itʼs a pixelated entity: itʼs an integer function that approximates the real function.Okay, thereʼs some slight details here, like antialiasing and the fact it could draw several pixels stacked vertically. But for the sake of simplification letʼs assume that it only renders one pixel in each column, making it an honest-to-goodness integer function.
What happens as we keep scrolling the screen? We could keep sampling snapshots of the screen at each moment, and see how they vary.
If we had picked an integer slope, every snapshot would appear to have the same slope. For a slope of , there is never any change. For a slope of , it steadily moves up pixel by pixel. For a slope of , it skips a pixel as it goes. Et cetera. This isnʼt so interesting, yet!
But if we had picked a rational slope, we start to see something interesting happen: It doesnʼt always step by the same amount, sometimes it jumps more than it does other times.
For instance, a slope of would only jump up a single pixel every pixels. A slope of would jump up pixels one time, then jump up a single pixel twice more, making visible groups of three.
More complicated fractions will look more complicated and repeat over longer distances. Can you figure out what the third line graphed in this figure is? (Hint: how long does it take to repeat?)
However, this same phenomenon occurs for irrational slopes too: how do we tell the difference now??
The key is not how the jumps repeat in individual snapshots, but how they repeat across the whole domain of the function.
Do the snapshots recur in a nice pattern? If so, that must be graphing a line with rational slope! They will recur when the window has shifted by the denominator of the slope.
If the snapshots recur irregularly, without fitting a nice pattern, it is graphing an irrational number.
(Since there are a finite number of snapshots that will be generated for a particular line, they must recur – thatʼs not the point. The point is whether they recur on a fixed schedule.)
This is where we cross over from computation to mathematics: We canʼt hope to decide whether this is the case from a finite number of snapshots! Instead we leave it to the realm of mathematical proof to definitely establish whether a slope is rational or irrational. (We might not know!)
Okay, we are ready to take the knowledge we learned from graphing calculators and translate it back into mathematics.
We agreed that linear functions of all types have slope, but only linear functions using real numbers contained enough data to represent real numbers. However, we saw that calculators approximate real functions with integer functions – this is our hope of salvation, if we can connect the two ideas.
What, then, will tell us whether some integer function approximates a real linear function?
We said that integer-function approximations to lines with rational slopes had a key property: they recur in a consistent pattern!
If we call the period of recurrence , this property is that , for some .
What is this mysterious though? Itʼs the rise for the run . Weʼre saying that the function repeats every time we shift by steps of the -axis, but it has shifted by in the -axis in the meantime.
Thatʼs right, weʼve circled back to good olʼ slope, but in a new way: instead of saying that we want the function to be linear by having rise over run being constant for all we choose, we just want it to be constant when is a known period.
This is much less restrictive: in particular, it lets us get rational slopes back out of integer functions! If we define the slope to be , we see that for integer approximations it matches the slope of the original rational linear function, which we said would recur with period of the denominator, and of course it must travel vertically by the numerator in that time.
With how they wave around, these two functions donʼt look linear anymore, but they do follow a nice recurrence. Their shapes keep repeating every or pixels, while they climb or fall by half that each time. That is, they have slope and in our new system, even though they are not linear.
We hit a roadblock with irrational slopes, though. If we donʼt have a period to work with, how can we get our hands on a slope from an integer function??
The answer is … [drumroll] … we donʼt!!!!
Huh?!?! 🤨😧😱
What. Donʼt give me that look. 😄😌😇
Look here: we said linear functions were boooring. Linear functions were just their slope! pluuus a second term that merely shifts it vertically, as if that matters. Yawn 😴😪.
However, now weʼre going to use this correspondence as leverage to do something truly exciting: represent slopes as the functions weʼre given!
We donʼt need an independent notion of slope if we can isolate the properties that arenʼt slope in order to distill the functions down into their slopiest essence. 🍾✨
As we just said, shifting by a constant amount vertically will not impact slope. However, we can do infinitely better than this.
Recall that weʼre looking at integer functions through their whole domain, not just at a single slice anymore.
Say the calculator made an error and calculated a pixel incorrectly. (The horror! Quick – letʼs blame it on cosmic radiation.)
Should this single error affect the slope? Nope! Itʼll quickly get washed away in the infinite other data that informs us what the slope is.
Should another error affect the slope? Nahhh. Should any finite amount of errors affect the slope? Not even!
Thus we can say that shifting vertically doesnʼt influence the slope and any finite number of errors wonʼt influence it either.
Putting these together, we can wrap them up as a single property, and with a little extra sprinkle on top: any two functions that differ by a bounded amount represent the same slope!This accounts for the two above properties as well as infinite differences that are not merely a shift but are still bounded. In fact, this allows us to get rid of horizontal shifts, as we will see later.
Finally we get to identify what slope actually is. Took us long enough!
Hereʼs the key property: we want to ask that the quantity is bounded for the integer functions we are considering.
The nLab calls this property “almost linear”. Letʼs also call the “wiggle” of the function between and .
As a first check, letʼs see that this happens for actually linear functions :
Whoa, itʼs not only bounded, itʼs actually constant! And the constant is the vertical shift we said we didnʼt care about, interesting.
What about for integer approximations of linear functions?
First we can look at the linearly-periodic approximations of rational linear functions: saying the period is , we mean that for all . So as one example, if we pick , then for any , the quantity weʼre looking at is just . Constant, thus bounded. (Note how nicely this matches up with the perfectly linear case, since then!)
We can play this game again: as well.
But what about other values of that donʼt fit into the period? The key is that it is not just linearly-periodic, but actually periodic in both and now, having the same period of course:
The cancels out from the two terms that changed.
The upshot is that we donʼt need to worry about all values of , just one period worth of it, from to . Explicitly, the wiggle of the function is bounded by the quantity
We didnʼt identify a property for abstractly looking at approximations of linear functions with irrational slopes, like we did with approximations of rational slopes. However, if we look at a typical approximation function, we know that, with whatever integer rounding it is using, its integer value will be within of the actual real value at any given point. So when you look at the three terms , , and , it is clear that the wiggle will be within of the value of the wiggle of the underlying linear function, which we know is constantly . So the wiggle is clearly bounded.
Having satisfied our curiosity that we have identified the necessary ingredients, letʼs finally define the actual mathematical object and the operations on it.
The Eudoxus real numbers are almost linear functions but where two functions are considered the same when they have a bounded difference.In technical terms, this form of definition is known as a subquotient in type theory/category theory: it is a quotient (= set of equivalence classes) of a subobject (= subset).
We can add slopes in the obvious way, by adding the functions together (pointwise!): . And we can negate pointwise , and the zero for this is of course the constantly zero function: . For example, we can add slope to slope to get slope :
Huh, that middle line doesnʼt look like the graph of the line with slope ! Itʼs very flakey, like it almost canʼt decide whether it wants to go up or down. Still, it looks like it does have the right trend overall, a general slope of , and we will prove this later: addition works out just fine, within the bounded wiggle we expect!
Now we get to my favorite part: how to multiply slopes.
Multiplying the functions pointwise is the wrong move: that would produce something like quadratic functions, or worse.
That is a sad looking parabola indeed ☹️.
Hmm … oh wait! What about composing functions?
If you compose two functions, you multiply their slopes! So we have . This suggests that the identity function acts as the identity for this multiplication, (of course it has slope !).
The goal is to come up with an ordering on almost-linear functions that is consistent with the arithmetic and the equivalence relation. So we can take a shortcut: we only need to define what it means to be positive, then is defined by the difference being positive.
Before we ask what it means to positive or negative, what does it mean to be zero? We said that zero was the constant function – but wait, itʼs actually the equivalence class of the constant function, which consists of all functions that are bounded (since that means they have bounded difference from zero).
Positive and negative, then, are going to be unbounded functions. How do we distinguish them?
The basic idea is that positive will grow unboundedly positive to the right and negative will grow unboundedly negative to the right. In formal terms, we say that is positive if, upon picking some arbitrary height , it is eventually always above as it goes towards the right: i.e. there is some such that for all , .
There are some properties to verify.
First off, why is it sufficient to just consider the behavior of the function to the right (i.e. for )? We would need to know that growing unboundedly positive to the right is the same as growing unboundedly negative to the left. This conforms with our intuition of how almost linear functions have slope, but it requires a proof.
The other properties of how the ordering interacts with arithmetic are straightforward, however:
The million dollar question: does it really have a slope?
To determine this, weʼll basically use Cauchy real numbers. Recall that those are equivalence classes of sequences of rational numbers. Can we construct a Cauchy real number from our definition of real numbers, to capture the slope of our almost-linear functions?
Recall our definition of slope as rise over run. We should be able to pick two arbitrary points and compute their slope, right? It will be a rational number, since weʼre dealing with integer functions. And then presumably as those points get further and further away from each other, the approximation will get better and better.
We might as well pick as our basepoint and move out to positive numbers, like so:
Except notice that is a constant in the numerator, and since the denominator grows to , some basic properties of limits tell us that it is irrelevant, so we can instead use this limit:
In fact this realization is the key step that informs us that the limit is well-defined on equivalence classes: any bounded difference between two functions will not affect this limit. So we could also assume that without loss of generality, since that will be in the same equivalence class.
Now we just need to show that it satisfies the Cauchy condition, assuming is almost linear:
We will go through what this means including a proof later, since it requires more machinery.
But weʼve half-answered our question already: we have done what we can to isolate slope from the other data the functions carry, and it just remains to confirm that we can in fact define slope from it.
Thereʼs another obvious property that we need slopes to be invariant under:
Weʼve covered vertical shifting, but what about horizontal shifting? Does bounded (vertical) differences suffice to cover bounded (horizontal) differences? Obviously not for arbitrary functions, but hopefully for the almost-linear functions we are covering?
It turns out yes, being almost-linear is exactly what it means that horizontal shifts of the function require just bounded vertical shifts to compensate: we need bounded (for fixed ), but being almost linear means that is bounded which is different just by the constant .
For example, these two functions with slope are shifted by relative to each other, so their different forms a pretty checkerboard-like pattern, which is clearly bounded, only taking the values and as the functions grow in close proportion.
Thereʼs one property we have not mentioned yet: monotonicity.
All of the linear approximation functions we would come up with are monotonic: either for all , (weakly monotonically increasing), or for all , (weakly monotonically decreasing). But weʼve never mentioned this. Why not include it as a property we care about having or respecting?
The first clue is what we saw above with addition: when we added two monotonic representative functions (with slope and ), the result wasnʼt monotonic anymore, it went up and down and up and down, although in a nicely repeating pattern. So our naïve definition of addition did not preserve monotonicity.Note that the plain definition of multiplication does preserve monotonicity!
However, you can in fact take any representative function and make it monotonic by only a bounded difference – but with one big catch: you have to know up front whether the slope is zero, positive, or negative,— and that property is undecidable.
So it just seems more fussy and complicated to require the representative functions be monotonic, even though it could be done without resulting in a different theory.
All of the lemmas involving addition are straightforward, so I wonʼt go through them here, but feel free to ask me if you have questions!
This is my favorite theorem, and the reason I like this construction so much.
Distributivity is what lets you pull out common factors, like so: .
Watch what happens when we use our definitions of these operators (recall that multiplication is function composition):
Thatʼs right, it falls out for free based on how we defined multiplication and addition, just because of how functions work! Isnʼt that so cool?
But … thereʼs a catch.
Proving that multiplication on the left distributes over addition is not so straightforward:
There doesnʼt seem to be a direct way to attack this. However, once we prove that multiplication is commutative, it doesnʼt matter anymore. (In fact, I suspect that proving left-distributivity directly requires similar arguments to the commutativity of multiplication anyways.)
Thereʼs two aspects to show that the multiplication is well-defined: since weʼre dealing with subquotients, we need to show that the result satisfies the necessary property, and also that it respects the quotienting relation.
I think this first aspect requires one of the more tricky proofs: is almost linear if and are. (Remember we defined multiplication as the composition of those functions, not pointwise multiplication!)
Assume and are bounded, then show that is bounded as well.
For the set-up, we observe that is bounded (by assumption), and we choose strategic functions and .
In particular, if is bounded by , then is clearly also bounded, thus so is – namely, by the maximum of on the interval .
What quantity do we know is bounded? Letʼs choose .
For the other function we choose , which makes their sum .
Plugging these in, we see that is bounded. But is also bounded by the first assumption. Thus their difference is bounded. Q.E.D.
We also need that if and then . We can decompose this into two steps, varying one side at a time: and .
The first is trivial: if is bounded, of course is also bounded, just by properties of functions!
The second step also makes sense, but is trickier to formalize: if is bounded, then being almost linear should preserve this bounded difference. Itʼs not like can stray too far away so as to make an unbounded difference on closely bounded inputs.
So how do we knead the almost-linear condition into a form that tells us about ? Well, what it looks like now is: and to ask about the difference of at certain input, we want to pick and , which makes , giving us: But weʼre done now, since is bounded, making its image under bounded, so using the above fact, is really bounded as we wanted.
Following Arthanʼs exposition, we will need some more machinery before we tackle the rest of the proofs. Using the tools of mathematical analysis we will establish further properties that capture the almost-linearity of functions, using the existing property of bounded wiggle.
Let be a bound for the wiggle of the function; i.e. for all and .
The first lemmaPart of lemma 7 in Arthan we want to establish is the followingExercise: Why is this related to being an almost linear function? What if was actually a linear function?:
We need to use induction on . This wasnʼt obvious to me at first, but it makes sense in retrospect!
For the base case , it is easy: However, letʼs rewrite it using the point , to make the next steps of induction clearer:
Now we need to establish that if it holds for , then it holds for :
How do we get from to ? The difference is : but this is actually , so it is also less than in absolute value.
So this means that each step we take changes it by less than , and weʼre done by inductionTechnically we also need to cover the cases where is negative, but thatʼs easy.. We can write it all out in one step like this:
Using that lemma, we get to the next lemmaLemma 7 in Arthan in a very slick manner:
Our task here is to compare and . The magic of the above lemma is that it compared these values to a common value: . So we have what we need now:
While weʼre here, we need one more lemmaLemma 8 in Arthan that certifies that acts like a linear function, approximately. We show that there are some constants and such that the following holds for all :
Notice that this also says that is behaves like a linear: it cannot grow outside the confines of the linear function (although those and may be large numbers, thus providing only loose bounds on ).
It comes immediately from the above lemma: take to get so with some rearranging we get by itself and a coefficient for :
Thus we can take and to accomplish the task.
Now we finally have the tools we need to prove that we can get a slope out of these almost linear functions! Recall that we were going to define slope as
And to show that it converges, we show that it is a Cauchy sequence, meaning that terms in the sequence get arbitrarily close beyond a certain point:
Well, the lemma above gives us exactly what we need, just pretend that and stand for and :
Notice how the numerator is like and the denominator is like , clearly the denominator will win as and get large!
To give more details, notice how we can bound the fraction like so since and .
So now to get to be less than , we need . So as gets really tiny and close to , has to grow very large in response. But this is fine: it exists by the Archimedean property, completing our proof.
Let , be two almost-linear functions, we need to show that by showing that is bounded.
We will use the same trick as above, comparing and to a common term. In fact itʼs a titch more subtle still: we will add extra factors of first, comparing and to the common term . Do you see where this is going? We get to use our lemmas from above!
Take and in our favorite lemma, to give us these two inequalities: and
Now to get we take the sum, getting a bound of
That kind of looks awful, Iʼm not going to lie, but weʼre sooo close. Remember the other lemma that told us that and behaved roughly like ? We can use that here to say that it all behaves like :
We can squash all that nonsense behind two new constants and , reaching our next conclusion:
Take a breath … One last step.
Both sides now behave like . If you look really closely, this actually implies that is bounded: it must sort of behave like (the slope of ) on the other side. (It certainly canʼt behave like or or anything else that grows faster!)
More specifically, if we have , we can divide through by to get
This is only valid when , however! Since is an integer, it would then have to be at least , so in that case.
And dealing with the case on its own poses no problem (it bounds itself), so overall we have
This establishes that is bounded, and thus multiplication commutes!
Whew! Hopefully you enjoyed our journey through almost-linear integer functions to get to Eudoxus real numbers.
Here are some things I did my best to convey in this article:
Thereʼs a lot still to talk about – like “what does it mean to construct a mathematical entity and what tools do we use to do so?”, and we didnʼt even prove that multiplicative inverses exist (what would that look like visually? I bet you have the intuition!) or that the Eudoxus reals are sequentially complete. But this is more than enough for now – the intuition is more important than the details.
For more on Eudoxus reals, including a cool construction of as a Eudoxus real number and references to further connections made with the construction, I would recommend Matt Baker’s Math Blog. And of course see the nLab, where I originally learned about this construction and which I already referenced several times, and Arthanʼs exposition, which helped me fill in the tougher proofs.