1. Introduction
In this article, we’ll give an introduction to the mathematics of big-O notation, as well as show an example of a big-O proof.
2. Formal Definition
Definition: f(x) = O(g(x)) means that there exist two positive constants, x1 and c, such that 0 ≤ f(x) ≤ cg(x) for all x ≥ x1.
3. The First Positive Constant: x1
When saying f(x) = O(g(x)), we say “f of x is big-O g of x“.
Don’t let the equals sign fool you: f(x) is a function and O(g(x)) is a set. You can’t have a function equalling a set. That’s like saying that the Earth equals the Solar System. It’s more like, the Earth belongs in (the set of planets which comprise) the Solar System. Similarly, f(x) belongs in a set of functions called O(g(x)) (big-O g of x).
We have two functions now – f(x) and g(x). Functions grow at different speeds. For example, a quadratic function will grow quicker than a linear function. But sometimes, it takes some time for the quadratic to catch up!
For example, if we have a(x) = 2x2 + 2x + 1 and b(x) = 10x, we’ll have a(1) = 5 and b(1) = 10. But say now we choose x = 15. Now we have a(15) = 450 + 30 + 1 = 481 and b(x) = 150. What’s more is that for any value greater than x = 15, say x = 20, a(x) will be bigger than b(x). (Now is a good time to refer back to the formal definition; this is the x ≥ x1 part).
Importantly for the case of understanding big-O formally, we don’t particularly care at which point a(x) began outgrowing b(x). Just that it did, and that from that point onwards, it continues to be larger than b(x).
4. The Second Positive Constant: c
We saw above how it took a while for a(x) to catch up to b(x). Eventually, it did, but it took a while.
The reason it eventually grows faster is because of the 2x2 term. Or, more precisely, the x2 part of it. This tells us that a(x) = O(x2). Using the symbols of the original definition: f(x) = a(x) and g(x) = x2.
For big-O, we don’t care about the other terms in a(x), or the constant 2 in the 2x2 term.
Looking again at our definition from section 2, this is where the constant c comes in. We can scale g(x) by any positive constant – as long as f(x) stays smaller than the scaled version (past a certain point, x ≥ x1).
If we take d(x) = 3x3 + 2x + 10, we’ll need to find values of c and x1, such that d(x) ≤ cx3 for all values greater than x1. This is exactly what we do in section 4.
5. Putting the Pieces Together
In order to prove f(x) = O(gx), we need to find two positive constants, c and x1, such that 0 ≤ f(x) ≤ cg(x) for all x ≥ x1. We need to find values for c and x1 such that the inequality holds.
What it means, is that past a certain point, a scaled version of g(x) will always be bigger than f(x).
6. Example
Let d(x) = 3x3 + 2x + 10.
Suppose we wish to prove that d(x) = O(x3). This means we need to find two positive integers, c and x1 such that 0 ≤ d(x) ≤ cx3 for all x ≥ x1.
Well, we know that
d(x) = 3x3 + 2x + 10 ≤ 3x3 + 2x3 + 10x3 = 15x3
so,
d(x) ≤ 15x3.
So if we set x1 = 1 and c = 15, then we have that for any x ≥ x1, 0 ≤ d(x) ≤ cx3. So d(x) = O(x3).
We could have found other values for x1 and c which satisfy the above condition. All that matters is that there exist values of x1 and c which satisfy the condition.
7. Conclusion
In this article, we focused on an introduction to the theory of big-O notation.
A more practical look at this topic can be found here.