Chapter 5. Integration

Integration evolved over centuries from geometric and physical considerations related to the determination of areas and volumes.

Definition. Let \([a, b]\), \(a < b\), be a compact interval. By a partition P of \([a, b]\) we mean a set of points \(x_0, x_1, \dots, x_m\) with \(a = x_0 < x_1 < \dots < x_m=b\). Suppose \(f: [a, b] \to \mathbb{R}\) is a bounded function. We define

\(M_i := \sup_{x \in [x_{i-1}, x_i]} f(x), \quad m_i := \inf_{x \in [x_{i-1}, x_i]} f(x), \quad i = 1, \dots, m,\)

\(U(P, f) := \sum_{i=1}^m M_i (x_i - x_{i-1}), \quad L(P, f) := \sum_{i=1}^m m_i (x_i - x_{i-1}),\)

"upper sum" \(\quad\quad\quad\quad\) "lower sum"

and

\(\int_a^{b*} f(x)dx := \inf \{ U(P, f) : P \text{ is partition of } [a, b] \}\),

"upper Riemann integral"

\(\int_{a*}^b f(x)dx := \sup \{ L(P, f) : P \text{ is partition of } [a, b] \}\),

"lower Riemann integral"

If \(\int_a^{b*} f(x)dx = \int_{a*}^b f(x)dx\), then we say that the function f is Riemann-integrable on \([a, b]\) and the Riemann integral of f on \([a, b]\) is defined as

\(\int_a^b f(x)dx := \int_a^{b*} f(x)dx.\)

Moreover, we define the set

\(\mathcal{R}([a, b]) := \{ f: [a, b] \to \mathbb{R} : f \text{ is Riemann-integrable} \}\)

and introduce the conventions

\(\int_a^a f(x)dx = 0\), \(\int_b^a f(x)dx = -\int_a^b f(x)dx \quad (a < b).\)

Remarks. i) For every bounded function \(f: [a, b] \to \mathbb{R}\) we have

\(\int_a^{b*} f(x)dx, \int_{a*}^b f(x)dx \in \mathbb{R}\)

and \(\int_{a*}^b f(x)dx \le \int_a^{b*} f(x)dx.\)

ii) Not every bounded function is Riemann-integrable, e.g., consider \(f: \to \mathbb{R}\),

\(f(x) := \begin{cases} 1, & x \in \mathbb{Q} \cap \\ 0, & \text{else} \end{cases}\), then we have

\(\int_0^{1*} f(x)dx = 1\) but \(\int_{0*}^1 f(x)dx = 0.\)

iii) It is not important which letter we choose

to represent the variable of integration:

\(\int_a^b f(x)dx = \int_a^b f(t)dt = \int_a^b f(\alpha)d\alpha = \dots\)

Example. Let \(f: [a, b] \to \mathbb{R}\), \(f(x) = \begin{cases} y_1, & x = a \\ c, & a < x < b \\ y_2, & x=b \end{cases}\),

where \(y_1, y_2\) and \(c\) are fixed real constants.

For any partition P of \([a, b]\) we have

\(U(P, f) = \max\{y_1, c\} (x_1 - a) + \sum_{i=2}^{m-1} c(x_i - x_{i-1}) + \max\{y_2, c\} (b - x_{m-1})\)

\(= \max\{y_1, c\} (x_1 - a) + c(x_{m-1} - x_1) + \max\{y_2, c\} (b - x_{m-1})\)

As \(x_1\) can be chosen arbitrarily close to \(a\), and \(x_{m-1}\) can be chosen arbitrarily close to \(b\), we obtain

\(\int_a^{b*} f(x)dx \le c(b-a)\). Similarly, we obtain

\(\int_{a*}^b f(x)dx \ge c(b-a)\), which shows that \(f \in \mathcal{R}([a, b])\)

with \(\int_a^b f(x)dx = c(b-a)\).

Definition. Let P and P* be partitions of \([a, b]\). We say that P* is a refinement of P if every point of P is a point of P*.

Given two partitions \(P_1\) and \(P_2\) of \([a, b]\), we call \(P^* = P_1 \cup P_2\) their common refinement.

Theorem 44. Let \(f: [a, b] \to \mathbb{R}\) be bounded, and let P, P* be partitions of \([a, b]\) such that P* is a refinement of P. Then

\(L(P, f) \le L(P^*, f)\) and \(U(P^*, f) \le U(P, f)\).

Proof. We only show the first inequality. To this end, it is sufficient to consider the case that P* contains just one point more than P. Let this point be \(x^*\) and suppose that \(x^* \in (x_{j-1}, x_j)\), where \(a = x_0 < x_1 < \dots < x_m=b\) are the points of P. Then we have

\(m_j = \inf_{x \in [x_{j-1}, x_j]} f(x) \le \min \{ \inf_{x \in [x_{j-1}, x^*]} f(x), \inf_{x \in [x^*, x_j]} f(x) \}\),

and thus \(L(P, f) = \sum_{i=1}^m m_i (x_i - x_{i-1})\)

\(= \sum_{i=1, i \neq j}^m m_i (x_i - x_{i-1}) + \underbrace{m_j (x_j - x^*) + m_j (x^* - x_{j-1})}_{\le \inf_{x \in [x^*, x_j]} f(x) (x_j - x^*) + \inf_{x \in [x_{j-1}, x^*]} f(x) (x^* - x_{j-1})}\)

\(\le L(P^*, f)\).

Theorem 45. Let \(f: [a, b] \to \mathbb{R}\) be bounded. Then we have \(f \in \mathcal{R}([a, b])\) if and only if for every \(\epsilon > 0\) there exists a partition P of \([a, b]\) such that

\(U(P, f) - L(P, f) < \epsilon\).

Proof. "\(\implies\)" Let \(f \in \mathcal{R}([a, b])\) and \(\epsilon > 0\) be given. By definition of integrability, we can find partitions \(P_1\) and \(P_2\) of \([a, b]\) such that

\(\int_a^b f(x)dx - L(P_1, f) < \frac{\epsilon}{2}\), \(U(P_2, f) - \int_a^b f(x)dx < \frac{\epsilon}{2}\).

\(\implies U(P_2, f) - L(P_1, f) < \epsilon\)

\(\implies (\text{Thm 44}) \ U(P^*, f) - L(P^*, f) < \epsilon\), where P* is the common refinement of \(P_1\) and \(P_2\).

"\(\impliedby\)" Let \(\epsilon > 0\) be given and let P be a partition of \([a, b]\) such that \(U(P, f) - L(P, f) < \epsilon\).

\(\implies 0 \le \int_a^{b*} f(x)dx - \int_{a*}^b f(x)dx \le U(P, f) - L(P, f) < \epsilon\).

As \(\epsilon > 0\) is arbitrary, we have \(\int_a^{b*} f(x)dx = \int_{a*}^b f(x)dx\).

Theorem 46. i) If \(f: [a, b] \to \mathbb{R}\) is continuous, then \(f \in \mathcal{R}([a, b])\).

ii) If \(f: [a, b] \to \mathbb{R}\) is monotonic, then \(f \in \mathcal{R}([a, b])\).

iii) If \(f: [a, b] \to \mathbb{R}\) has only finitely many points of discontinuity, then \(f \in \mathcal{R}([a, b])\).

iv) Suppose \(f \in \mathcal{R}([a, b])\), \(m \le f(x) \le M\) on \([a, b]\), and \(g: [m, M] \to \mathbb{R}\) is continuous. Then for \(h: [a, b] \to \mathbb{R}\), \(h(x) := g(f(x))\), we have \(h \in \mathcal{R}([a, b])\).

Proof. i) First, we observe that a continuous function, defined on a compact set K (here \([a, b]\)), is "uniformly" continuous in the following sense:

(*) \(\forall \epsilon > 0 \ \exists \delta > 0 \ \forall x, y \in K \text{ with } |x-y| < \delta : |f(x) - f(y)| < \epsilon\).

To see this, let us assume it is not true; then

\(\exists \epsilon > 0 \ \forall n \in \mathbb{N} \ \exists x_n, y_n \in K \text{ with } |x_n - y_n| \le \frac{1}{n} : |f(x_n) - f(y_n)| \ge \epsilon\).

As K is compact we find \((x_{n_k})_{k=1}^\infty\) such that

\(\rho := \lim_{k \to \infty} x_{n_k} \in K \implies \lim_{k \to \infty} y_{n_k} = \rho\)

\(\implies (f \text{ is cont.}) \lim_{k \to \infty} (f(x_{n_k}) - f(y_{n_k})) = f(\rho) - f(\rho) = 0\), which is a contradiction.

Now, in order to show that \(f \in \mathcal{R}([a, b])\), let \(\epsilon > 0\) be given, and choose \(\delta > 0\) according to (*). We choose a partition P of \([a, b]\) such that

\(x_i - x_{i-1} < \delta\), for \(i=1, \dots, m\).

\(\implies\) For \(x, y \in [x_{i-1}, x_i] : |f(x) - f(y)| < \epsilon \quad i=1, \dots, m\)

\(\implies M_i - m_i = \sup_{x \in [x_{i-1}, x_i]} f(x) - \inf_{x \in [x_{i-1}, x_i]} f(x)\)

\(= \sup_{x, y \in [x_{i-1}, x_i]} (f(x) - f(y)) \le \epsilon, \quad i = 1, \dots, m.\)

\(\implies U(P, f) - L(P, f) = \sum_{i=1}^m (M_i - m_i)(x_i - x_{i-1}) \le \epsilon(b-a)\).

As \(\epsilon > 0\) is arbitrary, \(f \in \mathcal{R}([a, b])\) by Thm 45.

ii), iii) and iv) can be proved by similar arguments based on Thm 45.

Example. Consider \(f: \to \mathbb{R}, f(x) = e^x\), then by Thm 46 i) we know that \(f \in \mathcal{R}()\). We want to compute \(\int_0^1 e^x dx\).

We define a sequence of partitions: For \(m \in \mathbb{N}\), let \(P_m\) be the partition of \(\) with \(x_i = \frac{i}{m}\), \(i = 0, \dots, m\).

\(\implies U(P_m, f) = \sum_{i=1}^m e^{x_i}(x_i - x_{i-1}) = \frac{1}{m} \sum_{i=1}^m (e^{1/m})^i\)

\(= \frac{1}{m} e^{1/m} \frac{e-1}{e^{1/m}-1} = e^{1/m} \frac{1/m}{e^{1/m}-1} (e-1) \to e-1, \quad m \to \infty.\)

Similarly, we obtain \(L(P_m, f) = \sum_{i=1}^m e^{x_{i-1}}(x_i - x_{i-1}) \to e-1, \quad m \to \infty\);

\(\implies \int_0^1 e^x dx = \int_0^{1*} e^x dx \le e-1 \le \int_{0*}^1 e^x dx = \int_0^1 e^x dx\)

\(\implies \int_0^1 e^x dx = e-1\).

Theorem 47. i) If \(f, g \in \mathcal{R}([a, b])\) and \(c \in \mathbb{R}\), then \(f+g \in \mathcal{R}([a, b])\), \(c \cdot f \in \mathcal{R}([a, b])\), and

\(\int_a^b (f(x)+g(x))dx = \int_a^b f(x)dx + \int_a^b g(x)dx\),

\(\int_a^b c \cdot f(x)dx = c \int_a^b f(x)dx\).

ii) If \(f, g \in \mathcal{R}([a, b])\) with \(f(x) \le g(x)\) on \([a, b]\), then

\(\int_a^b f(x)dx \le \int_a^b g(x)dx.\)

iii) If \(a < b < c\) and \(f \in \mathcal{R}([a, c])\), then \(f \in \mathcal{R}([a, b])\) and \(f \in \mathcal{R}([b, c])\), and we have

\(\int_a^c f(x)dx = \int_a^b f(x)dx + \int_b^c f(x)dx.\)

iv) If \(f \in \mathcal{R}([a, b])\), then \(|f| \in \mathcal{R}([a, b])\) and

\(| \int_a^b f(x)dx | \le \int_a^b |f(x)| dx.\)

v) If \(f, g \in \mathcal{R}([a, b])\), then \(fg \in \mathcal{R}([a, b])\).

Proof. i) We only focus on the sum, so let us assume that \(f, g \in \mathcal{R}([a, b])\). By Thm 45, we find partitions \(P_1\) and \(P_2\) of \([a, b]\) such that

\(U(P_1, f) - L(P_1, f) < \frac{\epsilon}{2}\), \(U(P_2, g) - L(P_2, g) < \frac{\epsilon}{2}\)

\(\implies\) For the refinement \(P^* = P_1 \cup P_2\) we have

\(U(P^*, f+g) - L(P^*, f+g)\)

\(\le U(P^*, f) + U(P^*, g) - L(P^*, f) - L(P^*, g)\)

\(\le U(P_1, f) - L(P_1, f) + U(P_2, g) - L(P_2, g)\)

\(< \epsilon. \implies (\text{Thm 45}) \ f+g \in \mathcal{R}([a, b])\).

Moreover, we have \(\int_a^b (f(x)+g(x))dx \le U(P^*, f+g)\)

\(\le U(P^*, f) + U(P^*, g) \le U(P_1, f) + U(P_2, g)\)

\(\le \int_a^b f(x)dx + \int_a^b g(x)dx + \epsilon\).

As \(\epsilon > 0\) is arbitrary, this shows that

\(\int_a^b (f(x)+g(x))dx \le \int_a^b f(x)dx + \int_a^b g(x)dx\).

Similarly, we obtain \(\int_a^b (f(x)+g(x))dx \ge \int_a^b f(x)dx + \int_a^b g(x)dx\), hence, we must have equality.

ii), iii) follow using similar arguments.

iv) \(|f| \in \mathcal{R}([a, b])\) follows by choosing \(g(u) := |u|\) in Thm 46 iv), and the estimate follows from ii).

v) In general, we have \(fg = \frac{1}{4} \{ (f+g)^2 - (f-g)^2 \}\), so by i) it is sufficient to show \(f^2 \in \mathcal{R}([a, b])\) if \(f \in \mathcal{R}([a, b])\), which follows by choosing \(g(u) := u^2\) in Thm 46 iv).

The original definition of the Riemann integral was based on so-called Riemann sums.

Let \(f \in \mathcal{R}([a, b])\), let P be a partition of \([a, b]\) given by \(a = x_0 < x_1 < \dots < x_m=b\), and let

\(\xi_1, \xi_2, \dots, \xi_m \in [a, b]\) be given sample points, i.e., \(\xi_i \in [x_{i-1}, x_i]\), \(i = 1, \dots, m\). Then it is possible to show that for every \(\epsilon > 0\), there exists \(\delta > 0\) such that for every partition P on \([a, b]\) with \(\max_{i=1, \dots, m} (x_i - x_{i-1}) < \delta\) and for every choice of sample points \(\xi_1, \dots, \xi_m\), we have

\(| \int_a^b f(x)dx - \sum_{i=1}^m f(\xi_i) (x_i - x_{i-1}) | < \epsilon\).

"Riemann sum"

Example. We can use Riemann sums to compute integrals. Let \(f: \to \mathbb{R}\), \(f(x) = x\), then \(f \in \mathcal{R}()\) by Thm 46. For each \(m \in \mathbb{N}\), we choose the partition \(P_m\) given by \(x_i = \frac{i}{m}\), \(i = 1, \dots, m\), and we choose the sample points \(\xi_i = x_i\), \(i = 1, \dots, m\).

Then \(\max_{i=1, \dots, m} (x_i - x_{i-1}) = \frac{1}{m} \to 0, m \to \infty\), so

\(\lim_{m \to \infty} \sum_{i=1}^m f(\xi_i) (x_i - x_{i-1}) = \int_0^1 f(x)dx.\)

On the other hand, we have

\(\lim_{m \to \infty} \sum_{i=1}^m f(\xi_i) (x_i - x_{i-1}) = \lim_{m \to \infty} \sum_{i=1}^m x_i (x_i - x_{i-1})\)

\(= \lim_{m \to \infty} \frac{1}{m^2} \sum_{i=1}^m i = \lim_{m \to \infty} \frac{m(m+1)}{2m^2} = \frac{1}{2}\).

Next, we explore the connection between the two fundamental concepts of integration and differentiation.

Definition. Let \(I \subset \mathbb{R}\) be an interval. A function \(F: I \to \mathbb{R}\) is an antiderivative (or primitive) of \(f: I \to \mathbb{R}\) if F is differentiable on I and \(F'(x) = f(x)\), \(x \in I\).

Remark. Two antiderivatives F, G: \(I \to \mathbb{R}\) of the same function f differ only by a constant (Remark iii) after Thm 39).

Theorem 48. (Fundamental Theorem of Calculus)

Let \(f \in \mathcal{R}([a, b])\), \(a < b\), and define \(F: [a, b] \to \mathbb{R}\) by \(F(x)=\int_a^x f(t)dt\).

i) If f is continuous at the point \(\xi \in [a, b]\), then F is differentiable at \(\xi\) with \(F'(\xi) = f(\xi)\). In particular, if f is continuous on \([a, b]\), then F is an antiderivative of f.

ii) If \(G: [a, b] \to \mathbb{R}\) is an antiderivative of f on \([a, b]\), then \(\int_a^b f(x)dx = G(b) - G(a)\).

Remarks. i) Thm 48 shows that differentiation and integration are "inverse operations". Part i) establishes that every continuous function has an antiderivative. This is not true in general for functions \(f \in \mathcal{R}([a, b])\). For instance, the function \(f: [-1, 1] \to \mathbb{R}\), \(f(x) = \begin{cases} -1, & -1 \le x \le 0 \\ 1, & \text{else} \end{cases}\) does not have an antiderivative.

Part ii) shows the importance of antiderivatives for the computation of integrals. This is why we also use the notation \(\int f(x)dx\) for any antiderivative of f and call it "indefinite integral" of f.

ii) For the difference \(G(b) - G(a)\) we use the notation \(G(x) \big|_a^b\).

Proof. i) Let f be continuous at \(\xi \in [a, b]\), and let \(\epsilon > 0\) be given. Then we choose \(\delta > 0\) such that \(\forall x \in U_\delta(\xi) \cap [a, b] : |f(x) - f(\xi)| < \epsilon\). Hence, for \(x \in U_\delta(\xi) \cap [a, b]\) with \(x \neq \xi\), we have

\(| \frac{F(x) - F(\xi)}{x - \xi} - f(\xi) | = | \frac{1}{x - \xi} \int_\xi^x f(t)dt - \frac{1}{x - \xi} \int_\xi^x f(\xi)dt |\)

\(= | \frac{1}{x - \xi} \int_\xi^x (f(t) - f(\xi))dt | \le \frac{1}{x - \xi} \int_\xi^x \underbrace{|f(t) - f(\xi)|}_{< \epsilon} dt\)

\(\le \epsilon \implies F'(\xi) = f(\xi)\).

ii) Let G be an antiderivative of f on \([a, b]\), and let \(\epsilon > 0\) be given. As \(f \in \mathcal{R}([a, b])\), we can choose a partition P of \([a, b]\) with \(a = x_0 < \dots < x_n=b\) such that

\(\int_a^b f(t)dt - \epsilon < L(P, f)\), \(\int_a^b f(t)dt + \epsilon> U(P, f)\).

We apply the mean value theorem to G on \([x_{k-1}, x_k]\) to obtain \(G(x_k) - G(x_{k-1}) = f(\xi_k)(x_k - x_{k-1})\) for some \(\xi_k \in (x_{k-1}, x_k)\), \(k = 1, \dots, n\).

\(\implies G(b) - G(a) = \sum_{k=1}^n (G(x_k) - G(x_{k-1})) = \sum_{k=1}^n f(\xi_k)(x_k - x_{k-1})\)

\(\begin{cases} \le U(P, f) < \int_a^b f(t)dt + \epsilon \\ \ge L(P, f)> \int_a^b f(t)dt - \epsilon \end{cases}\)

\(\implies | G(b) - G(a) - \int_a^b f(t)dt | < \epsilon\).

As \(\epsilon > 0\) is arbitrary, we obtain \(\int_a^b f(t)dt = G(x) \big|_a^b\).

Theorem 49. (Integration by Parts)

Let \(f, g \in C_1([a, b])\), then we have

\(\int_a^b f'(x)g(x)dx = f(x)g(x) \big|_a^b - \int_a^b f(x)g'(x)dx.\)

Proof. We know \((f(x)g(x))' = f'(x)g(x) + f(x)g'(x)\), \(x \in [a, b]\).

\(\xrightarrow{\text{Thm 48}} \quad \int_a^b f'(x)g(x)dx + \int_a^b f(x)g'(x)dx\)

\(= \int_a^b (f'(x)g(x) + f(x)g'(x))dx = f(x)g(x) \big|_a^b\).

Example. For \(x \in \mathbb{R}\) we have

\(\int_0^x t e^t dt = t e^t \big|_0^x - \int_0^x e^t dt = x e^x - e^t \big|_0^x\)

\(= x e^x - e^x + 1.\)

Theorem 50. (Change of Variables)

Let \(f: [a, b] \to \mathbb{R}\) be continuous, and let \(g: [c, d] \to [a, b]\) be continuously differentiable such that \(a = g(c)\), \(b = g(d)\). Then

\(\int_a^b f(x)dx = \int_c^d f(g(t)) g'(t) dt.\)

Proof. From Thm 48 we know that f has an antiderivative F on \([a, b]\). Applying the chain rule, we obtain

\(\frac{d}{dt} F(g(t)) = F'(g(t))g'(t) = f(g(t))g'(t), \quad t \in [c, d].\)

\(\xrightarrow{\text{Thm 48}} \quad \int_c^d f(g(t))g'(t)dt = F(g(d)) - F(g(c))\)

\(= F(b) - F(a) = \int_a^b f(x)dx\).