Gram-Schmidt Orthogonalization Process Definition Given linearly independent vectors $v_1, v_2, \dots, v_k$ in an inner product space, the Gram-Schmidt process constructs an orthonormal set $e_1, e_2, \dots, e_k$ spanning the same subspace by: $u_1 = v_1$ $e_1 = \frac{u_1}{||u_1||}$ $u_j = v_j - \sum_{i=1}^{j-1} \text{proj}_{u_i}(v_j)$ for $j \ge 2$, where $\text{proj}_{u_i}(v_j) = \frac{\langle v_j, u_i \rangle}{\langle u_i, u_i \rangle} u_i$ $e_j = \frac{u_j}{||u_j||}$ The result $\{e_1, \dots, e_k\}$ is an orthonormal basis for $\text{span}\{v_1, \dots, v_k\}$. Dual Space and Double Dual Definition 1. Definitions Dual Space ($V^*$): For a vector space $V$ over a field $F$, $V^* := \{f : V \to F \mid f \text{ is linear}\}$ (linear functionals). Double Dual ($V^{**}$): $V^{**} := (V^*)^*$. Canonical Linear Map ($\Phi$): $\Phi: V \to V^{**}$, defined by $\Phi(v)(f) := f(v)$ for $f \in V^*$. If $V$ is finite-dimensional, $\Phi$ is an isomorphism, so $V \cong V^{**}$. 2. Dual Basis Given a basis $\{v_1, \dots, v_n\}$ for a finite-dimensional $V$, the dual basis $\{v^1, \dots, v^n\} \subset V^*$ is defined by $v^i(v_j) = \delta_{ij}$ for $1 \le i, j \le n$. Any $f \in V^*$ can be uniquely written as $f = \sum_{i=1}^n f(v_i) v^i$. Matrix Diagonalization and Minimal Polynomial Diagonalization A square matrix $A$ is diagonalizable if there exists an invertible matrix $P$ and a diagonal matrix $D$ such that $A = PDP^{-1}$. This implies $P^{-1}AP = D$. $D$ contains the eigenvalues of $A$ on its diagonal. $P$ contains the corresponding eigenvectors of $A$ as its columns. A matrix is diagonalizable if and only if its algebraic multiplicity equals its geometric multiplicity for every eigenvalue. A matrix is diagonalizable if and only if its minimal polynomial has distinct roots. Minimal Polynomial ($m_A(x)$) The minimal polynomial of a matrix $A$ is the monic polynomial of least degree such that $m_A(A) = 0$. The minimal polynomial divides the characteristic polynomial $\chi_A(x)$. Every root of the characteristic polynomial is also a root of the minimal polynomial. For a diagonalizable matrix, the minimal polynomial is the product of distinct linear factors corresponding to its distinct eigenvalues. Inner Product and Norms on $C[0,1]$ 1. Inner Product For continuous functions $f, g \in C[0,1]$ (over $\mathbb{R}$ or $\mathbb{C}$), a common inner product is defined as: $\langle f, g \rangle = \int_0^1 f(x) \overline{g(x)} dx$ (for complex-valued functions) $\langle f, g \rangle = \int_0^1 f(x) g(x) dx$ (for real-valued functions) Axioms of Inner Product i) Conjugate Symmetry: $\langle f, g \rangle = \overline{\langle g, f \rangle}$ ii) Linearity in the first argument: $\langle \alpha f_1 + \beta f_2, g \rangle = \alpha \langle f_1, g \rangle + \beta \langle f_2, g \rangle$ iii) Positive-definiteness: $\langle f, f \rangle \ge 0$, and $\langle f, f \rangle = 0 \iff f = 0$ 2. Norms A norm $||f||$ on $C[0,1]$ satisfies: (N1) Positivity: $||f|| \ge 0$, and $||f|| = 0 \iff f = 0$. (N2) Homogeneity: $||\alpha f|| = |\alpha| ||f||$ for any scalar $\alpha$. (N3) Triangle Inequality: $||f + g|| \le ||f|| + ||g||$. Common norms on $C[0,1]$: Induced norm (from inner product): $||f||_2 = \sqrt{\langle f, f \rangle} = \left(\int_0^1 |f(x)|^2 dx\right)^{1/2}$ Supremum norm (or $L_\infty$ norm): $||f||_\infty = \sup_{x \in [0,1]} |f(x)|$ Jordan Canonical Form Explanation 1. Definition and Properties A square matrix $A \in \mathbb{C}^{n \times n}$ is similar to a unique block-diagonal matrix $J$ called its Jordan canonical form, such that $A = PJP^{-1}$ for some invertible matrix $P$. The matrix $J$ is a direct sum of Jordan blocks of the form: $$ J_k(\lambda) = \begin{pmatrix} \lambda & 1 & 0 & \dots & 0 \\ 0 & \lambda & 1 & \dots & 0 \\ \vdots & \vdots & \ddots & \ddots & \vdots \\ 0 & 0 & \dots & \lambda & 1 \\ 0 & 0 & \dots & 0 & \lambda \end{pmatrix} \in \mathbb{C}^{k \times k} $$ The eigenvalues of $A$ appear on the diagonal of $J$. For each eigenvalue $\lambda$, the number of Jordan blocks is equal to the dimension of its eigenspace, $\text{dim ker}(A - \lambda I)$. The sum of the sizes of all Jordan blocks for an eigenvalue $\lambda$ equals its algebraic multiplicity. A matrix $A$ is diagonalizable if and only if all its Jordan blocks have size 1. Adjoint and Transpose of Linear Operators on $\mathbb{R}^\infty$ 1. Definitions Adjoint: Let $T: V \to W$ be a linear operator between real inner product spaces. Its adjoint $T^*: W \to V$ is the unique linear operator such that $\langle Tx, y \rangle = \langle x, T^*y \rangle$ for all $x \in V, y \in W$. Transpose: Over $\mathbb{R}$, if $T$ is represented by a matrix $[T]$ in the standard bases, the transpose $T^t$ is the operator represented by $[T]^t$. In real inner product spaces with the standard inner product, $T^* = T^t$. We work on $\mathbb{R}^\infty$, the space of sequences with only finitely many nonzero entries, with the standard inner product $\langle x, y \rangle = \sum_{k=1}^{\infty} x_k y_k$ (finite sum since $x, y$ have finite support). Define the shift operators: $R(x_1, x_2, \dots) = (x_2, x_3, \dots)$ (left shift, dropping the first coordinate) $S(x_1, x_2, \dots) = (0, x_1, x_2, \dots)$ (right shift, inserting a 0 at the front) 2. Compute the Adjoints $R^*$ and $S^*$ Let $x = (x_k)_{k \ge 1}$ and $y = (y_k)_{k \ge 1}$ in $\mathbb{R}^\infty$. For $R$: $\langle Rx, y \rangle = \sum_{k=1}^\infty (Rx)_k y_k = \sum_{k=1}^\infty x_{k+1} y_k$. By definition, $\langle x, R^*y \rangle = \sum_{j=1}^\infty x_j (R^*y)_j$. To match, we re-index the first sum: $\sum_{j=2}^\infty x_j y_{j-1}$. So, for $j \ge 2$, $(R^*y)_j = y_{j-1}$. For $j=1$, there is no $x_1$ term on the left, so $(R^*y)_1=0$. Thus, $R^*y = (0, y_1, y_2, \dots) = S(y)$. Hence, $R^* = S$. For $S$: $\langle Sx, y \rangle = \sum_{k=1}^\infty (Sx)_k y_k = x_1 \cdot 0 + \sum_{k=2}^\infty x_{k-1} y_k = \sum_{j=1}^\infty x_j y_{j+1}$. By definition, $\langle x, S^*y \rangle = \sum_{j=1}^\infty x_j (S^*y)_j$. Comparing, $(S^*y)_j = y_{j+1}$. Thus, $S^*y = (y_2, y_3, y_4, \dots) = R(y)$. Hence, $S^* = R$. 3. Summary for Shift Operators $R^* = S$ (Adjoint of left shift is right shift) $S^* = R$ (Adjoint of right shift is left shift) Since we are over $\mathbb{R}$ with the standard inner product, the transpose coincides with the adjoint ($T^* = T^t$). $R^t = S$ $S^t = R$ Bessel's Inequality Explanation 1. Statement Let $H$ be a complex (or real) Hilbert space and $\{e_n\}_{n \in I}$ be an orthonormal set in $H$. For any $x \in H$, the sequence of coefficients $c_n = \langle x, e_n \rangle$ satisfies Bessel's inequality: $$ \sum_{n \in I} |\langle x, e_n \rangle|^2 \le ||x||^2 $$ This inequality states that the sum of the squares of the absolute values of the Fourier coefficients of $x$ with respect to an orthonormal set is bounded by the norm squared of $x$. 2. Proof Sketch For any finite subset $F \subset I$, define the orthogonal projection of $x$ onto the span of $\{e_n\}_{n \in F}$: $x_F := \sum_{n \in F} \langle x, e_n \rangle e_n$. By construction, $x - x_F$ is orthogonal to $x_F$. Thus, by the Pythagorean theorem in Hilbert spaces: $||x||^2 = ||x_F||^2 + ||x - x_F||^2$. Since $||x - x_F||^2 \ge 0$, it follows that $||x||^2 \ge ||x_F||^2$. Using orthonormality of $\{e_n\}$: $||x_F||^2 = \langle \sum_{n \in F} \langle x, e_n \rangle e_n, \sum_{m \in F} \langle x, e_m \rangle e_m \rangle$ $= \sum_{n \in F} \sum_{m \in F} \langle x, e_n \rangle \overline{\langle x, e_m \rangle} \langle e_n, e_m \rangle$ $= \sum_{n \in F} \sum_{m \in F} \langle x, e_n \rangle \overline{\langle x, e_m \rangle} \delta_{nm}$ $= \sum_{n \in F} |\langle x, e_n \rangle|^2$. Combining these, for every finite $F \subset I$: $ \sum_{n \in F} |\langle x, e_n \rangle|^2 \le ||x||^2 $. Taking the supremum over all finite $F$ (which is equivalent to taking the limit of partial sums as $F$ expands to $I$), we obtain Bessel's inequality: $ \sum_{n \in I} |\langle x, e_n \rangle|^2 \le ||x||^2 $. 3. Equality and Relation to Parseval's Identity Equality holds if and only if $x$ lies in the closed linear span of $\{e_n\}_{n \in I}$, i.e., $x = \sum_{n \in I} \langle x, e_n \rangle e_n$. This means that the orthogonal complement of $x_F$ is zero ($||x - x_F||^2 = 0$). If $\{e_n\}_{n \in I}$ is an orthonormal basis for $H$, then every $x \in H$ is in the closed linear span of $\{e_n\}$. In this case, Bessel's inequality becomes Parseval's identity: $$ ||x||^2 = \sum_{n \in I} |\langle x, e_n \rangle|^2 $$ 4. Consequences and Examples Bessel's inequality guarantees that the sequence of Fourier coefficients $\{\langle x, e_n \rangle\}_{n \in I}$ is in the space $l^2(I)$, meaning it is square-summable. In $L^2([-\pi, \pi])$, with $e_n(t) = \frac{1}{\sqrt{2\pi}} e^{int}$ for $n \in \mathbb{Z}$, Bessel's inequality reads: $$ \sum_{n=-\infty}^{\infty} \left| \frac{1}{2\pi} \int_{-\pi}^{\pi} f(t) e^{-int} dt \right|^2 \le \frac{1}{2\pi} \int_{-\pi}^{\pi} |f(t)|^2 dt $$ This asserts the square-summability of the Fourier coefficients for $f \in L^2([-\pi, \pi])$. Normal, Self-Adjoint, Unitary, and Orthogonal Operators 1. Preliminaries: Adjoint and Inner Product Let $V$ be a finite-dimensional inner product space over $\mathbb{C}$ (for unitary/normal/self-adjoint) or over $\mathbb{R}$ (for orthogonal/self-adjoint). For a linear operator $T: V \to V$, its adjoint $T^*$ is defined by $\langle Tx, y \rangle = \langle x, T^*y \rangle$ for all $x, y \in V$. In matrix terms (with the standard inner product), $T^*$ is the conjugate transpose: $T^* = \overline{T}^T$. 2. Normal Operators Definition: A linear operator $T$ is Normal if $TT^* = T^*T$. Example (normal but not self-adjoint): On $\mathbb{C}^2$, let $A = \begin{pmatrix} 1 & 0 \\ 0 & i \end{pmatrix}$. Then $A^* = \begin{pmatrix} 1 & 0 \\ 0 & -i \end{pmatrix}$. $AA^* = \begin{pmatrix} 1 & 0 \\ 0 & i \end{pmatrix} \begin{pmatrix} 1 & 0 \\ 0 & -i \end{pmatrix} = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}$. $A^*A = \begin{pmatrix} 1 & 0 \\ 0 & -i \end{pmatrix} \begin{pmatrix} 1 & 0 \\ 0 & i \end{pmatrix} = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}$. Since $AA^* = A^*A$, $A$ is normal. However, $A \ne A^*$, so it is not self-adjoint. Eigenvalues are $1$ and $i$, and eigenvectors form an orthonormal basis; this exemplifies the spectral behavior typical of normal operators. 3. Self-Adjoint Operators Definition: A linear operator $T$ is self-adjoint if $T = T^*$. Example (real symmetric matrix on $\mathbb{R}^2$): Let $B = \begin{pmatrix} 2 & 1 \\ 1 & 3 \end{pmatrix}$. Then $B^* = \overline{B}^T = B^T = \begin{pmatrix} 2 & 1 \\ 1 & 3 \end{pmatrix}$. Thus $B = B^*$, so $B$ is self-adjoint. All eigenvalues are real and eigenvectors corresponding to distinct eigenvalues are orthogonal, illustrating the standard spectral properties. 4. Unitary Operators Definition: A linear operator $U$ is unitary if $U^*U = UU^* = I$. Equivalently, $\langle Ux, Uy \rangle = \langle x, y \rangle$ for all $x, y$; norms and angles are preserved on complex inner product spaces. Example (rotation on $\mathbb{C}^2$, also orthogonal over $\mathbb{R}^2$): Let $R(\theta) = \begin{pmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{pmatrix}$. Since $R(\theta)^* = \overline{R(\theta)}^T = R(\theta)^T = \begin{pmatrix} \cos\theta & \sin\theta \\ -\sin\theta & \cos\theta \end{pmatrix}$. $R(\theta)^*R(\theta) = \begin{pmatrix} \cos\theta & \sin\theta \\ -\sin\theta & \cos\theta \end{pmatrix} \begin{pmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{pmatrix} = \begin{pmatrix} \cos^2\theta + \sin^2\theta & -\cos\theta\sin\theta + \sin\theta\cos\theta \\ -\sin\theta\cos\theta + \cos\theta\sin\theta & \sin^2\theta + \cos^2\theta \end{pmatrix} = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix} = I$. So $R(\theta)$ is unitary (over $\mathbb{C}$) and orthogonal (over $\mathbb{R}$). Its eigenvalues are $e^{i\theta}$ and $e^{-i\theta}$ over $\mathbb{C}$, both on the unit circle. 5. Orthogonal Operators Definition: A linear operator $Q$ is an orthogonal operator on a real inner product space if $Q^TQ = QQ^T = I$. Equivalently, $\langle Qx, Qy \rangle = \langle x, y \rangle$ for all $x, y \in \mathbb{R}^n$. Examples: Rotation $R(\theta)$ as above: $R(\theta)^TR(\theta) = I$. Reflection across the x-axis: Let $S = \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix}$. $S^TS = \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix} \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix} = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix} = I$. Orthogonal operators preserve lengths and angles; their determinants are either $+1$ (proper rotations) or $-1$ (reflections and improper rotations). 6. Relationships and Contrasts Every unitary operator is normal; not every normal operator is self-adjoint (see the diagonal example with eigenvalue $i$). Over $\mathbb{R}$, orthogonal operators are precisely the unitary operators with real entries. Self-adjoint operators are normal with real spectra and orthogonal eigenspaces. Singular Value Decomposition (SVD) 1. Definition For any real matrix $A \in \mathbb{R}^{m \times n}$, a singular value decomposition (SVD) is: $A = U \Sigma V^T$ where: $U \in \mathbb{R}^{m \times m}$ is orthogonal ($U^T U = I_m$). Its columns are called left singular vectors. $V \in \mathbb{R}^{n \times n}$ is orthogonal ($V^T V = I_n$). Its columns are called right singular vectors. $\Sigma \in \mathbb{R}^{m \times n}$ is a diagonal (rectangular) matrix with non-negative entries $\sigma_1 \ge \sigma_2 \ge \dots \ge \sigma_r > 0$ on the diagonal (the singular values), where $r = \text{rank}(A)$. The remaining diagonal entries are zero. For complex matrices, replace $V^T$ by $V^*$ (conjugate transpose) and "orthogonal" by "unitary": $A = U \Sigma V^*$ 2. Existence, Structure, and Uniqueness Existence: Every matrix has an SVD. Shapes: $U$ is $m \times m$, $V$ is $n \times n$, $\Sigma$ is $m \times n$. Nonzero diagonal entries of $\Sigma$: $\sigma_1, \dots, \sigma_r$, where $r = \text{rank}(A)$. Uniqueness: The singular values are unique. The singular vectors (columns of $U, V$) are unique up to signs (real case) or phases (complex case) within subspaces corresponding to repeated singular values. 3. Spectral Connections The singular values satisfy: $\sigma_i = \sqrt{\lambda_i(A^T A)}$ (real case) $\sigma_i = \sqrt{\lambda_i(A^* A)}$ (complex case) So they are the square roots of the eigenvalues of $A^T A$ (or $A^* A$). The right singular vectors (columns of $V$) are eigenvectors of $A^T A$. The left singular vectors (columns of $U$) are eigenvectors of $A A^T$. 4. Geometric Interpretation The SVD provides a geometric decomposition of the linear transformation $x \mapsto Ax$: $V^T$ (or $V^*$): Rotates/reflects the input coordinates to principal axes. $\Sigma$: Scales by $\sigma_1, \dots, \sigma_r$ along those axes (with possible zero scaling in null directions). $U$: Rotates/reflects the result to the output coordinates. Thus, $A$ maps the unit sphere to an ellipsoid with principal semi-axes $\sigma_i$ along the columns of $U$. 5. Consequences and Standard Formulas Rank: $\text{rank}(A) = \#\{i : \sigma_i > 0\}$. Norms: $||A||_2 = \sigma_1$ (spectral norm, or $L_2$ norm) $||A||_F = \sqrt{\sum \sigma_i^2}$ (Frobenius norm) Condition Number (full column rank square case): $\kappa_2(A) = \sigma_1 / \sigma_n$. Pseudoinverse (Moore-Penrose pseudoinverse): $A^+ = V \Sigma^+ U^T$, where $\Sigma^+$ is formed by taking the reciprocal of the nonzero singular values and transposing $\Sigma$. For example, if $\Sigma = \begin{pmatrix} \sigma_1 & 0 \\ 0 & \sigma_2 \\ 0 & 0 \end{pmatrix}$, then $\Sigma^+ = \begin{pmatrix} 1/\sigma_1 & 0 & 0 \\ 0 & 1/\sigma_2 & 0 \end{pmatrix}$. Best low-rank approximation (Eckart-Young theorem): $A_k = U \Sigma_k V^T$, where $\Sigma_k = \text{diag}(\sigma_1, \dots, \sigma_k, 0, \dots, 0)$. This $A_k$ minimizes $||A - X||_F$ (and $||A - X||_2$) over all matrices $X$ of rank at most $k$. Square nonsingular $A$: $\det(A) = \prod_{i=1}^n \sigma_i \cdot \det(U) \det(V)$ and $||A^{-1}||_2 = 1/\sigma_n$. 6. Computation (High Level) One approach is to compute $A^T A$ and its eigen-decomposition to get $V$ and $\sigma_i$, then $U = A V \Sigma^{-1}$. More numerically stable algorithms (e.g., Golub-Kahan bidiagonalization with QR iterations) are implemented in LAPACK/BLAS. 7. Summary The SVD expresses any matrix as orthogonal/unitary rotations sandwiching a non-negative diagonal scaling. It reveals intrinsic scales and directions, enabling robust analysis and algorithms for least squares, compression, denoising, and numerical stability.