1. Vectors (벡터) Definition: An ordered list of numbers. Represented as a column or row. Column Vector: $\mathbf{v} = \begin{pmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{pmatrix}$, Row Vector: $\mathbf{v}^T = \begin{pmatrix} v_1 & v_2 & \dots & v_n \end{pmatrix}$ Geometric Interpretation: A point in space or a direction and magnitude from the origin. Operations: Addition: $\mathbf{u} + \mathbf{v} = (u_1+v_1, \dots, u_n+v_n)$ (element-wise) Scalar Multiplication: $c\mathbf{v} = (cv_1, \dots, cv_n)$ Dot Product (Inner Product): $\mathbf{u} \cdot \mathbf{v} = \mathbf{u}^T\mathbf{v} = \sum_{i=1}^n u_i v_i = ||\mathbf{u}|| \cdot ||\mathbf{v}|| \cos(\theta)$ Measures similarity. If $\mathbf{u} \cdot \mathbf{v} = 0$, they are orthogonal. Norm (Length): $||\mathbf{v}||_2 = \sqrt{\sum_{i=1}^n v_i^2}$ (Euclidean norm) Unit Vector: $\hat{\mathbf{v}} = \frac{\mathbf{v}}{||\mathbf{v}||}$ (vector with length 1) ML Relevance: Feature vectors, weights in neural networks, data points in space. 2. Matrices (행렬) Definition: A rectangular array of numbers. $m \times n$ matrix has $m$ rows and $n$ columns. $\mathbf{A} = \begin{pmatrix} a_{11} & \dots & a_{1n} \\ \vdots & \ddots & \vdots \\ a_{m1} & \dots & a_{mn} \end{pmatrix}$ Special Matrices: Square Matrix: $m=n$. Identity Matrix ($\mathbf{I}$): Square matrix with 1s on the main diagonal, 0s elsewhere. $\mathbf{A}\mathbf{I} = \mathbf{I}\mathbf{A} = \mathbf{A}$. Diagonal Matrix: Non-zero elements only on the main diagonal. Symmetric Matrix: $\mathbf{A} = \mathbf{A}^T$. Zero Matrix ($\mathbf{0}$): All elements are zero. Operations: Addition: Element-wise for same dimensions. $\mathbf{A} + \mathbf{B} = (a_{ij} + b_{ij})$. Scalar Multiplication: $c\mathbf{A} = (ca_{ij})$. Matrix Multiplication: $\mathbf{C} = \mathbf{A}\mathbf{B}$ where $c_{ij} = \sum_{k=1}^p a_{ik}b_{kj}$. $\mathbf{A}$ is $m \times p$, $\mathbf{B}$ is $p \times n \implies \mathbf{C}$ is $m \times n$. Not commutative: $\mathbf{A}\mathbf{B} \neq \mathbf{B}\mathbf{A}$. Transpose: $\mathbf{A}^T$, swap rows and columns. $(a_{ij})^T = a_{ji}$. $(\mathbf{A}\mathbf{B})^T = \mathbf{B}^T\mathbf{A}^T$. Inverse: $\mathbf{A}^{-1}$ such that $\mathbf{A}\mathbf{A}^{-1} = \mathbf{A}^{-1}\mathbf{A} = \mathbf{I}$. Exists only for square, non-singular matrices (determinant $\neq 0$). $(\mathbf{A}\mathbf{B})^{-1} = \mathbf{B}^{-1}\mathbf{A}^{-1}$. ML Relevance: Representing datasets (rows=samples, columns=features), transformations (e.g., rotations, scaling), covariance matrices, weight matrices in neural networks. 3. Systems of Linear Equations (선형 방정식 시스템) Form: $\mathbf{A}\mathbf{x} = \mathbf{b}$ $\mathbf{A}$: coefficient matrix $\mathbf{x}$: vector of unknowns $\mathbf{b}$: constant vector Solution: If $\mathbf{A}$ is invertible, $\mathbf{x} = \mathbf{A}^{-1}\mathbf{b}$. Geometric Interpretation: Intersection of hyperplanes. ML Relevance: Solving for parameters in linear regression, optimization problems. 4. Determinant (행렬식) Definition: A scalar value calculated from the elements of a square matrix. For $2 \times 2$: $\det(\mathbf{A}) = \begin{vmatrix} a & b \\ c & d \end{vmatrix} = ad - bc$. For $3 \times 3$: $\det(\mathbf{A}) = a(ei-fh) - b(di-fg) + c(dh-eg)$. Properties: If $\det(\mathbf{A}) = 0$, $\mathbf{A}$ is singular (not invertible). $|\det(\mathbf{A})|$ represents the scaling factor of volume/area by the transformation $\mathbf{A}$. $\det(\mathbf{A}\mathbf{B}) = \det(\mathbf{A})\det(\mathbf{B})$. $\det(\mathbf{A}^T) = \det(\mathbf{A})$. ML Relevance: Checking matrix invertibility, understanding data variance, Principal Component Analysis (PCA). 5. Eigenvalues & Eigenvectors (고유값 & 고유벡터) Definition: For a square matrix $\mathbf{A}$, an eigenvector $\mathbf{v}$ is a non-zero vector that, when $\mathbf{A}$ is applied to it, only changes by a scalar factor $\lambda$ (the eigenvalue). $\mathbf{A}\mathbf{v} = \lambda\mathbf{v}$ Finding them: $(\mathbf{A} - \lambda\mathbf{I})\mathbf{v} = \mathbf{0}$. For non-trivial solutions, $\det(\mathbf{A} - \lambda\mathbf{I}) = 0$ (characteristic equation). Properties: Eigenvectors corresponding to distinct eigenvalues are linearly independent. Symmetric matrices have real eigenvalues and orthogonal eigenvectors. ML Relevance: PCA (Principal Component Analysis) for dimensionality reduction, understanding variance directions in data, spectral clustering, Markov chains. 6. Vector Spaces & Subspaces (벡터 공간 & 부분 공간) Vector Space: A set of vectors closed under vector addition and scalar multiplication. Subspace: A subset of a vector space that is itself a vector space. Span: The set of all possible linear combinations of a set of vectors. Linear Independence: A set of vectors $\{\mathbf{v}_1, \dots, \mathbf{v}_k\}$ is linearly independent if $c_1\mathbf{v}_1 + \dots + c_k\mathbf{v}_k = \mathbf{0}$ implies $c_1 = \dots = c_k = 0$. Basis: A set of linearly independent vectors that span the entire vector space. The number of vectors in a basis is the dimension of the space. Rank of a Matrix: The dimension of the column space (or row space). Max number of linearly independent columns (or rows). For an $m \times n$ matrix $\mathbf{A}$, $\text{rank}(\mathbf{A}) \le \min(m, n)$. If $\text{rank}(\mathbf{A}) = n$, columns are linearly independent. Null Space (Kernel): The set of all vectors $\mathbf{x}$ such that $\mathbf{A}\mathbf{x} = \mathbf{0}$. ML Relevance: Basis vectors can be components (PCA), understanding redundancy in data features, feature engineering. 7. Matrix Decompositions (행렬 분해) Definition: Breaking down a matrix into a product of simpler matrices. Eigen-decomposition: $\mathbf{A} = \mathbf{P}\mathbf{\Lambda}\mathbf{P}^{-1}$ $\mathbf{P}$: matrix whose columns are eigenvectors of $\mathbf{A}$. $\mathbf{\Lambda}$: diagonal matrix with eigenvalues on the diagonal. Requires $\mathbf{A}$ to be square and diagonalizable. If $\mathbf{A}$ is symmetric, $\mathbf{P}$ is orthogonal ($\mathbf{P}^{-1} = \mathbf{P}^T$). Singular Value Decomposition (SVD): $\mathbf{A} = \mathbf{U}\mathbf{\Sigma}\mathbf{V}^T$ Applies to any $m \times n$ matrix. $\mathbf{U}$: $m \times m$ orthogonal matrix (left singular vectors). $\mathbf{\Sigma}$: $m \times n$ diagonal matrix with singular values (square roots of eigenvalues of $\mathbf{A}^T\mathbf{A}$) on the diagonal. $\mathbf{V}$: $n \times n$ orthogonal matrix (right singular vectors). ML Relevance: Eigen-decomposition: PCA (for covariance matrices). SVD: Dimensionality reduction (PCA alternative), recommender systems, latent semantic analysis (LSA), noise reduction, pseudo-inverse calculation. Highly robust and widely used. 8. Norms (노름) Definition: A function that assigns a "length" or "size" to a vector or matrix. Vector Norms: $L_p$-norm: $||\mathbf{x}||_p = \left( \sum_{i=1}^n |x_i|^p \right)^{1/p}$ $L_1$-norm (Manhattan/Taxicab): $||\mathbf{x}||_1 = \sum_{i=1}^n |x_i|$ (feature selection in Lasso regression) $L_2$-norm (Euclidean): $||\mathbf{x}||_2 = \sqrt{\sum_{i=1}^n x_i^2}$ (most common, regularization in Ridge regression) $L_\infty$-norm (Max norm): $||\mathbf{x}||_\infty = \max_i |x_i|$ Matrix Norms: Frobenius Norm: $||\mathbf{A}||_F = \sqrt{\sum_{i=1}^m \sum_{j=1}^n |a_{ij}|^2}$ Spectral Norm: $||\mathbf{A}||_2 = \sigma_{\max}$ (largest singular value) ML Relevance: Regularization (L1, L2 for preventing overfitting), distance metrics (Euclidean distance), error calculation (RMSE, MAE). 9. Gradient & Derivatives (그라디언트 & 도함수) Scalar-on-Vector Derivative: $\frac{\partial f(\mathbf{x})}{\partial \mathbf{x}} = \begin{pmatrix} \frac{\partial f}{\partial x_1} \\ \vdots \\ \frac{\partial f}{\partial x_n} \end{pmatrix}$ (column vector) Gradient: $\nabla f(\mathbf{x}) = \frac{\partial f(\mathbf{x})}{\partial \mathbf{x}}$. Points in the direction of the steepest ascent of $f$. Jacobian Matrix: For a vector-valued function $\mathbf{f}(\mathbf{x}) = \begin{pmatrix} f_1(\mathbf{x}) \\ \vdots \\ f_m(\mathbf{x}) \end{pmatrix}$, the Jacobian is an $m \times n$ matrix: $\mathbf{J} = \frac{\partial \mathbf{f}}{\partial \mathbf{x}} = \begin{pmatrix} \frac{\partial f_1}{\partial x_1} & \dots & \frac{\partial f_1}{\partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial f_m}{\partial x_1} & \dots & \frac{\partial f_m}{\partial x_n} \end{pmatrix}_{m \times n}$ Hessian Matrix: For a scalar function $f(\mathbf{x})$, the Hessian is a symmetric $n \times n$ matrix of second-order partial derivatives: $\mathbf{H}_{ij} = \frac{\partial^2 f}{\partial x_i \partial x_j}$ ML Relevance: Gradient Descent: Optimizing loss functions by iteratively moving in the direction opposite to the gradient. Newton's Method: Uses Hessian for faster convergence in optimization. Understanding convexity/concavity of functions.