1. Introduction to Convergence in Statistics In statistics and probability, we often deal with sequences of random variables, estimators, or distributions. Understanding how these sequences behave as the sample size grows (i.e., as $n \to \infty$) is crucial for establishing properties like consistency and asymptotic normality. For a sequence of random variables $X_1, X_2, \dots, X_n, \dots$ and a random variable $X$. 2. Types of Convergence for Random Variables 2.1. Pointwise Convergence (for functions of random variables) Definition: A sequence of functions $\{f_n(x)\}$ converges pointwise to $f(x)$ for each $x$. In the context of random variables, this is less about $X_n \to X$ directly and more about $f_n(X) \to f(X)$ in a non-stochastic sense for each realization of $X$. Remark: While a fundamental concept in analysis, for sequences of random variables $X_n \to X$, pointwise convergence on the probability space $\Omega$ (i.e., $X_n(\omega) \to X(\omega)$ for each $\omega \in \Omega$) is equivalent to Almost Sure Convergence. 2.2. Almost Sure Convergence (a.s. or Strong Convergence) Definition: $X_n \xrightarrow{a.s.} X$ if $P(\{\omega \in \Omega : \lim_{n \to \infty} X_n(\omega) = X(\omega)\}) = 1$. This means the sequence $X_n(\omega)$ converges to $X(\omega)$ for almost all outcomes $\omega$ in the sample space. Remark: This is the strongest type of convergence for random variables. It implies convergence in probability. Strong Law of Large Numbers (SLLN): If $X_1, \dots, X_n$ are i.i.d. with $E[X_i] = \mu$, then $\bar{X}_n \xrightarrow{a.s.} \mu$. (Requires only $E[|X_i|] 2.3. Convergence in Probability (or Stochastic Convergence) Definition: $X_n \xrightarrow{P} X$ if for every $\epsilon > 0$, $\lim_{n \to \infty} P(|X_n - X| \ge \epsilon) = 0$. This means that the probability of $X_n$ being "far" from $X$ goes to zero as $n$ increases. Remark: This is the most common type of convergence for estimators (consistency). If $X$ is a constant $c$, then $X_n \xrightarrow{P} c$ implies $X_n$ is a consistent estimator of $c$. Weak Law of Large Numbers (WLLN): If $X_1, \dots, X_n$ are i.i.d. with $E[X_i] = \mu$ and $Var(X_i) 2.4. Convergence in Distribution (or Weak Convergence) Definition: $X_n \xrightarrow{D} X$ if $\lim_{n \to \infty} F_{X_n}(x) = F_X(x)$ for all $x$ where $F_X(x)$ is continuous. ($F_Y(y)$ is the cumulative distribution function of $Y$). This means that the CDFs of $X_n$ approach the CDF of $X$. Remark: This type of convergence is fundamental for asymptotic normality and constructing hypothesis tests and confidence intervals. $X$ does not need to be a constant. Central Limit Theorem (CLT): If $X_1, \dots, X_n$ are i.i.d. with $E[X_i] = \mu$ and $Var(X_i) = \sigma^2 2.5. Convergence of Characteristic Functions Definition: $X_n \xrightarrow{D} X$ if and only if $\lim_{n \to \infty} \phi_{X_n}(t) = \phi_X(t)$ for all $t \in \mathbb{R}$, where $\phi_Y(t) = E[e^{itY}]$ is the characteristic function of $Y$. Remark: This is an equivalent condition for convergence in distribution. It's often easier to prove convergence of characteristic functions than directly proving convergence of CDFs, especially for sums of independent random variables. This is a powerful tool for proving limit theorems like the CLT. 2.6. Convergence in $L^p$ (Mean Convergence) Definition: $X_n \xrightarrow{L^p} X$ if $\lim_{n \to \infty} E[|X_n - X|^p] = 0$ for $p \ge 1$. For $p=1$, it's convergence in mean: $E[|X_n - X|] \to 0$. For $p=2$, it's convergence in mean square: $E[(X_n - X)^2] \to 0$. Remark: Convergence in $L^2$ is important in linear regression and signal processing. It implies convergence in probability. 2.7. Convergence of Expectations Definition: $\lim_{n \to \infty} E[X_n] = E[X]$. Remark: This is not a type of convergence for the random variables themselves, but for a sequence of their expected values. It needs to be carefully distinguished from $L^1$ convergence. $L^1$ convergence ($E[|X_n - X|] \to 0$) implies convergence of expectations, but the converse is not true. The convergence of expectations is a consequence of some stronger types of convergence under certain conditions (e.g., uniform integrability, dominated convergence theorem). 3. Inter-relations Between Convergences The hierarchy of strength (strongest to weakest) for random variables (where $X$ is a random variable, not necessarily a constant): $X_n \xrightarrow{a.s.} X \implies X_n \xrightarrow{P} X \implies X_n \xrightarrow{D} X$ $X_n \xrightarrow{L^p} X \implies X_n \xrightarrow{P} X$ (for $p \ge 1$) $X_n \xrightarrow{D} X \iff \phi_{X_n}(t) \to \phi_X(t)$ More detailed implications: Almost Sure Convergence $\implies$ Convergence in Probability: Always true. The converse is not generally true. Convergence in Probability $\implies$ Convergence in Distribution: Always true. The converse is not generally true. Exception: If $X$ is a constant $c$, then $X_n \xrightarrow{D} c \iff X_n \xrightarrow{P} c$. $L^p$ Convergence $\implies$ Convergence in Probability: Always true for $p \ge 1$. (By Markov's inequality). The converse is not generally true. $L^p$ Convergence $\implies$ Convergence of Expectations: If $X_n \xrightarrow{L^1} X$, then $\lim_{n \to \infty} E[X_n] = E[X]$. This is because $E[X_n] - E[X] = E[X_n - X]$, and $|E[X_n - X]| \le E[|X_n - X|]$. Convergence in Probability $\not\implies$ $L^p$ Convergence: Generally false. Convergence in Probability $\not\implies$ a.s. Convergence: Generally false. Convergence in Distribution $\not\implies$ Convergence in Probability: Generally false (unless the limit is a constant). Convergence of Expectations $\not\implies$ any other convergence: Knowing only that $E[X_n] \to E[X]$ does not imply any of the other types of convergence for $X_n$. Type of Convergence Implies Implied by Almost Sure ($X_n \xrightarrow{a.s.} X$) Probability Pointwise (on $\Omega$) $L^p$ ($X_n \xrightarrow{L^p} X$) Probability, Expectations (for $p \ge 1$) None directly Probability ($X_n \xrightarrow{P} X$) Distribution a.s., $L^p$ Distribution ($X_n \xrightarrow{D} X$) None directly (weakest) a.s., $L^p$, Probability, Characteristic Functions Characteristic Functions ($\phi_{X_n} \to \phi_X$) Distribution Distribution Expectations ($E[X_n] \to E[X]$) None (weakest) $L^1$ (under conditions) Important Note: If $X_n \xrightarrow{P} c$ (where $c$ is a constant), then $X_n \xrightarrow{D} c$. The converse is also true: if $X_n \xrightarrow{D} c$, then $X_n \xrightarrow{P} c$. 4. Important Theorems and Rules 4.1. Slutsky's Theorem If $X_n \xrightarrow{D} X$ and $Y_n \xrightarrow{P} c$ (where $c$ is a constant), then: $X_n + Y_n \xrightarrow{D} X + c$ $X_n Y_n \xrightarrow{D} cX$ $X_n / Y_n \xrightarrow{D} X / c$ (if $c \neq 0$) Remark: Very useful for combining a convergent sequence (e.g., from CLT) with a consistent sequence (e.g., sample variance for population variance). 4.2. Continuous Mapping Theorem (CMT) If $X_n \xrightarrow{D} X$ and $g$ is a continuous function, then $g(X_n) \xrightarrow{D} g(X)$. If $X_n \xrightarrow{P} X$ and $g$ is a continuous function, then $g(X_n) \xrightarrow{P} g(X)$. If $X_n \xrightarrow{a.s.} X$ and $g$ is a continuous function, then $g(X_n) \xrightarrow{a.s.} g(X)$. Remark: Allows us to transform convergent sequences. For example, if $\hat{\theta}_n \xrightarrow{P} \theta$, then $g(\hat{\theta}_n) \xrightarrow{P} g(\theta)$ for continuous $g$. 4.3. Delta Method If $\sqrt{n}(T_n - \theta) \xrightarrow{D} N(0, \sigma^2)$ and $g$ is a continuously differentiable function at $\theta$ with $g'(\theta) \neq 0$, then $\sqrt{n}(g(T_n) - g(\theta)) \xrightarrow{D} N(0, [g'(\theta)]^2 \sigma^2)$. Remark: Used to find the asymptotic distribution of a function of an asymptotically normal estimator. 4.4. Dominated Convergence Theorem (DCT) If $X_n \xrightarrow{a.s.} X$ and there exists an integrable random variable $Y$ such that $|X_n| \le Y$ for all $n$, then $\lim_{n \to \infty} E[X_n] = E[X]$ and $\lim_{n \to \infty} E[|X_n - X|] = 0$ (i.e., $X_n \xrightarrow{L^1} X$). Remark: Provides conditions under which almost sure convergence implies convergence of expectations (and $L^1$ convergence).