### Completeness & Sufficiency #### Sufficiency A statistic $T(\mathbf{X})$ is **sufficient** for a parameter $\theta$ if the conditional distribution of $\mathbf{X}$ given $T(\mathbf{X})=t$ does not depend on $\theta$. - **Factorization Theorem:** $T(\mathbf{X})$ is sufficient iff the joint PDF/PMF $f(\mathbf{x}|\theta)$ can be factored as $f(\mathbf{x}|\theta) = g(T(\mathbf{x})|\theta)h(\mathbf{x})$ for some functions $g$ and $h$. #### Completeness A statistic $T(\mathbf{X})$ is **complete** for a family of distributions $\mathcal{P} = \{P_\theta : \theta \in \Omega\}$ if for any function $g(T)$ such that $E_\theta[g(T)] = 0$ for all $\theta \in \Omega$, it implies $P_\theta(g(T)=0) = 1$ for all $\theta \in \Omega$. - **Minimum Sufficient Statistic:** A sufficient statistic $T(\mathbf{X})$ is minimal sufficient if it is a function of every other sufficient statistic. ### Rao-Blackwell Theorem Let $\delta(\mathbf{X})$ be an unbiased estimator of $\tau(\theta)$. Let $T(\mathbf{X})$ be a sufficient statistic for $\theta$. Define a new estimator $\delta^*(\mathbf{X}) = E[\delta(\mathbf{X})|T(\mathbf{X})]$. Then: 1. $\delta^*(\mathbf{X})$ is also an unbiased estimator of $\tau(\theta)$. 2. $Var(\delta^*(\mathbf{X})) \le Var(\delta(\mathbf{X}))$. 3. If $T(\mathbf{X})$ is complete, then $\delta^*(\mathbf{X})$ is the unique UMVUE (Uniformly Minimum Variance Unbiased Estimator). The Rao-Blackwell theorem provides a method to improve an unbiased estimator by conditioning on a sufficient statistic. ### Lehmann-Scheffé Theorem If $T(\mathbf{X})$ is a **complete and sufficient statistic** for $\theta$, and $\delta(T)$ is any unbiased estimator of $\tau(\theta)$ (i.e., $E_\theta[\delta(T)] = \tau(\theta)$), then $\delta(T)$ is the unique UMVUE for $\tau(\theta)$. This theorem is a powerful tool for finding UMVUEs. First, find a complete and sufficient statistic, then find any unbiased estimator that is a function of this statistic. ### One-Parameter Exponential Family A family of distributions $\{f(x|\theta) : \theta \in \Omega\}$ belongs to the one-parameter exponential family if its PDF/PMF can be written as: $$f(x|\theta) = h(x) c(\theta) \exp(w(\theta) t(x))$$ where $h(x) \ge 0$, $c(\theta) > 0$, and $w(\theta)$ and $t(x)$ are real-valued functions. #### Completeness of Exponential Family For a one-parameter exponential family, if the set $\{t(x) : f(x|\theta)>0\}$ contains an open interval (for continuous distributions) or infinitely many points (for discrete distributions), then $T(\mathbf{X}) = \sum_{i=1}^n t(X_i)$ is a complete and sufficient statistic for $\theta$. ### Cramér-Rao Inequality Let $\mathbf{X} = (X_1, ..., X_n)$ be a random sample from $f(x|\theta)$. Let $\delta(\mathbf{X})$ be any unbiased estimator of $\tau(\theta)$. Under certain regularity conditions, the variance of $\delta(\mathbf{X})$ is bounded below by: $$Var(\delta(\mathbf{X})) \ge \frac{[\tau'(\theta)]^2}{I_n(\theta)}$$ where $I_n(\theta)$ is the Fisher Information for $n$ observations: $$I_n(\theta) = E_\theta \left[ \left( \frac{\partial}{\partial \theta} \ln L(\theta|\mathbf{X}) \right)^2 \right] = -E_\theta \left[ \frac{\partial^2}{\partial \theta^2} \ln L(\theta|\mathbf{X}) \right]$$ For a single observation $X$, $I_1(\theta) = E_\theta \left[ \left( \frac{\partial}{\partial \theta} \ln f(X|\theta) \right)^2 \right]$. If $\delta(\mathbf{X})$ is an unbiased estimator of $\theta$ itself, then $\tau'(\theta)=1$, and the bound becomes $1/I_n(\theta)$. An estimator that achieves this lower bound is called an **efficient estimator**. ### Best Linear Unbiased Estimator (BLUE) For a linear model $Y = X\beta + \epsilon$, where $\epsilon \sim N(0, \sigma^2 I)$, the **Gauss-Markov Theorem** states that the Ordinary Least Squares (OLS) estimator $\hat{\beta}_{OLS} = (X^T X)^{-1} X^T Y$ is the BLUE of $\beta$. This means $\hat{\beta}_{OLS}$ has the smallest variance among all linear unbiased estimators of $\beta$. - **Linear:** $\hat{\beta}$ is a linear function of $Y$. - **Unbiased:** $E[\hat{\beta}] = \beta$. - **Best:** $Var(\hat{\beta})$ is minimal. Note: BLUE does not require normality of errors, only that $E[\epsilon]=0$ and $Var(\epsilon) = \sigma^2 I$. If errors are normal, OLS is also the Maximum Likelihood Estimator (MLE) and UMVUE.