Bayesian Inference & Hypothesi

Shared 2/12/2026•11 views

/ 1

Cheatsheet Content

### Notation Reminder - $H_0$: null hypothesis - $H_1$: alternative hypothesis - $t(x)$: test statistic - $\alpha$: significance level - $Z \sim N(0, 1)$ - $p$: p-value - $m(x)$: sample mean - $sd(x)$: sample standard deviation - $p$: sample proportion - $\Theta$: unknown parameter ### Bayesian Inference II (Week 6) #### 1. Posterior Distribution After observing data $x$, the posterior $f(\Theta|x)$ summarizes all information about the unknown parameter $\Theta$. You can use it to compute: - Posterior mode (MAP) - Posterior mean - Posterior median - Credible intervals - Predictive probabilities #### 2. MAP, Mean, and Median - **MAP:** value where posterior density is maximized - **Mean:** $E(\Theta|x)$ - **Median:** point with 50% posterior probability below. They may differ if posterior is skewed. #### 3. Credible Interval An interval $[a, b]$ such that $P(a \le \Theta \le b | x) = 0.95$. Probability refers directly to the parameter. #### 4. Predictive Distribution (6A1) For future data $Y$: $$f(y|x) = \int f(y|\Theta)f(\Theta|x) d\Theta$$ **Interpretation:** - Average predictions over all possible $\Theta$ - Weighted by posterior uncertainty #### 5. Binary Model: Beta Posterior If prior is $\text{Beta}(a, b)$ and data has $a'$ ones and $b'$ zeros: $$\Theta|x \sim \text{Beta}(a + a', b + b')$$ **Posterior mean:** $$E(\Theta|x) = \frac{a + a'}{a+b+a'+b'}$$ #### 6. Posterior Prediction (Coin Example) For one future Bernoulli trial: $$P(Y=1|x) = E(\Theta|x)$$ For multiple trials, use predictive distribution (not plug-in estimates). ### Hypothesis Testing (Week 6) #### 1. Hypotheses - $H_0$: "nothing new / default model" - $H_1$: "deviation from $H_0$" Typically: $H_0: \Theta = \Theta_0$, $H_1: \Theta \ne \Theta_0$. #### 2. Test Statistic A real-valued function of the data: $t(x) = t(x_1, ..., x_n)$. Chosen so that: - Large values = surprising under $H_0$ - Distribution of $t(X)$ under $H_0$ is known #### 3. p-value $$p = P(|t(X)| \ge |t(x)||H_0)$$ **Interpretation:** - Small $p$ = data unlikely if $H_0$ true - $p$ is not the probability that $H_0$ is true #### 4. Decision Rule Choose significance level $\alpha$. - If $p ### Confidence Intervals (Frequentist) (Week 5) #### 1. Point vs Interval Estimate - **Point estimate:** single number (e.g., $m(x)$) - **Confidence interval (CI):** random interval computed from data #### 2. Meaning of a Confidence Interval A 95% CI means: $P(\mu \in [L(X), U(X)]) = 0.95$. The probability refers to the procedure, not the realized interval. #### 3. CI for Mean $\mu$ (Normal Model, Known $\sigma$) (5A1) Model: $X_1, ..., X_n$ i.i.d. $N(\mu, \sigma)$, $\sigma$ known. $$\mu \in m(x) \pm z \frac{\sigma}{\sqrt{n}}$$ #### 4. CI for Mean $\mu$ (Normal Model, Unknown $\sigma$) Replace $\sigma$ by $sd(x)$: $$\mu \in m(x) \pm z \frac{sd(x)}{\sqrt{n}}$$ For small $n$, $z$ should come from the t-distribution. #### 5. CI for Mean $\mu$ (General Model, CLT) If $n$ is large: $$\mu \in m(x) \pm z \frac{sd(x)}{\sqrt{n}}$$ **Justification:** CLT. #### 6. CI for Proportion $p$ (Binary Model) (5A2) Binary data $X_i \in \{0,1\}$: $\hat{p} = \frac{\#\{x_i=1\}}{n}$ $$p \in \hat{p} \pm z \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$$ #### 7. Conservative CI for $p$ Since $\hat{p}(1-\hat{p}) \le 0.25$: $$p \in \hat{p} \pm z \frac{0.5}{\sqrt{n}}$$ Used for margins of error in polls. ### Bayesian Inference I (Week 5) #### 1. Bayesian Setup - **Prior:** $f_\Theta(\theta)$ - **Likelihood:** $f_{X|\Theta}(x|\theta)$ - **Posterior:** $f_{\Theta|X}(\theta|x)$ #### 2. Bayes' Formula (Core Rule) $$f(\theta|x) = \frac{f_\Theta(\theta) f_{X|\Theta}(x|\theta)}{\int f_\Theta(t) f_{X|\Theta}(x|t) dt}$$ Posterior $\propto$ prior $\times$ likelihood. #### 3. Discrete Parameter Update (5B1) For $\Theta \in \{\theta_1, ..., \theta_K\}$: $$f(\theta_i|x) = \frac{f(\theta_i) f(x|\theta_i)}{\sum_j f(\theta_j) f(x|\theta_j)}$$ #### 4. Sequential Updating Posterior after first data point becomes prior for the next: $f(\theta|x_1, x_2) \propto f(\theta|x_1) f(x_2|\theta)$ Order does not matter. #### 5. MAP Estimate $$\hat{\theta}_{MAP} = \arg \max_\theta f(\theta|x)$$ Maximizes posterior density. #### 6. Credible Interval An interval $[a, b]$ such that $P(a \le \Theta \le b | x) = 0.95$. Probability refers directly to the parameter. #### 7. Normal-Normal Model (5B2) If $\Theta \sim N(\mu_0, \sigma_0^2)$ and $X_i|\Theta \sim N(\Theta, \sigma^2)$, then posterior is $\Theta|x \sim N(\mu_1, \sigma_1^2)$ where: $$\mu_1 = \frac{\mu_0/\sigma_0^2 + nm(x)/\sigma^2}{1/\sigma_0^2 + n/\sigma^2}$$ $$\sigma_1^2 = \frac{1}{1/\sigma_0^2 + n/\sigma^2}$$ #### 8. Credible Interval (Normal Posterior) $\mu_1 \pm z\sigma_1$ **Key z-values:** | Confidence | Z-value | |------------|---------| | 90% | 1.64 | | 95% | 1.96 | | 99% | 2.58 | **Important Distinction:** - **Confidence interval:** randomness in data - **Credible interval:** randomness in parameter ### Graphs and Statistics of Data Sets (Week 4) #### 1. Grouped Data and Histograms (4A1) Data grouped into bins (intervals): $[a_1, b_1), [a_2, b_2), ...$ **Histogram rules:** - Bar width = interval width - Bar height = relative frequency / width - Bar area = relative frequency Vertical axis unit: "% per unit" #### 2. Reading Information From Histograms - Compare frequencies using areas, not heights - Density $\approx$ constant within each bin (assumption) Used when estimating: - proportions - medians - averages #### 3. Median From Grouped Data (4A1) The median is the point where 50% of total area lies below. **Procedure:** - Accumulate bar areas - Locate bin containing 50% - Interpolate within the bin (uniformity assumption) #### 4. Mean From Grouped Data (4A1) Approximate using weighted average: $$\text{Mean} \approx \frac{\sum (\text{group midpoint}) \times (\text{frequency})}{\sum \text{frequency}}$$ Assumes approximate uniformity inside bins. #### 5. Empirical Distribution For dataset $(x_1, ..., x_n)$: $$f_x(x) = \frac{\#\{i: x_i = x\}}{n}$$ It is a probability distribution: $\sum_x f_x(x) = 1$. #### 6. Empirical Statistics If $X$ is drawn uniformly from $x$: $E(X) = m(x)$, $Var(X) = var(x)$, $SD(X) = sd(x)$. ### Parameter Estimation (Week 4) #### 1. Statistical Model Assume data $x_1, ..., x_n \sim f_\Theta(x)$ where $\Theta$ is an unknown parameter. **Estimator vs estimate:** - **Estimator:** rule ($\hat{\Theta}$) - **Estimate:** numerical value after observing data #### 2. Likelihood Function $$L(\Theta) = \prod_{i=1}^n f_\Theta(x_i)$$ For continuous distributions: $L(\Theta) \propto \prod f_\Theta(x_i)$. **Log-likelihood:** $l(\Theta) = \log L(\Theta)$. (Maxima occur at same $\Theta$.) #### 3. Maximum Likelihood Estimator (MLE) $$\hat{\Theta} = \arg \max_\Theta L(\Theta)$$ Usually found by: $\frac{d}{d\Theta} l(\Theta) = 0$. #### 4. Key ML Results (Used in Class) - **Exponential:** $f_X(x) = \lambda e^{-\lambda x}$ $$\hat{\lambda} = \frac{1}{m(x)}$$ - **Uniform:** $\{1, ..., b\}$ $$\hat{b} = \max\{x_1, ..., x_n\}$$ - **Binomial:** $\text{Bin}(n, p)$ $$\hat{p} = \frac{k}{n}$$ #### 5. Likelihood With Constraints (4B2) If parameter value is impossible: $L(\Theta) = 0$. **Typical shape:** - Zero below a threshold - Decreasing above threshold Maximum often occurs at boundary. #### 6. Bias of an Estimator Estimator $\hat{\Theta}$ is unbiased if $E(\hat{\Theta}) = \Theta$. MLEs may be biased. **Example:** $\hat{\sigma}_{ML}^2 = \frac{1}{n}\sum(x_i - \bar{x})^2$ (biased). **Common Pitfalls:** - Confusing estimator with estimate - Ignoring parameter constraints - Assuming MLE is unbiased ### Standard Deviation and Correlation (Week 3) #### 1. Variance and Standard Deviation $Var(X) = E(X^2) - [E(X)]^2$, $SD(X) = \sqrt{Var(X)}$. **Discrete case:** $E(X) = \sum_x xP(X=x)$, $E(X^2) = \sum_x x^2P(X=x)$. #### 2. Covariance $Cov(X, Y) = E(XY) - E(X)E(Y)$. **Properties:** - $Cov(X, X) = Var(X)$ - If $X, Y$ independent $\Rightarrow Cov(X, Y) = 0$ #### 3. Correlation $$Cor(X, Y) = \frac{Cov(X, Y)}{SD(X)SD(Y)}$$ Always satisfies: $-1 \le Cor(X, Y) \le 1$. **Important message (3A1):** Uncorrelated $\not\Rightarrow$ independent. #### 4. Joint Distributions (3A1) For discrete $X, Y$: $E(XY) = \sum_x \sum_y xy P(X=x, Y=y)$. Row sums $\rightarrow$ distribution of $X$. Column sums $\rightarrow$ distribution of $Y$. #### 5. Linearity of Expectation Always valid: $E(aX + bY + c) = aE(X) + bE(Y) + c$. No independence required. #### 6. Variance of Sums (3A2) $Var(X+Y) = Var(X) + Var(Y) + 2Cov(X, Y)$. If $X, Y$ independent: $Var(X+Y) = Var(X) + Var(Y)$. **Scaling:** $Var(aX) = a^2Var(X)$, $SD(aX) = |a|SD(X)$. #### 7. Averages of i.i.d. Variables (3A2) If $A_n = \frac{1}{n}\sum_{i=1}^n X_i$ and $X_i$ independent: $E(A_n) = E(X_1)$, $SD(A_n) = \frac{SD(X_1)}{\sqrt{n}}$. ### Sums and Normal Approximation (Week 3) #### 1. Sums of Random Variables Let $S_n = X_1 + ... + X_n$. **Mean:** $E(S_n) = \sum_{i=1}^n E(X_i)$. **Variance:** $Var(S_n) = \sum_{i=1}^n Var(X_i)$ (if independent). #### 2. Central Limit Theorem (CLT) If $X_1, ..., X_n$ are i.i.d. with mean $\mu$ and SD $\sigma$, then: $$\frac{S_n - n\mu}{\sigma\sqrt{n}} \approx N(0,1) \text{ for large } n.$$ Thus: $S_n \approx N(n\mu, \sigma\sqrt{n})$. #### 3. Normal Approximation Workflow (3B1, 3B2) - Identify $X_i$, check independence assumption - Compute $E(S_n)$ and $SD(S_n)$ - Standardize: $Z = \frac{S_n - E(S_n)}{SD(S_n)}$ - Use standard normal CDF #### 4. Linear Transformations If $X \sim N(\mu, \sigma)$: $aX + b \sim N(a\mu + b, |a|\sigma)$. #### 5. Sums of Independent Normals (3B2) If $X, Y$ independent and normal: $X+Y$ is normal. **With:** $E(X+Y) = E(X) + E(Y)$, $Var(X+Y) = Var(X) + Var(Y)$. #### 6. Correlation and Dependence (Portfolios) Nonzero correlation $\Rightarrow$ dependence. Independence $\Rightarrow$ zero correlation, but not conversely. **Common pitfalls:** - Forgetting to square scaling constants in variance - Using CLT without independence assumption - Confusing correlation with independence **Standard Normal Distribution: Key Z-values** Let $Z \sim N(0,1)$. | Probability $P(Z 1.645) \approx 0.05$. ### Continuous Random Variables (Week 2) #### 1. Density Function A continuous random variable $X$ has density $f(x)$ if $P(a \le X \le b) = \int_a^b f(x) dx$. **Properties:** - $f(x) \ge 0$ - $\int_{-\infty}^\infty f(x) dx = 1$ Outside the support: $f(x) = 0$. #### 2. Cumulative Distribution Function (CDF) $$F(x) = P(X \le x) = \int_{-\infty}^x f(t) dt$$ **Relations:** - $F'(x) = f(x)$ (where differentiable) **Tail probability:** $P(X > x) = 1 - F(x)$. #### 3. Quantiles (2A1) The $u$-quantile $x_u$ satisfies $F(x_u) = u$. Used to find thresholds like "90% earn at most $x$". #### 4. Support of a Continuous RV If $f(x) > 0$ on an interval $(a, b)$, then $X \in (a, b)$. #### 5. Uniform Distribution (2A2) If $X \sim \text{Unif}(a, b)$: $$f(x) = \frac{1}{b-a}, \quad a ### Expectation and Transformations (Week 2) #### 1. Expected Value (Continuous) $$E(X) = \int_{-\infty}^\infty x f(x) dx$$ **Interpretation:** long-run average of independent repetitions. #### 2. Expectation of a Function (2B1) For a function $g$: $$E(g(X)) = \int_{-\infty}^\infty g(x) f(x) dx$$ **Important:** $E(g(X)) \ne g(E(X))$ in general. #### 3. Power Transformation (2B1) If $X \sim \text{Unif}(0, 1)$: $$E(X^n) = \int_0^1 x^n dx = \frac{1}{n+1}$$ Valid for $n = 1, 2, 3, ...$ #### 4. Linearity of Expectation Always valid: $E(a + bX) = a + bE(X)$. $E(X+Y) = E(X) + E(Y)$. No independence required. #### 5. Piecewise-Defined Transformations (2B2) If $W = g(X)$ is defined by cases: $$E(W) = \int g(x)f(x) dx$$ Split the integral according to the cases of $g$. #### 6. Indicator Variables For an event $A$: $$1_A = \begin{cases} 1, & \text{A occurs} \\ 0, & \text{otherwise} \end{cases}$$ $E(1_A) = P(A)$. Useful for counting arguments. **Common Mistakes:** - Forgetting limits of integration - Applying $g(E(X))$ instead of $E(g(X))$ - Assuming independence when not stated ### Probability: Concept and Basic Rules (Week 1) #### 1. Events and Set Operations (1A1) - $A \cup B$: union (A or B) - $A \cap B$: intersection (A and B) - $A^c$: complement of A (not A) - $B \setminus A$: difference (B but not A) - $\emptyset$: empty set - $S$: sample space **Complement reminder:** $(X > 3)^c \Leftrightarrow X \le 3$. #### 2. Symmetric Sample Space (1A1) If all outcomes equally likely: $P(A) = \frac{|A|}{|S|}$. **Die twice:** $|S| = 36$, each cell has probability $1/36$. #### 3. Core Probability Rules - $P(S) = 1$ - $0 \le P(A) \le 1$ **Mutually exclusive $A_i$:** $P(\cup_i A_i) = \sum_i P(A_i)$. **Addition rule (use for overlaps) (1A2):** $P(A \cup B) = P(A) + P(B) - P(A \cap B)$. **Complement:** $P(A^c) = 1 - P(A)$. #### 4. Venn Diagram Toolkit (1A2) For three sets (inclusion-exclusion): $P(T \cup S \cup B) = P(T) + P(S) + P(B) - P(T \cap S) - P(T \cap B) - P(S \cap B) + P(T \cap S \cap B)$. **Only-event pattern (1A2):** $P(T \text{ only}) = P(T) - P(T \cap S) - P(T \cap B) + P(T \cap S \cap B)$. #### 5. Conditional Probability and Product Rule (1A3) $$P(A|B) = \frac{P(A \cap B)}{P(B)}, \quad P(B) > 0$$ $$P(A \cap B) = P(A)P(B|A)$$ **Law of total probability:** If $B_1, ..., B_n$ is a partition: $$P(A) = \sum_{i=1}^n P(B_i)P(A|B_i)$$ **Bayes' rule:** $$P(B|A) = \frac{P(A|B)P(B)}{P(A)}$$ #### 6. Counting Basics - **With replacement (ordered sequences):** $n^k$ - **Without replacement (ordered):** $P(n, k) = n(n-1)...(n-k+1) = \frac{n!}{(n-k)!}$ - **Unordered subsets:** $\binom{n}{k} = \frac{n!}{k!(n-k)!}$ ### Random Variables and Distributions (Week 1) #### 1. Random Variable (RV) A random variable is a function $X: S \to \mathbb{R}$. **Event notation:** $\{X=x\} = \{s \in S : X(s)=x\}$. #### 2. Discrete Distribution (PMF) $f_X(x) = P(X=x)$, $\sum_x f_X(x) = 1$. **Distribution table:** | $X$ | $x_1$ | ... | $x_n$ | |------------|------------|-----|------------| | $P(X=x)$ | $p_1$ | ... | $p_n$ | #### 3. Joint Distribution Table (1B1, 1B2) $f_{X,Y}(x,y) = P(X=x, Y=y)$. **Marginals (row/column sums):** $f_X(x) = \sum_y f_{X,Y}(x,y)$, $f_Y(y) = \sum_x f_{X,Y}(x,y)$. **Conditional distribution:** $$f_{Y|X}(y|x) = \frac{f_{X,Y}(x,y)}{f_X(x)}, \quad f_X(x) > 0$$ #### 4. Independence Test (1B1f) $X$ and $Y$ independent if $f_{X,Y}(x,y) = f_X(x)f_Y(y)$ for all $(x,y)$. #### 5. Joint Table Workflow (Class Problems) - Decide possible values (mark impossible pairs as 0) - Fill joint table by counting outcomes / product rule - Compute marginals by row/column sums - Answer queries by summing relevant cells **Common Mistakes:** - Same distribution $\ne$ same random variable - Joint probabilities must sum to 1 - Forgetting impossible outcomes

Bayesian Inference & Hypothesi

Related Cheatsheets

Statistical Inference Problem

Create Your Own AI Cheatsheet