### Point Estimation - Formulas - **Sample Mean:** $\bar{X} = \frac{1}{n}\sum_{i=1}^{n} X_i$ - **Sample Variance:** $S^2 = \frac{1}{n-1}\sum_{i=1}^{n} (X_i - \bar{X})^2$ - **Sample Standard Deviation:** $S = \sqrt{S^2}$ - **Sample Proportion:** $\hat{p} = \frac{X}{n}$ (where X is number of successes) ### Statistical Intervals - Formulas - **Confidence Interval for Mean (Large Sample, $\sigma$ known):** $\bar{X} \pm Z_{\alpha/2} \frac{\sigma}{\sqrt{n}}$ - **Confidence Interval for Mean (Small Sample, $\sigma$ unknown):** $\bar{X} \pm t_{\alpha/2, n-1} \frac{S}{\sqrt{n}}$ - **Confidence Interval for Proportion (Large Sample):** $\hat{p} \pm Z_{\alpha/2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$ - **Confidence Interval for Variance (Normal Population):** $\left[ \frac{(n-1)S^2}{\chi^2_{\alpha/2, n-1}}, \frac{(n-1)S^2}{\chi^2_{1-\alpha/2, n-1}} \right]$ ### Test of Hypothesis (Single Sample) - Formulas - **Z-test for Mean ( $\sigma$ known):** $Z = \frac{\bar{X} - \mu_0}{\sigma/\sqrt{n}}$ - **t-test for Mean ( $\sigma$ unknown):** $t = \frac{\bar{X} - \mu_0}{S/\sqrt{n}}$ - **Z-test for Proportion:** $Z = \frac{\hat{p} - p_0}{\sqrt{p_0(1-p_0)/n}}$ - **p-value:** - For $H_a: \mu > \mu_0$, p-value $= P(Z > z_{calc})$ or $P(t > t_{calc})$ - For $H_a: \mu |z_{calc}|)$ or $2 \times P(t > |t_{calc}|)$ ### Statistical Inference (Two Samples) - Formulas #### Two Means - **Z-test for Two Means ($\sigma_1, \sigma_2$ known):** $Z = \frac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 - \mu_2)_0}{\sqrt{\sigma_1^2/n_1 + \sigma_2^2/n_2}}$ - **t-test for Two Means (Pooled Variance, $\sigma_1 = \sigma_2$ unknown):** $t = \frac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 - \mu_2)_0}{S_p \sqrt{1/n_1 + 1/n_2}}$ - **Pooled Variance:** $S_p^2 = \frac{(n_1-1)S_1^2 + (n_2-1)S_2^2}{n_1+n_2-2}$ - **Degrees of Freedom:** $df = n_1+n_2-2$ - **t-test for Two Means (Unequal Variances, $\sigma_1 \neq \sigma_2$ unknown, Welch's t-test):** $t = \frac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 - \mu_2)_0}{\sqrt{S_1^2/n_1 + S_2^2/n_2}}$ - **Degrees of Freedom:** $df = \frac{(S_1^2/n_1 + S_2^2/n_2)^2}{\frac{(S_1^2/n_1)^2}{n_1-1} + \frac{(S_2^2/n_2)^2}{n_2-1}}$ - **Confidence Interval for Two Means ($\sigma_1, \sigma_2$ known):** $(\bar{X}_1 - \bar{X}_2) \pm Z_{\alpha/2} \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}$ - **Confidence Interval for Two Means (Pooled Variance):** $(\bar{X}_1 - \bar{X}_2) \pm t_{\alpha/2, df} S_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}$ - **Confidence Interval for Two Means (Unequal Variances):** $(\bar{X}_1 - \bar{X}_2) \pm t_{\alpha/2, df} \sqrt{\frac{S_1^2}{n_1} + \frac{S_2^2}{n_2}}$ #### Paired Samples - **t-test for Paired Samples:** $t = \frac{\bar{d} - \mu_{d0}}{S_d/\sqrt{n}}$ - **Mean Difference:** $\bar{d} = \frac{1}{n}\sum d_i$ - **Standard Deviation of Differences:** $S_d = \sqrt{\frac{1}{n-1}\sum (d_i - \bar{d})^2}$ - **Degrees of Freedom:** $df = n-1$ #### Two Proportions - **Z-test for Two Proportions:** $Z = \frac{(\hat{p}_1 - \hat{p}_2) - (p_1 - p_2)_0}{\sqrt{\hat{p}(1-\hat{p})(1/n_1 + 1/n_2)}}$ (for $H_0: p_1 = p_2$) - **Pooled Proportion:** $\hat{p} = \frac{X_1 + X_2}{n_1 + n_2}$ - **Z-test for Two Proportions (General):** $Z = \frac{(\hat{p}_1 - \hat{p}_2) - (p_1 - p_2)_0}{\sqrt{\hat{p}_1(1-\hat{p}_1)/n_1 + \hat{p}_2(1-\hat{p}_2)/n_2}}$ (for $H_0: p_1 - p_2 = D_0$) - **Confidence Interval for Two Proportions:** $(\hat{p}_1 - \hat{p}_2) \pm Z_{\alpha/2} \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}}$ - **p-value:** Calculated similarly to single sample tests, using the appropriate Z or t statistic. ### Correlation and Regression - Formulas - **Covariance:** $Cov(X,Y) = \frac{1}{n-1}\sum_{i=1}^{n} (X_i - \bar{X})(Y_i - \bar{Y})$ - **Pearson Product-Moment Correlation Coefficient:** $r = \frac{\sum (X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum (X_i - \bar{X})^2 \sum (Y_i - \bar{Y})^2}} = \frac{Cov(X,Y)}{S_X S_Y}$ - **Simple Linear Regression Model:** $Y = \beta_0 + \beta_1 X + \epsilon$ - **Estimated Regression Line:** $\hat{Y} = \hat{\beta}_0 + \hat{\beta}_1 X$ - **Slope Coefficient:** $\hat{\beta}_1 = \frac{\sum (X_i - \bar{X})(Y_i - \bar{Y})}{\sum (X_i - \bar{X})^2} = \frac{Cov(X,Y)}{S_X^2}$ - **Intercept Coefficient:** $\hat{\beta}_0 = \bar{Y} - \hat{\beta}_1 \bar{X}$ - **Coefficient of Determination:** $R^2 = r^2 = \frac{SSR}{SST} = 1 - \frac{SSE}{SST}$ - **Total Sum of Squares (SST):** $SST = \sum (Y_i - \bar{Y})^2$ - **Regression Sum of Squares (SSR):** $SSR = \sum (\hat{Y}_i - \bar{Y})^2$ - **Error Sum of Squares (SSE):** $SSE = \sum (Y_i - \hat{Y}_i)^2$ - **Standard Error of the Estimate:** $S_e = \sqrt{\frac{SSE}{n-2}}$ ### Joint Probability Distribution - Formulas #### Discrete - **Joint Probability Mass Function (PMF):** $P(X=x, Y=y)$ - **Marginal PMF for X:** $P_X(x) = \sum_y P(X=x, Y=y)$ - **Marginal PMF for Y:** $P_Y(y) = \sum_x P(X=x, Y=y)$ - **Conditional PMF:** $P(Y=y | X=x) = \frac{P(X=x, Y=y)}{P_X(x)}$ (if $P_X(x) > 0$) - **Independence:** $P(X=x, Y=y) = P_X(x) P_Y(y)$ for all $x,y$ #### Continuous - **Joint Probability Density Function (PDF):** $f(x,y)$ - **Probability over a region R:** $P((X,Y) \in R) = \iint_R f(x,y) \,dx\,dy$ - **Marginal PDF for X:** $f_X(x) = \int_{-\infty}^{\infty} f(x,y) \,dy$ - **Marginal PDF for Y:** $f_Y(y) = \int_{-\infty}^{\infty} f(x,y) \,dx$ - **Conditional PDF:** $f(Y=y | X=x) = \frac{f(x,y)}{f_X(x)}$ (if $f_X(x) > 0$) - **Independence:** $f(x,y) = f_X(x) f_Y(y)$ for all $x,y$ #### Expected Values - **Expected Value of a function $g(X,Y)$:** - **Discrete:** $E[g(X,Y)] = \sum_x \sum_y g(x,y) P(X=x, Y=y)$ - **Continuous:** $E[g(X,Y)] = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} g(x,y) f(x,y) \,dx\,dy$ - **Covariance:** $Cov(X,Y) = E[(X - E[X])(Y - E[Y])] = E[XY] - E[X]E[Y]$ - **Correlation Coefficient:** $\rho = \frac{Cov(X,Y)}{\sigma_X \sigma_Y}$