### Normal Distribution - **Definition:** A continuous probability distribution characterized by its symmetric, bell-shaped curve. - **Parameters:** - Mean ($\mu$): Center of the distribution. - Standard Deviation ($\sigma$): Spread of the distribution. - **Probability Density Function (PDF):** $$f(x | \mu, \sigma) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}$$ - **Standard Normal Distribution:** - A special case where $\mu = 0$ and $\sigma = 1$. - Denoted by $Z \sim N(0,1)$. - Any normal random variable $X \sim N(\mu, \sigma)$ can be standardized using the Z-score formula: $$Z = \frac{X - \mu}{\sigma}$$ - **Empirical Rule (68-95-99.7 Rule):** - Approximately 68% of data falls within 1 standard deviation of the mean. - Approximately 95% of data falls within 2 standard deviations of the mean. - Approximately 99.7% of data falls within 3 standard deviations of the mean. - **Properties:** - Symmetric about the mean. - Mean, median, and mode are all equal. - Total area under the curve is 1. - Asymptotic to the x-axis. ### Hypothesis Testing: Introduction - **Purpose:** A statistical method used to make decisions about a population parameter based on sample data. - **Key Concepts:** - **Null Hypothesis ($H_0$):** A statement of no effect or no difference. Assumed true until evidence suggests otherwise. - **Alternative Hypothesis ($H_1$ or $H_a$):** A statement that contradicts the null hypothesis. What we're trying to find evidence for. - **Significance Level ($\alpha$):** The probability of rejecting the null hypothesis when it is actually true (Type I error). Common values: 0.05, 0.01, 0.10. - **P-value:** The probability of observing a test statistic as extreme as, or more extreme than, the one calculated from the sample data, assuming the null hypothesis is true. - **Test Statistic:** A value calculated from sample data that is used to evaluate the null hypothesis. - **Steps in Hypothesis Testing:** 1. **Formulate Hypotheses:** State $H_0$ and $H_1$. 2. **Choose Significance Level ($\alpha$):** Determine the acceptable risk of a Type I error. 3. **Select Appropriate Test:** Based on data type, number of samples, and question. 4. **Calculate Test Statistic:** Using sample data. 5. **Determine P-value OR Critical Value:** - **P-value Approach:** Compare P-value to $\alpha$. - **Critical Value Approach:** Compare test statistic to critical value(s). 6. **Make Decision:** Reject or Fail to Reject $H_0$. 7. **State Conclusion:** In the context of the problem. ### Types of Errors - **Type I Error ($\alpha$):** Rejecting a true null hypothesis (False Positive). - **Type II Error ($\beta$):** Failing to reject a false null hypothesis (False Negative). - **Power of a Test ($1 - \beta$):** The probability of correctly rejecting a false null hypothesis. ### Common Hypothesis Tests #### 1. Z-test for Population Mean - **Use:** When population standard deviation ($\sigma$) is known, or sample size ($n$) is large ($n \ge 30$). - **Test Statistic:** $$Z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}}$$ where $\bar{x}$ is sample mean, $\mu_0$ is hypothesized population mean. #### 2. t-test for Population Mean - **Use:** When population standard deviation ($\sigma$) is unknown and sample size ($n$) is small ($n ### Correlation: Introduction - **Definition:** A statistical measure that expresses the extent to which two variables are linearly related (i.e., they change together at a constant rate). - **Correlation Coefficient (r):** A numerical value that quantifies the strength and direction of a linear relationship. - Ranges from -1 to +1. - **+1:** Perfect positive linear relationship. - **-1:** Perfect negative linear relationship. - **0:** No linear relationship. - **Important Note:** Correlation does NOT imply causation. ### Pearson Correlation Coefficient (r) - **Use:** To measure the strength and direction of a linear relationship between two quantitative variables. - **Formula:** $$r = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum (x_i - \bar{x})^2 \sum (y_i - \bar{y})^2}}$$ Alternatively, using Z-scores: $$r = \frac{1}{n-1} \sum Z_x Z_y$$ - **Interpretation of r:** - **Magnitude:** - $|r| \approx 0.1 - 0.3$: Weak linear relationship - $|r| \approx 0.3 - 0.5$: Moderate linear relationship - $|r| > 0.5$: Strong linear relationship - **Sign:** - Positive `r`: As one variable increases, the other tends to increase. - Negative `r`: As one variable increases, the other tends to decrease. ### Coefficient of Determination ($R^2$) - **Definition:** The square of the correlation coefficient ($r^2$). - **Interpretation:** Represents the proportion of the variance in the dependent variable that can be predicted from the independent variable(s). - E.g., if $R^2 = 0.60$, then 60% of the variation in Y can be explained by the variation in X. ### Correlation vs. Causation - **Correlation:** Simply indicates that two variables tend to change together. - **Causation:** Means that a change in one variable directly causes a change in another variable. - **Spurious Correlation:** When two variables are correlated but the relationship is due to a third, unobserved variable (confounding variable) or pure chance. - **Example:** Ice cream sales and drowning incidents are correlated, but neither causes the other; both are influenced by warm weather.