Exam Prep Plan & Cheatsheet

Shared 4/20/2026•118 views

/ 1

Cheatsheet Content

### Exam Preparation Plan (May 4th Exam) #### Overview This plan focuses on revising concepts from Contact Sessions 1, 2, and 3, practicing problem-solving, and memorizing key formulas. The exam is on May 4th, so effective time management is crucial. #### Daily Schedule Suggestion (Adapt as needed) **Day 1: Concept Review - Contact Session 1** - **Morning (2-3 hours):** - Review notes/materials for Contact Session 1. - Focus on understanding core definitions and principles. - Read through the "Cheatsheet: Contact Session 1" below. - **Afternoon (2-3 hours):** - Work through assigned problems/examples related to Contact Session 1. - Identify areas of weakness. - **Evening (1 hour):** - Review formulas for Contact Session 1. - Create flashcards for difficult concepts or formulas. **Day 2: Concept Review - Contact Session 2** - **Morning (2-3 hours):** - Review notes/materials for Contact Session 2. - Focus on understanding core definitions and principles. - Read through the "Cheatsheet: Contact Session 2" below. - **Afternoon (2-3 hours):** - Work through assigned problems/examples related to Contact Session 2. - Identify areas of weakness. - **Evening (1 hour):** - Review formulas for Contact Session 2. - Create flashcards for difficult concepts or formulas. **Day 3: Concept Review - Contact Session 3** - **Morning (2-3 hours):** - Review notes/materials for Contact Session 3. - Focus on understanding core definitions and principles. - Read through the "Cheatsheet: Contact Session 3" below. - **Afternoon (2-3 hours):** - Work through assigned problems/examples related to Contact Session 3. - Identify areas of weakness. - **Evening (1 hour):** - Review formulas for Contact Session 3. - Create flashcards for difficult concepts or formulas. **Day 4: Integrated Practice & Weakness Targeting** - **Morning (3-4 hours):** - Attempt a mock exam or a comprehensive set of mixed problems from all three sessions. - Time yourself to simulate exam conditions. - **Afternoon (2-3 hours):** - Review your answers from the morning session. - Identify common mistakes and topics that still require attention. - Revisit specific sections of your notes or the cheatsheet. - **Evening (1 hour):** - Focused review of formulas and concepts from your weakest areas. **Day 5: Final Review & Rest (Day before exam)** - **Morning (2-3 hours):** - Quick scan of all cheatsheets and formula lists. - Do a few light practice problems, but avoid intense new problem-solving. - Focus on confidence-building. - **Afternoon/Evening:** - Relax, eat well, and ensure you get a good night's sleep. Avoid cramming late into the night. #### Important Tips - **Active Recall:** Don't just passively read. Test yourself regularly. - **Spaced Repetition:** Revisit topics multiple times over several days. - **Problem Solving:** Practice is key. Understand *why* an answer is correct/incorrect. - **Formula Sheets:** Create and use your own formula sheet during practice. - **Time Management:** Stick to your schedule but be flexible if a topic needs more time. - **Google Drive:** Utilize the provided Google Drive link for all course materials, assignments, and capstone project proposal (as a reference for applied concepts). ### Cheatsheet: Contact Session 1 - Introduction to Data Science & Statistics #### Key Concepts - **Data Science Lifecycle:** Problem definition, data acquisition, data cleaning/preparation, exploratory data analysis (EDA), modeling, evaluation, deployment. - **Types of Data:** - **Quantitative:** Numerical (discrete, continuous). - **Qualitative/Categorical:** Non-numerical (nominal, ordinal). - **Measures of Central Tendency:** - **Mean ($\bar{x}$):** Average. Sensitive to outliers. - **Median:** Middle value when data is ordered. Robust to outliers. - **Mode:** Most frequent value. - **Measures of Dispersion:** - **Range:** Max - Min. - **Variance ($\sigma^2$ or $s^2$):** Average of squared differences from the mean. - **Standard Deviation ($\sigma$ or $s$):** Square root of variance. Measures spread. - **Interquartile Range (IQR):** $Q_3 - Q_1$. Range of the middle 50% of data. Robust to outliers. - **Data Visualization:** Histograms, Box plots, Scatter plots, Bar charts. - **Probability Basics:** - **Experiment:** Process with uncertain outcomes. - **Outcome:** Result of an experiment. - **Sample Space ($S$):** Set of all possible outcomes. - **Event ($E$):** Subset of the sample space. - **Probability ($P(E)$):** Likelihood of an event occurring. $0 \le P(E) \le 1$. - **Complement ($E^c$):** Event does not occur. $P(E^c) = 1 - P(E)$. - **Types of Probability:** - **Classical:** Equally likely outcomes. $P(E) = \frac{\text{Number of favorable outcomes}}{\text{Total number of outcomes}}$. - **Empirical/Relative Frequency:** Based on observations. $P(E) = \frac{\text{Number of times E occurred}}{\text{Total number of trials}}$. - **Subjective:** Based on personal judgment. - **Rules of Probability:** - **Addition Rule:** - **Mutually Exclusive:** $P(A \cup B) = P(A) + P(B)$ - **General:** $P(A \cup B) = P(A) + P(B) - P(A \cap B)$ - **Multiplication Rule:** - **Independent Events:** $P(A \cap B) = P(A) \times P(B)$ - **Dependent Events:** $P(A \cap B) = P(A) \times P(B|A)$ - **Conditional Probability:** $P(B|A) = \frac{P(A \cap B)}{P(A)}$, where $P(A) > 0$. - **Bayes' Theorem:** $P(A|B) = \frac{P(B|A)P(A)}{P(B)}$ #### Key Formulas (Contact Session 1) - **Mean:** $\bar{x} = \frac{\sum x_i}{n}$ - **Sample Variance:** $s^2 = \frac{\sum (x_i - \bar{x})^2}{n-1}$ - **Population Variance:** $\sigma^2 = \frac{\sum (x_i - \mu)^2}{N}$ - **Standard Deviation:** $s = \sqrt{s^2}$ or $\sigma = \sqrt{\sigma^2}$ - **Z-score:** $z = \frac{x - \mu}{\sigma}$ (for population) or $z = \frac{x - \bar{x}}{s}$ (for sample) - **IQR:** $Q_3 - Q_1$ - **General Addition Rule:** $P(A \cup B) = P(A) + P(B) - P(A \cap B)$ - **Conditional Probability:** $P(B|A) = \frac{P(A \cap B)}{P(A)}$ - **Bayes' Theorem:** $P(A|B) = \frac{P(B|A)P(A)}{P(B)}$ #### Application Examples - **Descriptive Statistics:** Calculate mean, median, mode, standard deviation for a given dataset of student scores. Interpret the spread of scores. - **Probability:** Given a deck of cards, calculate the probability of drawing a King or a Heart. Calculate the probability of drawing two Kings without replacement. - **Conditional Probability:** If 60% of students pass Math and 70% pass English, and 40% pass both, what is the probability a student passed Math given they passed English? ### Cheatsheet: Contact Session 2 - Probability Distributions & Sampling #### Key Concepts - **Random Variable:** A variable whose value is a numerical outcome of a random phenomenon. - **Discrete Random Variable:** Takes on a finite or countably infinite number of values (e.g., number of heads in coin flips). - **Continuous Random Variable:** Takes on any value within a given range (e.g., height, temperature). - **Probability Distribution:** Describes the probabilities of all possible outcomes for a random variable. - **Probability Mass Function (PMF):** For discrete variables, $P(X=x)$. - **Probability Density Function (PDF):** For continuous variables, $f(x)$ where $\int_{-\infty}^{\infty} f(x) dx = 1$. - **Cumulative Distribution Function (CDF):** $F(x) = P(X \le x)$. - **Expected Value ($E[X]$):** The long-run average of a random variable. - **Discrete:** $E[X] = \sum x \cdot P(X=x)$ - **Continuous:** $E[X] = \int x \cdot f(x) dx$ - **Variance of a Random Variable ($Var(X)$):** Measures the spread of the distribution. - $Var(X) = E[X^2] - (E[X])^2$ - **Common Discrete Distributions:** - **Bernoulli Distribution:** Single trial, two outcomes (success/failure). $P(X=1)=p, P(X=0)=1-p$. - **Binomial Distribution:** Number of successes in a fixed number ($n$) of independent Bernoulli trials. $X \sim B(n, p)$. - **Poisson Distribution:** Number of events in a fixed interval of time or space, given a constant average rate ($\lambda$). $X \sim P(\lambda)$. - **Common Continuous Distributions:** - **Uniform Distribution:** All values within a given interval are equally likely. - **Normal Distribution:** Bell-shaped, symmetric, characterized by mean ($\mu$) and standard deviation ($\sigma$). $X \sim N(\mu, \sigma^2)$. - **Standard Normal Distribution:** $Z \sim N(0, 1)$, mean 0, std dev 1. - **Central Limit Theorem (CLT):** For a large sample size ($n \ge 30$), the sampling distribution of the sample mean ($\bar{X}$) will be approximately normal, regardless of the population distribution, with mean $\mu_{\bar{X}} = \mu$ and standard deviation $\sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}}$ (Standard Error). - **Sampling:** - **Population:** Entire group of interest. - **Sample:** Subset of the population. - **Sampling Methods:** Simple random sampling, stratified sampling, cluster sampling, systematic sampling. - **Sampling Error:** Difference between sample statistic and population parameter. - **Bias:** Systematic error in measurement or selection. #### Key Formulas (Contact Session 2) - **Expected Value (Discrete):** $E[X] = \sum_{x} x \cdot P(X=x)$ - **Variance (Discrete):** $Var(X) = \sum_{x} (x - E[X])^2 P(X=x)$ or $E[X^2] - (E[X])^2$ - **Binomial PMF:** $P(X=k) = \binom{n}{k} p^k (1-p)^{n-k}$ - $E[X] = np$, $Var(X) = np(1-p)$ - **Poisson PMF:** $P(X=k) = \frac{e^{-\lambda} \lambda^k}{k!}$ - $E[X] = \lambda$, $Var(X) = \lambda$ - **Z-score for Normal Distribution:** $Z = \frac{X - \mu}{\sigma}$ - **Standard Error of the Mean:** $\sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}}$ - **Z-score for Sample Mean (CLT):** $Z = \frac{\bar{X} - \mu}{\sigma / \sqrt{n}}$ #### Application Examples - **Binomial Distribution:** Calculate the probability of getting exactly 3 heads in 5 coin flips. - **Poisson Distribution:** If a call center receives an average of 5 calls per hour, what is the probability of receiving exactly 7 calls in the next hour? - **Normal Distribution:** Given a population with mean $\mu$ and std dev $\sigma$, find the probability that a randomly selected value falls within a certain range. Use Z-tables. - **Central Limit Theorem:** If the average height of students is 170cm with a std dev of 10cm, what is the probability that the mean height of a sample of 50 students is greater than 172cm? ### Cheatsheet: Contact Session 3 - Hypothesis Testing & Regression #### Key Concepts - **Inferential Statistics:** Using sample data to make inferences about a population. - **Estimation:** - **Point Estimate:** A single value used to estimate a population parameter (e.g., sample mean $\bar{x}$ for population mean $\mu$). - **Confidence Interval (CI):** A range of values likely to contain the population parameter with a certain level of confidence (e.g., 95% CI). - **Interpretation:** If we were to take many samples and construct a CI for each, approximately X% of these intervals would contain the true population parameter. - **Hypothesis Testing:** A statistical method used to make decisions about a population based on sample data. - **Null Hypothesis ($H_0$):** A statement of no effect or no difference. Assumed true until evidence suggests otherwise. - **Alternative Hypothesis ($H_1$ or $H_a$):** A statement that contradicts $H_0$. What we want to prove. - **Types of Errors:** - **Type I Error ($\alpha$):** Rejecting $H_0$ when it is true (False Positive). Significance level. - **Type II Error ($\beta$):** Failing to reject $H_0$ when it is false (False Negative). - **P-value:** The probability of observing a test statistic as extreme as, or more extreme than, the one calculated from the sample, *assuming $H_0$ is true*. - **Decision Rule:** If P-value $\le \alpha$, reject $H_0$. If P-value $> \alpha$, fail to reject $H_0$. - **Test Statistic:** A value calculated from sample data used to test the hypothesis (e.g., Z-statistic, T-statistic). - **Common Hypothesis Tests:** - **Z-test:** For population mean when population standard deviation ($\sigma$) is known, or for large sample sizes ($n \ge 30$). - **T-test:** For population mean when population standard deviation ($\sigma$) is unknown and sample size is small ($n ### Practice Problems #### Contact Session 1: Data & Probability 1. **Descriptive Statistics:** Given the dataset: `[15, 22, 18, 25, 30, 10, 20, 28, 12, 18]`. * Calculate the mean, median, and mode. * Calculate the range, variance, and standard deviation. * Identify any outliers using the 1.5*IQR rule. 2. **Probability:** A bag contains 5 red, 3 blue, and 2 green marbles. * What is the probability of drawing a red marble? * What is the probability of drawing a blue or a green marble? * If you draw two marbles without replacement, what is the probability that both are red? * What is the probability that the second marble is blue, given the first was red? 3. **Bayes' Theorem:** A medical test for a disease has a 95% accuracy (correctly identifies diseased individuals) and a 98% specificity (correctly identifies healthy individuals). The prevalence of the disease in the population is 1%. If a person tests positive, what is the probability they actually have the disease? #### Contact Session 2: Distributions & Sampling 1. **Binomial Distribution:** A biased coin lands on heads with a probability of 0.6. If you flip the coin 8 times: * What is the probability of getting exactly 5 heads? * What is the probability of getting at least 6 heads? * Calculate the expected number of heads and the variance. 2. **Poisson Distribution:** On average, a website receives 12 unique visitors per minute. * What is the probability that it receives exactly 10 visitors in the next minute? * What is the probability that it receives more than 15 visitors in the next minute? 3. **Normal Distribution:** The scores on a standardized test are normally distributed with a mean of 500 and a standard deviation of 100. * What proportion of test-takers score above 650? * What is the score below which 25% of test-takers fall? 4. **Central Limit Theorem:** The average weight of a certain type of apple is 150g with a standard deviation of 15g. If you take a random sample of 40 apples: * What is the probability that the sample mean weight is between 145g and 155g? * What is the probability that the sample mean weight is less than 140g? #### Contact Session 3: Hypothesis Testing & Regression 1. **Confidence Interval:** A sample of 36 students has an average GPA of 3.2 with a sample standard deviation of 0.5. Construct a 90% confidence interval for the true average GPA of all students. 2. **One-Sample T-test:** A company claims that its new energy drink improves reaction time. The average reaction time for the general population is 0.25 seconds. A sample of 15 individuals who consumed the drink had an average reaction time of 0.22 seconds with a standard deviation of 0.04 seconds. At $\alpha = 0.05$, test if the drink significantly reduces reaction time. 3. **Chi-Square Test of Independence:** A survey asks 100 people about their gender and their preference for coffee or tea. The results are: * Coffee: 30 males, 20 females * Tea: 15 males, 35 females * Is there a significant association between gender and beverage preference at $\alpha = 0.05$? 4. **Linear Regression:** Given the following data points for advertising spend (X) and sales (Y): `X = [10, 15, 20, 25, 30]` `Y = [50, 60, 70, 85, 90]` * Calculate the slope ($b_1$) and y-intercept ($b_0$) for the regression line. * Predict sales for an advertising spend of 22. * Calculate the Pearson correlation coefficient ($r$). * Interpret the $R^2$ value (you don't need to calculate it fully, just explain what it means in this context).

Related Cheatsheets

Create Your Own AI Cheatsheet

Generate comprehensive study cheatsheets from your notes, textbooks, or lecture materials using AI.

Exam Prep Plan & Cheatsheet

Related Cheatsheets

Exam Prep Cheatsheet

Exam Cheatsheet

Exam Cheatsheet

SET A Exam Cheatsheet

TN Board Exam Study Plan

2025 FMGE Exam Prep

Create Your Own AI Cheatsheet