1. Data Representation Frequency Distributions Relative Frequency: $ \frac{\text{Frequency}}{\text{Total Number of Observations}} $ Expressed as a percentage, rounded to one decimal place. Histograms Graphical representation of frequency distribution for numerical data. Each bar represents a range of values (bin). Height of bar indicates frequency or relative frequency. Sample Size (n): Sum of all frequencies from the histogram. Distribution Shapes Uniform: All values have roughly equal frequency. Bell-Shaped (Normal): Symmetric, with a peak in the middle and tapering tails. Right Skewed (Positively Skewed): Tail extends to the right; mean > median. Left Skewed (Negatively Skewed): Tail extends to the left; mean 2. Measures of Central Tendency Mean ($\bar{x}$ or $\mu$) Definition: Average of all values. Sum of values divided by the count. Formula: $ \bar{x} = \frac{\sum x_i}{n} $ Sensitive to outliers. Median Definition: Middle value when data is ordered. If 'n' is odd, median is the middle value. If 'n' is even, median is the average of the two middle values. Not sensitive to outliers. Mode Definition: Value that appears most frequently in a data set. A data set can have one mode (unimodal), multiple modes (multimodal), or no mode. Useful for categorical data. 3. Measures of Spread (Variability) Range Definition: Difference between the maximum and minimum values. Formula: $ \text{Range} = \text{Max} - \text{Min} $ Highly sensitive to outliers. Standard Deviation ($\sigma$ or $s$) Definition: Measures the average amount of variation or dispersion from the mean. Small standard deviation indicates data points are close to the mean. Large standard deviation indicates data points are spread out. Sensitive to outliers. 4. Five-Number Summary and Boxplots Five-Number Summary Consists of: Minimum Value First Quartile (Q1) Median (Q2) Third Quartile (Q3) Maximum Value Q1: 25th percentile (25% of data is below Q1). Q2 (Median): 50th percentile (50% of data is below Q2). Q3: 75th percentile (75% of data is below Q3). Boxplots Visual representation of the five-number summary. Box extends from Q1 to Q3 (Interquartile Range, IQR). Line inside the box marks the Median. Whiskers extend to the min and max values (or to $1.5 \times IQR$ from quartiles to show outliers). 5. Percentiles and Quartiles Percentiles Definition: A value below which a given percentage of observations falls. The $P$-th percentile is the value such that $P\%$ of the data falls at or below it. Example: The 66th percentile means 66% of values are at or below that point. Quartiles as Percentiles Q1 = 25th percentile Q2 = 50th percentile (Median) Q3 = 75th percentile Calculating Number of Observations for Percentiles If a sample has $N$ observations, the number of observations at or below the $P$-th percentile is $N \times \frac{P}{100}$. The number of observations at or above the $P$-th percentile is $N \times \frac{(100-P)}{100}$. 6. Impact of Outliers Sensitive to Outliers: Mean Range Standard Deviation Resistant to Outliers: Median Mode Outliers have very little or no effect on the Median and Mode.