Research Data & Statistics
Cheatsheet Content
### Primary Data - **Definition:** Original data collected directly by a researcher for a specific study. It's fresh, original, and unpublished. #### Characteristics - **Originality:** Collected for the first time. - **Specific Purpose:** Gathered for a particular research objective. - **Reliability:** Generally accurate as it comes directly from the source. - **Time-Consuming & Costly:** Requires significant effort, planning, time, and resources. #### Methods of Collection 1. **Direct Personal Investigation:** Investigator personally visits respondents for face-to-face interaction. 2. **Indirect Oral Investigation:** Data obtained from individuals well-informed about the subject, rather than directly from the concerned persons. 3. **Schedules and Questionnaires:** - **Schedule:** Structured form filled by the investigator during personal visits. - **Questionnaire:** Set of questions sent to respondents to fill out themselves. 4. **Local Correspondents Method:** Using local agents or representatives who provide information. ### Essentials of a Good Questionnaire - **Clear Objective:** Purpose clearly defined. - **Simplicity of Language:** Questions easy to understand. - **Brevity:** Concise and not unnecessarily lengthy. - **Logical Order:** Questions arranged systematically. - **Specific & Definite Questions:** Clear and unambiguous. - **Avoid Leading Questions:** Should not influence answers. - **Use Open & Closed Questions:** Balances detailed and structured responses. - **Pre-testing (Pilot Survey):** Test before final use. - **Attractive Layout:** Neat and appealing format encourages participation. - **Consider Respondents' Ability:** Match knowledge and understanding. - **Confidentiality:** Assure information privacy. - **Avoid Sensitive Questions:** Unless essential. - **Adequate Number of Questions:** Enough to meet objective without burdening. - **Proper Instructions:** Clear guidance for answering. - **Easy to Code & Tabulate:** Responses simple to classify and analyze. - **Personal Identification:** Include if required. - **Time Consideration:** Reasonable completion time. ### Secondary Data - **Definition:** Information already collected, processed, and published by someone else, used for a new study. #### Sources - **Internal Sources:** Data from within an organization (e.g., company records, reports). - **External Sources:** Data from outside the organization (e.g., government publications, journals, websites). #### Merits - **Time-Saving:** Readily available. - **Economical:** Less expensive than primary data. - **Convenient:** Easy to access and use. - **Helps Primary Data Collection:** Provides background for research. - **Large Scope:** Covers wide range of topics/geographical areas. - **Useful for Comparison:** Compares current findings with past data. - **Basis for Forecasting:** Helps predict future trends. ### Measures of Central Tendency - **Definition:** A single significant figure representing characteristics of a group. - **Types:** Mean, Median, Mode, Geometric Mean (GM), Harmonic Mean (HM). #### Key Measures - **Mean (Arithmetic Mean):** The sum of observations divided by the number of observations. - **Median:** The middle value in an ordered dataset. - **Mode:** The value that appears most frequently in a dataset. - **Geometric Mean (GM):** For $n$ values, the $n$-th root of their product. - **Harmonic Mean (HM):** Reciprocal of the mean of the reciprocals of the values. ### Measures of Dispersion - **Definition:** Describes the spread or scatter of values in a series (variability). #### Types 1. **Absolute Measures:** Expressed in the same unit as the data. - Range - Quartile Deviation - Mean Deviation - Standard Deviation 2. **Relative Measures (Coefficient of Dispersion):** Ratio of a measure of dispersion to an appropriate average. Unitless, useful for comparison. - Coefficient of Range - Coefficient of Quartile Deviation - Coefficient of Mean Deviation - Coefficient of Variation #### Key Measures - **Range:** Difference between the highest and lowest values ($Max - Min$). - **Quartile Deviation:** Half the difference between the third and first quartiles: $(Q_3 - Q_1)/2$. - **Mean Deviation:** Arithmetic mean of the absolute deviations of all values from their average. - **Standard Deviation ($\sigma$):** Square root of the mean of the squared deviations from the arithmetic mean. - For a population: $$\sigma = \sqrt{\frac{\sum (x_i - \mu)^2}{N}}$$ - For a sample: $$s = \sqrt{\frac{\sum (x_i - \bar{x})^2}{n-1}}$$