Mastering the Variables of Data
Choosing the correct statistical test begins with understanding your data. Is it a label or a number? Can it be ranked? Is it counted or measured? Let's decode the classification of statistical variables.
The Primary Split
Qualitative vs. Quantitative
The first step in data analysis is determining if your variable represents a quality (category) or a quantity (number).
Qualitative (Categorical)
Labels, groups, or names. Mathematical operations (like averaging) don't make sense here.
Quantitative (Numerical)
Values that measure or count something. Differences between numbers are meaningful.
Example Dataset Composition
In a typical health survey, variables are often a mix of both types.
Deep Dive: Qualitative Data
Does the order matter? This is the key question separating Nominal from Ordinal data.
Type 1 Nominal
Categories that are just names. There is no logical order (e.g., Red is not "higher" than Blue).
Example: Blood Groups
A classic nominal variable. You cannot rank Blood Type A over O.
Type 2 Ordinal
Categories that possess a clear rank or order, but the distance between them is unknown.
Example: Patient Satisfaction
Likert scales (Poor, Fair, Good, Excellent) have a direction.
Deep Dive: Quantitative Data
Are we counting whole items or measuring on a continuous scale?
Type 3 Continuous
Variables that can take any value within a range, including decimals. Infinite possibilities.
Example: Body Mass Index (BMI)
BMI can be 22.5, 22.51, etc. It flows continuously.
Type 4 Discrete
Variables restricted to whole numbers (integers). You count them.
Example: Number of Children
You can have 2 or 3 children, but not 2.5.
Statistical Test Prerequisites
Normality: Parametric vs. Non-parametric
Once the variable type is known, the next critical step is assessing the **Normality of Data**. This determines whether we apply the more powerful Parametric tests or the assumption-free Non-parametric alternatives.
Parametric Tests
- Assumption: Normal distribution required.
- Measures: Summarized by Mean $\pm$ SD (Standard Deviation).
- Test Power: More powerful (when assumptions met).
- Examples: t-test, ANOVA, Pearson correlation.
Non-parametric Tests
- Assumption: No normality required.
- Measures: Summarized by Median (IQR - Interquartile Range).
- Test Power: Less powerful.
- Examples: Mann-Whitney, Kruskal-Wallis, Spearman.
Comparative Test Profile
Visual comparison across key statistical criteria. Higher score is generally better/more robust.
Non-Parametric Analysis
The Non-Parametric Toolkit
Non-parametric tests are the fallback when strict assumptions are not met, particularly for non-normally distributed or ordinal data.
When to Choose Non-Parametric?
- ★ Data is not normally distributed
- ★ Sample size is small ($n < 30$)
- ★ Ordinal or ranked data is used
- ★ Outliers are present and cannot be removed
- ★ Likert scores or subjective rating scale is used
Key Non-parametric Tests & Purpose
| Test | Comparison | Parametric Equivalent | Example Scenario |
|---|---|---|---|
| Mann-Whitney U test | 2 independent groups | Independent t-test | Hb in smokers vs non-smokers |
| Wilcoxon Signed-Rank | 2 paired groups | Paired t-test | Before-after Hb after therapy |
| Kruskal-Wallis test | $\ge 3$ independent groups | One-way ANOVA | BMI across SES groups |
| Friedman Test | $\ge 3$ repeated measures (paired) | Repeated-measures ANOVA | Pain score at $0$, $30$, $60 \text{ min}$ |
| Spearman Rank correlation | Correlation (Ordinal/Non-normal) | Pearson correlation | SES vs academic score |
Statistical Scenario Generator ✨
Stuck on the theory? Type a test name (e.g., `t-test`, `Friedman Test`, or `Nominal data`) and the AI will generate a detailed, plausible research scenario where that test/data type is required.
The Decision Pathway
Follow the path to classify your variable correctly.
Q1: Categories/Labels or Measurable Quantities?
Q2: Can it be ranked?
Q3: Any value or counts?