In a normal distribution, data is symmetrically distributed with no skew<\/a>. When plotted on a graph, the data follows a bell shape, with most values clustering around a central region<\/a> and tapering off as they go further away from the center.<\/p>\n
All kinds of variables in natural and social sciences are normally or approximately normally distributed. Height, birth weight, reading ability, job satisfaction, or SAT scores are just a few examples of such variables.<\/p>\n
Because normally distributed variables are so common, many statistical tests<\/a> are designed for normally distributed populations.<\/p>\n
Understanding the properties of normal distributions means you can use inferential statistics<\/a> to compare different groups and make estimates about populations using samples.<\/p>\n
Normal distributions have key characteristics that are easy to spot in graphs:<\/p>\n
<\/p>\n
The mean is the location parameter while the standard deviation is the scale parameter.<\/p>\n
The mean determines where the peak of the curve is centered. Increasing the mean moves the curve right, while decreasing it moves the curve left.<\/p>\n
<\/p>\n
The standard deviation stretches or squeezes the curve. A small standard deviation results in a narrow curve, while a large standard deviation leads to a wide curve.<\/p>\n
<\/p>\n
The empirical rule<\/strong>, or the 68-95-99.7 rule, tells you where most of your values lie in a normal distribution:<\/p>\n
Following the empirical rule:<\/p>\n
<\/figure>\n
The empirical rule is a quick way to get an overview of your data and check for any outliers or extreme values that don\u2019t follow this pattern.<\/p>\n
If data from small samples do not closely follow this pattern, then other distributions like the t-distribution<\/a> may be more appropriate. Once you identify the distribution of your variable, you can apply appropriate statistical tests.<\/p>\n
The central limit theorem<\/a> is the basis for how normal distributions work in statistics.<\/p>\n
In research, to get a good idea of a population<\/a> mean, ideally you\u2019d collect data from multiple random samples<\/a> within the population. A sampling distribution of the mean<\/strong> is the distribution of the means of these different samples.<\/p>\n
The central limit theorem shows the following:<\/p>\n
Parametric statistical tests<\/a> typically assume that samples come from normally distributed populations, but the central limit theorem means that this assumption isn\u2019t necessary to meet when you have a large enough sample.<\/p>\n
You can use parametric tests for large samples from populations with any kind of distribution as long as other important assumptions<\/a> are met. A sample size of 30 or more is generally considered large.<\/p>\n
Once you have the mean and standard deviation of a normal distribution, you can fit a normal curve to your data using a probability density function<\/strong>.<\/p>\n
<\/p>\n
In a probability density function, the area under the curve tells you probability. The normal distribution is a probability distribution<\/strong><\/a>, so the total area under the curve is always 1 or 100%.<\/p>\n
For any value of x<\/em>, you can plug in the mean and standard deviation into the formula to find the probability density of the variable taking on that value of x<\/em>.<\/p>\n\n\n\n\n\n