The standard deviation is used to show how the values are spread above and below the mean. A low standard deviation means that the values are closely grouped around the mean whereas a high standard deviation means that the values are widely spread. We can use the standard deviation to decide weather the differences between two means is significant.
A low standard deviation indicates that the data points tend to be very close to the mean; a high standard deviation indicates that the data points are spread out over a large range of values.
A useful property of standard deviation is that, unlike variance, it is expressed in the same units as the data. In statistics, the standard deviation is the most common measure of statistical dispersion. However, in addition to expressing the variability of a population, standard deviation is commonly used to measure confidence in statistical conclusions. For example, the margin of error in polling data is determined by calculating the expected standard deviation in the results if the same poll were to be conducted multiple times.
To calculate the population standard deviation, first compute the difference of each data point from the mean, and square the result of each:. This quantity is the population standard deviation, and is equal to the square root of the variance. The formula is valid only if the eight values we began with form the complete population.
In cases where the standard deviation of an entire population cannot be found, it is estimated by examining a random sample taken from the population and computing a statistic of the sample. Unlike the estimation of the population mean, for which the sample mean is a simple estimator with many desirable properties unbiased, efficient, maximum likelihood , there is no single estimator for the standard deviation with all these properties.
Therefore, unbiased estimation of standard deviation is a very technically involved problem. However, other estimators are better in other respects:. The mean and the standard deviation of a set of data are usually reported together. This is because the standard deviation from the mean is smaller than from any other point. Variability can also be measured by the coefficient of variation, which is the ratio of the standard deviation to the mean. Often, we want some information about the precision of the mean we obtained.
We can obtain this by determining the standard deviation of the sampled mean, which is the standard deviation divided by the square root of the total amount of numbers in a data set:. Standard Deviation Diagram : Dark blue is one standard deviation on either side of the mean.
For the normal distribution, this accounts for The practical value of understanding the standard deviation of a set of values is in appreciating how much variation there is from the mean. A large standard deviation, which is the square root of the variance, indicates that the data points are far from the mean, and a small standard deviation indicates that they are clustered closely around the mean.
Their standard deviations are 7, 5, and 1, respectively. The third population has a much smaller standard deviation than the other two because its values are all close to 7. Standard deviation may serve as a measure of uncertainty. In physical science, for example, the reported standard deviation of a group of repeated measurements gives the precision of those measurements. When deciding whether measurements agree with a theoretical prediction, the standard deviation of those measurements is of crucial importance.
If the mean of the measurements is too far away from the prediction with the distance measured in standard deviations , then the theory being tested probably needs to be revised. This makes sense since they fall outside the range of values that could reasonably be expected to occur, if the prediction were correct and the standard deviation appropriately quantified.
The practical value of understanding the standard deviation of a set of values is in appreciating how much variation there is from the average mean. As a simple example, consider the average daily maximum temperatures for two cities, one inland and one on the coast.
It is helpful to understand that the range of daily maximum temperatures for cities near the coast is smaller than for cities inland. Thus, while these two cities may each have the same average maximum temperature, the standard deviation of the daily maximum temperature for the coastal city will be less than that of the inland city as, on any particular day, the actual maximum temperature is more likely to be farther from the average maximum temperature for the inland city than for the coastal one.
Another way of seeing it is to consider sports teams. In any set of categories, there will be teams that rate highly at some things and poorly at others.
Chances are, the teams that lead in the standings will not show such disparity but will perform well in most categories. The lower the standard deviation of their ratings in each category, the more balanced and consistent they will tend to be. Teams with a higher standard deviation, however, will be more unpredictable.
Comparison of Standard Deviations : Example of two samples with the same mean and different standard deviations. The red sample has a mean of and a SD of 10; the blue sample has a mean of and a SD of Each sample has 1, values drawn at random from a Gaussian distribution with the specified parameters.
For advanced calculating and graphing, it is often very helpful for students and statisticians to have access to statistical calculators. Two of the most common calculators in use are the TI series and the R statistical software environment.
The TI series of graphing calculators, shown in, is manufactured by Texas Instruments. Released in , it was one of the most popular graphing calculators for students. TI : The TI series of graphing calculators is one of the most popular calculators for statistics students.
R logo shown in is a free software programming language and a software environment for statistical computing and graphics. The R language is widely used among statisticians and data miners for developing statistical software and data analysis.
R is an implementation of the S programming language, which was created by John Chambers while he was at Bell Labs. R provides a wide variety of statistical and graphical techniques, including linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, and clustering. Another strength of R is static graphics, which can produce publication-quality graphs, including mathematical symbols.
Dynamic and interactive graphics are available through additional packages. R is easily extensible through functions and extensions, and the R community is noted for its active contributions in terms of packages.
Due to its S heritage, R has stronger object-oriented programming facilities than most statistical computing languages. The number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary. Consider this example: To compute the variance, first sum the square deviations from the mean. The mean is a parameter, a characteristic of the variable under examination as a whole, and a part of describing the overall distribution of values.
Knowing all the parameters, you can accurately describe the data. The more known fixed parameters you know, the fewer samples fit this model of the data.
If you know only the mean, there will be many possible sets of data that are consistent with this model. However, if you know the mean and the standard deviation, fewer possible sets of data fit this model.
In computing the variance, first calculate the mean, then you can vary any of the scores in the data except one. This one score left unexamined can always be calculated accurately from the rest of the data and the mean itself. This is one way of measuring the dispersion of a given set of data.
What do I mean by dispersion? Well, if we use a sample child weight dataset shown below , the data ranges from 13 to , with a mean of It is a pretty wide spread of data. Well, what if the data had the exact same mean, but instead ranged from 45 to 62? You would determine that this would be a much tighter range, right? There would not be as much spread or dispersion from the mean. It is important to have a good understanding of the dispersion of your data so it can be properly compared to other data.
This tells us that there is more variation in weight for the women's results than the men's. Standard deviation Standard deviation is an important measure of spread or dispersion. Data set 1 1 7 12 15 20 22 28 Data set 2 1 15 15 15 15 16 28 Most of the results in data set 2 are close to the mean, whereas the results in data set 1 are further from the mean in comparison.
National 5 Subjects National 5 Subjects up.
0コメント