Central tendency and variation

Description

Overview

Don't use plagiarized sources. Get Your Custom Assignment on
Central tendency and variation
From as Little as $13/Page

Descriptive statistics are used to help identify the center and spread of our data sets. By understanding these values, we can have a better understanding of our data, and even compare what we see to other data sets.

Measures of Center: We use three main statistics when trying to represent the center of our data set. The first one is the mean or average, the second is the median or the 50th percentile, and the last is the mode or the number that occurs most frequently.

The mean is computed by taking all values in the data set, adding them together, and then dividing by the sample size (or number of data points we have). The formula we use is as follows: ˉx=Σxin.

The median is computed by taking all of the values in a data set, ordering them from smallest to largest, and then finding the value that has an equal number of data points above and below it. We can compute the median of a data set by using the following method: The value 12(n+1) tells us where in the data set the median will fall. Then we figure out which value corresponds to this point in the data set. For example: 1,3,5,6,9,15,18. We have 7 data points, making our n-value 7. So, we can figure out which point in the data set will be our median by using the formula and substituting n=7 in. So, we get 4. This means our median will fall at the fourth data point in the set. When we look at our data set, we see that the value of 6 falls in this position, therefore our median value will be 6. If our data set has an even number of data points, our calculation of 12(n+1) will yield a decimal value. For example, if n = 8, then 12(n+1). This means the median falls at the 4.5th data point. This does not make much sense to us, as we do not have half points in our data sets. When this is the case, we take the two integers surrounding the value we get (in this case 4 and 5) and we average the data points that fall in the 4th and 5thpositions of our set

The mode is computed by finding out which value in the data set occurs most often. The numeric value with the highest frequency will be the mode. For example, the set 1,2,3,3,4,5 has a mode of 3, as the number 3 shows up more often than any other number. A data set may have multiple modes. This will occur when more than one number in our data set have the same frequency, while also being larger than all other frequencies. For example, the set 1,1,1,2,3,4,5,5,5 has two modes, 1 and 5. Both of the numbers 1 and 5 show up three times, which is larger than the frequency of any other value. A data set may have no modes. This occurs when all data points in the set have a frequency of 1, meaning they all show up exactly once.

Measures of Spread: When we compute measures of spread, the most common measures we use are range, variance/standard deviation, and inter-quartile range (IQR).

The range of a data set is computed by finding the difference in the largest and smallest data points in a data set. The formula we use to find the range is as follows: Range=Max−Min. The range gives us a measure of how spread out the extreme points in our dataset are.

The variance of a data set is found by taking each individual data point in the set, subtracting it from the mean, squaring it, summing them up, and dividing them by either n (for a population) or n-1 (for a sample). The following is the formula for the population variance: σ2=∑ni=1(xi−μ)2n. We use this equation when we have collected a census, or the entirety of the population of interest. If we have only collected a sample, or a subset of the population, we use the following formula to find the variance: s2=∑ni=1(xi−ˉx)2n−1. To compute the standard deviation of either a population or sample variance, we simply just take the square root of the variance. Here are the formulas used:

Population Standard Deviation:σ=√σ2=√∑ni=1(xi−μ)2n

Sample Standard Deviation: s=√s2=√∑ni=1(xi−ˉx)2n−1

Standard deviation and variance give us a measure of how spread out our dataset is with respect to the mean.

The IQR is found by finding the range between the third quartile (75th percentile) and the first quartile (25thpercentile). To compute the IQR, we use the following formula: IQR=Q3−Q1. The IQR gives us a measure of how spread out the middle 50% of our dataset is.

Instructions

A new migraine medication is in development, and the developers are curious to see the impact it has on Men v Women. The medication was given to 13 Men and 13 Women, and the participants were asked to identify how long it took for them (in minutes) to feel initial relief from their migraine. The results are provided below:

Men – 3,8,15,32,21,45,78,73,72,85,23,35,45

Women – 3,9,2,15,27,32,35,54,45,37,24,27,31

For each group individually, find the five number summary, as well as the mean and sample standard deviation. Using the boxplot calculator (link below), produce side-by-side boxplots of Men v Women.

Use the following calculator to produce side-by-side boxplots: https://goodcalculators.com/box-plot-maker/You can copy/paste the values above into the correct cells in the calculator. Do not add any spaces.

Discussion Prompts

Answer the following questions in your initial post:

Report your five-number summary, mean, and sample standard deviation. Do not forget, we are looking at a sample set here, so we need to use the sample standard deviation.
Describe how the summary statistics for the group of Men compare to the summary statistics for the group of Women
Post your created side-by-side boxplots. From the visual, what can we say on an objective level about the two groups?