discussion

Description

Overview

Don't use plagiarized sources. Get Your Custom Assignment on
discussion
From as Little as $13/Page

A histogram is a graphical summary that can be created to help us visualize the shape and distribution of our data set. Histograms deal with numeric data sets, in which values are binned, and plotted like a bar graph with consecutive bars touching.

We can control the number of bins, which is important for us in visualization. Most histograms fare best when there are somewhere between 5 and 20 bins of data, and really depends on how many data points we have in the set, and how spread out they are. Sometimes, the bin breakout comes naturally (Maybe binning exam scores by letter grade), but other times, there is no clear-cut methodology for it. When we run into these situations, we can play around with the number of bins to try and get the best visualization of our data that we can.

Once we create our histogram, we can describe the shape as follows:

Symmetric vs Skew

If a distribution is symmetric or approximately symmetric, the mean and median of the data set will be approximately equal to one another. If we were to draw a line at the median point, it will look roughly the same on both sides of the line.

If a distribution is skew, there will appear to be a “tail” associated with the data set. A data set that is skew left (or negatively skewed) will have the data points tail off on the left side of the histogram. The mean will be smaller than the median in a skew left distribution. A data set that is skew right (or positively skewed) will have the data points tail off on the right side of the histogram. The mean will be larger than the median in a skew right distribution.

Unimodal vs Bimodal

A distribution is considered unimodal if it has one distinct “peak” about it. This is a general peak and does not refer to every individual rise and fall of bars.

A distribution is considered to be bimodal if it has two distinct peaks about it. We often see bimodal distributions when we either combine two populations together, or the population of interest breaks itself up into two naturally occurring subgroups.

If we happen to have a nominal data set, say a collection of everyone’s favorite color, our choice of graphical summary will change. We can still create a frequency table of our data, but instead of having numeric classes with widths, we will just use the singular outcome. So, we may have a frequency table that looks like this:

Color Frequency
Green 27
Red 42
Yellow 21
Blue 38
Orange 12
Pink 14

From this sort of data set, we can either create a pie chart, which represents the relative frequencies of each category as a slice of a “pie”, or we can create a bar graph, with each category having its own slot and the frequency plotted on the y-axis. This bar graph differs from a histogram as the consecutive bars will not touch, as we are not trying to create a continuum of values.

Instructions

For this discussion post, we are going to create a histogram for the following data set:

7, 4, 15, 17, 23, 27, 21, 28, 31, 35, 39, 32, 37, 33, 39, 49, 47, 43, 44, 44, 44, 45, 47, 51, 54, 53, 59, 57, 53, 62, 69, 69, 67, 64

Use this link to create your histogram: https://www.socscistatistics.com/descriptive/histograms/Links to an external site.. Take the data set above, and copy/paste it into the empty box, then click generate.

Discussion Prompts

Answer the following questions in your initial post:

When you enter the data, the applet will create a histogram using what it thinks is the proper number of classes. Post this histogram and describe the distribution. (Symmetric, skew left, skew right? Is it unimodal or bimodal?)
After you create this base histogram, you can use the “Edit tool” and play around with the number of classes, and even create your own class range. After playing around with this, which number of classes do you think is the best to show our data set? Why did you choose this value? Post this new histogram to support your reasoning.