Description
In this assignment you will generate data visualizations from data sets provided below and formulate hypotheses based on the patterns you identify.
Part 1: Exploratory Data Analysis (approx. 400 words including captions)
(a) Brief introduction of the data you are presenting. Begin with a concise introduction to the dataset you are working with. Provide context about its source, relevance, and any key attributes. (More than one data set can be used; most importantly that the datasets are complementing each other and together supporting the hypothesis you will be developing)
(b) Create a minimum of two data visualizations that adhere to best practices for clarity and effectiveness. It is highly encouraged for high grades to also add a “ready made” third visual from Gap Minder or Our World in Data that would also support your hypothesis.
Ensure that your visualizations are well-labeled, appropriately scaled, and visually appealing.
Your visualizations should complement each other and should help you develop your hypothesis in part 2 of the assignment. Please ensure a strong connection between your visuals and the well thought hypothesis you will develop.
(c) Include figure captions. For each visualization, include descriptive figure captions. These captions should not only highlight essential elements within the figure but also provide a summary of the primary patterns or insights revealed by the visualization.
LO: #visualizations
Part 2: Hypothesis and Justification (approx. 200 words):
(a) Describe your hypothesis. Clearly articulate your hypothesis, which should be based on the patterns and insights derived from your data visualizations. Explain the research question or idea you aim to explore through your hypothesis.
(b) Explain how it builds on the data visualizations you present and any related readings.
Describe how your hypothesis is grounded in the data visualizations you presented in Part 1. Discuss any relevant readings or prior research that influenced the development of your hypothesis. This is highly encouraged and expected for high grades.
(c) Explain how your hypothesis can be tested. Provide specific predictions that can be measured using empirical evidence. Clarify the type of data or experiments required to support or refute your hypothesis.
(d) Explain how the hypothesis is plausible. Discuss why your hypothesis is plausible based on your understanding of the dataset, the context, and any prior knowledge. Highlight any logical reasoning or assumptions that support the validity of your hypothesis.
LO: #hypothesisdevelopment
Datasets:
-** Our World in Data: https://ourworldindata.org/**
Gapminder: Gapminder contains a wealth of data and visualization tools about important global trends. Some climate change related variables you can find in Gapminder include greenhouse emissions, material footprint, types of energy used, and sustainability from a large number of countries, In addition to examining how a variable changes over time for a given country, Gapminder allows for making comparisons among countries and investigating relationships between different variables. You can get information on the source of the data you use by clicking on the question mark next to the variable on the graph.
The World Bank provides free data for the public to use on a variety of topics. Search for climate change data and you can also navigate to specific countries.
Background reading (ONLY for students interested in climate change as a topic for Assignment 2):
*All topics of interest to students are welcome and students are not restricted to climate change as a topic.
NASA. (n.d.)._ What is climate change_? https://climate.nasa.gov/ Why/Use: On this website you can find information on the evidence, causes, effects, and solutions of climate change. This should illuminate some of the data that scientists collect used both to determine past climates and to predict changes.
Packages to create visualizations
You may use the software package of your choice. Some resources below:
CODAP, website link
Python, we suggest [matplotlib](https://www.google.com/url?q=https://matplotlib.org/&sa=D&ust=1507759643862000&usg=AFQjCNGd-OKZqpbZEaxzsNeL3rR1tlXURQ (or Seaborn: https://www.google.com/url?q=https://seaborn.pydata.org/&sa=D&ust=1507759643862000&usg=AFQjCNGTgedrzaNbqCcR9izvlQYq05Rzlw%29)
Excel: Microsoft Office Tutorials. (2015). Create a chart from start to finish. Retrieved July 11, 2015 from
Google sheets
IMPORTANT NOTES for students:
Appendix: You will need to create an Appendix (will not count towards your word) in the same document you will submit in which you take a screenshot of your raw data used for each of your figures. You can go something like Figure 1 Raw data, Figure 2 Raw data etc. For your “ready made” one no raw data are requested. Failure to submit the clear screenshots representative of your data WILL impact your grade.
**Be ready to discuss the details of your submission with your peers and instructor in the classroom
Assignment Information
Length:
600-800 words EXCLUDING References/Title page
Weight:
15%
Learning Outcomes Added
Visualizations: Interpret, analyze, and create data visualizations.
HypothesisDevelopment: Evaluate the link between hypothesis-driven research and the theories or observations that motivate it.
#Visualizations
Interpret, analyze, and create data visualizations.
Typically, one can understand data best by looking at it from multiple different perspectives, which can be facilitated by data visualization techniques—this is because the human brain did not evolve to be able rapidly to scan and make sense of columns of data. Histograms, cumulative histograms, difference histograms, bar graphs, line graphs, scatter plots, and many other types of graphics can provide insights into how to ask and answer questions. Different data visualization techniques have different strengths and weaknesses, and one must consider properties of the data and the questions of interest when deciding how best to use these tools.
Example
Your marketing firm is working with a company that is trying to increase product sales from existing customers. You have access to a range of data and create a few graphs to highlight potential areas of opportunity. You wonder if there are seasonal trends, so you construct a line graph of monthly sales, which show that sales substantially dip during winter months. A line graph is appropriate because of the temporal relationship the time series data have. Next, you wonder if particular demographic groups drive sales, so you create a bar graph to compare those groups. A bar graph is appropriate because the demographic groups membership is a categorical variable. Last, you wonder if income levels may relate to product sales, so you construct a scatter plot with income levels on the x axis and product sales per customer on the y axis. A scatter plot is appropriate because the two variables are continuous variables and you want to evaluate the relationship between them. For each graph, you ensure that you have proper axis labels with units, and a brief but informative caption. The insights the data visualizations provide help guide the marketing strategy.
#HypothesisDevelopment
Evaluate the link between hypothesis-driven research and the theories or observations that motivate it.
Scientific research begins with observations, which then must be organized to suggest underlying patterns of regularity. Such patterns in turn suggest hypotheses about the nature of the factors that may give rise to these patterns in the data. As science progresses, scientists formulate and use theories to develop further hypotheses and design additional studies. A well-formed hypothesis requires one to appreciate the links between data, theories, and models. Hypothesis-driven research is essential to the iterative cycles of data collection and theorizing that forms the core of the scientific method.
Example
You read a story about Blue Moon butterflies in the Samoan Islands. The butterflies were attacked by a parasite that destroyed only male embryos and resulted in males being only 1% of the population. But after 10 generations (about 1 year) males are now 40% of the population. Based on these facts, you come up with two competing explanations: (a) an extinction event could have led to the sudden disappearance of the parasite, which would explain the sudden resurgence of male butterflies, or (b) male butterflies developed a resistance against the parasites, which might also explain such resurgence. Hypothesis (b) also stems from your knowledge of genetics and the plausibility of developing resistance to particularly harmful species through genetic mutations coupled with natural selective pressures. You discuss predictions that stem from these two explanations, specifically that in case (a) most parasites would be gone, and in case (b) that the parasites might still be present in butterflies that developed resistances. Finally, you make the connection between study design, testing the mentioned predictions, and deciding which of your hypothesis is corroborated. (And in fact, researchers found that the parasite was still present and that currently, males all carried the mutation that allowed them to survive the parasite’s attack.).