Description
`state.name` and `state.region` are R built-in vectors that give the full names of the 50 US states and their
corresponding geographic regions (Northeast, South, North Central, West), respectively. Here is corresponding tibble, names as `state.info`.
“`{r}
library(tibble)
state.info <- tibble(state = state.name, state.region) state.info ```1. From the data `us_contagious_diseases` of package `dslabs`, ignoring the variable `weeks_reporting`, compute the yearly incidence rate of each disease for each of the 4 geographic regions (Northeast, South, North Central, West). Store the result into a new data frame, named as `region_incidence`, with columns `disease`, `region`, `year`, and `incidence_per_millon` (i.e., the yearly incidence rate times one million) (Hint: 1e6*sum(count)/sum(population)). Provide the output of `head(region_incidence)` and `dim(region_incidence)`. Note that you need to drop the missing values (NA) of `us_contagious_diseases` after deleting the variable `weeks_reporting`. 2. In the data frame `region_incidence`, choose an appropriate str_ function to add the suffix "_US" to the values in the column region. Make sure that the resulting column region is a factor with levels in the order: `North_Central_US, Northeast_US, South_US, West_US`. Provide the output of `head(region_incidence)` and `levels(region_incidence$region)`. 3. From `region_incidence`, use ggplot to provide a line graph that shows the trend of the yearly incidence of disease Hepatitis A for all the 4 geographic regions. 4. Pivot the data frame `region_incidence` into a new one that shows the `incidence_per_millon` values for all diseases of the same year at the same row. Use the name `US_incidence` for the new data frame. Then provide the output of `head(US_incidence)` and `dim(US_incidence)`. 5. Carefully read the documentation of the function cor() of package stats. According to the new data frame US_incidence from Question 2, compute the Pearson’s correlation between Hepatitis A and Measles in terms of incidence_per_million. Due to missing values, you need to choose an appropriate value for the use argument in cor() 6. According to the new data frame `US_incidence` from Question 2, use one of the map functions to compute the Pearson’s correlation between Hepatitis A and each of the other 6 diseases in terms of `incidence_per_millon`, and return a double vector as the output. 7. According to the new data frame US_incidence from Question 2, for each of the 7 diseases, find its most positively (Pearson’s) correlated disease (except itself) and corresponding correlation. Simplify your code by loops or the map functions.