Description
Hello,
Descriptive Analytics
1. Solve the following questions on Google Colab or Databricks using Spark SQL
a. Search the internet for a big dataset of at least 1 GB.
b. Create a DataFrame from the dataset.
c. Using the DataFrame and implement the following aggregation functions.
i. Aggregation with grouping
ii. Aggregation with pivotin
iii. Aggregation with rollups and cubes
d. Spark SQL supports the following window functions. Apply these functions on the DataFrame
i. Ranking functions
1. rank
2. dense_rank
3. percent_rank
4. row_number
5. ntile
ii. Analytic functions
1. cume_dist
2. first_value
3. last_value
4. lag
5. lead
Deliverables:
• One pdf file which contains the following:
o A cover page which includes Student ID, name, HW number, and date
o A description of the big dataset and its source.
o Each SQL statement and a snapshot of its output
o Problemsyoufacedifany.