Descriptive Analytics Homework

Description

Hello,

Don't use plagiarized sources. Get Your Custom Assignment on
Descriptive Analytics Homework
From as Little as $13/Page

Descriptive Analytics

1. Solve the following questions on Google Colab or Databricks using Spark SQL

a. Search the internet for a big dataset of at least 1 GB.

b. Create a DataFrame from the dataset.

c. Using the DataFrame and implement the following aggregation functions.

i. Aggregation with grouping

ii. Aggregation with pivotin

iii. Aggregation with rollups and cubes

d. Spark SQL supports the following window functions. Apply these functions on the DataFrame

i. Ranking functions

1. rank

2. dense_rank

3. percent_rank

4. row_number

5. ntile

ii. Analytic functions

1. cume_dist

2. first_value

3. last_value

4. lag

5. lead

Deliverables:

• One pdf file which contains the following:

o A cover page which includes Student ID, name, HW number, and date

o A description of the big dataset and its source.

o Each SQL statement and a snapshot of its output

o Problemsyoufacedifany.