WEI JUN TAN
Skillsoft issued completion badges are earned based on viewing the percentage required or receiving a passing score when assessment is required. An open-source cluster-computing framework used for data science, Apache Spark has become the de facto big data framework. In this Skillsoft Aspire course, learners explore how to analyze real data sets by using DataFrame API methods. Discover how to optimize operations with shared variables and combine data from multiple DataFrames using joins. Key concepts covered in this course include features that make Spark 2.x versions significantly faster than Spark 1.x; how to create a Spark DataFrame from contents of a CSV file and apply some simple transformations on the DataFrame; and how to apply grouping and aggregation operations on a DataFrame to analyze categories of data in a data set. Then use Matplotlib to visualize the contents of a Spark DataFrame; learn about broadcast variables and how to perform a join operation with a DataFrame; and study contents of a DataFrame in a text file for archiving or sharing. Finally, learn how to perform different join operations on Spark DataFrames to combine data from multiple sources, and how to analyze data with DataFrame API.
Issued on
July 8, 2020
Expires on
Does not expire