Alaattin Isilak
Skillsoft issued completion badges are earned based on viewing the percentage required or receiving a passing score when assessment is required. Apache Spark, an open-source cluster-computing framework used for data science, has become the de facto big data framework. In this Skillsoft Aspire course, explore the basics of Apache Spark, an analytics engine for working with big data built on top of Hadoop. Discover how it allows operations on data with both its own library methods and with SQL, while delivering great performance. Key concepts covered here include how Spark fits in with Hadoop; Spark RDDs, their characteristics, and how to distinguish between RDDs and DataFrames; and the components of Spark and the functions of the Spark Session, Master, and Worker nodes. Then observe how to install PySpark and initialize a Spark Context; how to initialize a Spark DataFrame from the contents of an RDD; and the contents of a DataFrame by using the SQLContext. Next, you will learn how to apply the map() function on an RDD to configure a DataFrame; how to retrieve required data from DataFrame and how to apply transformations; and how to convert Spark DataFrames to Pandas DataFrames and vice versa.
Issued on
August 19, 2020
Expires on
Does not expire