Processing Data: Introducing Apache Spark

1d3a7b12-Fe57-4d65-B3c3-E9009c8b0d68

Skillsoft issued completion badges are earned based on viewing the percentage required or receiving a passing score when assessment is required. Apache Spark is a powerful distributed data processing engine that can handle petabytes of data by chunking that data and dividing across a cluster of resources. In this course, explore Spark’s structured streaming engine, including components like PySpark shell. Begin by downloading and installing Apache Spark. Then create a Spark cluster and run a job from the PySpark shell. Monitor an application and job runs from the Spark web user interface. Then, set up a streaming environment, reading and manipulating the contents of files that are added to a folder in real-time. Finally, run apps on both Spark standalone and local modes.

Issued on

November 15, 2024

Expires on

Does not expire