Streaming Data Architectures: Processing Streaming Data with Spark
Skillsoft issued completion badges are earned based on viewing the percentage required or receiving a passing score when assessment is required. Spark is an analytics engine built on Hadoop that works with big data, data science and processing batch, and streaming data. In this 11-video course, discover how to develop applications in Spark to work with streaming data and explore different ways to process streams and generate output. Key concepts covered here include installing the latest version of PySpark; configuring a streaming data source using Netcat and writing applications to process the stream; and effects of using the Update mode for output of your stream processing application. Learn how to write an application to listen for new files added to a directory; compare the Append output to the Update mode and distinguish between the two; and develop applications that limit files processed in each trigger and use Spark's Complete mode for output. Next, learners perform aggregation operations on streaming data with the DataFrame API (application programming interface); work with Spark SQL to process streaming data by using SQL queries; and learn ways to use Spark for streaming data and ways to process streams and generate output.