Processing Data: Integrating Kafka with Apache Spark
Skillsoft issued completion badges are earned based on viewing the percentage required or receiving a passing score when assessment is required. Flexible and Intuitive, DataFrames are a popular data structure in data analytics. In this course, build Spark applications that process data streamed to Kafka topics using DataFrames.
Begin by setting up a simple Spark app that streams in messages from a Kafka topic, processes and transforms them, and publishes them to an output sink. Next, leverage the Spark DataFrame application programming interface by performing selections, projections, and aggregations on data streamed in from Kafka, while also exploring the use of SQL queries for those transformations. Finally, you will perform windowing operations - both tumbling windows, where the windows do not overlap, and sliding windows, where there is some overlapping of data.