Building solid data pipelines with PySpark
build data pipelines
Duration (fully-guided training)
Flipped-classroom training duration:
of videos and
of interactive workshop.
About the Course
Apache Spark is an essential tool in a data engineer's toolbelt. With it, you can build impressive data transformation pipelines, especially for larger, cloud-native datasets. It can also be used in streaming applications, and for machine learning on large datasets, which other well-known tools, like Pandas, don't lend themselves well to. In this workshop, you won't just learn about the most common operations, but you'll get to apply them on the two most common business scenarios. You'll also learn about structuring your Spark pipelines, improving their performance and reducing the chances of mistakes in them. By the end, you should have a firm knowledge of Apache Spark, and have learned to use it effectively.