Better data engineering with PySpark

Duration

1 day workshop

This course is part of our winter school. 

About the Course

Apache Spark is an essential tool in a data engineer's toolbelt. With it, you can build impressive data transformation pipelines, especially for larger, cloud-native datasets. It can also be used in streaming applications, and for machine learning on large datasets, which other well-known tools, like Pandas, don't lend themselves well to. In this workshop, you won't just learn about the most common operations, but you'll get to apply them on the two most common business scenarios. You'll also learn about structuring your Spark pipelines, improving their performance and reducing the chances of mistakes in them. By the end, you should have a firm knowledge of Apache Spark, and have learned to use it effectively.

Your Instructor

Oliver Willekens

Oliver Willekens