Building solid data pipelines with PySpark

About the Course

Apache Spark is an essential tool in a data engineer's toolbelt. With it, you can build impressive data transformation pipelines, especially for larger, cloud-native datasets. It can also be used in streaming applications, and for machine learning on large datasets, which other well-known tools, like Pandas, don't lend themselves well to. In this workshop, you won't just learn about the most common operations, but you'll get to apply them on the two most common business scenarios. You'll also learn about structuring your Spark pipelines, improving their performance and reducing the chances of mistakes in them. By the end, you should have a firm knowledge of Apache Spark, and have learned to use it effectively.

Building solid data pipelines with PySpark

About the Course

MAP

©Copyright 2024 Data Minded | All Rights reserved