1. Google Data Engineering Learning Path (Google)
Data Engineers design solutions that ensure maximum flexibility and scalability, while meeting all required security controls. Learn data engineering on the Google Cloud Platform.
Google Cloud Platform Data Engineer
2. Data Wrangling with MongoDB (Udacity)
In this course, we will explore how to wrangle data from diverse sources and shape it to enable data-driven applications.
3. Intro to Hadoop and MapReduce (Udacity)
The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. Learn the fundamental principles behind it, and how you can use its power to make sense of your Big Data.
4. Spark for Big Data and Machine Learning (Udacity)
In this course, youโll learn how to use Spark to work with big data and build machine learning models at scale, including how to wrangle and model massive datasets with PySpark, the Python library for interacting with Spark.
Learn Spark for Big Data and ML
5. Introduction to Big Data Systems (Coursera)
Interested in increasing your knowledge of the Big Data landscape? This course is for those new to data science and interested in understanding why the Big Data Era has come to be. It is for those who want to become conversant with the terminology and the core concepts behind big data problems, applications, and systems.
6. Data Engineering Basics for Everyone (edX)
Learn about data engineering concepts, ecosystem, and lifecycle. Also learn about the systems, processes, and tools you need as a Data Engineer in order to gather, transform, load, process, query, and manage data so that it can be leveraged by data consumers for operations, and decision-making.
7. AI Skills for Engineers: Data Engineering and Data Pipelines (edX)
Good data is central to effective AI applications. This course teaches the basics of data for AI, covering what data is needed, how to extract data from existing databases and basic data skills including setup of a Python notebook environment, basic data exploration and simple data visualizations.
8. Apache Spark for Data Engineering and Machine Learning (edX)
This short course introduces you to the fundamentals of Data Engineering and Machine Learning with Apache Spark, including Spark Structured Streaming, ETL for Machine Learning (ML) Pipelines, and Spark ML. By the end of the course, you will have hands-on experience applying Spark skills to ETL and ML workflows.
Apache Spark for Data Engineering and ML
9. Modernizing Data Lakes and Data Warehouses (Google)
The two key components of any data pipeline are data lakes and warehouses. This course highlights use-cases for each type of storage and dives into the available data lake and warehouse solutions on Google Cloud in technical detail.
Modernizing Data Lakes and Data Warehouses
10. Azure for the Data Engineer (Microsoft)
Explore how the world of data has evolved and how the advent of cloud technologies is providing new opportunities for business to explore. You will learn the various data platform technologies that are available, and how a Data Engineer can take advantage of this technology to an organization benefit.
11. Getting Started with Data Engineering on Azure (Microsoft)
If you have large volumes of data stored as files in a data lake, you’ll need a convenient way to explore and analyze the data they contain. Azure Synapse Analytics enables you to apply the SQL skills you use in a relational database to files in a data lake.
Get Started with Data Engineering on Azure
12. Build data analytics solutions using Azure Synapse serverless SQL pools (Microsoft)
If you have large volumes of data stored as files in a data lake, you’ll need a convenient way to explore and analyze the data they contain. Azure Synapse Analytics enables you to apply the SQL skills you use in a relational database to files in a data lake.
Build Data Solutions with Azure Synapse Analytics
13. Perform data engineering with Azure Synapse Apache Spark Pools (Microsoft)
Apache Spark is a highly scalable distributed processing solution for big data analytics and transformation. You can leverage its power in Azure Synapse Analytics by using Spark pools.
Azure Data Engineering using Spark Pools
14. Work with Data Warehouses using Azure Synapse Analytics (Microsoft)
Relational data warehouses are at the heart of many business intelligence and enterprise analytics solutions. You can use Azure Synapse Analytics to implement highly scalable data warehouses in the cloud.
Build Data Warehouses with Azure Synapse Analytics
15. Transfer and transform data with Azure Synapse Analytics pipelines (Microsoft)
Azure Synapse Analytics enables data integration through the use of pipelines, which you can use to automate and orchestrate data transfer and transformation activities.
Transform data with Azure Synapse Analytics Pipelines
16. Data engineering with Azure Databricks (Microsoft)
Learn how to harness the power of Apache Spark and powerful clusters running on the Azure Databricks platform to run large data engineering workloads in the cloud.
Data Engineering with Azure Databricks
17. Hybrid Transactional and Analytical Processing Solutions using Azure Synapse Analytics (Microsoft)
Hybrid Transactional and Analytical Processing (HTAP) is a technique for near real time analytics without a complex ETL solution. In Azure Synapse Analytics, HTAP is supported through Azure Synapse Link.
Hybrid Processing Solutions using Azure Synapse Analytics
18. Implement a Data Streaming Solution with Azure Stream Analytics (Microsoft)
Stream processing enables you to capture and analyze data in real-time. Azure Stream Analytics is a cloud-based stream processing engine that you can use to build highly scalable real-time analytics solutions.
Implement a Streaming Solution with Azure Stream Analytics
19. Introduction to end-to-end analytics using Microsoft Fabric (Microsoft)
Discover how Microsoft Fabric can meet your enterprise’s analytics needs in one platform. Learn about Microsoft Fabric, how it works, and identify how you can use it for your analytics needs.
Introduction to Microsoft Fabric
20. Get started with Lakehouses in Microsoft Fabric (Microsoft)
Lakehouses merge data lake storage flexibility with data warehouse analytics. Microsoft Fabric offers a lakehouse solution for comprehensive analytics on a single SaaS platform.
Getting Started with Microsoft Fabric
21. Get started with Microsoft Fabric (Microsoft)
Explore the capabilities of Microsoft Fabric. In-depth course.
Getting Started with Microsoft Fabric
22. Use Apache Spark in Microsoft Fabric (Microsoft)
Apache Spark is a core technology for large-scale data analytics. Microsoft Fabric provides support for Spark clusters, enabling you to analyze and process data in a Lakehouse at scale.
Use Apache Spark in Microsoft Fabric
23. Work with Delta Lake tables in Microsoft Fabric (Microsoft)
Tables in a Microsoft Fabric lakehouse are based on the Delta Lake storage format commonly used in Apache Spark. By using the enhanced capabilities of delta tables, you can create advanced analytics solutions.
Work with Delta Lake Tables in Microsoft Fabric
24. Use Data Factory pipelines in Microsoft Fabric (Microsoft)
Microsoft Fabric includes Data Factory capabilities, including the ability to create pipelines that orchestrate data ingestion and transformation tasks.
Use Data Factory in Microsoft Fabric
25. Ingest Data with Dataflows Gen2 in Microsoft Fabric (Microsoft)
Data ingestion is crucial in analytics. Microsoft Fabric’s Data Factory offers Dataflows (Gen2) for visually creating multi-step data ingestion and transformation using Power Query Online.
Ingest Data in Microsoft Fabric
26. Administer Microsoft Fabric (Microsoft)
Microsoft Fabric is a SaaS solution for end-to-end data analytics. As an administrator, you can configure features and manage access to suit your organization’s needs.
Learn to administer Microsoft Fabric
27. Get started with data warehouses in Microsoft Fabric (Microsoft)
Data warehouses are analytical stores built on a relational schema to support SQL queries. Microsoft Fabric enables you to create a relational data warehouse in your workspace and integrate it easily with other elements of your end-to-end analytics solution.
Build Data Warehouses in Microsoft Fabric
28. Get started with Real-Time Analytics in Microsoft Fabric (Microsoft)
Analysis of real-time data streams is a critical capability for any modern data analytics solution. You can use the Real-Time Analytics capabilities of Microsoft Fabric to ingest, query, and process streams of data.
Real-Time Analytics in Microsoft Fabric
29. Get started with data science in Microsoft Fabric (Microsoft)
In Microsoft Fabric, data scientists can manage data, notebooks, experiments, and models while easily accessing data from across the organization and collaborating with their fellow data professionals.
Data Science in Microsoft Fabric
30. Introduction to Azure Service Fabric (Microsoft)
Determine the types of business problems that can be solved using Azure Service Fabric. Describe Service Fabric’s features, such as Azure service integration, stateless and stateful service support, and automatic scaling.
31. Modern Distributed Systems (edX)
Distributed systems are the backbone of modern society but entail challenges in areas such as complexity and energy-use. Discover distributed systems from first principles, understand the architectures and techniques derived from them and explore examples of current practical use.