Ingesting and Transforming Data Using Azure Data Factory
Description
Azure Data Factory is a cloud-based data integration service that allows you to create, schedule, and manage data pipelines for ingesting, preparing, transforming, and publishing data. Data cleansing and preparation are essential steps in the data processing workflow, ensuring that data is accurate, reliable, and ready for analysis.
Organizations often deal with large volumes of data from various sources, which can be messy, inconsistent, and contain errors. Data cleansing involves identifying and correcting inaccuracies, inconsistencies, and missing values in datasets. By standardizing data fields, removing duplicates, handling missing data, and splitting datasets, you can improve data quality and ensure that your data is ready for analysis and reporting.
In this hands-on lab, you will learn how to standardize and cleanse data fields in Azure Data Factory.
Learning objectives
Upon completion of this intermediate-level lab, you will be able to:
- Standardize and cleanse data fields in Azure Data Factory.
- Identify and remove duplicate records in datasets.
- Handle missing data by filling in default values or removing incomplete records.
- Split data into multiple streams based on specified criteria.
Intended audience
- Candidates for Microsoft Certified: Azure Data Engineer Associate
- Cloud Architects
- Data Engineers
- DevOps Engineers
- Machine Learning Engineers
- Software Engineers
Prerequisites
Familiarity with the following will be beneficial but is not required:
- Basic understanding of data processing concepts
- Introduction to Azure services
- Basic knowledge of data storage solutions