Azure Databricks is an analytics platform powered by Apache Spark. Spark is a unified analytics engine capable of working with virtually every major database, data caching service, and data warehouse provider. In addition to Apache Spark working with most providers, companies use Spark because it uses in-memory computing among other optimizations to offer very fast analytics. Azure Databricks enables companies to integrate their data analytics solutions into their existing Azure infrastructure. In this lab, you'll load data into Azure Data Lake Store Gen2 and use Databricks to interact with that data through a Databricks workspace and cluster that you'll configure.
Upon completion of this lab, you will be able to:
This lab is intended for:
You should be familiar with:
May 29th, 2025 - Updated instructions and screenshots to reflect the latest UI
June 18, 2024 - Updated screenshots and instructions to reflect the latest UI
March 1, 2024 - Migrated to Azure Data Lake Storage Gen2
December 5, 2023 - Updated screenshots and instructions to reflect the latest UI
March 7, 2023 - Updated screenshots and instructions to reflect the latest UI
May 9, 2022 - Updated screenshots and instructions for clarity
Nov 3, 2021 - Updated instruction to resolve the login issue with Azure Databricks
October 23, 2021 - Provide a workaround for an Azure Active Directory issue that initially prevents logging in to Databricks
September 7, 2021 - Updated instructions and images to reflect the latest portal experience
June 15, 2021 - Updated the instruction to reflect the latest portal experience
June 22, 2020 - Clarified the format of the Azure Data Lake Storage URL and included a screenshot to avoid confusion