hands-on lab

Using Azure Databricks to Import and Analyze Data

Difficulty: Beginner
Duration: Up to 1 hour
Students: 4,217
Rating: 4.3/5
Get guided in a real environmentPractice with a step-by-step scenario in a real, provisioned environment.
Learn and validateUse validations to check your solutions every step of the way.
See resultsTrack your knowledge and monitor your progress.

Description

Azure Databricks is an analytics platform powered by Apache Spark. Spark is a unified analytics engine capable of working with virtually every major database, data caching service, and data warehouse provider. In addition to it working with most providers, companies use Spark because it uses in-memory computing among other optimizations to offer very fast analytics. Azure Databricks enables companies to integrate their data analytics solutions into their existing Azure infrastructure. In this lab, you'll load data into Azure Data Lake Store Gen2 and use Databricks to interact with that data through a Databricks workspace and cluster that you'll configure.

Learning Objectives

Upon completion of this lab you will be able to:

  • Load data into Azure Data Lake Storage Gen2
  • Create and manage a Databricks workspace
  • Create and manage a Databricks cluster
  • Mount data into a Databricks workspace from Azure Data Lake Store
  • Interact with data using Databricks

Intended Audience

This lab is intended for:

  • Azure administrators
  • Cloud engineers and solutions architects
  • Data engineers
  • Anyone with a need to visualize and analyze data in Azure

Prerequisites

You should be familiar with:

Updates

June 18, 2024 - Updated screenshots and instructions to reflect the latest UI

March 1, 2024 - Migrated to Azure Data Lake Storage Gen2

December 5, 2023 - Updated screenshots and instructions to reflect the latest UI

March 7, 2023 - Updated screenshots and instructions to reflect the latest UI

May 9, 2022 - Updated screenshots and instructions for clarity

Nov 3, 2021 - Updated instruction to resolve the login issue with Azure Databricks

October 23, 2021 - Provide a workaround for an Azure Active Directory issue that initially prevents logging in to Databricks

September 7, 2021 - Updated instructions and images to reflect the latest portal experience

June 15, 2021 - Updated the instruction to reflect the latest portal experience

June 22, 2020 - Clarified the format of the Azure Data Lake Storage URL and included a screenshot to avoid confusion 

Environment before

Environment after

Covered topics

Lab steps

Logging in to the Microsoft Azure Portal
Adding Customer Data to Azure Data Lake Store Gen2
Creating an Azure Databricks Workspace
Creating a Spark Cluster and Python Notebook in Azure Databricks
Importing Azure Data Lake Storage Data into Databricks
Interacting with Data in Azure Databricks