Big Data Training Library
Learn to architect for scale, get hands-on with the leading big data tools, and reveal meaningful insights from data using services on Amazon Web Services, Microsoft Azure, and Google Cloud Platform. Content added and updated weekly.
Explore all library
- HANDS-ON LABAndrew BurchillAmazon Simple Storage Service (Amazon S3) PlaygroundBeginnerDuration: Up to 4 hoursAuthor: Andrew Burchill; Difficulty: Beginner; Description: Explore the Amazon Simple Storage Service (Amazon S3) in this hands-on playground lab.; Duration: Up to 4 hours; Content Topics: Storage; This hands-on lab has: 2 Lab steps
- LAB CHALLENGEAndrew BurchillAWS Data Analytics Processing ChallengeBeginnerDuration: Up to 1 hour and 15 minutesAuthor: Andrew Burchill; Difficulty: Beginner; Description: Put your skills to the test in this data analytics lab. Complete a data analytics processing solution before time runs out. You will need to be familiar with Amazon's Data Analytics services and associated tools in order to complete a partially built solution for processing log data from an EC2 instance using Amazon Kinesis, AWS Lambda, and Amazon S3.; Duration: Up to 1 hour and 15 minutes; Content Topics: Serverless, App Streaming; This lab challenge has: 2 Lab steps
- HANDS-ON LABAndrew BurchillImplementing a Searchable Amazon S3 Data LakeBeginnerDuration: Up to 1 hour and 30 minutesAuthor: Andrew Burchill; Difficulty: Beginner; Description: Learn how to use AWS Glue with AWS Lambda and Amazon S3 to create an efficient searchable data lake in this hands-on lab.; Duration: Up to 1 hour and 30 minutes; Content Topics: Messaging, Serverless; This hands-on lab has: 5 Lab steps
- HANDS-ON LABStefano CascavillaDeploy a MongoDB Solution With Amazon DocumentDBBeginnerDuration: Up to 1 hourAuthor: Stefano Cascavilla; Difficulty: Beginner; Description: In this lab, you will deploy an Amazon DocumentDB cluster, and you will connect to it to perform some DB operations using the mongo shell.; Duration: Up to 1 hour; Content Topics: Databases; This hands-on lab has: 7 Lab steps
- HANDS-ON LABStefano CascavillaDeploy a Fully Managed and Scalable SQL Database With Google Cloud SpannerBeginnerDuration: Up to 45 minutesAuthor: Stefano Cascavilla; Difficulty: Beginner; Description: In this lab, you will create a Google Cloud Spanner database, you will define a schema for a table, and you will perform some SQL queries.; Duration: Up to 45 minutes; Content Topics: SQL; This hands-on lab has: 5 Lab steps
- HANDS-ON LABAndrew BurchillUsing AWS Glue for ETL WorkloadsBeginnerDuration: Up to 1 hourAuthor: Andrew Burchill; Difficulty: Beginner; Description: Learn how to use AWS Glue jobs to perform an Extract, Transform, and Load (ETL) task in this hands-on lab.; Duration: Up to 1 hour; Content Topics: Amazon Web Services; This hands-on lab has: 4 Lab steps
- HANDS-ON LABAndrew BurchillTransforming Data With Apache Spark and Amazon EMRBeginnerDuration: Up to 1 hour and 30 minutesAuthor: Andrew Burchill; Difficulty: Beginner; Description: Learn how to use an Amazon Elastic MapReduce (EMR) cluster to transform and aggregate data in this hands-on lab.; Duration: Up to 1 hour and 30 minutes; Content Topics: Amazon Web Services; This hands-on lab has: 5 Lab steps
- HANDS-ON LABAndrew BurchillCentralizing Data Management With AWS Lake FormationBeginnerDuration: Up to 1 hourAuthor: Andrew Burchill; Difficulty: Beginner; Description: Learn how to use AWS Lake Formation and AWS Glue to manage permissions and dataset metadata for a data lake in this hands-on lab.; Duration: Up to 1 hour; Content Topics: Analytics, Storage; This hands-on lab has: 4 Lab steps
- HANDS-ON LABStefano CascavillaDefining and Working With dbt TestsAdvancedDuration: Up to 1 hour and 15 minutesAuthor: Stefano Cascavilla; Difficulty: Advanced; Description: In this lab, you will understand what are dbt tests. You will then create a custom test, and test your sources and a model both with native dbt tests and with the custom dbt test.; Duration: Up to 1 hour and 15 minutes; Content Topics: Data Modeling; This hands-on lab has: 7 Lab steps
- HANDS-ON LABStefano CascavillaWorking With Incremental dbt ModelsBeginnerDuration: Up to 1 hourAuthor: Stefano Cascavilla; Difficulty: Beginner; Description: In this lab, you will understand what are incremental dbt models, and you will create two models by using the incremental materialization type.; Duration: Up to 1 hour; Content Topics: Data Modeling; This hands-on lab has: 5 Lab steps
- HANDS-ON LABStefano CascavillaCreate and Execute Your First dbt ModelsBeginnerDuration: Up to 1 hourAuthor: Stefano Cascavilla; Difficulty: Beginner; Description: In this lab, you will create your first dbt models. You will create a model starting from the dbt sources, and you will create another one starting from the existing model. You will then execute and materialize them in the PostgreSQL database.; Duration: Up to 1 hour; Content Topics: Data Modeling; This hands-on lab has: 5 Lab steps
- HANDS-ON LABAndrew BurchillCombining and Enriching Data with Amazon Managed Workflows for Apache AirflowIntermediateDuration: Up to 2 hoursAuthor: Andrew Burchill; Difficulty: Intermediate; Description: Learn about Amazon Managed Workflows for Apache Airflow in this hands-on lab as you create a Directed Acyclic Graph in Apache Airflow.; Duration: Up to 2 hours; Content Topics: Amazon Web Services; This hands-on lab has: 5 Lab steps
- HANDS-ON LABStefano CascavillaConfigure a dbt Profile and Define SourcesBeginnerDuration: Up to 40 minutesAuthor: Stefano Cascavilla; Difficulty: Beginner; Description: In this lab, you will configure a dbt profile to connect to a PostgreSQL database, and you will define sources to be used in a dbt project.; Duration: Up to 40 minutes; Content Topics: Data Modeling; This hands-on lab has: 4 Lab steps
- HANDS-ON LABStefano CascavillaUnderstand and Use dbt Jinja MacrosIntermediateDuration: Up to 1 hourAuthor: Stefano Cascavilla; Difficulty: Intermediate; Description: In this lab, you will understand what Jinja macros are and why they are helpful. You will then leverage the most important native macros source and ref, and you will create a custom macro.; Duration: Up to 1 hour; Content Topics: Data Modeling; This hands-on lab has: 6 Lab steps
- HANDS-ON LABStefano CascavillaCreate Your First dbt (Data Build Tool) ProjectBeginnerDuration: Up to 40 minutesAuthor: Stefano Cascavilla; Difficulty: Beginner; Description: In this lab, you will learn what is dbt, why it is super helpful for data transformations, and you will install it and create your first dbt project.; Duration: Up to 40 minutes; Content Topics: Data Modeling; This hands-on lab has: 4 Lab steps
- HANDS-ON LABStefano CascavillaIntroduction to Graph Database With Neo4jBeginnerDuration: Up to 30 minutesAuthor: Stefano Cascavilla; Difficulty: Beginner; Description: In this lab, you will understand the core principles of a graph database. You will more focus on the property graph. You will understand what is Neo4j and install it on a virtual machine.; Duration: Up to 30 minutes; Content Topics: Graph Databases; This hands-on lab has: 5 Lab steps
- HANDS-ON LABCalculated SystemsHandling Variable Data in DynamoDB with GraceBeginnerDuration: Up to 1 hourAuthor: Calculated Systems; Difficulty: Beginner; Description: This lab is aimed at students with a basic understanding of Python who want to learn about schemaless data and Amazon DynamoDB.; Duration: Up to 1 hour; Content Topics: NoSQL; This hands-on lab has: 3 Lab steps
- HANDS-ON LABStefano CascavillaWorking With Full-Refresh dbt ModelsBeginnerDuration: Up to 1 hourAuthor: Stefano Cascavilla; Difficulty: Beginner; Description: In this lab, you will understand what are full-refresh dbt models, and you will create two models by using the table materialization type.; Duration: Up to 1 hour; Content Topics: Data Modeling; This hands-on lab has: 5 Lab steps
- HANDS-ON LABStefano CascavillaQuery a Neo4j Graph Database With CypherIntermediateDuration: Up to 40 minutesAuthor: Stefano Cascavilla; Difficulty: Intermediate; Description: In this lab, you will learn what a graph pattern is and you will perform graph queries using Cypher in a Neo4j database.; Duration: Up to 40 minutes; Content Topics: Graph Databases; This hands-on lab has: 3 Lab steps
- HANDS-ON LABStefano CascavillaCreate and Manage Graph Data With Neo4jIntermediateDuration: Up to 30 minutesAuthor: Stefano Cascavilla; Difficulty: Intermediate; Description: In this lab, you will create nodes inside a graph database. Then you will define relationships between these nodes to create an example of a property graph.; Duration: Up to 30 minutes; Content Topics: Graph Databases; This hands-on lab has: 3 Lab steps
- LAB CHALLENGEStefano CascavillaGoogle Cloud SQL ChallengeIntermediateDuration: Up to 40 minutesAuthor: Stefano Cascavilla; Difficulty: Intermediate; Description: Demonstrate your Google Cloud SQL skills by performing tasks required to set up a Cloud SQL infrastructure in this lab challenge.; Duration: Up to 40 minutes; Content Topics: SQL; This lab challenge has: 2 Lab steps
- HANDS-ON LABStefano CascavillaWorking With Ephemeral dbt ModelsIntermediateDuration: Up to 1 hourAuthor: Stefano Cascavilla; Difficulty: Intermediate; Description: In this lab, you will understand what are ephemeral dbt models, and you will create an ephemeral dbt model that will be then leveraged by a full-refresh model. You will then review what dbt has materialized and what has not.; Duration: Up to 1 hour; Content Topics: Data Modeling; This hands-on lab has: 5 Lab steps
- HANDS-ON LABAndrea GiussaniIntroduction to Financial Data Manipulation with PythonBeginnerDuration: Up to 1 hourAuthor: Andrea Giussani; Difficulty: Beginner; Description: The goal of this lab is to consolidate your data management and manipulation skills using Python.; Duration: Up to 1 hour; Content Topics: Analytics, Development; This hands-on lab has: 2 Lab steps
- HANDS-ON LABAndrew BurchillTroubleshooting Amazon Athena QueriesIntermediateDuration: Up to 1 hourAuthor: Andrew Burchill; Difficulty: Intermediate; Description: Learn how to use and troubleshoot Amazon Athena SQL queries in this hands-on lab.; Duration: Up to 1 hour; Content Topics: SQL; This hands-on lab has: 4 Lab steps