hands-on lab

Text Analysis and LLMs with Python - Module 2

Difficulty: Intermediate
Duration: Up to 1 hour and 30 minutes
Students: 3
Get guided in a real environmentPractice with a step-by-step scenario in a real, provisioned environment.
Learn and validateUse validations to check your solutions every step of the way.
See resultsTrack your knowledge and monitor your progress.

Description

Transformer Model Architecture

In this lab, you will explore the transformer architecture, understand its core components, and investigate how it overcame the limitations of earlier sequence models. You’ll examine attention, self-attention, encoder/decoder structures, and positional encoding, as well as strengths and limitations of transformers in real-world use cases.

Learning objectives

Upon completion of this lab, you will be able to:

  • Describe the limitations of earlier sequence models and explain how transformers address them.
  • Explain the core components of the transformer architecture.
  • Differentiate between encoder, decoder, and encoder–decoder transformer blocks and match them to appropriate use cases.
  • Describe the concept of attention and self-attention.
  • Investigate the purpose of positional encoding and describe how it allows transformers to model token order.
  • Identify the strengths and limitations of transformer models.

Intended audience

This course is designed for:

  • Data Scientists
  • Software Developers
  • Machine Learning Engineers
  • AI Engineers
  • DevOps Engineers

Prerequisites

Completion of previous modules is highly recommended before attempting this lab.

Lab structure

Demo: Attention Under the Hood — Encoder, Decoder & Encoder–Decoder
In this demo, you will see attention in action by visualising:
- Encoder self-attention with DistilBERT (heatmaps per layer/head and mean across heads)
- Decoder masked self-attention with GPT-2 (causal/triangular masking)
- Encoder→decoder cross-attention with T5-small (which source tokens the decoder looks at)
- KV-caching speedups during generation to link mechanics to performance

Intended learning outcomes:
- Explain queries, keys, values, and how attention weights are interpreted on heatmaps.
- Distinguish encoder, decoder (masked), and encoder–decoder (cross-attention) architectures.
- Describe causal masking and why it enforces left-to-right generation.
- Interpret how attention shifts across layers/heads (syntax vs. semantics).
- Explain what KV caching is and why it speeds up decoding.

Activity: Be an Attention Detective
In this activity, you will investigate real attention patterns to build intuition:
- Edit inputs with pronoun ambiguity and compare self-attention across layers/heads.
- Explore masked self-attention by changing the last tokens of a prefix.
- Inspect cross-attention over two decoding steps to see how focus shifts.
- Benchmark generation with/without cache for different lengths.
You will document your observations with notes and timing tables.

Intended learning outcomes:
- Analyse and articulate which tokens a model attends to and why (for encoder, decoder, and cross-attention).
- Identify and explain causal masking patterns.
- Describe how cross-attention moves across steps as tokens are generated.
- Quantify the impact of KV caching and reason about throughput/latency.
- Reflect on failure modes (ambiguity, long-range dependencies, head noise) and tie them to transformer limitations.

Hands-on Lab UUID

Lab steps

0 of 1 steps completed.Use arrow keys to navigate between steps. Press Enter to go to a step if available.
  1. Starting the Notebooks