hands-on lab

Text Analysis and LLMs with Python - Module 7

Difficulty: Intermediate

Duration: Up to 2 hours and 30 minutes

Students: 4

Rating: 5/5

Start lab

On average, students complete this lab in2h 10m

Get guided in a real environmentPractice with a step-by-step scenario in a real, provisioned environment.

Learn and validateUse validations to check your solutions every step of the way.

See resultsTrack your knowledge and monitor your progress.

About

Author

Description

Training and Fine-Tuning Language Models

In this lab, you will learn why training or fine-tuning may be necessary for specific use cases, how embedding models are trained, and how different fine-tuning methods align models with desired behaviours. You’ll explore semantic search, re-ranking, and practical tuning choices through demos and hands-on activity.

Learning objectives

Upon completion of this lab, you will be able to:

Describe why training or fine-tuning a language model may be necessary for specific use cases.
Describe what embedding models do and how they create meaningful representations in vector space.
Outline the process of training embedding models, including data preparation, architecture choice, and contrastive learning approaches.
Distinguish between the three stages of LLM training: pre-training, supervised fine-tuning, and preference tuning.
Investigate continued pre-training and when it is useful for domain adaptation.
Describe masked language modeling and its role in LLM training.
Compare supervised fine-tuning and preference tuning, and explain how each aligns models with desired behaviour.

Intended audience

This course is designed for:

Data Scientists
Software Developers
Machine Learning Engineers
AI Engineers
DevOps Engineers

Prerequisites

Completion of previous modules is highly recommended before attempting this lab.

Lab structure

Demo: Train & Fine-Tune with OpenAI — Embeddings, Re-Ranking, and Tuning Paths
In this demo, you will:
- Explore bi-encoder style semantic similarity with OpenAI embeddings.
- Use in-batch negatives and inspect similarity matrices to build intuition.
- Apply cross-encoder re-ranking with a chat model for precision.
- Prepare a small JSONL dataset for supervised fine-tuning (SFT) and launch a fine-tune job.
- Position continued pre-training (MLM), SFT, and preference tuning in context.

Intended learning outcomes:
- Explain Siamese/Twin (bi-encoder) embeddings and measure cosine similarity.
- Describe in-batch negatives and analyse a similarity matrix.
- Build a cross-encoder re-ranker by asking a chat model to score pair similarity.
- Prepare a small dataset and understand the pipeline to launch a fine-tune job.
- Differentiate continued pre-training, SFT, and preference tuning.

Activity: Semantic Search + Re-Ranking + Tuning Choices
In this activity, you will:
- Build a mini FAQ semantic search system using bi-encoder embeddings for retrieval.
- Apply cross-encoder re-ranking with a chat model to improve results.
- Prepare a small supervised fine-tuning dataset to enforce strict output formatting.
- Reflect on when to use continued pre-training, SFT, or preference tuning in real-world adaptation.

Intended learning outcomes:
- Implement bi-encoder retrieval with embeddings.
- Explain in-batch negatives with a similarity matrix.
- Apply cross-encoder re-ranking for precision.
- Prepare a small dataset for supervised fine-tuning.
- Reflect on the role of continued pre-training, SFT, and preference tuning.

Hands-on Lab UUID

Lab steps

0 of 1 steps completed.Use arrow keys to navigate between steps. Press Enter to go to a step if available.

Starting the Notebooks

Lab Rules

Lab rules apply