hands-on lab

Deploying Large Language Models Using Ray Serve

Difficulty: Beginner

Duration: Up to 1 hour and 30 minutes

Students: 39

Rating: 5/5

Start lab

Get guided in a real environmentPractice with a step-by-step scenario in a real, provisioned environment.

Learn and validateUse validations to check your solutions every step of the way.

See resultsTrack your knowledge and monitor your progress.

Description

Ray Serve is a framework for deploying and serving machine learning and large language model (LLM) inference workloads. It is designed to be scalable, support complex multi-model workflows, and efficiently utilize costly resources such as GPUs. The Phi-3 model released by Microsoft is a capable LLM model that has been optimized for use with CPUs and low memory environments.

Learning how to use Ray Serve to deploy a large language model will benefit anyone working with machine learning models and looking to deploy them in a production environment.

In this hands-on lab, you will use a development environment to implement a Ray Serve deployment, and you will run your deployment on a virtual machine.

Learning objectives

Upon completion of this beginner-level lab, you will be able to:

Implement a Ray Serve deployment that allows you to interact with a large language model
Test your deployment on a virtual machine
Deploy your Ray Serve deployment to a Ray cluster