hands-on lab

Using Amazon DocumentDB to Store and Search Vector Data

Difficulty: Beginner
Duration: Up to 1 hour
Students: 16
Get guided in a real environmentPractice with a step-by-step scenario in a real, provisioned environment.
Learn and validateUse validations to check your solutions every step of the way.
See resultsTrack your knowledge and monitor your progress.

Description

Amazon DocumentDB is a fully managed, MongoDB-compatible database service. It's designed to be fast, scalable, and highly available. Amazon DocumentDB makes it easy and cost-effective to store, query, and index JSON data. Its API is compatible with the popular open source document database MongoDB, meaning you can use your existing MongoDB drivers and tools to interact with Amazon DocumentDB.

Amazon DocumentDB supports indexing and searching for documents based on their vector embeddings. This Amazon DocumentDB-specific feature makes it easy to implement semantic search systems, recommendation systems, and other applications that require searching for similar documents.

In this hands-on lab, you will convert textual data to vectors, insert the vectorized data into an Amazon DocumentDB collection, and then search for similar documents using the vector embeddings.

Please note: This lab creates an Amazon DocumentDB cluster during lab setup, which takes approximately ten minutes. Please ensure you have enough time available to complete the lab.

Learning objectives

Upon completion of this beginner-level lab, you will be able to:

  • Implement a Python script that converts textual data to vectors
  • Insert data into an Amazon DocumentDB cluster
  • Create an index on a field in an Amazon DocumentDB collection
  • Implement a Python script that searches for similar documents using vector data

Intended audience

  • Candidates for the AWS Certified Machine Learning Specialty certification
  • Data Engineers
  • DevOps Engineers
  • Machine Learning Engineers
  • Software Engineers

Prerequisites

Familiarity with the following will be beneficial but is not required:

  • Amazon DocumentDB or MongoDB
  • The Python scripting language
  • Machine learning concepts

The following content can be used to fulfill the prerequisites:

Environment before

Environment after

Covered topics

Lab steps

Logging In to the Amazon Web Services Console
Connecting to the Virtual Machine using EC2 Instance Connect
Vectorizing Textual Data
Indexing and Semantically Searching Your Data