Using Amazon DocumentDB to Store and Search Vector Data
Description
Amazon DocumentDB is a fully managed, MongoDB-compatible database service. It's designed to be fast, scalable, and highly available. Amazon DocumentDB makes it easy and cost-effective to store, query, and index JSON data. Its API is compatible with the popular open source document database MongoDB, meaning you can use your existing MongoDB drivers and tools to interact with Amazon DocumentDB.
Amazon DocumentDB supports indexing and searching for documents based on their vector embeddings. This Amazon DocumentDB-specific feature makes it easy to implement semantic search systems, recommendation systems, and other applications that require searching for similar documents.
In this hands-on lab, you will convert textual data to vectors, insert the vectorized data into an Amazon DocumentDB collection, and then search for similar documents using the vector embeddings.
Please note: This lab creates an Amazon DocumentDB cluster during lab setup, which takes approximately ten minutes. Please ensure you have enough time available to complete the lab.
Learning objectives
Upon completion of this beginner-level lab, you will be able to:
- Implement a Python script that converts textual data to vectors
- Insert data into an Amazon DocumentDB cluster
- Create an index on a field in an Amazon DocumentDB collection
- Implement a Python script that searches for similar documents using vector data
Intended audience
- Candidates for the AWS Certified Machine Learning Specialty certification
- Data Engineers
- DevOps Engineers
- Machine Learning Engineers
- Software Engineers
Prerequisites
Familiarity with the following will be beneficial but is not required:
- Amazon DocumentDB or MongoDB
- The Python scripting language
- Machine learning concepts
The following content can be used to fulfill the prerequisites: