This workbook demonstrates how to work with Langchain Amazon Bedrock. Amazon Bedrock is a managed service that makes foundation models from leading AI startup and Amazon's own Titan models available through APIs.
Install packages and import modules
Note: boto3 is part of AWS SDK for Python and is required to use Bedrock LLM
Init Bedrock client
To authorize in AWS service we can use ~/.aws/config file with configuring credentials or pass AWS_ACCESS_KEY, AWS_SECRET_KEY, AWS_REGION to boto3 module.
We're using second approach for our example.
Connect to Elasticsearch
ℹ️ We're using an Elastic Cloud deployment of Elasticsearch for this notebook. If you don't have an Elastic Cloud deployment, sign up here for a free trial.
We'll use the Cloud ID to identify our deployment, because we are using Elastic Cloud deployment. To find the Cloud ID for your deployment, go to https://cloud.elastic.co/deployments and select your deployment.
We will use ElasticsearchStore to connect to our elastic cloud deployment. This would help create and index data easily. In the ElasticsearchStore instance, will set embedding to BedrockEmbeddings to embed the texts and elasticsearch index name that will be used in this example.
Download the dataset
Let's download the sample dataset and deserialize the document.
Split Documents into Passages
We’ll chunk documents into passages in order to improve the retrieval specificity and to ensure that we can provide multiple passages within the context window of the final question answering prompt.
Here we are chunking documents into 500 token passages with an overlap of 0 tokens.
Here we are using a simple splitter but Langchain offers more advanced splitters to reduce the chance of context being lost.
Index data into elasticsearch
Next, we will index data to elasticsearch using ElasticsearchStore.from_documents. We will use Cloud ID, Password and Index name values set in the Create cloud deployment step.
Init Bedrock LLM
Next, we will initialize Bedrock LLM. In the Bedrock instance, will pass bedrock_client and specific model_id: amazon.titan-text-express-v1, ai21.j2-ultra-v1, anthropic.claude-v2, cohere.command-text-v14 or etc. You can see list of available base models on Amazon Bedrock User Guide
Asking a question
Now that we have the passages stored in Elasticsearch and llm is initialized, we can now ask a question to get the relevant passages.