databases/qdrant/manifests/04-notebook/vector-database.ipynb (446 lines of code) (raw):
{
"cells": [
{
"cell_type": "markdown",
"id": "yuQxBYbWpPCd",
"metadata": {
"id": "yuQxBYbWpPCd"
},
"source": [
"Copyright 2024 Google LLC\n",
" \n",
"Licensed under the Apache License, Version 2.0 (the \"License\");\n",
" you may not use this file except in compliance with the License.\n",
" You may obtain a copy of the License at\n",
"https://www.apache.org/licenses/LICENSE-2.0\n",
"\n",
" Unless required by applicable law or agreed to in writing, software\n",
" distributed under the License is distributed on an \"AS IS\" BASIS,\n",
" WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
" See the License for the specific language governing permissions and\n",
"limitations under the License."
]
},
{
"cell_type": "markdown",
"id": "201cd5fa-25e0-4bd7-8a27-af1fc85a12e7",
"metadata": {
"id": "201cd5fa-25e0-4bd7-8a27-af1fc85a12e7",
"vscode": {
"languageId": "raw"
}
},
"source": [
"This section shows you how to upload Vectors into a new Qdrant Collection and run simple search queries using the official Qdrant client.\n",
"\n",
"In this example, you use a dataset from a CSV file that contains a list of books in different genres. Qdrant will serve as a search engine.\n",
"\n",
"Install kubectl and the Google Cloud SDK with the necessary authentication plugin for Google Kubernetes Engine (GKE)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "Yk3cHn1ToFLM",
"metadata": {
"id": "Yk3cHn1ToFLM"
},
"outputs": [],
"source": [
"%%bash\n",
"\n",
"curl -LO \"https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl\"\n",
"sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl\n",
"apt-get update && apt-get install apt-transport-https ca-certificates gnupg\n",
"curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo gpg --dearmor -o /usr/share/keyrings/cloud.google.gpg\n",
"echo \"deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main\" | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list\n",
"apt-get update && sudo apt-get install google-cloud-cli-gke-gcloud-auth-plugin"
]
},
{
"cell_type": "markdown",
"id": "8e6f6249",
"metadata": {
"id": "8e6f6249"
},
"source": [
"Install a Qdrant client:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1c3b796a-3b3a-4322-a276-d72c1dc8540e",
"metadata": {
"id": "1c3b796a-3b3a-4322-a276-d72c1dc8540e"
},
"outputs": [],
"source": [
"! pip install qdrant-client[fastembed] python-dotenv -U"
]
},
{
"cell_type": "markdown",
"id": "0170d1de",
"metadata": {
"id": "0170d1de"
},
"source": [
"Replace \\<CLUSTER_NAME> with your cluster name, e.g. qdrant-cluster. Retrieve the GKE cluster's credentials using the gcloud command."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6RinCTUPp8uU",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"executionInfo": {
"elapsed": 1312,
"status": "ok",
"timestamp": 1721925276351,
"user": {
"displayName": "",
"userId": ""
},
"user_tz": -180
},
"id": "6RinCTUPp8uU",
"outputId": "2d8ad861-dc60-4fe0-f01d-910ec12a177e"
},
"outputs": [],
"source": [
"%%bash\n",
"\n",
"export KUBERNETES_CLUSTER_NAME= <CLUSTER_NAME> \n",
"gcloud container clusters get-credentials $KUBERNETES_CLUSTER_NAME --region $GOOGLE_CLOUD_REGION"
]
},
{
"cell_type": "markdown",
"id": "4fb9bee4",
"metadata": {
"id": "4fb9bee4"
},
"source": [
"Download the dataset from Git."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "3xC4suW60D0Z",
"metadata": {
"executionInfo": {
"elapsed": 462,
"status": "ok",
"timestamp": 1721925284451,
"user": {
"displayName": "",
"userId": ""
},
"user_tz": -180
},
"id": "3xC4suW60D0Z"
},
"outputs": [],
"source": [
"%%bash\n",
"\n",
"export DATASET_PATH=https://raw.githubusercontent.com/epam/kubernetes-engine-samples/qdrant-installation/databases/qdrant/manifests/04-notebook/dataset.csv\n",
"curl -s -LO $DATASET_PATH"
]
},
{
"cell_type": "markdown",
"id": "11a1f364",
"metadata": {},
"source": [
"Please run the next command and check if Qdrant internal load balancer achieved an IP address. If you see ip address in the output proceed to the next step if blanc please repeat the command after a few minutes or check the status of qdrant-ilb service from your console, proceed to the next step only when IP address appears."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dd59571f",
"metadata": {},
"outputs": [],
"source": [
"%%bash\n",
"kubectl get svc qdrant-ilb -n qdrant --output jsonpath=\"{.status.loadBalancer.ingress[0].ip}\""
]
},
{
"cell_type": "markdown",
"id": "9ac49ff2",
"metadata": {
"id": "9ac49ff2"
},
"source": [
"Create an .env file with environment variables required for connecting to Qdrant in a Kubernetes cluster."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "rMJoOVqhs_u1",
"metadata": {
"id": "rMJoOVqhs_u1"
},
"outputs": [],
"source": [
"%%bash\n",
"echo QDRANT_ENDPOINT=\"http://$(kubectl get svc qdrant-ilb -n qdrant --output jsonpath=\"{.status.loadBalancer.ingress[0].ip}\"):6333\" > .env\n",
"echo APIKEY=$(kubectl get secret qdrant-database-apikey -n qdrant --template='{{index .data \"api-key\"}}'| base64 -d) >> .env\n"
]
},
{
"cell_type": "markdown",
"id": "-fVame2rv6Aj",
"metadata": {
"id": "-fVame2rv6Aj"
},
"source": [
"Import the required Python and Qdrant libraries:"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "bb5ca67b-607d-4b23-926a-6459ea584f45",
"metadata": {
"executionInfo": {
"elapsed": 327,
"status": "ok",
"timestamp": 1721929030323,
"user": {
"displayName": "",
"userId": ""
},
"user_tz": -180
},
"id": "bb5ca67b-607d-4b23-926a-6459ea584f45"
},
"outputs": [],
"source": [
"from dotenv import load_dotenv\n",
"from qdrant_client import QdrantClient\n",
"from qdrant_client.http import models\n",
"import os\n",
"import csv"
]
},
{
"cell_type": "markdown",
"id": "m-ZrYHeTwBmb",
"metadata": {
"id": "m-ZrYHeTwBmb"
},
"source": [
"Load data from a CSV file for inserting data into a Qdrant collection:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "013284ff-e4b6-4ad7-b330-17860121c4c1",
"metadata": {
"executionInfo": {
"elapsed": 415,
"status": "ok",
"timestamp": 1721925424235,
"user": {
"displayName": "",
"userId": ""
},
"user_tz": -180
},
"id": "013284ff-e4b6-4ad7-b330-17860121c4c1"
},
"outputs": [],
"source": [
"books = [*csv.DictReader(open('/content/dataset.csv'))]"
]
},
{
"cell_type": "markdown",
"id": "DF0lu7pYwQLS",
"metadata": {
"id": "DF0lu7pYwQLS"
},
"source": [
"Prepare data for uploading:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "75f71220-349b-41f0-89ea-1ba7a1c52771",
"metadata": {
"executionInfo": {
"elapsed": 407,
"status": "ok",
"timestamp": 1721925458633,
"user": {
"displayName": "",
"userId": ""
},
"user_tz": -180
},
"id": "75f71220-349b-41f0-89ea-1ba7a1c52771"
},
"outputs": [],
"source": [
"documents: list[dict[str, any]] = []\n",
"metadata: list[dict[str, any]] = []\n",
"ids: list[int] = []\n",
"\n",
"for idx, doc in enumerate(books):\n",
" ids.append(idx)\n",
" documents.append(doc[\"description\"])\n",
" metadata.append(\n",
" {\n",
" \"title\": doc[\"title\"],\n",
" \"author\": doc[\"author\"],\n",
" \"publishDate\": doc[\"publishDate\"],\n",
" }\n",
" )"
]
},
{
"cell_type": "markdown",
"id": "bm-CN-B8waPZ",
"metadata": {
"id": "bm-CN-B8waPZ"
},
"source": [
"Define a Qdrant connection, it requires an API Key for authentication:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b08ebd75-0b8c-4805-a40f-634d2d5df3de",
"metadata": {
"id": "b08ebd75-0b8c-4805-a40f-634d2d5df3de"
},
"outputs": [],
"source": [
"load_dotenv()\n",
"qdrant = QdrantClient(\n",
" url=os.getenv(\"QDRANT_ENDPOINT\"), api_key=os.getenv(\"APIKEY\"))"
]
},
{
"cell_type": "markdown",
"id": "b048a1bb-31e3-40f3-8ae1-6c0c9a49fa70",
"metadata": {
"id": "b048a1bb-31e3-40f3-8ae1-6c0c9a49fa70"
},
"source": [
"Create a Qdrant collection and insert data. This method establishes a connection to Qdrant, creates a new collection named `my_books`, and uploads the book data to `my_books`."
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "637e4922-d58c-4eb3-91c2-03252422c662",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"executionInfo": {
"elapsed": 1077,
"status": "ok",
"timestamp": 1721929052203,
"user": {
"displayName": "",
"userId": ""
},
"user_tz": -180
},
"id": "637e4922-d58c-4eb3-91c2-03252422c662",
"outputId": "8f44c044-903e-4637-c324-1bfd41cb819b"
},
"outputs": [
{
"data": {
"text/plain": [
"[]"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"qdrant.add(collection_name=\"my_books\", documents=documents, metadata=metadata, ids=ids, parallel=2)"
]
},
{
"cell_type": "markdown",
"id": "9d0ca596-9688-4df3-a8cc-dc384c1e5234",
"metadata": {
"id": "9d0ca596-9688-4df3-a8cc-dc384c1e5234"
},
"source": [
"Query the Qdrant database. This method runs a search query about `drama about people and unhappy love` and displays results.\n",
"\n",
"It prints each result separated by a line of dashes, in the following format :\n",
"\n",
"- Title: Title of the book\n",
"- Author: Author of the book\n",
"- Description: As stored in your document's description metadata field\n",
"- Published: Book publication date\n",
"- Score: Qdrant's relevancy score"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7d1cae5f-ffa3-44ea-8b9e-fd376cdc185c",
"metadata": {
"id": "7d1cae5f-ffa3-44ea-8b9e-fd376cdc185c"
},
"outputs": [],
"source": [
"results = qdrant.query(\n",
" collection_name=\"my_books\",\n",
" query_text=\"drama about people and unhappy love\",\n",
" limit=2,\n",
")\n",
"for result in results:\n",
" print(\"Title:\", result.metadata[\"title\"], \"\\nAuthor:\", result.metadata[\"author\"])\n",
" print(\"Description:\", result.metadata[\"document\"], \"Published:\", result.metadata[\"publishDate\"], \"\\nScore:\", result.score)\n",
" print(\"-----\")"
]
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.0rc1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}