# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

This section shows you how to upload Vectors into a new Weaviate Collection and run simple search queries using the official Weaviate client. In this example, you use a dataset from a CSV file that contains a list of books in different genres. Weaviate will serve as a search engine.

Install **kubectl** and the **Google Cloud SDK** with the necessary authentication plugin for Google Kubernetes Engine (GKE).

In [None]:
%%bash

curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
apt-get update && apt-get install apt-transport-https ca-certificates gnupg
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo gpg --dearmor -o /usr/share/keyrings/cloud.google.gpg
echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
apt-get update && sudo apt-get install google-cloud-cli-gke-gcloud-auth-plugin

**Replace** \<CLUSTER_NAME> with your cluster name, e.g. weaviate-cluster. Retrieve the GKE cluster's credentials using the gcloud command.

In [None]:
%%bash

export KUBERNETES_CLUSTER_NAME=<CLUSTER_NAME>
gcloud container clusters get-credentials $KUBERNETES_CLUSTER_NAME --region $GOOGLE_CLOUD_REGION

Download the dataset from Git.

In [None]:
%%bash

export DATASET_PATH=https://raw.githubusercontent.com/epam/kubernetes-engine-samples/Weaviate/databases/weaviate/manifests/02-notebook/dataset.csv
curl -s -LO $DATASET_PATH

Create an .env file with environment variables required for connecting to Weaviate in a Kubernetes cluster.

In [None]:
%%bash

echo WEAVIATE_ENDPOINT=$(kubectl get svc weaviate-ilb -n weaviate --output jsonpath="{.status.loadBalancer.ingress[0].ip}") > .env
echo APIKEY=$(kubectl get secret apikeys -n weaviate --template={{.data.AUTHENTICATION_APIKEY_ALLOWED_KEYS}} | base64 -d) >> .env
echo PALM_APIKEY=$(gcloud auth print-access-token) >> .env

Install a Weaviate client:

In [None]:
! pip install weaviate-client python-dotenv

Import Python libraries:

In [None]:
import os
import csv
import weaviate
import json
from weaviate.connect import ConnectionParams
from weaviate.classes.config import Configure
from typing import List
import numpy as np
from dotenv import load_dotenv

Load data from a CSV file for inserting data into a Weaviate collection:

In [None]:
with open('/content/dataset.csv') as csv_file:
    books = [*csv.DictReader(csv_file)]

Define a Weaviate connection, it requires an API Key for authentication:

In [None]:
load_dotenv()
auth_config = weaviate.auth.AuthApiKey(api_key=os.getenv("APIKEY"))
client = weaviate.WeaviateClient(
    connection_params=ConnectionParams.from_params(
        http_host=os.getenv("WEAVIATE_ENDPOINT"),
        http_port="8080",
        http_secure=False,
        grpc_host=os.getenv("WEAVIATE_ENDPOINT"),
        grpc_port="50051",
        grpc_secure=False,
    ),
    additional_headers={
        "X-Palm-Api-Key": os.getenv("PALM_APIKEY")
    },
    auth_client_secret=auth_config
)
client.connect()

Create or recreate a collection "Book". Weaviate will vectorize all book descriptions using Vertex AI embedding model:

In [None]:
if client.collections.exists("Book"):
    client.collections.delete("Book")
collection = client.collections.create(
    name="Book",
      vectorizer_config=[
        Configure.NamedVectors.text2vec_palm(
            name="description_vector",
            source_properties=["description"],
            project_id=os.getenv("GOOGLE_CLOUD_PROJECT"),
            model_id="text-embedding-005"
        )
    ],
)

Insert data into the Weaviate collection:

In [None]:
with collection.batch.dynamic() as batch:
    for i, doc in enumerate(books):  # Batch import data
        print(f"importing book: {i+1}")
        batch.add_object(properties=doc)

Define the Weaviate query function. Weaviate converts the text query into an embedding, runs a vector search and displays results.

It prints each result separated by a line of dashes, in the following format :

- Title: Title of the book
- Author: Author of the book
- Publish date: Book publication date
- Description: As stored in your document's description metadata field

In [None]:
def handle_query(query, limit):
    result = (
        collection.query.near_text(
            query=query,
            limit=limit
        )
    )
    for hit in result.objects:
        book = hit.properties
        print("Title: {}, Author: {}, Publish date: {}".format(book["title"], book["author"], book["publishDate"]))
        print(book["description"])
        print("---------")

Run the query `drama about people and unhappy love`:

In [None]:
handle_query("drama about people and unhappy love", 2)