-
Notifications
You must be signed in to change notification settings - Fork 3
Google Cloud Data Extraction Component Setup
Author: Abdelkader Alkadour
Date: 20.06.2023
This documentation provides instructions for setting up and configuring the Google Cloud environment to run the Data Extraction using the provided commands. Please follow the steps below to ensure a successful setup.
Before you begin, make sure you have the following:
- Google Cloud account
- Project ID
- Billing account connected to the project
-
Open the Google Cloud Console and sign in to your Google Cloud account.
-
Initialize google cloud. You can do this by running the following command in the terminal:
gcloud init
This command will sign you in and configure the project.
-
Connect the project to a billing account. Go to the Google Cloud Console, navigate to "Billing" under "IAM & Admin," and follow the instructions to connect a billing account to your project.
-
Enable the Compute Engine API by running the following command:
gcloud services enable compute.googleapis.com
-
Replace
<ProjectID>
with the ProjectID of your project and<Region>
with the region where you want to deploy the Data Extraction Component.gcloud compute instances create data-retriver \ --zone=<Region>\ --machine-type=e2-highmem-2 \ --network-interface=network-tier=PREMIUM,stack-type=IPV4_ONLY,subnet=default \ --maintenance-policy=MIGRATE \ --provisioning-model=STANDARD \ --scopes=https://www.googleapis.com/auth/devstorage.read_only,https://www.googleapis.com/auth/logging.write,https://www.googleapis.com/auth/monitoring.write,https://www.googleapis.com/auth/servicecontrol,https://www.googleapis.com/auth/service.management.readonly,https://www.googleapis.com/auth/trace.append \ --create-disk=auto-delete=yes,boot=yes,device-name=instance-1,image=projects/ubuntu-os-cloud/global/images/ubuntu-2004-focal-v20230616,mode=rw,size=20,type=projects/<ProjectID>/zones/<Region>/diskTypes/pd-balanced \ --no-shielded-secure-boot \ --shielded-vtpm \ --shielded-integrity-monitoring \ --labels=goog-ec-src=vm_add-gcloud \ --reservation-affinity=any ```
-
SSH into the instance by running the following command:
gcloud compute ssh --zone <Region> "root@data-retriver"
Replace
<Region>
with the region of your instance.If you have problem with last command. You can click on the dropdown menu and then on view gcloud command. The command can be used on your local shell.
After you are on the machine, enter the following command to get in root user:
sudo su -
-
Update the OS and install Anaconda:
sudo apt update sudo apt upgrade -y curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -o Miniconda3-latest-Linux-x86_64.sh sudo bash Miniconda3-latest-Linux-x86_64.sh -b -p /home/ubuntu/miniconda3 /home/ubuntu/miniconda3/bin/conda init exit
-
Reconnect with the VM, clone the project and set the teokens file:
Reconnect:
gcloud compute ssh --zone <Region> "root@data-retriver"
Clone the project:
git clone https://github.com/amosproj/amos2023ss03-qachat.git cd amos2023ss03-qachat/ export PYTHONPATH="$PYTHONPATH:$(pwd)" cd QAChat/Data_Processing pip install -r requirements.txt conda install -c conda-forge poppler sudo apt-get install tesseract-ocr tesseract-ocr-deu pip install pdf2image python -m spacy download de_core_news_sm python -m spacy download xx_ent_wiki_sm
Add API credentials:
nano /root/amos2023ss03-qachat/tokens.env nano /root/amos2023ss03-qachat/QAChat/Data_Processing/credentials_file.json
-
Add startup script: this script ensure that the vm will be shout down after finish the task.
click on the name of the vm, as in the picture:
then click on Edit, as in the picture:
scroll down to Metadata and add the following startup script, as in the picture:
#!/bin/bash export PYTHONPATH=$PYTHONPATH:/root/amos2023ss03-qachat/ env_file="/root/amos2023ss03-qachat/tokens.env" while IFS= read -r line || [ -n "$line" ]; do line="${line// /}" # Replace space with nothing line="${line//\"/}" # Replace double quotes with nothing export "$line" done < "$env_file" export CREDENTIALS_JSON_FILE=/root/amos2023ss03-qachat/QAChat/Data_Processing/credentials_file.json /home/ubuntu/miniconda3/bin/python /root/amos2023ss03-qachat/QAChat/Data_Processing/main.py sudo shutdown -h now
finally click on save.