Intro to Databases in Industry: Data Cleaning, Querying, and Modeling at Scale
Speakers:
- Rodolfo Lourenzutti, University of British Columbia
- Arman Seyed-Ahmadi, University of British Columbia
- Diego Ardila, Shopify
You can install PostgreSQL on your own machine and load the database dump files provided in the databases/ folder to locally recreate the databases used in the workshop for further practicing. The instructions to do so are provided here.
The Jupyter notebooks in this repository use a few packages to run SQL commands within the Python environment of the notebooks, which are all provided in the environment.yml. In order to reproduce this environment and make it accessible to Jupyter Lab, you need to install the nb_conda_kernels package in your base environment (or whichever environment Jupyter Lab is installed in) using the following command in your terminal:
conda install nb_conda_kernels
Then run the following command to recreate the environment
conda env create -f environment.yml
A new environment called ssc2022 should appear in the list of kernels when you launch Jupyter Lab on your computer.
© 2022 Arman Seyed-Ahmadi, Rodolfo Lourenzutti, Diego Ardila
Software licensed under the MIT License, non-software content licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License. See the license file for more information.