Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
328 changes: 170 additions & 158 deletions DEVELOPER_GUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,163 @@

This document is intended for developers who want to install, test or contribute to the code.

## Install
## Set up development environment

### Linux

Install [rust](https://www.rust-lang.org/tools/install):

```bash
$ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
$ source $HOME/.cargo/env
```

Install [pyenv](https://github.com/pyenv/pyenv):

```bash
$ curl https://pyenv.run | bash
```

Install Python 3.9.18:

```bash
$ pyenv install 3.9.18
```

Check that the expected local version of Python is used:

```bash
$ cd services/worker
$ python --version
Python 3.9.18
```

Install Poetry with [pipx](https://pipx.pypa.io/stable/installation/):

- Either a single version:
```bash
pipx install poetry==1.8.2
poetry --version
```
- Or a parallel version (with a unique suffix):
```bash
pipx install poetry==1.8.2 --suffix=@1.8.2
poetry@1.8.2 --version
```

Set the Python version to use with Poetry:

```bash
poetry env use 3.9.18
```
or
```bash
poetry@1.8.2 env use 3.9.18
```

Install the dependencies:

```bash
make install
```

### Mac OS

To install the [worker](./services/worker) on Mac OS, you can follow the next steps.

#### First: as an administrator

Install brew:

```bash
$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
```

#### Then: as a normal user

Install pyenv:

```bash
$ curl https://pyenv.run | bash
```

append the following lines to ~/.zshrc:

```bash
export PYENV_ROOT="$HOME/.pyenv"
command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH"
eval "$(pyenv init -)"
```

Logout and login again.

Install Python 3.9.18:

```bash
$ pyenv install 3.9.18
```

Check that the expected local version of Python is used:

```bash
$ cd services/worker
$ python --version
Python 3.9.18
```

Install Poetry with [pipx](https://pipx.pypa.io/stable/installation/):

- Either a single version:
```bash
pipx install poetry==1.8.2
poetry --version
```
- Or a parallel version (with a unique suffix):
```bash
pipx install poetry==1.8.2 --suffix=@1.8.2
poetry@1.8.2 --version
```

append the following lines to ~/.zshrc:

```bash
export PATH="/Users/slesage2/.local/bin:$PATH"
```

Install rust:

```bash
$ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
$ source $HOME/.cargo/env
```

Set the python version to use with poetry:

```bash
poetry env use 3.9.18
```
or
```bash
poetry@1.8.2 env use 3.9.18
```

Avoid an issue with Apache beam (https://github.com/python-poetry/poetry/issues/4888#issuecomment-1208408509):

```bash
poetry config experimental.new-installer false
```
or
```bash
poetry@1.8.2 config experimental.new-installer false
```

Install the dependencies:

```bash
make install
```

## Install dataset-viewer

To start working on the project:

Expand All @@ -11,6 +167,12 @@ git clone git@github.com:huggingface/dataset-viewer.git
cd dataset-viewer
```

Install all the packages:

```bash
make install
```

Install docker (see https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository and https://docs.docker.com/engine/install/linux-postinstall/)

Run the project locally:
Expand All @@ -19,6 +181,8 @@ Run the project locally:
make start
```

When the docker containers have been started, enter http://localhost:8100/healthcheck: it should show `ok`.

Run the project in development mode:

```bash
Expand All @@ -28,7 +192,7 @@ make dev-start
In development mode, you don't need to rebuild the docker images to apply a change in a worker.
You can just restart the worker's docker container and it will apply your changes.

To install a single job (in [jobs](./jobs)), library (in [libs](./libs)) or service (in [services](./services)), go to their respective directory, and install Python 3.9 (consider [pyenv](https://github.com/pyenv/pyenv)) and [poetry](https://python-poetry.org/docs/master/#installation) (don't forget to add `poetry` to the `PATH` environment variable).
To install a single job (in [jobs](./jobs)), library (in [libs](./libs)) or service (in [services](./services)), go to their respective directory, and install Python 3.9 (consider [pyenv](https://github.com/pyenv/pyenv)) and [poetry](https://python-poetry.org/docs/main/#installation) (don't forget to add `poetry` to the `PATH` environment variable).

If you use pyenv:

Expand Down Expand Up @@ -101,8 +265,8 @@ The following environments contain all the modules: reverse proxy, API server, a

| Environment | URL | Type | How to deploy |
| ----------- | ---------------------------------------------------- | ----------------- | --------------------------------------- |
| Production | https://datasets-server.huggingface.co | Helm / Kubernetes | `make upgrade-prod` in [chart](./chart) |
| Development | https://datasets-server.us.dev.moon.huggingface.tech | Helm / Kubernetes | `make upgrade-dev` in [chart](./chart) |
| Production | https://datasets-server.huggingface.co | Helm / Kubernetes | Argo CD |
| Development | https://datasets-server.us.dev.moon.huggingface.tech | Helm / Kubernetes | Argo CD |
| Local build | http://localhost:8100 | Docker compose | `make start` (builds docker images) |

## Jobs queue
Expand Down Expand Up @@ -143,11 +307,9 @@ To launch the end to end tests:
make e2e
```

## Poetry

### Versions
## Versions

If service is updated, we don't update its version in the `pyproject.yaml` file. But we have to update the [helm chart](./chart/) with the new image tag, corresponding to the last build docker published on docker.io by the CI.
We don't use the package versions (in pyproject.toml files), no need to update them.

## Pull requests

Expand All @@ -170,153 +332,3 @@ DOCKERHUB_USERNAME=xxx
DOCKERHUB_PASSWORD=xxx
GITHUB_TOKEN=xxx
```

## Set up development environment

### Linux

Install pyenv:

```bash
$ curl https://pyenv.run | bash
```

Install Python 3.9.18:

```bash
$ pyenv install 3.9.18
```

Check that the expected local version of Python is used:

```bash
$ cd services/worker
$ python --version
Python 3.9.18
```

Install Poetry with [pipx](https://pipx.pypa.io/stable/installation/):

- Either a single version:
```bash
pipx install poetry==1.8.2
poetry --version
```
- Or a parallel version (with a unique suffix):
```bash
pipx install poetry==1.8.2 --suffix=@1.8.2
poetry@1.8.2 --version
```

Set the Python version to use with Poetry:

```bash
poetry env use 3.9.18
```
or
```bash
poetry@1.8.2 env use 3.9.18
```

Install the dependencies:

```bash
make install
```


### Mac OS

To install the [worker](./services/worker) on Mac OS, you can follow the next steps.

#### First: as an administrator

Install brew:

```bash
$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
```

#### Then: as a normal user

Install pyenv:

```bash
$ curl https://pyenv.run | bash
```

append the following lines to ~/.zshrc:

```bash
export PYENV_ROOT="$HOME/.pyenv"
command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH"
eval "$(pyenv init -)"
```

Logout and login again.

Install Python 3.9.18:

```bash
$ pyenv install 3.9.18
```

Check that the expected local version of Python is used:

```bash
$ cd services/worker
$ python --version
Python 3.9.18
```

Install Poetry with [pipx](https://pipx.pypa.io/stable/installation/):

- Either a single version:
```bash
pipx install poetry==1.8.2
poetry --version
```
- Or a parallel version (with a unique suffix):
```bash
pipx install poetry==1.8.2 --suffix=@1.8.2
poetry@1.8.2 --version
```

append the following lines to ~/.zshrc:

```bash
export PATH="/Users/slesage2/.local/bin:$PATH"
```

Install rust:

```bash
$ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
$ source $HOME/.cargo/env
```

Set the python version to use with poetry:

```bash
poetry env use 3.9.18
```
or
```bash
poetry@1.8.2 env use 3.9.18
```

Avoid an issue with Apache beam (https://github.com/python-poetry/poetry/issues/4888#issuecomment-1208408509):

```bash
poetry config experimental.new-installer false
```
or
```bash
poetry@1.8.2 config experimental.new-installer false
```

Install the dependencies:

```bash
make install
```