Skip to content

Comments on the "dry run" #12

@ffund

Description

@ffund

Creating the template:

  • Sanitize and/or give guidance on appropriate project names
  • Ask the user to create a repo named for the project, and supply the repo URL
  • Don't ask for lease name at template creation time.
    • Instead, direct people to use the project name as the prefix for the lease name.
  • Ask which site to put data buckets at, have a sane default if user doesn't know or care
  • Some people may be GPU-agnostic (don't care what type of GPU), should be able to use both
  • Should give some additional guidance in general, to help people make good decisions

Workflow:

  • After creating the template, we should advise to put the newly generated project in a Github repo
  • User is going to run the template generator once. EVERYTHING they will need goes in it then.
  • General comment: want to minimize friction by making user run step only at the time that it is required.

chi materials:

  • In notebook 0, remove angle brackets around project name
  • At the end of notebook 0: remove "next steps: launch GPU instance"
  • At the end of notebook 0: add a section that shows how to check on the newly created buckets using the Horizon GUI
  • Notebook 1: instead of a single lease name, let the lease name start with the project name; write code to get the active lease name that starts with the project name.
  • Notebook 1: didn't substitute project name when mounting buckets
  • Notebook 1: instance name should have project name instead of mltrain
  • Should have notebook 1 equivalent for no GPU, also for VM instance (with NVIDIA GPU and with no GPU)
  • Need to be able to mount object store buckets even if they are on different site than compute instance
  • Might need to adjust mount settings for data bucket (e.g. cache etc.) so it's not very slow to load data

Post-chi materials:

  • Need a new notebook after Notebook 1, for setting up env variable, building container images, and starting containers
    • If I don't have/need a HF token, it should do something sane
    • specify HF_HOME where I mounted object store bucket for datasets, but must also specify HF_TOKEN_PATH in an ephemeral location
  • The whole Docker workflow assumes I am in NVIDIA land, and Pytorch
  • Don't assume Lightning automatically
  • When installing MLFlow Python client in Dockerfile, match the MLFlow server version
  • Only need to log in to Github when I need to push something
  • mlflow.set_experiment - use project name
  • Git context
  • in examples, don't put everything in one cell
  • should have notebook and src examples for each

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions