Skip to content

Commit 7ea7654

Browse files
committed
Add more actions to datasets
Signed-off-by: Ching Yi, Chan <qrtt1@infuseai.io>
1 parent 363b35c commit 7ea7654

File tree

14 files changed

+1209
-41
lines changed

14 files changed

+1209
-41
lines changed

docs/CLI/datasets.md

Lines changed: 341 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,12 @@ Usage:
88
Get a dataset or list datasets
99
1010
Available Commands:
11+
create Create a dataset
12+
delete Delete a dataset by id
1113
get Get a dataset by name
1214
list List datasets
15+
update Update the dataset
16+
upload_secret Regenerate the secret of the upload server
1317
1418
Options:
1519
-h, --help Show the help
@@ -24,6 +28,36 @@ Global Options:
2428
```
2529

2630

31+
### create
32+
33+
Create a dataset
34+
35+
36+
```
37+
primehub datasets create
38+
```
39+
40+
41+
* *(optional)* file
42+
43+
44+
45+
46+
### delete
47+
48+
Delete a dataset by id
49+
50+
51+
```
52+
primehub datasets delete <id>
53+
```
54+
55+
* id: The dataset id
56+
57+
58+
59+
60+
2761
### get
2862

2963
Get a dataset by name
@@ -51,10 +85,42 @@ primehub datasets list
5185

5286

5387

88+
89+
### update
90+
91+
Update the dataset
92+
93+
94+
```
95+
primehub datasets update <name>
96+
```
97+
98+
* name
99+
100+
101+
102+
103+
104+
### upload_secret
105+
106+
Regenerate the secret of the upload server
107+
108+
109+
```
110+
primehub datasets upload_secret <id>
111+
```
112+
113+
* id: The dataset id or name
114+
115+
116+
117+
54118

55119

56120
## Examples
57121

122+
### Query datasets
123+
58124
The `datasets` command is a group specific resource. It only works after the `group` assigned.
59125

60126
Using `list` to find all datasets in your group:
@@ -64,22 +130,286 @@ $ primehub datasets list
64130
```
65131

66132
```
67-
id name displayName description type
68-
------ ------ ------------- ------------- ------
69-
kaggle kaggle kaggle pv
133+
id name displayName description type
134+
----------- ----------- -------------------------- ------------------------------- ------
135+
pv-dataset pv-dataset the dataset created by SDK It is a PV dataset pv
136+
env-dataset env-dataset env-dataset make changes to the description env
70137
```
71138

72139
If you already know the name of a dataset, use the `get` to get a single entry:
73140

74141
```
75-
$ primehub datasets get kaggle
142+
$ primehub datasets get dataset
76143
```
77144

78145
```
79-
primehub datasets get kaggle
80-
id: kaggle
81-
name: kaggle
82-
displayName: kaggle
83-
description:
84-
type: pv
85-
```
146+
id: pv-dataset
147+
name: pv-dataset
148+
displayName: the dataset created by SDK
149+
description: It is a PV dataset
150+
type: pv
151+
pvProvisioning: auto
152+
volumeSize: 1
153+
enableUploadServer: True
154+
uploadServerLink: http://primehub-python-sdk.primehub.io/dataset/hub/pv-dataset/browse
155+
global: False
156+
groups: [{'id': 'a962305b-c884-4413-9358-ef56373b287c', 'name': 'foobarbar', 'displayName': '', 'writable': False}, {'id': 'a7a283b5-c0e2-4b79-a78c-39c630324762', 'name': 'phusers', 'displayName': 'primehub users', 'writable': False}]
157+
```
158+
159+
### Admin actions for datasets
160+
161+
These actions only can be used by administrators:
162+
163+
* create
164+
* update
165+
* delete
166+
167+
For `create` and `update` require a dataset configuration, please see above examples.
168+
169+
### Fields for creating or updating
170+
171+
| field | required | type | description |
172+
| --- | --- | --- | --- |
173+
| name | required | string | it should be a valid resource name for kubernetes |
174+
| displayName | optional | string | display name for this dataset |
175+
| description | optional | string | |
176+
| global | optional | boolean | when a dataset is global, it could be seen for each group |
177+
| type | required | string | one of ['pv', 'nfs', 'hostPath', 'git', 'env'] |
178+
| url | conditional | string | **MUST** use with `git` type |
179+
| pvProvisioning | conditional | string | onf of ['auto', 'manual'], **MUST** use with `pv` type. This field only uses in `CREATE` action |
180+
| nfsServer | conditional | string | **MUST** use with `nfs` type |
181+
| nfsPath | conditional | string | **MUST** use with `nfs` type |
182+
| hostPath | conditional | string | **MUST** use with `hostPath` type |
183+
| variables | optional | dict | **MAY** use with `env` type. It is key value pairs. All values have to a string value. For example: `{"key1":"value1","key2":"value2"}`. |
184+
| groups | optional | list of connected groups (dict) | please see the `connect` examples |
185+
| secret | optional | dict | **MAY** use with `git` type | bind a `secret` to the `git` dataset |
186+
| volumeSize | conditional | integer | **MUST** use with `pv` type. The unit is `GB`.|
187+
| enableUploadServer | optional | boolean | it only works with one of ['pv', 'nfs', 'hostPath'] writable types |
188+
189+
> There is a simple rule to use fields for `UPDATE`. All required fields should not be in the payload.
190+
191+
For example, there is a configuration for creating env dataset:
192+
193+
```bash
194+
primehub datasets create <<EOF
195+
{
196+
"name": "env-dataset",
197+
"description": "",
198+
"type": "env",
199+
"variables": {
200+
"ENV": "prod",
201+
"LUCKY_NUMBER": "7"
202+
}
203+
}
204+
EOF
205+
```
206+
207+
After removing required `name` and `type` fields, it could be used with updating:
208+
209+
```bash
210+
primehub datasets update env-dataset <<EOF
211+
{
212+
"description": "make changes to the description",
213+
"variables": {
214+
"ENV": "prod",
215+
"LUCKY_NUMBER": "8"
216+
}
217+
}
218+
EOF
219+
```
220+
221+
For updating, giving things that you want to make different:
222+
223+
```bash
224+
primehub datasets update env-dataset <<EOF
225+
{
226+
"groups": {
227+
"connect": [
228+
{
229+
"id": "a7a283b5-c0e2-4b79-a78c-39c630324762",
230+
"writable": false
231+
}
232+
]
233+
}
234+
}
235+
EOF
236+
```
237+
238+
239+
240+
241+
242+
### PV type
243+
244+
```json
245+
{
246+
"name": "pv-dataset",
247+
"displayName": "the dataset created by SDK",
248+
"description": "It is a PV dataset",
249+
"type": "pv",
250+
"global": false,
251+
"groups": {
252+
"connect": [
253+
{
254+
"id": "a7a283b5-c0e2-4b79-a78c-39c630324762",
255+
"writable": true
256+
},
257+
{
258+
"id": "a962305b-c884-4413-9358-ef56373b287c",
259+
"writable": false
260+
}
261+
]
262+
},
263+
"pvProvisioning": "auto",
264+
"volumeSize": 1
265+
}
266+
```
267+
268+
Save the configuration to `create-dataset.json` and run `create`:
269+
270+
```
271+
primehub datasets create --file create-dataset.json
272+
```
273+
274+
The example creates a PV dataset. According to the type `pv`, these fields become `required`:
275+
* pvProvisioning: how does the PV create? `auto` means PV will create automatically, `manual` means the system administrator should create it.
276+
* volumeSize: the capacity in GB when `auto` creates it.
277+
278+
The `group.connect` will bind two groups to the dataset. One is a writable group and another is readonly group.
279+
280+
281+
### NFS type
282+
283+
```json
284+
{
285+
"name": "nfs-dataset",
286+
"type": "nfs",
287+
"groups": {
288+
"connect": [
289+
{
290+
"id": "a7a283b5-c0e2-4b79-a78c-39c630324762",
291+
"writable": true
292+
}
293+
]
294+
},
295+
"nfsServer": "1.2.3.4",
296+
"nfsPath": "/data"
297+
}
298+
```
299+
300+
Save the configuration to `create-dataset.json` and run `create`:
301+
302+
```
303+
primehub datasets create --file create-dataset.json
304+
```
305+
306+
The example creates a NFS dataset. According to the type `nfs`, these fields become `required`:
307+
* nfsServer: the address of a NFS server
308+
* nfsPath: the mount path of a NFS server
309+
310+
### HostPath type
311+
312+
```json
313+
{
314+
"name": "host-path-dataset",
315+
"description": "",
316+
"type": "hostPath",
317+
"groups": {
318+
"connect": [
319+
{
320+
"id": "a7a283b5-c0e2-4b79-a78c-39c630324762",
321+
"writable": true
322+
}
323+
]
324+
},
325+
"hostPath": "/opt/data"
326+
}
327+
```
328+
329+
Save the configuration to `create-dataset.json` and run `create`:
330+
331+
```
332+
primehub datasets create --file create-dataset.json
333+
```
334+
335+
The example creates a hostPath dataset. According to the type `hostPath`, the `hostPath` field becomes `required`. You should put an absolute path that available in the node.
336+
337+
### Git type
338+
339+
```json
340+
{
341+
"name": "git-dataset",
342+
"type": "git",
343+
"url": "https://github.com/datasets/covid-19"
344+
}
345+
```
346+
347+
or with a `secret`
348+
349+
```json
350+
{
351+
"name": "git-dataset",
352+
"type": "git",
353+
"url": "https://github.com/datasets/covid-19",
354+
"secret": {
355+
"connect": {
356+
"id": "gitsync-secret-public-key-for-git-repo"
357+
}
358+
}
359+
}
360+
```
361+
362+
Save the configuration to `create-dataset.json` and run `create`:
363+
364+
```
365+
primehub datasets create --file create-dataset.json
366+
```
367+
368+
The example creates a git dataset. According to the type `git`, `url` field becomes `required`. You should put a git repository url.
369+
370+
If the url needs a credential, you could use `secret` to connect the pre-set secret (an SSH public key).
371+
372+
### ENV type
373+
374+
```json
375+
{
376+
"name": "env-dataset",
377+
"description": "",
378+
"type": "env",
379+
"variables": {
380+
"ENV": "prod",
381+
"LUCKY_NUMBER": "7"
382+
}
383+
}
384+
```
385+
386+
Save the configuration to `create-dataset.json` and run `create`:
387+
388+
```
389+
primehub datasets create --file create-dataset.json
390+
```
391+
392+
The example creates an ENV dataset. According to the type `env`, `variables` field becomes `required`. You could put many key-value pairs. Be careful, the key and value should be string values.
393+
394+
### Group connect/disconnect
395+
396+
All dataset types could connect or disconnect to groups, but there is subtle difference between `CREATE` and `UPDATE`.
397+
398+
```json
399+
{
400+
"connect": [
401+
{
402+
"id": "a7a283b5-c0e2-4b79-a78c-39c630324762",
403+
"writable": true
404+
}
405+
],
406+
"disconnect": [
407+
{
408+
"id": "a7a283b5-c0e2-4b79-a78c-39c630324762"
409+
}
410+
]
411+
}
412+
```
413+
414+
* `disconnect` is only available for `UPDATE`
415+
* `connect` are both available (`CREATE` `UPDATE`)

0 commit comments

Comments
 (0)