In order to setup local development, you must have docker installed and if you want to run it locally you must have python 3.11.4 or greater installed.
make a folder called /data as well as inside that /bulk and inside that a folder for any usernames you wish it to work with
- data
- bulk - username1 - username2
if you want to run locally you must install requirements.txt for python3
to run locally run /deployment/bin/entrypoint_staging_service.sh
to run inside docker run /run_in_docker.sh
- to test use
./run_tests.sh
- requires python 3.11.4 or higher
- requires installation on mac of libmagic:
brew install libmagic
orsudo port install libmagic
. - testing requires a KBase auth token (see AUTH_URL configuration in
deployment/conf/testing.cfg
for relevant KBase environment) and KBase user id. These must be in environment variablesKBASE_TEST_TOKEN
andKBASE_TEST_USER
, respectively.
Included configurations for the Visual Studio Code debugger for python that mirror what is in the entrypoint.sh and testing configuration to run locally in the debugger, set breakpoints and if you open the project in VSCode the debugger should be good to go. The provided configurations can run locally and run tests locally
When releasing a new version:
- Update the release notes
- Update the version in staging_service/app.py.VERSION
to run locally you will need all of these utils on your system: tar, unzip, zip, gzip, bzip2, md5sum, head, tail, wc
in the docker container all of these should be available
The types returned by the importer_mappings
endpoint (see Get Importer Mappings below)
are generated by the GenerateMappings.py
script here: staging_service/autodetect/GenerateMappings.py
.
Running this script will build the supported_apps_w_extensions.json
file that should be placed in the
deployment/conf
directory.
To add new mappings, update the GenerateMappings.py
script. New file types and object types should be added
to the staging_service.autodetect.Mappings
module and included from there. See GenerateMappings.py
docstrings for more details.
The KBase Data Transfer Service (DTS,
See here for further documentation) copies data files from external sources into a KBase user's staging area, accompanied by a manifest.json
file.
However, due to various permissions reasons, it cannot copy those files directly. First,
it drops them off in a separate directory to which it has access. The Staging Service DTS
File Watcher then moves those files to the user's directory.
The DTS File Watcher is a separate entrypoint that uses much of the same machinery as the
rest of the Staging Service. The script can be found in scripts/run_dts_watcher.py
, and
the entrypoint can be found in deployment/bin/entrypoint_dts_watcher.sh
. It
does not provide a web service, but just watches a directory given in the config as
DTS_STAGING_DIR
for changes. If it sees a manifest.json
file in a subdirectory, it
parses that to get a KBase username and moves the whole subdirectory to that user's
staging area.
all paths should be specified treating the user's home directory as root
URL : ci.kbase.us/services/staging_service/test-service
local URL : localhost:3000/test-service
Method : GET
Code : 200 OK
Content example
This is just a test. This is only a test.
URL : ci.kbase.us/services/staging_service/test-auth
local URL : localhost:3000/test-auth
Method : GET
Headers : Authorization: <Valid Auth token>
Code : 200 OK
Content example
I'm authenticated as <username>
Condition : if authentication is incorrect
Code : 401 Unauthorized
Content :
Error Connecting to auth service ...
Code : 400 Bad Request
Content
Must supply token
defaults to not show hidden dotfiles
URL : ci.kbase.us/services/staging_service/list/{path to directory}
**URL
** : ci.kbase.us/services/staging_service/list/{path to directory}?showHidden={True/False}
local URL : localhost:3000/list/{path to directory}
**local URL
** : localhost:3000/list/{path to directory}?showHidden={True/False}
Method : GET
Headers : Authorization: <Valid Auth token>
Code : 200 OK
Content example
[
{
"name": "testFolder",
"path": "nixonpjoshua/testFolder",
"mtime": 1510949575000,
"size": 96,
"isFolder": true
},
{
"name": "testfile",
"path": "nixonpjoshua/testfile",
"mtime": 1510949629000,
"size": 335,
"isFolder": false
}
]
Condition : if authentication is incorrect
Code : 401 Unauthorized
Content :
Error Connecting to auth service ...
Code : 400 Bad Request
Content
Must supply token
Code : 404 Not Found
Content :
path <username>/<incorrect path> does not exist
URL : ci.kbase.us/services/staging_service/download/{path to file}
URL : ci.kbase.us/services/staging_service/download/{path to file}
local URL : localhost:3000/download/{path to file}
local URL : localhost:3000/download/{path to file}
Method : GET
Headers : Authorization: <Valid Auth token>
Code : 200 OK
Content : <file content>
Condition : if authentication is incorrect
Code : 401 Unauthorized
Content :
Error Connecting to auth service ...
Code : 400 Bad Request
Content
Must supply token
Code : 400 Bad Request
Content :
<username>/<incorrect path> is a directory not a file
Code : 404 Not Found
Content :
path <username>/<incorrect path> does not exist
defaults to not show hidden dotfiles
URL : ci.kbase.us/services/staging_service/search/{search query}
**URL
** : ci.kbase.us/services/staging_service/search/{search query}?showHidden={True/False}
local URL : localhost:3000/search/{search query}
local URL : localhost:3000/search/?showHidden={True/False}
Method : GET
Headers : Authorization: <Valid Auth token>
Code : 200 OK
Content example
[
{
"name": "testfile",
"path": "nixonpjoshua/testfile",
"mtime": 1510949629000,
"size": 335,
"isFolder": false
},
{
"name": "testFolder",
"path": "nixonpjoshua/testFolder",
"mtime": 1510949575000,
"size": 96,
"isFolder": true
},
{
"name": "testinnerFile",
"path": "nixonpjoshua/testFolder/testinnerFile",
"mtime": 1510949575000,
"size": 0,
"isFolder": false
}
]
Condition : if authentication is incorrect
Code : 401 Unauthorized
Content :
Error Connecting to auth service ...
Code : 400 Bad Request
Content
Must supply token
**URL
** : ci.kbase.us/services/staging_service/metadata/{path to file or folder}
local URL : localhost:3000/metadata/{path to file or folder}
Method : GET
Headers : Authorization: <Valid Auth token>
Code : 200 OK
Content example
{
"name": "testFolder",
"path": "nixonpjoshua/testFolder",
"mtime": 1510949575000,
"size": 96,
"isFolder": true
}
{
"md5": "73cf08ad9d78d3fc826f0f265139de33",
"lineCount": "13",
"head": "there is stuff in this file\nthere is stuff in this file\nthere is stuff in this file\nthere is stuff in this file\nthere is stuff in this file\nthere is stuff in this file\nthere is stuff in this file\nstuff at the bottom\nstuff at the bottom\nstuff at the bottom",
"tail": "there is stuff in this file\nthere is stuff in this file\nthere is stuff in this file\nstuff at the bottom\nstuff at the bottom\nstuff at the bottom\nstuff at the bottom\nstuff at the bottom\nstuff at the bottom\nstuff at the bottom",
"name": "testFile",
"path": "nixonpjoshua/testFile",
"mtime": 1510949629000,
"size": 335,
"isFolder": false
}
Condition : if authentication is incorrect
Code : 401 Unauthorized
Content :
Error Connecting to auth service ...
Code : 400 Bad Request
Content
Must supply token
Code : 404 Not Found
Content :
path <username>/<incorrect path> does not exist
URL : ci.kbase.us/services/staging_service/upload
local URL : localhost:3000/upload
Method : POST
Headers : Authorization: <Valid Auth token>
Body constraints
first element in request body should be
destPath: {path file should end up in}
second element in request body should be multipart file data
uploads: {multipart file}
Files starting with whitespace or a '.' are not allowed
Code : 200 OK
Content example
[
{
"name": "fasciculatum_supercontig.fasta",
"path": "nixonpjoshua/fasciculatum_supercontig.fasta",
"mtime": 1510950061000,
"size": 31536508,
"isFolder": false
}
]
Condition : if authentication is incorrect
Code : 401 Unauthorized
Content :
Error Connecting to auth service ...
Code : 400 Bad Request
Content
Must supply token
**URL
** : ci.kbase.us/services/staging_service/define-upa/{path to imported file}
local URL : localhost:3000/define-upa/{path to imported file}
Method : POST
Headers : Authorization: <Valid Auth token>
Body constraints
first element in request body should be
UPA: {the actual UPA of imported file}
Code : 200 OK
Content example
successfully update UPA <UPA> for file <Path>
Condition : if authentication is incorrect
Code : 401 Unauthorized
Content :
Error Connecting to auth service ...
Code : 400 Bad Request
Content
Must supply token
Code : 400 Bad Request
Content
must provide UPA field in body
URL : ci.kbase.us/services/staging_service/delete/{path to file or folder}
local URL : localhost:3000/delete/{path to file or folder}
Method : DELETE
Headers : Authorization: <Valid Auth token>
Code : 200 OK
Content example
successfully deleted UPA <Path>
Condition : if authentication is incorrect
Code : 401 Unauthorized
Content :
Error Connecting to auth service ...
Code : 400 Bad Request
Content
Must supply token
Code : 404 Not Found
Content
could not delete <Path>
Code : 403 Forbidden
Content
cannot delete home directory
cannot delete protected file
URL : ci.kbase.us/services/staging_service/mv/{path to file or folder}
local URL : localhost:3000/mv/{path to file or folder}
Method : PATCH
Headers : Authorization: <Valid Auth token>
Body constraints
first element in request body should be
newPath : {the new location/name for file or folder}
Code : 200 OK
Content example
successfully moved <path> to <newPath>
Condition : if authentication is incorrect
Code : 401 Unauthorized
Content :
Error Connecting to auth service ...
Code : 400 Bad Request
Content
Must supply token
Code : 400 Bad Request
Content
must provide newPath field in body
Code : 403 Forbidden
Content
cannot rename home or move directory
cannot rename or move protected file
Code: 409 Conflict
Content
<newPath> already exists
supported archive formats are:
.zip, .ZIP, .tar.gz, .tgz, .tar.bz, .tar.bz2, .tar, .gz, .bz2, .bzip2
URL : ci.kbase.us/services/staging_service/decompress/{path to archive
local URL : localhost:3000/decompress/{path to archive}
Method : PATCH
Headers : Authorization: <Valid Auth token>
Code : 200 OK
Content example
successfully decompressed <path to archive>
Condition : if authentication is incorrect
Code : 401 Unauthorized
Content :
Error Connecting to auth service ...
Code : 400 Bad Request
Content
Must supply token
Code : 400 Bad Request
Content
cannot decompress a <file extension> file
After authenticating at this endpoint, AUTH is queried to get your filepath and globus id file for linking to globus.
URL : ci.kbase.us/services/staging_service/add-acl
local URL : localhost:3000/add-acl
Method : GET
Headers : Authorization: <Valid Auth token>
Code : 200 OK
Content example
{
"success": true,
"principal": "KBase-Example-59436z4-z0b6-z49f-zc5c-zbd455f97c39",
"path": "/username/",
"permissions": "rw"
}
Condition : if authentication is incorrect
Code : 401 Unauthorized
Content :
Error Connecting to auth service ...
Condition : If issue with Globus API or ACL Already Exists
Code : 500 Internal Server Error
Content
{
'success': False,
'error_type': 'TransferAPIError',
'error': "Can't create ACL rule; it already exists",
'error_code': 'Exists', 'shared_directory_basename': '/username/'
}
After authenticating at this endpoint, AUTH is queried to get your filepath and globus id file for linking to globus.
URL : ci.kbase.us/services/staging_service/remove-acl
local URL : localhost:3000/remove-acl
Method : GET
Headers : Authorization: <Valid Auth token>
Code : 200 OK
Content example
{
"message": "{\n \"DATA_TYPE\": \"result\",\n \"code\": \"Deleted\",
"message\": \"Access rule 'KBASE-examplex766ada0-x8aa-x1e8-xc7b-xa1d4c5c824a' deleted successfully\",
"request_id\": \"x2KFzfop05\",\n \"resource\": \"/endpoint/KBaseExample2a-5e5b-11e6-8309-22000b97daec/access/KBaseExample-ada0-d8aa-11e8-8c7b-0a1d4c5c824a\"}",
"Success": true
}
Condition : if authentication is incorrect
Code : 401 Unauthorized
Content :
Error Connecting to auth service ...
Condition : If issue with Globus API or ACL Already Exists
Code : 500 Internal Server Error
Content
{
'success': False,
'error_type': 'TransferAPIError',
'error': "Can't create ACL rule; it already exists",
'error_code': 'Exists', 'shared_directory_basename': '/username/'
}
This endpoint parses one or more bulk specification files in the staging area into a data structure (close to) ready for insertion into the Narrative bulk import or analysis cell.
By default, it can parse .tsv
, .csv
, and Excel (.xls
and .xlsx
) files.
Templates for the currently supported data types are available in
the templates
directory of this repo. See
the README.md file
for instructions on template usage.
When given the dts
flag in the URL, this endpoint can also parse manifest
files that come from the KBase Data Transfer Service (DTS)
See here for further documentation on the service.
This service copies data files from external sources into a KBase user's staging
area, accompanied by a manifest.json
file. These also contain information on how to
load files into KBase importer apps. However, since they are a very specific format,
this means adding a dts
flag to the URL. When this flag is present, all files
are expected to be .json
files and conform to the DTS schema.
See the import specification ADR document for design details.
URL : ci.kbase.us/services/staging_service/bulk_specification
local URL : localhost:3000/bulk_specification
Method : GET
Headers : Authorization: <Valid Auth token>
Code : 200 OK
Content examples
GET bulk_specification/?files=file1.<ext>[,file2.<ext>,...]
<ext>
is one of csv
, tsv
, xls
, or xlsx
.
GET bulk_specification/?files=file1.json[,file2.json,...]&dts
When using the dts
flag, all files must be JSON and have the .json
extension.
Both versions of the endpoint respond in the same format.
Response:
{
"types": {
<type 1>: [
{<spec.json ID 1>: <value for ID, row 1>, <spec.json ID 2>: <value for ID, row 1>, ...},
{<spec.json ID 1>: <value for ID, row 2>, <spec.json ID 2>: <value for ID, row 2>, ...},
...
],
<type 2>: [
{<spec.json ID 1>: <value for ID, row 1>, <spec.json ID 2>: <value for ID, row 1>, ...},
...
],
...
},
"files": {
<type 1>: {"file": "<username>/file1.<ext>", "tab": "tabname"},
<type 2>: {"file": "<username>/file2.<ext>", "tab": null},
...
}
}
<type N>
is a data type ID from the Mappings.py file and the Narrative staging area configuration file - it is a shared namespace between the staging service and Narrative to specify bulk applications, and has a 1:1 mapping to an app. It is determined by the first header line from the templates.<spec.json ID N>
is the ID of an input parameter from aKB-SDK
app'sspec.json
file. These are determined by the second header line from the templates and will differ by the data type.<value for ID, row N>
is the user-provided value for the input for a givenspec.json
ID and import or analysis instance, where an import/analysis instance is effectively a row in the data file. Each data file row is provided in order for each type. Each row is provided in a mapping ofspec.json
ID to the data for the row. Lines > 3 in the templates are user-provided data, and each line corresponds to a single import or analysis.
Error responses are of the general form:
{
"errors": [
{"type": <error code string>,
... other fields depending on the error code ...
},
...
]
}
Existing error codes are currently:
cannot_find_file
if an input file cannot be foundcannot_parse_file
if an input file cannot be parsedincorrect_column_count
if the column count is not as expected- For Excel files, this may mean there is a non-empty cell outside the bounds of the data area
multiple_specifications_for_data_type
if more than one tab or file per data type is submittedno_files_provided
if no files were providedunexpected_error
if some other error occurs
The HTTP code returned will be, in order of precedence:
- 400 if any error other than
cannot_find_file
orunexpected_error
occurs - 404 if at least one error is
cannot_find_file
but there are no 400-type errors - 500 if all errors are
unexpected_error
The per error type data structures are:
{
"type": "cannot_find_file",
"file": <filepath>
}
{
"type": "cannot_parse_file",
"file": <filepath>,
"tab": <spreadsheet tab if applicable, else null>,
"message": <message regarding the parse error>
}
{
"type": "incorrect_column_count",
"file": <filepath>,
"tab": <spreadsheet tab if applicable, else null>,
"message": <message regarding the error>
}
{
"type": "multiple_specifications_for_data_type",
"file_1": <filepath for first file>,
"tab_1": <spreadsheet tab from first file if applicable, else null>,
"file_2": <filepath for second file>,
"tab_2": <spreadsheet tab for second file if applicable, else null>,
"message": <message regarding the multiple specification error>
}
{
"type": "no_files_provided"
}
{
"type": "unexpected_error",
"file": <filepath if applicable to a single file>
"message": <message regarding the error>
}
This endpoint is the reverse of the parse bulk specifications endpoint - it takes a similar data structure to that which the parse endpoint returns and writes bulk specification templates.
URL : ci.kbase.us/services/staging_service/write_bulk_specification
local URL : localhost:3000/write_bulk_specification
Method : POST
Headers :
Authorization: <Valid Auth token>
Content-Type: Application/JSON
Code : 200 OK
Content example
POST write_bulk_specification/
{
"output_directory": <staging area directory in which to write output files>,
"output_file_type": <one of "CSV", "TSV", or "EXCEL">,
"types": {
<type 1>: {
"order_and_display: [
[<spec.json ID 1>, <display.yml name 1>],
[<spec.json ID 2>, <display.yml name 2>],
...
],
"data": [
{<spec.json ID 1>: <value for ID, row 1>, <spec.json ID 2>: <value for ID, row 1>, ...},
{<spec.json ID 1>: <value for ID, row 2>, <spec.json ID 2>: <value for ID, row 2>, ...}
...
]
},
<type 2>: {
"order_and_display: [
[<spec.json ID 1>, <display.yml name 1>],
...
],
"data": [
{<spec.json ID 1>: <value for ID, row 1>, <spec.json ID 2>: <value for ID, row 1>, ...},
...
]
},
...
}
}
output_directory
specifies where the output files should be written in the user's staging area.output_file_type
specifies the format of the output files.<type N>
is a data type ID from the Mappings.py file and the Narrative staging area configuration file - it is a shared namespace between the staging service and Narrative to specify bulk applications, and has a 1:1 mapping to an app. It is included in the first header line in the templates.order_and_display
determines the ordering of the columns in the written templates, as well as mapping the spec.json ID of the parameter to the human-readable name of the parameter in the display.yml file.<spec.json ID N>
is the ID of an input parameter from aKB-SDK
app'sspec.json
file. These are written to the second header line from the import templates and will differ by the data type.data
contains any data to be written to the file as example data, and is analogous to the data structure returned from the parse endpoint. To specify that no data should be written to the template provide an empty list.<value for ID, row N>
is the value for the input for a givenspec.json
ID and import or analysis instance, where an import/analysis instance is effectively a row in the data file. Each data file row is provided in order for each type. Each row is provided in a mapping ofspec.json
ID to the data for the row. Lines > 3 in the templates are user-provided data, and each line corresponds to a single import or analysis.
Response:
{
"output_file_type": <one of "CSV", "TSV", or "EXCEL">,
"files": {
<type 1>: <staging service path to file containg data for type 1>,
...
<type N>: <staging service path to file containg data for type N>,
}
}
output_file_type
has the same definition as above.files
contains a mapping of each provided data type to the output template file for that type. In the case of Excel, all the file paths will be the same since the data types are all written to different tabs in the same file.
Method specific errors have the form:
{"error": <error message>}
The error code in this case will be a 4XX error.
The AioHTTP server may also return built in errors that are not in JSON format - an example of this is overly large (> 1MB) request bodies.
This endpoint returns:
- a mapping between a list of files and predicted importer apps, and
- a file information list that includes the input file names split between the
file prefix and the file suffix, if any, that was used to determine the file -> importer
mapping, and a list of file types based on the file suffix. If a file has a suffix that does not
match any mapping (e.g.
.sys
), the suffix will benull
, the prefix the entire file name, and the file type list empty.
For example,
- if we pass in nothing we get a response with no mappings
- if we pass in a list of files, such as ["file1.fasta", "file2.fq", "None"], we would get back a response that maps to Fasta Importers and FastQ Importers, with a weight of 0 to 1 which represents the probability that this is the correct importer for you.
- for files for which there is no predicted app, the return is a null value
- this endpoint is used to power the dropdowns for the staging service window in the Narrative
Note: to update these mappings see instructions here
URL : ci.kbase.us/services/staging_service/importer_mappings
local URL : localhost:3000/importer_mappings
Method : POST
Headers : Not Required
Code : 200 OK
Content example
data = {"file_list": ["file1.txt", "file2.zip", "file3.gff3.gz"]}
async with AppClient(config, username) as cli:
resp = await cli.post(
"importer_mappings/", data=data
)
Response:
{
"mappings": [
null,
[{
"id": "decompress",
"title": "decompress/unpack",
"app_weight": 1,
}],
[{
"app_weight": 1,
"id": "gff_genome",
"title": "GFF/FASTA Genome",
},
{
"app_weight": 1,
"id": "gff_metagenome",
"title": "GFF/FASTA MetaGenome",
}]
],
"fileinfo": [
{"prefix": "file1.txt", "suffix": null, "file_ext_type": []},
{"prefix": "file2", "suffix": "zip", "file_ext_type": ["CompressedFileFormatArchive"]},
{"prefix": "file3", "suffix": "gff3.gz", "file_ext_type": ["GFF"]}
]
}
Code : 400 Bad Request
Content
must provide file_list field
This endpoint returns information about the file types associated with data types and the file extensions for those file types. It is primarily of use for creating UI elements describing which file extensions may be selected when performing bulk file selections.
URL : ci.kbase.us/services/staging_service/importer_filetypes
local URL : localhost:3000/importer_filetypes
Method : GET
Headers : Not Required
Code : 200 OK
Content example
GET importer_filetypes/
Response:
{
"datatype_to_filetype": {
<type 1>: [<file type 1>, ... <file type N>],
...
<type M>: [<file type 1>, ... <file type N>],
},
"filetype_to_extensions": {
<file type 1>: [<extension 1>, ..., <extension N>],
...
<file type M>: [<extension 1>, ..., <extension N>],
}
}
<type N>
is a data type ID from the Mappings.py file and the Narrative staging area configuration file - it is a shared namespace between the staging service and Narrative to specify bulk applications, and has a 1:1 mapping to an import app. It is included in the first header line in the templates.<file type N>
is a file type likeFASTA
orGENBANK
. The supported file types are listed below.<extension N>
is a file extension like*.fa
or*.gbk
.
These are the currently supported upload app type IDs:
fastq_reads_interleaved
fastq_reads_noninterleaved
sra_reads
genbank_genome
gff_genome
gff_metagenome
expression_matrix
media
fba_model
assembly
phenotype_set
sample_set
metabolic_annotation
metabolic_annotation_bulk
escher_map
decompress
Note that decompress is only returned when no other file type can be detected from the file extension.
These are the currently supported file type IDs. These are primarily useful for apps that take two different file types, like GFF/FASTA genomes.
FASTA
FASTQ
SRA
GFF
GENBANK
SBML
JSON
TSV
CSV
EXCEL
CompressedFileFormatArchive