Skip to content

MLPipeline does not preserve metadata from JSON pipeline annotation #137

@micahjsmith

Description

@micahjsmith
  • MLBlocks version: 0.4
  • Python version: 3.8

Description

I'm trying to get metadata from an MLPipeline object that was present in the JSON pipeline annotation that was loaded. For example, the annotation has a metadata.name field that I'd like to access from the pipeline.

What I Did

In the following example, I would expect that the MLPipeline has a metadata dict which has name key, just like the JSON. But it doesn't.

$ mkdir -p mlprimitives mlpipelines
$ curl -s https://raw.githubusercontent.com/MLBazaar/MLPrimitives/master/mlprimitives/primitives/sklearn.ensemble.RandomForestRegressor.json -o mlprimitives/sklearn.ensemble.RandomForestRegressor.json
$ curl -s https://raw.githubusercontent.com/MLBazaar/MLPrimitives/master/mlprimitives/pipelines/sklearn.ensemble.RandomForestRegressor.json -o mlpipelines/sklearn.ensemble.RandomForestRegressor.json
$ jq .metadata.name mlpipelines/sklearn.ensemble.RandomForestRegressor.json 
"sklearn.ensemble.RandomForestRegressor"
$ python
Python 3.8.3 (default, Jul 20 2020, 16:43:14) 
[Clang 11.0.3 (clang-1103.0.32.62)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from mlblocks import load_pipeline, MLPipeline
>>> pipeline = MLPipeline(load_pipeline('sklearn.ensemble.RandomForestRegressor'))
>>> pipeline.metadata
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'MLPipeline' object has no attribute 'metadata'

Suggestions

  1. Explicitly support persisting metadata on the MLPipeline object (and presumably on underlying MLBlock objects)
  2. Raise an error of the JSON input contains unused (unsupported) keys
  3. Guarantee that MLPipeline.load and MLPipeline.save are inverse operations, i.e. that no data is lost (currently metadata fields are lost)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions