Skip to content
This repository was archived by the owner on Dec 5, 2024. It is now read-only.
Davide Berdin edited this page Oct 23, 2019 · 4 revisions

The model is the entity that keeps the data that we want to serve to the clients. It is composed by different parts and it needs to be attached to a container before using it. If you haven't read about the containers check this link.

Content

Each model has some metadata that is added at creation time. The metadata looks like the following:

  • Version. Initially it starts with version 0.1.0 and it increases by following the semver system. Check Versioning for more information
  • Stage. Value that determines if the data is published or not. Check Staging for more information
  • Name. Name of the model
  • SignalOrder. Each key in the data is called signal. To enforce the validation on the definition of each key, this list contains the amount of parameters that compose the key. Check Signal Ordering for more information
  • Concatenator. If the key is composed by multiple variables, the concatenator character helps validating the key. Check Concatenator for more information

In Aerospike the model entity is created as follow:

  • SetName = Name. The name of the model must be unique and it will compose the setname
  • Key = signalID. The key is the signalID of the dataset you are uploading to Phoenix
  • Bin = [ {} ]. The Bins is a list of objects then it will be returned to the client.

With this system we guarantee that there won't be clashes in the dataset. Note that the APIs keeps track of the models' name that are created to avoid duplication. As mentioned before, the name of the model must be unique. The APIs will throw a validation error in case of duplication.

Versioning

The version of the model follows the semver approach. If you never have heard of it, check this link first. Every time there is an update on the model metadata, the version of the model increase. For example, if the data is published, meaning that the Stage value is PUBLISHED the version of the model increase in the Major part. If the model is emptied and all the data is gone, the model version goes back to the initial value.

Staging

Each model has two type of stage value: STAGED and PUBLISHED. In the first case, the data is not accessible from the Public APIs. The reason behind this choice is to avoid having inconsistent information and a more strict way of controlling what type of data is uploaded.

Every operation on the model must be done when the stage value is STAGED. After the model is ready and the data has been uploaded, you can PUBLISH the model and make it available to the Public APIs. Once the data is in PUBLISHED mode, you cannot upload/delete/update the dataset nor the metadata of the model. You must switch back to STAGED if you want to do any maintenance.

Signal Ordering

As mentioned previously, we upload data in a Key/Value fashion. Although, the key can be composed by multiple parameters. In fact, it is common to have keys that is a combination of userId and articleId or other forms. To add a little layer of validation, the APIs asks the client to supply the information regarding the dataset. To make the above explanation simpler, let's build two examples.

Simple Key

The dataset looks like the following

{"signalId":"1","recommended":[{"item":"a","score":"0.5"},{"item":"b","score":"0.4"}]}
{"signalId":"2","recommended":[{"item":"c","score":"0.5"},{"item":"d","score":"0.4"}]}
{"signalId":"3","recommended":[{"item":"a","score":"0.5"},{"item":"c","score":"0.4"}]}

When you create the model in Phoenix the values for the signal ordering will look like the following:

{
    ...
    "signalOrder":["userId"],
    "concatenator":""
}

In this way, when the system uploads the dataset, it can validate that the "signalId" is actually formed by 1 parameter only. Of course, it cannot check if the actual "signalId" values exists, but at least we are enforcing the correct form.

Composed Key

Now, let's assume that the dataset looks like the following

{"signalId":"1_aaa_34","recommended":[{"item":"a","score":"0.5"},{"item":"b","score":"0.4"}]}
{"signalId":"2_aaa_34","recommended":[{"item":"c","score":"0.5"},{"item":"d","score":"0.4"}]}
{"signalId":"3_aaa_34","recommended":[{"item":"a","score":"0.5"},{"item":"c","score":"0.4"}]}

When you create the model in Phoenix the values for the signal ordering will look like the following:

{
    ...
    "signalOrder":["userId", "articleId", "paramX"],
    "concatenator":"_"
}

Now, when you are uploading the data, the system performs the following check. It counts the number of signalOrder elements, it splits the signalId by the concatenator value, and it performs a length matching. If that's not correct, it will store the line number where the error was found and returned to the client when checking the upload status (or directly in case of upload data directly).

Concatenator

The concatenator is a character used to connect multiple signals together. To avoid having the client to specify "random" characted as concatenator, you can choose only from the following list: '|','#','_','-'. We believe that other characters would trigger false-positive validation errors or not at all.

Clone this wiki locally