Store the necessary informations
Let's take a look at the declaration of the ModelInfo class:
class ModelInfo(BaseModel):
name: str
target: Column
model_id: str
encoder_ids: Dict[str, str]
time_components: List[str]
feature_types: List[Column]
encrypted_fields: List[str] = Field(
default=["target", "encoder_ids", "time_components", "feature_types"],
exclude=True,
)
model: Union[LinearRegression, LogisticRegression] = Field(
default=None, exclude=True
)
encoders: Dict[str, LabelEncoder] = Field(default=None, exclude=True)
The only ones that are required are name, target, model_id, encoder_ids, time_components, and feature_types. As for the rest:
encrypted_fieldsdescribes what you need to encrypt ifencryptedis set totrueinconfig.ini. See Appendix C for more info.modelis the actual machine learning model that you just trained. During prediction, you should be using this.encodersis the actual encoders. During prediction, you should be using this.
Let's get back to the save method:
model_info = cls(
name=name,
target=target,
model_id=model_id,
encoder_ids=saved_encoders,
time_components=time_components,
feature_types=feature_types,
)
Notice how we only fill in the required fields, and left the rest as default.
What should I include in ModelInfo?
Rule of thumb is to include anything you will need for prediction. That could include the following:
- A portion of the training dataframe
- Some more feature engineering artifacts
- Columns that require additional processing
What should I implement?
The developer of the model algorithm should implement the following:
- A way to dump the things into an object that can be downloaded (see
dump_to_base64method) - The
deletemethod, when you delete aModelInfoyou should also remove the relevant machine learning model and feature engineering artifacts