Store the necessary informations
Let's take a look at the declaration of the ModelInfo
class:
class ModelInfo(BaseModel):
name: str
target: Column
model_id: str
encoder_ids: Dict[str, str]
time_components: List[str]
feature_types: List[Column]
encrypted_fields: List[str] = Field(
default=["target", "encoder_ids", "time_components", "feature_types"],
exclude=True,
)
model: Union[LinearRegression, LogisticRegression] = Field(
default=None, exclude=True
)
encoders: Dict[str, LabelEncoder] = Field(default=None, exclude=True)
The only ones that are required are name
, target
, model_id
, encoder_ids
, time_components
, and feature_types
. As for the rest:
encrypted_fields
describes what you need to encrypt ifencrypted
is set totrue
inconfig.ini
. See Appendix C for more info.model
is the actual machine learning model that you just trained. During prediction, you should be using this.encoders
is the actual encoders. During prediction, you should be using this.
Let's get back to the save
method:
model_info = cls(
name=name,
target=target,
model_id=model_id,
encoder_ids=saved_encoders,
time_components=time_components,
feature_types=feature_types,
)
Notice how we only fill in the required fields, and left the rest as default.
What should I include in ModelInfo
?
Rule of thumb is to include anything you will need for prediction. That could include the following:
- A portion of the training dataframe
- Some more feature engineering artifacts
- Columns that require additional processing
What should I implement?
The developer of the model algorithm should implement the following:
- A way to dump the things into an object that can be downloaded (see
dump_to_base64
method) - The
delete
method, when you delete aModelInfo
you should also remove the relevant machine learning model and feature engineering artifacts