Train a model
In order to train a machine learning model and obtain its performance metrics, we will be doing the following procedure:
- Sanitize the training input
- Perform feature engineering
- Extract labels for categorical problems
- Calculate cross validation scores
- Train the machine learning model
- Calculate holdout score
- Save the model
- Return the result
All of the code is in the files model_algorithm.py
and model_info.py
.
To start things off, let's modify the description of your model algorithm:
class ModelAlgorithm:
def __init__(self, host: Optional[str] = None, port: Optional[str] = None) -> None:
# ... snipped ...
self.description: str = "description" # TODO Change me!
After you've figured out the description of your model algorithm, you can start implementing the train_iid
method, the entrypoint for training an IID model.
def train_iid(self, request: TrainIIDReqBody) -> TrainRespBody:
If you enter a newline underneath the comments, and insert request.
, there should be a lot of autocomplete suggestions:
The class TrainIIDReqBody
is defined in the exodusutils
library, and contains a myriad of helper methods and variables. If we right click on the TrainIIDReqBody
symbol, and select Go to definition
, we will be able to see what's in it:
class TrainIIDReqBody(TrainReqBodyBase):
# ... snipped ...
validation_data: Optional[bytes] = Field(
default=None, description="The validation data as a sequence of bytes. Optional"
)
holdout_data: Optional[bytes] = Field(
default=None, description="The holdout data as a sequence of bytes. Optional"
)
fold_assignment_column_name: Optional[str] = Field(
default=None,
description="The name of the fold assignment column. If not provided, Exodus will cut cross validation folds in a modulo fashion. If this field is defined, it is required to be a valid column in the input dataframe. This column is not included in `feature_types`, and will be discarded during training.",
)
There seems to be only three fields, however the TrainIIDReqBody
class actually inherits from the TrainReqBodyBase
class:
class TrainReqBodyBase(BaseModel):
# ... snipped ...
training_data: bytes = Field(description="The training data as a sequence of bytes")
feature_types: List[Column] = Field(
default=[], description="The features present in training data"
)
target: Column = Field(
description="The target column. Must be present in feature_types"
)
folds: int = Field(
default=5, ge=2, le=10, description="Number of folds for this experiment"
)
We will be using class TrainIIDReqBody
's methods and fields in the upcoming sections.