Piecing it together
Here's the train_iid
method in its entirety (sans comments):
def train_iid(self, request: TrainIIDReqBody) -> TrainRespBody:
# SANITIZING #
sanitized_df = remove_invalid_values(request.get_training_df())
raw_holdout_df = request.get_holdout_data()
if raw_holdout_df is not None:
sanitized_holdout = remove_invalid_values(raw_holdout_df)
else:
sanitized_holdout = None
# FEATURE ENGINEERING #
training_df, holdout_df, encoders, components = feature_engineering(
sanitized_df, sanitized_holdout
)
labels = pd.unique(training_df[request.target.name])
# CROSS VALIDATION #
cv_frames = CVFrames.iid(
training_df, request.folds, request.get_validation_data()
)
folds = [
do_cv_fold(cv_frame, request.target, labels)
for cv_frame in cv_frames.frames
]
cv_scores = CVScores(folds=folds)
# TRAIN THE MODEL #
train_frames = TrainFrames(train=training_df, validation=request.get_validation_data())
model = train_model(train_frames, request.target)
# HOLDOUT #
if holdout_df is not None:
holdout_scores = calculate_score(model, holdout_df, request.target, labels)
else:
holdout_scores = None
# SAVING #
model_info_id = ModelInfo.save(
name=self.name,
target=request.target,
model=model,
encoders=encoders,
time_components=components,
feature_types=request.feature_types,
mongo=self.mongo,
gridfs=self.gridfs,
collection=self.collection,
)
# PREPARING RESPONSE #
attributes = Attributes(cv_scores=cv_scores, holdout_scores=holdout_scores)
response = TrainRespBody(
name=self.name,
attributes=attributes,
model_id=model_info_id,
hyperparameters=model.get_params(),
)
return response
Several things to note here:
- For the final machine learning model, we do not need to calculate its score. Therefore we use the helper class
TrainFrames
, and invoke its methodiid_without_test
to generate the training frame and the validation frame. - Remember in
do_cv_fold
we make use of the methodstrain_model
andcalculate_score
? After we have calculated the cross validation scores, we can reuse those methods to calculate the final machine learning model, and calculate the holdout scores.
Now, we have successfully trained our machine learning model, and a set of scores to evaluate this model. In the following section we will see how persisting this model is possible.