Calculate cross validation scores
Next, we calculate the cross validation scores for the model. The procedure is as follows:
- Cut the training dataframe into several sub-frames
- For each of those sub-frames, create a machine learning model
- For each of those trained machine learning models, calculate their performance by comparing its predictions to another sub-frame
The corresponding code is the following snippet:
# Cut the training dataframe into cross-validation frames
cv_frames = CVFrames.iid(
training_df, request.folds, request.get_validation_data()
)
# Use the cross-validation frames to calculate the cv scores
folds = [do_cv_fold(cv_frame, request.target, labels) for cv_frame in cv_frames.frames]
cv_scores = CVScores(folds=folds)
We have provided several helper functions to make this a lot easier for you: the sub-frames splitting and the score calculations are all done by helpers in the exodusutils
function! The only place you need modifying is in do_cv_fold
function:
def do_cv_fold(train_frames: TrainFrames, target: Column, labels: np.ndarray):
"""
Does a cross validation fold. Includes training a cross validation model and calculating the score for the model.
Parameters
----------
train_frames : TrainFrames
The frames for this cross validation fold.
target : Column
The target column.
labels : np.ndarray
The unique classes that are involved in the classification problem.
Returns
-------
scores
The scores for this cross validation fold.
"""
cv_model = train_model(train_frames, target)
if train_frames.test is None:
raise ExodusBadRequest(detail=f"Cannot create test frame for this fold")
return calculate_score(cv_model, train_frames.test, target, labels)
This is a rather simple function, but there are 2 methods called here: train_model
and calculate_score
, the former correspond to the training model part, while the latter the score calculation part. We will go over them in the following sections.