Format the results
After we have our raw prediction results, we need to piece it back into a JSON string.
if model_info.target.data_type != DataType.double:
predicted = model_info.encoders[model_info.target.name].inverse_transform(predicted)
sanitized_df[PREDICTION_COLNAME] = predicted
if model_info.target.data_type != DataType.double:
classes = list(model_info.model.classes_)
predict_proba = pd.DataFrame(model_info.model.predict_proba(features), columns=classes)
sanitized_df = pd.concat([sanitized_df, predict_proba], axis=1).reset_index(drop=True)
else:
classes = list()
columns_to_drop = [
c
for c in list(sanitized_df.columns)
if c not in request.keep_columns and c != PREDICTION_COLNAME and c not in classes
]
results: List[Dict[str, Any]] = json.loads(
str(sanitized_df.drop(columns=columns_to_drop).to_json(orient="records"))
)
The return value of the predict
method contains only a field of type List[Dict[str, Any]]
, where a single Dict
represents the prediction results for a single row in the prediction input dataframe. That being said, it is easier for us to manipulate on a dataframe than doing each Dict
by hand, so let's see how we can achieve that.
Turn our predicted value back to strings
Remember that we did label_encoding
during feature engineering? This will turn our target column into a column with numeric class labels, and we need to turn those back into actual strings.
To do that, we make use of the LabelEncoder.inverse_transform
method:
if model_info.target.data_type != DataType.double:
predicted = model_info.encoders[model_info.target.name].inverse_transform(predicted)
If your model didn't do label_encoding
during feature engineering, you can omit this step.
Attach the predicted values
Then we attach our predicted value to the input dataframe.
sanitized_df[PREDICTION_COLNAME] = predicted
If we are dealing with a classification problem, each prediction should come with the prediction probabilities of all the possible classes. To do that, we create a new dataframe containing the prediction probabilities, and then concatenate the new dataframe to our result dataframe.
Note that we have to use the strings instead of the model's classes, which are just a bunch of numbers.
if model_info.target.data_type != DataType.double:
classes = list(model_info.encoders[model_info.target.name].classes_)
predict_proba = pd.DataFrame(model_info.model.predict_proba(features), columns=classes)
sanitized_df = pd.concat([sanitized_df, predict_proba], axis=1).reset_index(drop=True)
else:
classes = list()
We also need to keep track of the classes we've added to the result dataframe.
Drop columns we don't need
Then we need to filter out the columns we don't need for the result.
columns_to_drop = [
c
for c in list(sanitized_df.columns)
if c not in request.keep_columns and c != PREDICTION_COLNAME and c not in classes
]
We only keep the columns that are explicitly specified by the user (via the keep_columns
field in the request), the prediction result column (the PREDICTION_COLNAME
variable, which is an alias for the string "prediction"
), and the classes for the target column if it's a classification problem.