Developer Interface

This part of the documentation covers all the interfaces of Decanter Core SDK.

Main Interface

Decanter AI Core SDK’s main functionality can be accessed by the bellow Interfaces.

Connection and Funcional Settings

Initialization for running SDK.

class decanter.core.context.Context

Init the connection to decanter core server and functionality for running SDK.

Example

from decanter import core
context = core.Context.create(
    username='usr', password='pwd', host='decantercoreserver')
context.run()
CORO_TASKS = []
HOST = None
JOBS = []
LOOP = None
PASSWORD = None
USERNAME = None
api = None
static close()

Close the event loop and reset JOBS and CORO_TASKS.

Close the event loop if it’s not running (will not close in Jupyter Notebook).

classmethod create(username, password, host)

Create context instance and init necessary variable and objects.

Setting the user, password, and host for the funture connection when calling APIs, and create an event loop if it isn’t exist. Check if the connection is healthy after args be set.

Parameters
  • username (str) – User name for login Decanter Core server

  • password (str) – Password name for login Decanter Core server

  • host (str) – Decanter Core server URL.

Returns

Context>

static get_all_jobs()

Get a list of Jobs that have been or waiting to be executed.

Returns

list(Job)

static get_jobs_by_name(names)

Get the Job instances by its name.

Parameters

names (list(str)) – Names of wish to select.

Returns

Jobs with name in names list.

Return type

list(Job)

static get_jobs_status(sort_by_status=False, status=None)

Get a dataframe of jobs and its corresponding status. Return all jobs and its status if no arguments passed.

Parameters
  • sort_by_status (bool, optional) – DataFrame will sort by status, group the job with same status. Defaults to Faise.

  • status (list`(:obj:`str), optional) – Only select the job with the status in status list.

Returns

DataFrame with Job name and its status.

Return type

pandas.DataFrame

Raises

Exception – If any status in status list is invalid.

static healthy()

Check the connection between Decanter Core server.

Send a fake request to determine if there’s connection or authorization errors.

static run()

Start execute the tasks in CORO_TASKs.

Gather all tasks and execute. It will block on all tasks until all have been finished.

static stop_all_jobs()

Stop all Jobs which status is still in pending or running

static stop_jobs(jobs_list)

Stop Jobs in jobs_list.

Parameters

jobs_list (list(Job)) – List of jobs instance wished to be stopped.

decanter.core.context.get_or_create_eventloop()

Client for Decanter Core API

Function for user handle the use of Decanter Core API.

class decanter.core.client.CoreClient(username, password, host)

Handle client side actions.

Support actions sunch as setup data, upload data, train, predict, time series train and predict…ect.

Example

from decanter import core
client = core.CoreClient()
client.upload(data={csv-file-type/dataframe})
static predict(predict_input, name=None)

Predict model with test data.

Create a PredictResult Job and scheduled the execution in CORO_TASKS list. Record the Job in JOBS list.

Parameters
  • predict_input – (PredictInput): stores the settings for prediction.

  • name (str, optional) – string, name for predict action.

Returns

PredictResult object

Raises

AttributeError – If the function is called without Context created

static predict_ts(predict_input, name=None)

Predict time series model with test data.

Create a Time Series PredictResult Job and scheduled the execution in CORO_TASKS list. Record the Job in JOBS list.

Parameters
  • predict_input – (PredictTSInput): stores the settings for prediction.

  • name (str, optional) – name for predict time series action.

Returns

PredictTSResult object.

Raises

AttributeError – If the function is called without Context created

static setup(setup_input, name=None)

Setup data reference.

Create a DataSetup Job and scheduled the execution in CORO_TASKS list. Record the Job in JOBS list.

Parameters
  • setup_input – (SetupInput): stores the settings for training.

  • name (str, optional) – name for setup action.

Returns

DataSetup object

Raises

AttributeError – If the function is called without Context created.

static train(train_input, select_model_by=Evaluator.auto, name=None)

Train model with data.

Create a Experiment Job and scheduled the execution in CORO_TASKS list. Record the Job in JOBS list.

Parameters
  • train_input – (TrainInput): stores the settings for training.

  • select_model_by – (Evaluator): if predict by trained experiment, how should we select best model

  • name (str, optional) – name for train action.

Returns

Experiment object

Raises

AttributeError – If the function is called without Context created.

static train_cluster(train_input, name=None)

Train cluster model with data.

Create a Cluster Experiment Job and scheduled the execution in CORO_TASKS list. Record the Job in JOBS list.

Parameters
  • train_input – (TrainClusterInput): Settings for training.

  • name (str, optional) – name for train time series action.

Returns

ExperimentTS object

Raises

AttributeError – If the function is called without Context created.

static train_ts(train_input, select_model_by=Evaluator.auto, name=None)

Train time series model with data.

Create a Time Series Experiment Job and scheduled the execution in CORO_TASKS list. Record the Job in JOBS list.

Parameters
  • train_input – (TrainTSInput): Settings for training.

  • select_model_by – (Evaluator): if predict by trained experiment, how should we select best model

  • name (str, optional) – name for train time series action.

Returns

ExperimentTS object

Raises

AttributeError – If the function is called without Context created.

static upload(file, name=None, eda=True)

Upload csv file or pandas dataframe.

Create a DataUpload Job and scheduled the execution in CORO_TASKS list. Record the Job in JOBS list.

Parameters
  • file (csv-file, pandas.DataFrame) – File uploaded to core server.

  • name (str, optional) – Name for upload action.

  • eda (bool, optional) – Whether to perform eda on data upload

Returns

DataUpload object

Raises

AttributeError – If the function is called without Context created.

Plot

decanter.core.plot.show_model_attr(metric, score_types, exp)

Show all models attribute in Experiment

Parameters
  • metric (list) – List of str indicates metrics [‘mse’, ‘mae’…]. Informations to show in chart.

  • score_types (list) – List of str indicates score_type [‘cv_averages’, ‘validation’…]. Informations to show in chart.

  • exp (Experiemnt) – The experiment that want to show its models attributes.

Returns

class pandas.DataFrame

Prompt Info

decanter.core.enable_default_logger()

Set the default logger handler for the package.

Will set the root handles to empty list, prevent duplicate handlers added by other packages causing duplicate logging message.

Jobs

Introduce all the Jobs handling different kinds of actions, and the relation between Job and Task

Task

This module defines tasks for different Job.

Handle the completion of tasks such as upload data, train, prediction…ect. Return the result to Job.

class decanter.core.jobs.task.Task(name=None)

Bases: object

Handle Action’s result.

Handle the execution of the actions (ex. upload data), the update of the results, and the tracking of status to determine the end of execution.

status

Status of task.

Type

str

result

The result of executing the task.

Type

value of the task result

name

Name of task for tracking process.

Type

str

is_done()
Returns

bool. True for task in DONE_STATUS, False otherwise.

not_done()
Returns

bool. True for task not in DONE_STATUS, False otherwise.

is_success()
Returns

bool. True for success, False otherwise.

is_fail()
Returns

bool. True for failed, False otherwise.

abstract run()

Execute task.

Raises

NotImplementedError – If child class do not implement this function.

abstract async update()

Update attribute by response or result.

Raises

NotImplementedError – If child class do not implement this function.

class decanter.core.jobs.task.CoreTask(name=None)

Bases: decanter.core.jobs.task.Task

Handle Decanter Core Action’s result

Handle the task relate to Decanter Core server, such as upload data, training, prediction.

core_service

Status of task.

Type

str

id

Task identifier created by creating a task.

Type

str

response

Responses from the request api.

Type

dict

progress

Progress of the task process.

Type

float

name

Name of task for tracking process.

Type

str

BAR_CNT = 0

The position of progress bar to avoid overlapping.

Type

int

async update()

Update the response from Decanter server.

Get the task from sending api request and update the result of response.

abstract run()

Execute Decanter Core task. :raises NotImplementedError: If child class do not implement this function.

stop()

Stop undone task in Decanter Core server

Send the stop task api to stop the running or pending task.

class decanter.core.jobs.task.UploadTask(file, name=None, eda=True)

Bases: decanter.core.jobs.task.CoreTask

Upload data to Decanter Core.

file

The csv file to be uploaded.

Type

csv-file-object

run()

Execute upload data by sending the upload api.

class decanter.core.jobs.task.TrainTask(train_input, name=None)

Bases: decanter.core.jobs.task.CoreTask

Train model on Decanter Core.

train_input

Settings for training.

Type

TrainInput

run()

Execute model training by sending the triain api.

class decanter.core.jobs.task.TrainTSTask(train_input, name=None)

Bases: decanter.core.jobs.task.CoreTask

Train time series forecast models on Decanter Core.

train_input

Settings for training time series forecast models.

Type

TrainTSInput

run()

Execute time seires forecast model training by sending the auto time series forecast train api.

class decanter.core.jobs.task.TrainClusterTask(train_input, name=None)

Bases: decanter.core.jobs.task.CoreTask

Train time series forecast models on Decanter Core.

train_input

Settings for training time series forecast models.

Type

TrainTSInput

run()

Execute clustering training by sending the cluster triain api.

class decanter.core.jobs.task.PredictTask(predict_input, name=None)

Bases: decanter.core.jobs.task.CoreTask

Predict model on Decanter Core.

predict_input

Settings for prediction.

Type

PredictInput

run()

Execute predict model training by sending the predict api.

class decanter.core.jobs.task.PredictTSTask(predict_input, name=None)

Bases: decanter.core.jobs.task.CoreTask

Predict time series model on Decanter Core.

predict_input

Settings for time series prediction.

Type

PredictTSInput

run()

Execute time series models prediction by sending the time series predict api.

class decanter.core.jobs.task.SetupTask(setup_input, name='Setup')

Bases: decanter.core.jobs.task.CoreTask

V0 version is for normal use, changing columns type. V2 version is for V2 eda use, setting v2 eda result for the data, preparing for custom eda.

setup_input

Settings for set up data.

Type

SetupInput

run()

Execute setup data by sending the setup api.

Jobs

This module defines jobs to handle all kinds of actions.

Handle the timeing of task’s execution, and stores the result of task in its attributes.

class decanter.core.jobs.job.Job(task, jobs=None, name=None)

Handle he timeing of task’s execution.

Every Job will have to wait for all the other Jobs in jobs list to be success before it starts to run its task.

id

ObjectId in 24 hex digits.

Type

str

status

Job status.

Type

str

result

Job result.

Type

Depends on type of Job.

task

Task to be run by Job.

Type

Task

jobs

List of Job that needs to be waited before running the task.

Type

list(Job)

name

Name to track Job progress.

Type

str

core_service

Handle the calling of api.

Type

CoreAPI

is_done()
Returns

True for task in DONE_STATUS, False otherwise.

Return type

bool

not_done()
Returns

True for task not in DONE_STATUS, False otherwise.

Return type

bool

is_success()
Returns

True for success, False otherwise.

Return type

bool

is_fail()
Returns

True for failed, False otherwise.

Return type

bool

async wait()

Mange the Execution of task.

A python coroutine be wapped as task and put in event loop once a Job is created. When the event loop starts to run, wait function will wait for prerequired jobs in self.jobs list to be done, and continue to execute running the task if all prerequired jobs is successful.

The coroutine will be done when the Job fininsh gettng the result from task.

async update()

Update attributes from task’s result.

A python coroutine await by wait(). Will wait for task to update its result by await task.update(), then use the updated result from task to update Job’s attributes.

abstract update_result(task_result)

Update the task result to Job’s attributes.

Raises

NotImplementedError – If child class do not implement this function.

stop()

Stop Job.

Job will handle stoping itself by following conditions. If pending, will mark its status as fail, and do nothing and remains same status if it is done already. In running status the status will turn to fail, there are three conditions needed to be handle in terms of the status of task. It needs to call the api to stop task only if the task is in running status, else just mark task as fail if it haven’t start running yet or else remains same done status.

get(attr)

Get Job’s attribute

If it calls this function while Job is still undone, will appears message for remind.

Parameters

attr (str) – String that contains the attribute’s name.

Returns

Value of the named attribute of the given object.

class decanter.core.jobs.data_upload.DataUpload(file=None, name=None, eda=True)

Bases: decanter.core.jobs.job.Job

DataUpload manage to get the result from data upload.

Handle the execution of upload task in order to upload data to Decanter Core server Stores the upload results in DataUpload attributes.

jobs

None, list up jobs that DataUpload needs to wait for.

Type

list

task

Upload task run by DataUpload.

Type

UploadTask

accessor

Accessor for files in hdfs.

Type

dict

schema

The original data schema.

Type

dict

originSchema

The original data schema.

Type

dict

annotationsMeta

information: Extra information for data.

Type

dict

options

Extra information for data.

Type

dict

created_at

The date the data was created.

Type

str

updated_at

The time the data was last updated.

Type

str

completed_at

The time the data was completed at.

Type

str

name

Name to track Job progress, will give default name if None.

Type

str

classmethod create(data_id, name=None)

Create data by data_id.

Parameters
  • data_id (str) – ObjectId in 24 hex digits

  • name (str) – (opt) Name to track Job progress

Returns

DataUpload object

download_csv(path)

DownLoad csv format of the uploaded data.

Parameters

path (str) – The path to download csv file.

show()

Show data content.

Returns

Content of uploaded data.

Return type

str

show_df()

Show data in pandas dataframe.

Returns

Content of uploaded data.

Return type

pandas.DataFrame

update_result(task_result)

Update from ‘result’ in Task response.

Experiment and ExperimentTS handles the training of models on Decanter Core server, and stores Experiment results in its attributes.

class decanter.core.jobs.experiment.Experiment(train_input, select_model_by=Evaluator.auto, name=None)

Bases: decanter.core.jobs.job.Job

Experiment manage to get the results from model training.

Handle the execution of training task in order to train model on Decanter Core server Stores the training results in Experiment’s attributes.

jobs

[DataUpload]. List of jobs that Experiment needs to wait till completed.

Type

list(Job)

task

Train task runned by Experiment Job.

Type

TrainTask

train_input

Settings for training models.

Type

TrainInput

best_model

Model with the best score in select_model_by argument.

Type

Model

select_model_by

The score to select best model.

Type

str

features

The features used for training.

Type

list(str)

train_data_id

The ID of the train data.

Type

str

target

The target of the experiment.

Type

str

test_base_id

The ID for the test base data.

Type

str

models

The models’ id of the experiment.

Type

list(str)

hyperparameters

The hyperparameters of the experiment.

Type

dict

attributes

The experiment attributes.

Type

dict

recommendations

Recommended model for each evaluator.

Type

dict

created_at

The date the data was created.

Type

str

options

Extra information for experiment.

Type

dict

updated_at

The time the data was last updated.

Type

str

completed_at

The time the data was completed at.

Type

str

name

Name to track Job progress.

Type

str

classmethod create(exp_id, name=None)

Create Experiment by exp_id.

Parameters
  • exp_id (str) – ObjectId in 24 hex digits.

  • name (str, optional) – Name to track Job progress.

Returns

Experiment object

with the specific id.

Return type

Experiment

get_best_model()

Get the best model in experiment by select_model_by and stores in best model attribute.

update_result(task_result)

Update Job’s attribute from Task’s result.

class decanter.core.jobs.experiment.ExperimentCluster(train_input, select_model_by=Evaluator.auto, name=None)

Bases: decanter.core.jobs.experiment.Experiment, decanter.core.jobs.job.Job

ExperimentTS manage to get the result from clustering model training.

Handle the execution of clustering training task in order train clustering model on Decanter Core server Stores the training results in ExperimentCluster’s attributes.

jobs

[DataUpload]. List of jobs that ExperimentTS needs to wait till completed.

Type

list(Job)

task

Time series training task run by ExperimentTS Job.

Type

TrainTSTask

train_input

Settings for time series training models.

Type

TrainTSInput

best_model

MultiModel with the best score in select_model_by argument

Type

MultiModel

select_model_by

The score to select best model

Type

str

features

The features used for training

Type

list(str)

train_data_id

The ID of the train data

Type

str

target

The target of the experiment

Type

str

test_base_id

The ID for the test base data

Type

str

models

The models’ id of the experiment

Type

list(str)

hyperparameters

The hyperparameters of the experiment.

Type

dict

attributes

The experiment attributes.

Type

dict

recommendations

Recommended model for each evaluator.

Type

dict

created_at

The date the data was created.

Type

str

options

Extra information for experiment.

Type

dict

updated_at

The time the data was last updated.

Type

str

completed_at

The time the data was completed at.

Type

str

name

Name to track Job progress.

Type

str

classmethod create(exp_id, name=None)

Create Clustering Experiment by exp_id. Inherit from create()

Parameters
  • exp_id (str) – ObjectId in 24 hex digits

  • name (str, optional) – (opt) Name to track Job progress

Returns

Experiment object with the specific id.

Return type

ExperimentCluster

class decanter.core.jobs.experiment.ExperimentTS(train_input, select_model_by=Evaluator.auto, name=None)

Bases: decanter.core.jobs.experiment.Experiment, decanter.core.jobs.job.Job

ExperimentTS manage to get the result from time series model training.

Handle the execution of time series training task in order train time series model on Decanter Core server Stores the training results in ExperimentTS’s attributes.

jobs

[DataUpload]. List of jobs that ExperimentTS needs to wait till completed.

Type

list(Job)

task

Time series training task run by ExperimentTS Job.

Type

TrainTSTask

train_input

Settings for time series training models.

Type

TrainTSInput

best_model

MultiModel with the best score in select_model_by argument

Type

MultiModel

select_model_by

The score to select best model

Type

str

features

The features used for training

Type

list(str)

train_data_id

The ID of the train data

Type

str

target

The target of the experiment

Type

str

test_base_id

The ID for the test base data

Type

str

models

The models’ id of the experiment

Type

list(str)

hyperparameters

The hyperparameters of the experiment.

Type

dict

attributes

The experiment attributes.

Type

dict

recommendations

Recommended model for each evaluator.

Type

dict

created_at

The date the data was created.

Type

str

options

Extra information for experiment.

Type

dict

updated_at

The time the data was last updated.

Type

str

completed_at

The time the data was completed at.

Type

str

name

Name to track Job progress.

Type

str

classmethod create(exp_id, name=None)

Create Time series Experiment by exp_id. Inherit from create()

Parameters
  • exp_id (str) – ObjectId in 24 hex digits

  • name (str, optional) – (opt) Name to track Job progress

Returns

Experiment object with the specific id.

Return type

ExperimentTS

PredictResult and PredictTSResult handle the prediction of the model training on Decanter Core server, and stores the predict results in its attributes.

class decanter.core.jobs.predict_result.PredictResult(predict_input, name=None)

Bases: decanter.core.jobs.job.Job

PredictResult manage to get the results from predictions.

Handle the execution of predict task in order to predict model on Decanter Core server Stores the predict results in PredictResult’s attributes.

jobs

List of jobs that PredictResult needs to wait for, [TestData, Experiment].

Type

list(Job)

task

Predict task runned by PredictResult Job.

Type

PredictTask

accessor

Accessor for files in hdfs.

Type

dict

schema

The original data schema.

Type

dict

originSchema

The original data schema.

Type

dict

annotationsMeta

information: Extra information for data.

Type

dict

options

Extra information for data.

Type

dict

created_at

The date the data was created.

Type

str

updated_at

The time the data was last updated.

Type

str

completed_at

The time the data was completed at.

Type

str

download_csv(path)

DownLoad csv format of the predict result.

Parameters

path (str) – The path to download csv file.

show()

Show content of predict result.

Returns

Content of PredictResult.

Return type

str

show_df()

Show predict result in pandas dataframe.

Returns

Content of predict result.

Return type

pandas.DataFrame

update_result(task_result)

Update Job’s attributes from Task’s result.

class decanter.core.jobs.predict_result.PredictTSResult(predict_input, name=None)

Bases: decanter.core.jobs.predict_result.PredictResult, decanter.core.jobs.job.Job

Predict time series’s model result.

Handle time series’s model Prediction on Decanter Core server Stores predict Result to attribute.

jobs

List of jobs that PredictTSResult needs to wait for, [TestData, Experiment].

Type

list(Job)

task

Predict task runned by PredictResult Job.

Type

PredictTSTask

accessor

Accessor for files in hdfs.

Type

dict

schema

The original data schema.

Type

dict

originSchema

The original data schema.

Type

dict

annotationsMeta

information: Extra information for data.

Type

dict

options

Extra information for data.

Type

dict

created_at

The date the data was created.

Type

str

updated_at

The time the data was last updated.

Type

str

completed_at

The time the data was completed at.

Type

str

Core Api Interface

The API interfaces of Decanter, mainly handles the request and response body.

Train Input

Settings for the Model Training and Time Series MultiModel Training.

class decanter.core.core_api.train_input.TrainClusterInput(data, callback=None, features=None, feature_types=None, k=None, seed=None, version=None)

Bases: object

Train Input for Clustering Experiment Job.

Settings for model training.

Parameters
  • data (DataUpload) – Train data uploaded on Decanter Core server

  • features (list of str) – selected feature for training

  • seed (int) – Seed to be used for operations that have sudo random behavior. Fixing seed across runs will ensure reproducible results

Example

train_input = TrainClusterInput(data=train_data)
get_train_params()

Using train_body to create the JSON request body for training.

Returns

dict

class decanter.core.core_api.train_input.TrainInput(data, target, algos, callback=None, test_base_id=None, test_data_id=None, evaluator=None, features=None, feature_types=None, max_run_time=None, max_model=None, tolerance=None, nfold=None, ts_split_split_by=None, ts_split_cv=None, ts_split_train=None, ts_split_test=None, seed=None, balance_class=None, max_after_balance=None, sampling_factors=None, validation_percentage=None, holdout_percentage=None, apu=None, preprocessing=None, version=None)

Bases: object

Train Input for Experiment Job.

Settings for model training.

Parameters
  • data (DataUpload) – Train data uploaded on Decanter Core server

  • target (str) – the name of the target column

  • algos (Algo) – enabled algorithms (left with default None will enable all algorithms)

  • evaluator (Evaluator) – default evaluator for early stopping

  • features (list of str) – selected feature for training

  • max_model (int) – Model family and hyperparameter search will stop after the specified number of models are trained. Stacked Ensemble models are not counte.

  • tolerance (float) – Tolerance for early stop in both model training and model family and hyperparameter search. A higher value results in less accurate models, but faster training times and a larger model pool. Lower tolerance means better accuracy, but longer training time and a smaller model pool. It is recommended that user start with higher tolerance, and move to lower tolernce when the model training process is finalized

  • nfold (int) – The number of cross validation folds to be used during model training

  • seed (int) – Seed to be used for operations that have sudo random behavior. Fixing seed across runs will ensure reproducible results

  • balance_class (bool) – If true, will balance class distribution The maximum relative size increase of the training data after balancing class (in most cases, enabling the balance_classes option will increase the data frame size

  • validation_percentage (float) – Percentage of the train data to be used as the validation set.

  • holdout_percentage (float) – Percentage of the training data to be used as a holdout set.

Example

train_input = TrainInput(data=train_data, target='Survived',
algos=Algo.XGBoost, max_model=2, tolerance=0.9)
get_train_params()

Using train_body to create the JSON request body for training.

Returns

dict

class decanter.core.core_api.train_input.TrainTSInput(data, target, datetime_column, forecast_horizon, gap, algorithms=None, feature_types=None, callback=None, version='v2', max_iteration=None, generation_size=None, mutation_rate=None, crossover_rate=None, tolerance=None, validation_percentage=None, holdout_percentage=None, max_model=None, seed=None, evaluator=None, max_run_time=None, nfold=None, time_unit=None, numerical_groupby_method=None, categorical_groupby_method=None, endogenous_features=None, exogenous_features=None, time_groups=None, max_window_for_feature_derivation=None)

Bases: object

Train Input for ExperimentTS Job.

Settings for auto time series forecast training.

Parameters
  • data (DataUpload) – Train data uploaded on Decanter Core server

  • target (str) – the name of the target column

  • datetime_column (str) – the name of the datetime column used for time series ordering

  • forecast_horizon (int) – The number of data points to predict for auto time series. In current time series forecast model, the larger this value is, training time will take longer.

  • gap (int) – The number of time units between the train data and prediction data

  • max_window_for_feature_derivation (int) – This value limit the number of features we can use from the past to generate endogenous features. The value makes sure that when generating endogenous feature for forecast time t, we only use features from [t - gap - max_window_for_feature_derivation, t - gap). Note the larger this value is, the fewer data is resulted after feature engineering.

  • algos (Algo) – enabled algorithms (left with default None will enable all algorithms)

  • evaluator (Evaluator) – default evaluator for early stopping

  • features (list of str) – selected feature for training

  • max_model (int) – Model family and hyperparameter search will stop after the specified number of models are trained. Stacked Ensemble models are not counte.

  • tolerance (float) – Tolerance for early stop in both model training and model family and hyperparameter search. A higher value results in less accurate models, but faster training times and a larger model pool. Lower tolerance means better accuracy, but longer training time and a smaller model pool. It is recommended that user start with higher tolerance, and move to lower tolernce when the model training process is finalized

  • nfold (int) – The number of cross validation folds to be used during model training

  • seed (int) – Seed to be used for operations that have sudo random behavior. Fixing seed across runs will ensure reproducible results

  • balance_class (bool) – If true, will balance class distribution The maximum relative size increase of the training data after balancing class (in most cases, enabling the balance_classes option will increase the data frame size

  • validation_percentage (float) – Percentage of the train data to be used as the validation set.

  • holdout_percentage (float) – Percentage of the training data to be used as a holdout set.

get_train_params()

Using train_auto_ts_body to create the JSON request body for time series forecast training.

Returns

dict

static get_ts_algorithms()

Setup Input

class decanter.core.core_api.setup_input.ColumnSpec(id, data_type)

Bases: tuple

property data_type

Alias for field number 1

property id

Alias for field number 0

class decanter.core.core_api.setup_input.SetupInput(data, data_columns, callback=None, eda=None, preprocessing=None, version=None)

Bases: object

Setup Input for Experiment Job.

Settings for model training.

Parameters
  • data (DataUpload) – Train data uploaded on Decanter Core server

  • data_columns (list of ColumnSpec) – Request body for sending setup api.

  • eda (bool) – Whether to perform eda on data setup

Example

setup_input = SetupInput(
    data = upload_data,
    data_source=upload_data.accessor,
    data_columns=[{ 'id': 'Pclass', 'data_type': 'categorical'}]
)
get_setup_params()

Using setup_body to create the JSON request body for setting data.

Returns

dict

Predict Input

Settings for the PredictResult and PredictTSResult

class decanter.core.core_api.predict_input.PredictInput(data, experiment, select_model='best', select_opt=None, callback=None, keep_columns=None, threshold=None, version=None)

Bases: object

Predict Input for PredictResult Job.

Setting test data, best model, and the request body for prediction.

Parameters
  • data (DataUpload) – Test data.

  • experiment (Experiment) – Experiment from training.

  • select_model (str, optional) –

    Methods of screening models

    • best (default): predict with the best model scored on cv average

    • model_id: predict with the model designated by model_id

    • recommendation: predict with the recommended model

  • select_opt (str, optional) –

    Based on the options required by the select_model. value with select_model case:

    • best: None

    • model_id: given the model ID (model ID will be in the format of ObjectID)

    • recommendation: metric, ex: auc …

  • keep_columns (list, optional) – The names of the columns that will be appended to the prediction data.

  • threshold (double, optional) – Prediction threshold for binary classification models. Max = 1, Min = 0

Examples

predict_input = PredictInput(
    data=test_data,
    experiment=exp,
    threshold=0.9,
    keep_columns=['col_1', 'col_3']
)
getPredictParams()

Using pred_body to create the JSON request body for prediction.

Returns

dict

class decanter.core.core_api.predict_input.PredictTSInput(data, experiment, callback=None, threshold_max_by=None, version=None)

Bases: decanter.core.core_api.predict_input.PredictInput

Time series predict input for PredictTSResult Job.

Setting test data, best model, and the request body for prediction.

Parameters
  • data (DataUpload) – Test data.

  • experiment (Experiment) – Experiment from training.

  • threshold_max_by (float, optional) – Prediction threshold for binary classification only, can choose a threshold that optimizes the given metric.Available options are:precision, f1, accuracy, evaluator, recall, specificity, f2, f0point5, mean_per_class_error

Model

Model and Multimodel.

This module handles actions relate to models. Storing Attributes of model metadata from decanter.core server, also support model downloading from server or uploading form local zip fils.

class decanter.core.core_api.model.Model

Bases: object

Model from training.

get_model

Function to get models metadata.

download_model

Function to get model mojo file.

task_status

Status of training task.

Type

str

id

ObjectId in 24 hex digits.

Type

str

key

The unique key for the model.

Type

str

name

The name of the model (may not be unique across projects).

Type

str

exp_id

The experiment ID of the model.

Type

str

importances

The feature importance.

Type

dict

attributes

Model related scores and feature importance.

Type

dict

hyperparameters

The model’s hyperparameters.

Type

dict

created_at

The time the model was created at.

Type

str

updated_at

The time the model was last updated.

Type

str

completed_at

The time the model was completed_at.

Type

str

classmethod create(exp_id, model_id)

Create Model or MultiModel depends on which instance type has called.

Parameters
  • exp_id (str) – The experiment ID of the model.

  • model_id (str) – ObjectId in 24 hex digits.

Returns

Model or MultiModel object.

Raises

AttributeError – Occurred when getting no results from decanter server when getting model’s metadata.

download(model_path)

Download model file to local.

Download the trained mojo model from Model instance to local in the format of zip file.

Parameters

model_path (str) – Path to store zip mojo file.

classmethod download_by_id(model_id, model_path)

Download model file to local.

Getting Mojo model zip file from decanter.core server and download to local.

Parameters
  • model_id (str) – ObjectId in 24 hex digits.

  • model_path (str) – Path to store zip mojo file.

get(attr)

Get attribute of model.

Parameters

attr (str) – model’s attribute.

Returns

Value of the attribute of the given object.

is_done()

Check if training task is done.

Returns

True if Model’s task is done, else False.

Return type

bool

update(exp_id, model_id)

Update model attributes.

Get and Set attributes from the response attribute from decanter server.

class decanter.core.core_api.model.MultiModel

Bases: decanter.core.core_api.model.Model

MultiModel from time series training.

get_model

Get multi models metadata.

download_model

Get model mojo file.

task_status

Status of training task.

Type

str

id

ObjectId in 24 hex digits.

Type

str

name

The name of the model (may not be unique across projects).

Type

str

exp_id

The experiment ID of the model.

Type

str

attributes

Model related scores and feature importance.

Type

dict

predictionPipeline

TODO

Type

dict

hyperparameters

TODO

Type

dict

created_at

The time the model was created at.

Type

str

updated_at

The time the model was last updated.

Type

str

completed_at

The time the model was completed_at.

Type

str

classmethod create(exp_id, model_id)

Create Multimodel. Inherit from Model.create().

download(model_path)

MultiModel has no download method

classmethod download_by_id(model_id, model_path)

MultiModel has no download method

Enum

Return the machine learning algorithm and evaluator supported by the current Decanter in the form of enumerate object.

Algorithms

Function for user access the Machine Learning Algorithms of the Decanter AI Core SDK.

class decanter.core.enums.algorithms.Algo(value)

The Algo enumeration is the machine learning algorithms currently supported by the Decanter AI Core SDK

TrainInput (used by client.train) supported algorithms

  • DRF: Distributed Random Forest.

  • GLM: Generalized Linear Model.

  • GBM: Gradient Boosting Machine.

  • DeepLearning: Deep Learning.

  • StackedEnsemble: Stacked Ensemble.

  • XGBoost: eXtreme Gradient Boosting.

  • tpot: Tree-based Pipeline Optimization Tool. (available only after 4.10 deployed with Exodus).

TrainTSInput (used by client.train_ts) supported algorithms

  • GLM: Generalized Linear Model.

  • DRF: Distributed Random Forest.

  • GBM: Gradient Boosting Machine.

  • XGBoost: eXtreme Gradient Boosting.

  • arima: auto arima (available only after 4.9 deployed with Exodus, only available for regression).

  • prophet: auto prophet (available only after 4.9 deployed with Exodus, only available for regression).

  • theta: auto theta (available only after 4.9 deployed with Exodus, only available for regression).

  • ets: auto ets (available only after 4.10 deployed with Exodus, only available for regression).

Evaluators

Function for user access the Evaluators Metrics of the Decanter AI Core SDK.

class decanter.core.enums.evaluators.Evaluator(value)

The Evaluator enumeration is the metrics currently supported by the Decanter AI Core SDK

  • Regression
    • auto (deviance)

    • deviance

    • mse

    • mae

    • rmsle

    • r2

    • mape (Supported from Decanter AI 4.9~)

    • wmape (Supported from Decanter AI 4.9~)

  • Binary Classification
    • auto (logloss)

    • logloss

    • lift_top_group

    • auc

    • misclassification

  • Multinomial Classification
    • auto (logloss)

    • logloss

    • misclassification

  • Clustering
    • auto (tot_withinss)

    • tot_withinss