Developer Interface¶

This part of the documentation covers all the interfaces of Decanter Core SDK.

Main Interface¶

Decanter AI Core SDK’s main functionality can be accessed by the bellow Interfaces.

Connection and Funcional Settings¶

Initialization for running SDK.

class decanter.core.context.Context¶

Init the connection to decanter core server and functionality for running SDK.

Example

from decanter import core
context = core.Context.create(
    username='usr', password='pwd', host='decantercoreserver')
context.run()

CORO_TASKS = []¶

HOST = None¶

JOBS = []¶

LOOP = None¶

PASSWORD = None¶

USERNAME = None¶

api = None¶

static close()¶

Close the event loop and reset JOBS and CORO_TASKS.

Close the event loop if it’s not running (will not close in Jupyter Notebook).

classmethod create(username, password, host)¶

Create context instance and init necessary variable and objects.

Setting the user, password, and host for the funture connection when calling APIs, and create an event loop if it isn’t exist. Check if the connection is healthy after args be set.

Parameters

username (str) – User name for login Decanter Core server
password (str) – Password name for login Decanter Core server
host (str) – Decanter Core server URL.

Returns

Context>

static get_all_jobs()¶

Get a list of Jobs that have been or waiting to be executed.

Returns: list(Job)

static get_jobs_by_name(names)¶

Get the Job instances by its name.

Parameters: names (list(str)) – Names of wish to select.
Returns: Jobs with name in names list.
Return type: list(Job)

static get_jobs_status(sort_by_status=False, status=None)¶

Get a dataframe of jobs and its corresponding status. Return all jobs and its status if no arguments passed.

Parameters

sort_by_status (bool, optional) – DataFrame will sort by status, group the job with same status. Defaults to Faise.
status (list`(:obj:`str), optional) – Only select the job with the status in status list.

Returns

DataFrame with Job name and its status.

Return type

pandas.DataFrame

Raises

Exception – If any status in status list is invalid.

static healthy()¶

Check the connection between Decanter Core server.

Send a fake request to determine if there’s connection or authorization errors.

static run()¶

Start execute the tasks in CORO_TASKs.

Gather all tasks and execute. It will block on all tasks until all have been finished.

static stop_all_jobs()¶: Stop all Jobs which status is still in pending or running

static stop_jobs(jobs_list)¶

Stop Jobs in jobs_list.

Parameters: jobs_list (list(Job)) – List of jobs instance wished to be stopped.

decanter.core.context.get_or_create_eventloop()¶

Client for Decanter Core API¶

Function for user handle the use of Decanter Core API.

class decanter.core.client.CoreClient(username, password, host)¶

Handle client side actions.

Support actions sunch as setup data, upload data, train, predict, time series train and predict…ect.

Example

from decanter import core
client = core.CoreClient()
client.upload(data={csv-file-type/dataframe})

static predict(predict_input, name=None)¶

Predict model with test data.

Create a PredictResult Job and scheduled the execution in CORO_TASKS list. Record the Job in JOBS list.

Parameters

predict_input – (PredictInput): stores the settings for prediction.
name (str, optional) – string, name for predict action.

Returns

PredictResult object

Raises

AttributeError – If the function is called without Context created

static predict_ts(predict_input, name=None)¶

Predict time series model with test data.

Create a Time Series PredictResult Job and scheduled the execution in CORO_TASKS list. Record the Job in JOBS list.

Parameters

predict_input – (PredictTSInput): stores the settings for prediction.
name (str, optional) – name for predict time series action.

Returns

PredictTSResult object.

Raises

AttributeError – If the function is called without Context created

static setup(setup_input, name=None)¶

Setup data reference.

Create a DataSetup Job and scheduled the execution in CORO_TASKS list. Record the Job in JOBS list.

Parameters

setup_input – (SetupInput): stores the settings for training.
name (str, optional) – name for setup action.

Returns

DataSetup object

Raises

AttributeError – If the function is called without Context created.

static train(train_input, select_model_by=Evaluator.auto, name=None)¶

Train model with data.

Create a Experiment Job and scheduled the execution in CORO_TASKS list. Record the Job in JOBS list.

Parameters

train_input – (TrainInput): stores the settings for training.
select_model_by – (Evaluator): if predict by trained experiment, how should we select best model
name (str, optional) – name for train action.

Returns

Experiment object

Raises

AttributeError – If the function is called without Context created.

static train_cluster(train_input, name=None)¶

Train cluster model with data.

Create a Cluster Experiment Job and scheduled the execution in CORO_TASKS list. Record the Job in JOBS list.

Parameters

train_input – (TrainClusterInput): Settings for training.
name (str, optional) – name for train time series action.

Returns

ExperimentTS object

Raises

AttributeError – If the function is called without Context created.

static train_ts(train_input, select_model_by=Evaluator.auto, name=None)¶

Train time series model with data.

Create a Time Series Experiment Job and scheduled the execution in CORO_TASKS list. Record the Job in JOBS list.

Parameters

train_input – (TrainTSInput): Settings for training.
select_model_by – (Evaluator): if predict by trained experiment, how should we select best model
name (str, optional) – name for train time series action.

Returns

ExperimentTS object

Raises

AttributeError – If the function is called without Context created.

static upload(file, name=None, eda=True)¶

Upload csv file or pandas dataframe.

Create a DataUpload Job and scheduled the execution in CORO_TASKS list. Record the Job in JOBS list.

Parameters

file (csv-file, pandas.DataFrame) – File uploaded to core server.
name (str, optional) – Name for upload action.
eda (bool, optional) – Whether to perform eda on data upload

Returns

DataUpload object

Raises

AttributeError – If the function is called without Context created.

Plot¶

decanter.core.plot.show_model_attr(metric, score_types, exp)¶

Show all models attribute in Experiment

Parameters

metric (list) – List of str indicates metrics [‘mse’, ‘mae’…]. Informations to show in chart.
score_types (list) – List of str indicates score_type [‘cv_averages’, ‘validation’…]. Informations to show in chart.
exp (Experiemnt) – The experiment that want to show its models attributes.

Returns

class pandas.DataFrame

Prompt Info¶

decanter.core.enable_default_logger()¶

Set the default logger handler for the package.

Will set the root handles to empty list, prevent duplicate handlers added by other packages causing duplicate logging message.

Jobs¶

Introduce all the Jobs handling different kinds of actions, and the relation between Job and Task

Task¶

This module defines tasks for different Job.

Handle the completion of tasks such as upload data, train, prediction…ect. Return the result to Job.

class decanter.core.jobs.task.Task(name=None)¶

Bases: object

Handle Action’s result.

Handle the execution of the actions (ex. upload data), the update of the results, and the tracking of status to determine the end of execution.

status¶

Status of task.

Type: str

result¶

The result of executing the task.

Type: value of the task result

name¶

Name of task for tracking process.

Type: str

is_done()¶

Returns: bool. True for task in DONE_STATUS, False otherwise.

not_done()¶

Returns: bool. True for task not in DONE_STATUS, False otherwise.

is_success()¶

Returns: bool. True for success, False otherwise.

is_fail()¶

Returns: bool. True for failed, False otherwise.

abstract run()¶

Execute task.

Raises: NotImplementedError – If child class do not implement this function.

abstract async update()¶

Update attribute by response or result.

Raises: NotImplementedError – If child class do not implement this function.

class decanter.core.jobs.task.CoreTask(name=None)¶

Bases: decanter.core.jobs.task.Task

Handle Decanter Core Action’s result

Handle the task relate to Decanter Core server, such as upload data, training, prediction.

core_service¶

Status of task.

Type: str

id¶

Task identifier created by creating a task.

Type: str

response¶

Responses from the request api.

Type: dict

progress¶

Progress of the task process.

Type: float

name¶

Name of task for tracking process.

Type: str

BAR_CNT = 0¶

The position of progress bar to avoid overlapping.

Type: int

async update()¶

Update the response from Decanter server.

Get the task from sending api request and update the result of response.

abstract run()¶: Execute Decanter Core task. :raises NotImplementedError: If child class do not implement this function.

stop()¶

Stop undone task in Decanter Core server

Send the stop task api to stop the running or pending task.

class decanter.core.jobs.task.UploadTask(file, name=None, eda=True)¶

Bases: decanter.core.jobs.task.CoreTask

Upload data to Decanter Core.

file¶

The csv file to be uploaded.

Type: csv-file-object

run()¶: Execute upload data by sending the upload api.

class decanter.core.jobs.task.TrainTask(train_input, name=None)¶

Bases: decanter.core.jobs.task.CoreTask

Train model on Decanter Core.

train_input¶

Settings for training.

Type: TrainInput

run()¶: Execute model training by sending the triain api.

class decanter.core.jobs.task.TrainTSTask(train_input, name=None)¶

Bases: decanter.core.jobs.task.CoreTask

Train time series forecast models on Decanter Core.

train_input¶

Settings for training time series forecast models.

Type: TrainTSInput

run()¶: Execute time seires forecast model training by sending the auto time series forecast train api.

class decanter.core.jobs.task.TrainClusterTask(train_input, name=None)¶

Bases: decanter.core.jobs.task.CoreTask

Train time series forecast models on Decanter Core.

train_input¶

Settings for training time series forecast models.

Type: TrainTSInput

run()¶: Execute clustering training by sending the cluster triain api.

class decanter.core.jobs.task.PredictTask(predict_input, name=None)¶

Bases: decanter.core.jobs.task.CoreTask

Predict model on Decanter Core.

predict_input¶

Settings for prediction.

Type: PredictInput

run()¶: Execute predict model training by sending the predict api.

class decanter.core.jobs.task.PredictTSTask(predict_input, name=None)¶

Bases: decanter.core.jobs.task.CoreTask

Predict time series model on Decanter Core.

predict_input¶

Settings for time series prediction.

Type: PredictTSInput

run()¶: Execute time series models prediction by sending the time series predict api.

class decanter.core.jobs.task.SetupTask(setup_input, name='Setup')¶

Bases: decanter.core.jobs.task.CoreTask

V0 version is for normal use, changing columns type. V2 version is for V2 eda use, setting v2 eda result for the data, preparing for custom eda.

setup_input¶

Settings for set up data.

Type: SetupInput

run()¶: Execute setup data by sending the setup api.

Jobs¶

This module defines jobs to handle all kinds of actions.

Handle the timeing of task’s execution, and stores the result of task in its attributes.

class decanter.core.jobs.job.Job(task, jobs=None, name=None)¶

Handle he timeing of task’s execution.

Every Job will have to wait for all the other Jobs in jobs list to be success before it starts to run its task.

id¶

ObjectId in 24 hex digits.

Type: str

status¶

Job status.

Type: str

result¶

Job result.

Type: Depends on type of Job.

task¶

Task to be run by Job.

Type: Task

jobs¶

List of Job that needs to be waited before running the task.

Type: list(Job)

name¶

Name to track Job progress.

Type: str

core_service¶

Handle the calling of api.

Type: CoreAPI

is_done()¶

Returns: True for task in DONE_STATUS, False otherwise.
Return type: bool

not_done()¶

Returns: True for task not in DONE_STATUS, False otherwise.
Return type: bool

is_success()¶

Returns: True for success, False otherwise.
Return type: bool

is_fail()¶

Returns: True for failed, False otherwise.
Return type: bool

async wait()¶

Mange the Execution of task.

A python coroutine be wapped as task and put in event loop once a Job is created. When the event loop starts to run, wait function will wait for prerequired jobs in self.jobs list to be done, and continue to execute running the task if all prerequired jobs is successful.

The coroutine will be done when the Job fininsh gettng the result from task.

async update()¶

Update attributes from task’s result.

A python coroutine await by wait(). Will wait for task to update its result by await task.update(), then use the updated result from task to update Job’s attributes.

abstract update_result(task_result)¶

Update the task result to Job’s attributes.

Raises: NotImplementedError – If child class do not implement this function.

stop()¶

Stop Job.

Job will handle stoping itself by following conditions. If pending, will mark its status as fail, and do nothing and remains same status if it is done already. In running status the status will turn to fail, there are three conditions needed to be handle in terms of the status of task. It needs to call the api to stop task only if the task is in running status, else just mark task as fail if it haven’t start running yet or else remains same done status.

get(attr)¶

Get Job’s attribute

If it calls this function while Job is still undone, will appears message for remind.

Parameters: attr (str) – String that contains the attribute’s name.
Returns: Value of the named attribute of the given object.

class decanter.core.jobs.data_upload.DataUpload(file=None, name=None, eda=True)¶

Bases: decanter.core.jobs.job.Job

DataUpload manage to get the result from data upload.

Handle the execution of upload task in order to upload data to Decanter Core server Stores the upload results in DataUpload attributes.

jobs¶

None, list up jobs that DataUpload needs to wait for.

Type: list

task¶

Upload task run by DataUpload.

Type: UploadTask

accessor¶

Accessor for files in hdfs.

Type: dict

schema¶

The original data schema.

Type: dict

originSchema¶

The original data schema.

Type: dict

annotationsMeta¶

information: Extra information for data.

Type: dict

options¶

Extra information for data.

Type: dict

created_at¶

The date the data was created.

Type: str

updated_at¶

The time the data was last updated.

Type: str

completed_at¶

The time the data was completed at.

Type: str

name¶

Name to track Job progress, will give default name if None.

Type: str

classmethod create(data_id, name=None)¶

Create data by data_id.

Parameters

data_id (str) – ObjectId in 24 hex digits
name (str) – (opt) Name to track Job progress

Returns

DataUpload object

download_csv(path)¶

DownLoad csv format of the uploaded data.

Parameters: path (str) – The path to download csv file.

show()¶

Show data content.

Returns: Content of uploaded data.
Return type: str

show_df()¶

Show data in pandas dataframe.

Returns: Content of uploaded data.
Return type: pandas.DataFrame

update_result(task_result)¶: Update from ‘result’ in Task response.

Experiment and ExperimentTS handles the training of models on Decanter Core server, and stores Experiment results in its attributes.

class decanter.core.jobs.experiment.Experiment(train_input, select_model_by=Evaluator.auto, name=None)¶

Bases: decanter.core.jobs.job.Job

Experiment manage to get the results from model training.

Handle the execution of training task in order to train model on Decanter Core server Stores the training results in Experiment’s attributes.

jobs¶

[DataUpload]. List of jobs that Experiment needs to wait till completed.

Type: list(Job)

task¶

Train task runned by Experiment Job.

Type: TrainTask

train_input¶

Settings for training models.

Type: TrainInput

best_model¶

Model with the best score in select_model_by argument.

Type: Model

select_model_by¶

The score to select best model.

Type: str

features¶

The features used for training.

Type: list(str)

train_data_id¶

The ID of the train data.

Type: str

target¶

The target of the experiment.

Type: str

test_base_id¶

The ID for the test base data.

Type: str

models¶

The models’ id of the experiment.

Type: list(str)

hyperparameters¶

The hyperparameters of the experiment.

Type: dict

attributes¶

The experiment attributes.

Type: dict

recommendations¶

Recommended model for each evaluator.

Type: dict

created_at¶

The date the data was created.

Type: str

options¶

Extra information for experiment.

Type: dict

updated_at¶

The time the data was last updated.

Type: str

completed_at¶

The time the data was completed at.

Type: str

name¶

Name to track Job progress.

Type: str

classmethod create(exp_id, name=None)¶

Create Experiment by exp_id.

Parameters

exp_id (str) – ObjectId in 24 hex digits.
name (str, optional) – Name to track Job progress.

Returns

Experiment object: with the specific id.

Return type

Experiment

get_best_model()¶: Get the best model in experiment by select_model_by and stores in best model attribute.

update_result(task_result)¶: Update Job’s attribute from Task’s result.

class decanter.core.jobs.experiment.ExperimentCluster(train_input, select_model_by=Evaluator.auto, name=None)¶

Bases: decanter.core.jobs.experiment.Experiment, decanter.core.jobs.job.Job

ExperimentTS manage to get the result from clustering model training.

Handle the execution of clustering training task in order train clustering model on Decanter Core server Stores the training results in ExperimentCluster’s attributes.

jobs¶

[DataUpload]. List of jobs that ExperimentTS needs to wait till completed.

Type: list(Job)

task¶

Time series training task run by ExperimentTS Job.

Type: TrainTSTask

train_input¶

Settings for time series training models.

Type: TrainTSInput

best_model¶

MultiModel with the best score in select_model_by argument

Type: MultiModel

select_model_by¶

The score to select best model

Type: str

features¶

The features used for training

Type: list(str)

train_data_id¶

The ID of the train data

Type: str

target¶

The target of the experiment

Type: str

test_base_id¶

The ID for the test base data

Type: str

models¶

The models’ id of the experiment

Type: list(str)

hyperparameters¶

The hyperparameters of the experiment.

Type: dict

attributes¶

The experiment attributes.

Type: dict

recommendations¶

Recommended model for each evaluator.

Type: dict

created_at¶

The date the data was created.

Type: str

options¶

Extra information for experiment.

Type: dict

updated_at¶

The time the data was last updated.

Type: str

completed_at¶

The time the data was completed at.

Type: str

name¶

Name to track Job progress.

Type: str

classmethod create(exp_id, name=None)¶

Create Clustering Experiment by exp_id. Inherit from create()

Parameters

exp_id (str) – ObjectId in 24 hex digits
name (str, optional) – (opt) Name to track Job progress

Returns

Experiment object with the specific id.

Return type

ExperimentCluster

class decanter.core.jobs.experiment.ExperimentTS(train_input, select_model_by=Evaluator.auto, name=None)¶

Bases: decanter.core.jobs.experiment.Experiment, decanter.core.jobs.job.Job

ExperimentTS manage to get the result from time series model training.

Handle the execution of time series training task in order train time series model on Decanter Core server Stores the training results in ExperimentTS’s attributes.

jobs¶

[DataUpload]. List of jobs that ExperimentTS needs to wait till completed.

Type: list(Job)

task¶

Time series training task run by ExperimentTS Job.

Type: TrainTSTask

train_input¶

Settings for time series training models.

Type: TrainTSInput

best_model¶

MultiModel with the best score in select_model_by argument

Type: MultiModel

select_model_by¶

The score to select best model

Type: str

features¶

The features used for training

Type: list(str)

train_data_id¶

The ID of the train data

Type: str

target¶

The target of the experiment

Type: str

test_base_id¶

The ID for the test base data

Type: str

models¶

The models’ id of the experiment

Type: list(str)

hyperparameters¶

The hyperparameters of the experiment.

Type: dict

attributes¶

The experiment attributes.

Type: dict

recommendations¶

Recommended model for each evaluator.

Type: dict

created_at¶

The date the data was created.

Type: str

options¶

Extra information for experiment.

Type: dict

updated_at¶

The time the data was last updated.

Type: str

completed_at¶

The time the data was completed at.

Type: str

name¶

Name to track Job progress.

Type: str

classmethod create(exp_id, name=None)¶

Create Time series Experiment by exp_id. Inherit from create()

Parameters

exp_id (str) – ObjectId in 24 hex digits
name (str, optional) – (opt) Name to track Job progress

Returns

Experiment object with the specific id.

Return type

ExperimentTS

PredictResult and PredictTSResult handle the prediction of the model training on Decanter Core server, and stores the predict results in its attributes.

class decanter.core.jobs.predict_result.PredictResult(predict_input, name=None)¶

Bases: decanter.core.jobs.job.Job

PredictResult manage to get the results from predictions.

Handle the execution of predict task in order to predict model on Decanter Core server Stores the predict results in PredictResult’s attributes.

jobs¶

List of jobs that PredictResult needs to wait for, [TestData, Experiment].

Type: list(Job)

task¶

Predict task runned by PredictResult Job.

Type: PredictTask

accessor¶

Accessor for files in hdfs.

Type: dict

schema¶

The original data schema.

Type: dict

originSchema¶

The original data schema.

Type: dict

annotationsMeta¶

information: Extra information for data.

Type: dict

options¶

Extra information for data.

Type: dict

created_at¶

The date the data was created.

Type: str

updated_at¶

The time the data was last updated.

Type: str

completed_at¶

The time the data was completed at.

Type: str

download_csv(path)¶

DownLoad csv format of the predict result.

Parameters: path (str) – The path to download csv file.

show()¶

Show content of predict result.

Returns: Content of PredictResult.
Return type: str

show_df()¶

Show predict result in pandas dataframe.

Returns: Content of predict result.
Return type: pandas.DataFrame

update_result(task_result)¶: Update Job’s attributes from Task’s result.

class decanter.core.jobs.predict_result.PredictTSResult(predict_input, name=None)¶

Bases: decanter.core.jobs.predict_result.PredictResult, decanter.core.jobs.job.Job

Predict time series’s model result.

Handle time series’s model Prediction on Decanter Core server Stores predict Result to attribute.

jobs¶

List of jobs that PredictTSResult needs to wait for, [TestData, Experiment].

Type: list(Job)

task¶

Predict task runned by PredictResult Job.

Type: PredictTSTask

accessor¶

Accessor for files in hdfs.

Type: dict

schema¶

The original data schema.

Type: dict

originSchema¶

The original data schema.

Type: dict

annotationsMeta¶

information: Extra information for data.

Type: dict

options¶

Extra information for data.

Type: dict

created_at¶

The date the data was created.

Type: str

updated_at¶

The time the data was last updated.

Type: str

completed_at¶

The time the data was completed at.

Type: str

Core Api Interface¶

The API interfaces of Decanter, mainly handles the request and response body.

Train Input¶

Settings for the Model Training and Time Series MultiModel Training.

class decanter.core.core_api.train_input.TrainClusterInput(data, callback=None, features=None, feature_types=None, k=None, seed=None, version=None)¶

Bases: object

Train Input for Clustering Experiment Job.

Settings for model training.

Parameters

data (DataUpload) – Train data uploaded on Decanter Core server
features (list of str) – selected feature for training
seed (int) – Seed to be used for operations that have sudo random behavior. Fixing seed across runs will ensure reproducible results

Example

train_input = TrainClusterInput(data=train_data)

get_train_params()¶

Using train_body to create the JSON request body for training.

Returns: dict

class decanter.core.core_api.train_input.TrainInput(data, target, algos, callback=None, test_base_id=None, test_data_id=None, evaluator=None, features=None, feature_types=None, max_run_time=None, max_model=None, tolerance=None, nfold=None, ts_split_split_by=None, ts_split_cv=None, ts_split_train=None, ts_split_test=None, seed=None, balance_class=None, max_after_balance=None, sampling_factors=None, validation_percentage=None, holdout_percentage=None, apu=None, preprocessing=None, version=None)¶

Bases: object

Train Input for Experiment Job.

Settings for model training.

Parameters

data (DataUpload) – Train data uploaded on Decanter Core server
target (str) – the name of the target column
algos (Algo) – enabled algorithms (left with default None will enable all algorithms)
evaluator (Evaluator) – default evaluator for early stopping
features (list of str) – selected feature for training
max_model (int) – Model family and hyperparameter search will stop after the specified number of models are trained. Stacked Ensemble models are not counte.
tolerance (float) – Tolerance for early stop in both model training and model family and hyperparameter search. A higher value results in less accurate models, but faster training times and a larger model pool. Lower tolerance means better accuracy, but longer training time and a smaller model pool. It is recommended that user start with higher tolerance, and move to lower tolernce when the model training process is finalized
nfold (int) – The number of cross validation folds to be used during model training
seed (int) – Seed to be used for operations that have sudo random behavior. Fixing seed across runs will ensure reproducible results
balance_class (bool) – If true, will balance class distribution The maximum relative size increase of the training data after balancing class (in most cases, enabling the balance_classes option will increase the data frame size
validation_percentage (float) – Percentage of the train data to be used as the validation set.
holdout_percentage (float) – Percentage of the training data to be used as a holdout set.

Example

train_input = TrainInput(data=train_data, target='Survived',
algos=Algo.XGBoost, max_model=2, tolerance=0.9)

get_train_params()¶

Using train_body to create the JSON request body for training.

Returns: dict

class decanter.core.core_api.train_input.TrainTSInput(data, target, datetime_column, forecast_horizon, gap, algorithms=None, feature_types=None, callback=None, version='v2', max_iteration=None, generation_size=None, mutation_rate=None, crossover_rate=None, tolerance=None, validation_percentage=None, holdout_percentage=None, max_model=None, seed=None, evaluator=None, max_run_time=None, nfold=None, time_unit=None, numerical_groupby_method=None, categorical_groupby_method=None, endogenous_features=None, exogenous_features=None, time_groups=None, max_window_for_feature_derivation=None)¶

Bases: object

Train Input for ExperimentTS Job.

Settings for auto time series forecast training.

Parameters

data (DataUpload) – Train data uploaded on Decanter Core server
target (str) – the name of the target column
datetime_column (str) – the name of the datetime column used for time series ordering
forecast_horizon (int) – The number of data points to predict for auto time series. In current time series forecast model, the larger this value is, training time will take longer.
gap (int) – The number of time units between the train data and prediction data
max_window_for_feature_derivation (int) – This value limit the number of features we can use from the past to generate endogenous features. The value makes sure that when generating endogenous feature for forecast time t, we only use features from [t - gap - max_window_for_feature_derivation, t - gap). Note the larger this value is, the fewer data is resulted after feature engineering.
algos (Algo) – enabled algorithms (left with default None will enable all algorithms)
evaluator (Evaluator) – default evaluator for early stopping
features (list of str) – selected feature for training
max_model (int) – Model family and hyperparameter search will stop after the specified number of models are trained. Stacked Ensemble models are not counte.
tolerance (float) – Tolerance for early stop in both model training and model family and hyperparameter search. A higher value results in less accurate models, but faster training times and a larger model pool. Lower tolerance means better accuracy, but longer training time and a smaller model pool. It is recommended that user start with higher tolerance, and move to lower tolernce when the model training process is finalized
nfold (int) – The number of cross validation folds to be used during model training
seed (int) – Seed to be used for operations that have sudo random behavior. Fixing seed across runs will ensure reproducible results
balance_class (bool) – If true, will balance class distribution The maximum relative size increase of the training data after balancing class (in most cases, enabling the balance_classes option will increase the data frame size
validation_percentage (float) – Percentage of the train data to be used as the validation set.
holdout_percentage (float) – Percentage of the training data to be used as a holdout set.

get_train_params()¶

Using train_auto_ts_body to create the JSON request body for time series forecast training.

Returns: dict

static get_ts_algorithms()¶

Setup Input¶

class decanter.core.core_api.setup_input.ColumnSpec(id, data_type)¶

Bases: tuple

property data_type¶: Alias for field number 1

property id¶: Alias for field number 0

class decanter.core.core_api.setup_input.SetupInput(data, data_columns, callback=None, eda=None, preprocessing=None, version=None)¶

Bases: object

Setup Input for Experiment Job.

Settings for model training.

Parameters

data (DataUpload) – Train data uploaded on Decanter Core server
data_columns (list of ColumnSpec) – Request body for sending setup api.
eda (bool) – Whether to perform eda on data setup

Example

setup_input = SetupInput(
    data = upload_data,
    data_source=upload_data.accessor,
    data_columns=[{ 'id': 'Pclass', 'data_type': 'categorical'}]
)

get_setup_params()¶

Using setup_body to create the JSON request body for setting data.

Returns: dict

Predict Input¶

Settings for the PredictResult and PredictTSResult

class decanter.core.core_api.predict_input.PredictInput(data, experiment, select_model='best', select_opt=None, callback=None, keep_columns=None, threshold=None, version=None)¶

Bases: object

Predict Input for PredictResult Job.

Setting test data, best model, and the request body for prediction.

Parameters

data (DataUpload) – Test data.
experiment (Experiment) – Experiment from training.
select_model (str, optional) –
Methods of screening models
- best (default): predict with the best model scored on cv average
- model_id: predict with the model designated by model_id
- recommendation: predict with the recommended model
select_opt (str, optional) –
Based on the options required by the select_model. value with select_model case:
- best: None
- model_id: given the model ID (model ID will be in the format of ObjectID)
- recommendation: metric, ex: auc …
keep_columns (list, optional) – The names of the columns that will be appended to the prediction data.
threshold (double, optional) – Prediction threshold for binary classification models. Max = 1, Min = 0

Examples

predict_input = PredictInput(
    data=test_data,
    experiment=exp,
    threshold=0.9,
    keep_columns=['col_1', 'col_3']
)

getPredictParams()¶

Using pred_body to create the JSON request body for prediction.

Returns: dict

class decanter.core.core_api.predict_input.PredictTSInput(data, experiment, callback=None, threshold_max_by=None, version=None)¶

Bases: decanter.core.core_api.predict_input.PredictInput

Time series predict input for PredictTSResult Job.

Setting test data, best model, and the request body for prediction.

Parameters

data (DataUpload) – Test data.
experiment (Experiment) – Experiment from training.
threshold_max_by (float, optional) – Prediction threshold for binary classification only, can choose a threshold that optimizes the given metric.Available options are:precision, f1, accuracy, evaluator, recall, specificity, f2, f0point5, mean_per_class_error

Model¶

Model and Multimodel.

This module handles actions relate to models. Storing Attributes of model metadata from decanter.core server, also support model downloading from server or uploading form local zip fils.

class decanter.core.core_api.model.Model¶

Bases: object

Model from training.

get_model¶: Function to get models metadata.

download_model¶: Function to get model mojo file.

task_status¶

Status of training task.

Type: str

id¶

ObjectId in 24 hex digits.

Type: str

key¶

The unique key for the model.

Type: str

name¶

The name of the model (may not be unique across projects).

Type: str

exp_id¶

The experiment ID of the model.

Type: str

importances¶

The feature importance.

Type: dict

attributes¶

Model related scores and feature importance.

Type: dict

hyperparameters¶

The model’s hyperparameters.

Type: dict

created_at¶

The time the model was created at.

Type: str

updated_at¶

The time the model was last updated.

Type: str

completed_at¶

The time the model was completed_at.

Type: str

classmethod create(exp_id, model_id)¶

Create Model or MultiModel depends on which instance type has called.

Parameters

exp_id (str) – The experiment ID of the model.
model_id (str) – ObjectId in 24 hex digits.

Returns

Model or MultiModel object.

Raises

AttributeError – Occurred when getting no results from decanter server when getting model’s metadata.

download(model_path)¶

Download model file to local.

Download the trained mojo model from Model instance to local in the format of zip file.

Parameters: model_path (str) – Path to store zip mojo file.

classmethod download_by_id(model_id, model_path)¶

Download model file to local.

Getting Mojo model zip file from decanter.core server and download to local.

Parameters

model_id (str) – ObjectId in 24 hex digits.
model_path (str) – Path to store zip mojo file.

get(attr)¶

Get attribute of model.

Parameters: attr (str) – model’s attribute.
Returns: Value of the attribute of the given object.

is_done()¶

Check if training task is done.

Returns: True if Model’s task is done, else False.
Return type: bool

update(exp_id, model_id)¶

Update model attributes.

Get and Set attributes from the response attribute from decanter server.

class decanter.core.core_api.model.MultiModel¶

Bases: decanter.core.core_api.model.Model

MultiModel from time series training.

get_model¶: Get multi models metadata.

download_model¶: Get model mojo file.

task_status¶

Status of training task.

Type: str

id¶

ObjectId in 24 hex digits.

Type: str

name¶

The name of the model (may not be unique across projects).

Type: str

exp_id¶

The experiment ID of the model.

Type: str

attributes¶

Model related scores and feature importance.

Type: dict

predictionPipeline¶

TODO

Type: dict

hyperparameters¶

TODO

Type: dict

created_at¶

The time the model was created at.

Type: str

updated_at¶

The time the model was last updated.

Type: str

completed_at¶

The time the model was completed_at.

Type: str

classmethod create(exp_id, model_id)¶: Create Multimodel. Inherit from Model.create().

download(model_path)¶: MultiModel has no download method

classmethod download_by_id(model_id, model_path)¶: MultiModel has no download method

Enum¶

Return the machine learning algorithm and evaluator supported by the current Decanter in the form of enumerate object.

Algorithms¶

Function for user access the Machine Learning Algorithms of the Decanter AI Core SDK.

class decanter.core.enums.algorithms.Algo(value)¶

The Algo enumeration is the machine learning algorithms currently supported by the Decanter AI Core SDK

TrainInput (used by client.train) supported algorithms

DRF: Distributed Random Forest.
GLM: Generalized Linear Model.
GBM: Gradient Boosting Machine.
DeepLearning: Deep Learning.
StackedEnsemble: Stacked Ensemble.
XGBoost: eXtreme Gradient Boosting.
tpot: Tree-based Pipeline Optimization Tool. (available only after 4.10 deployed with Exodus).

TrainTSInput (used by client.train_ts) supported algorithms

GLM: Generalized Linear Model.
DRF: Distributed Random Forest.
GBM: Gradient Boosting Machine.
XGBoost: eXtreme Gradient Boosting.
arima: auto arima (available only after 4.9 deployed with Exodus, only available for regression).
prophet: auto prophet (available only after 4.9 deployed with Exodus, only available for regression).
theta: auto theta (available only after 4.9 deployed with Exodus, only available for regression).
ets: auto ets (available only after 4.10 deployed with Exodus, only available for regression).

Evaluators¶

Function for user access the Evaluators Metrics of the Decanter AI Core SDK.

class decanter.core.enums.evaluators.Evaluator(value)¶

The Evaluator enumeration is the metrics currently supported by the Decanter AI Core SDK

Regression
- auto (deviance)
- deviance
- mse
- mae
- rmsle
- r2
- mape (Supported from Decanter AI 4.9~)
- wmape (Supported from Decanter AI 4.9~)
Binary Classification
- auto (logloss)
- logloss
- lift_top_group
- auc
- misclassification
Multinomial Classification
- auto (logloss)
- logloss
- misclassification
Clustering
- auto (tot_withinss)
- tot_withinss

Developer Interface¶

Main Interface¶

Connection and Funcional Settings¶

Client for Decanter Core API¶

Plot¶

Prompt Info¶

Jobs¶

Task¶

Jobs¶

Core Api Interface¶

Train Input¶

Setup Input¶

Predict Input¶

Model¶

Enum¶

Algorithms¶

Evaluators¶

Table of Contents

Previous topic

Next topic