multiml.task.basic.ml_base module

MLBaseTask module.

class multiml.task.basic.ml_base.MLBaseTask(phases=None, input_var_names=None, output_var_names=None, save_var_names=None, pred_var_names=None, true_var_names=None, var_names=None, model=None, model_args=None, optimizer=None, optimizer_args=None, scheduler=None, scheduler_args=None, loss=None, loss_args=None, max_patience=None, loss_weights=None, load_weights=False, save_weights=False, metrics=None, num_epochs=10, batch_size=64, num_workers=0, verbose=None, **kwargs)

Bases: BaseTask

Base task class for (deep) machine learning tasks.

__init__(phases=None, input_var_names=None, output_var_names=None, save_var_names=None, pred_var_names=None, true_var_names=None, var_names=None, model=None, model_args=None, optimizer=None, optimizer_args=None, scheduler=None, scheduler_args=None, loss=None, loss_args=None, max_patience=None, loss_weights=None, load_weights=False, save_weights=False, metrics=None, num_epochs=10, batch_size=64, num_workers=0, verbose=None, **kwargs)

Initialize ML base task.

This base class will be inherited by deep learning task classes, KerasBaseTask() and PytorchBaseTask(). input_var_names and output_var_names specify data for model inputs and outputs. If input_var_names is list, e.g. [‘var0’, ‘var1’], model will receive data with format of [(batch size, k), (batch size, k)], where k is arbitrary shape of each variable. If input_var_names is tuple, e.g. (‘var0’, ‘var1’), model will receive data with (batch size, M, k), where M is the number of variables. If output_var_names` is list, model must returns list of tensor data for each variable. If output_var_names` is tuple, model must returns a tensor data. pred_var_names and true_var_names specify data for loss calculations. If pred_var_names is given, only variables indicated by pred_var_names are selected from model outputs before being passed to loss calculation. Please see KerasBaseTask()` or ``PytorchBaseTask() for actual examples.

Parameters:
  • phases (list) – list to indicates ML phases, e.g. [‘train’, ‘test’]. If None is given, [‘train’, ‘valid’, ‘test’] is set.

  • input_var_names (str or list or tuple) – input variable names in StoreGate.

  • output_var_names (str or list or tuple) – output variable names of model.

  • save_var_names (str or list) – variable names saved to StoreGate..

  • pred_var_names (str or list) – prediction variable names passed to loss.

  • true_var_names (str or list or tuple) – true variable names.

  • var_names (str) – str of “input output true” variable names for shortcut. This is not valid to specify multiple variables.

  • model (str or obj) – name of model, or class object of model.

  • model_args (dict) – args of model, e.g. dict(param0=0, param1=1).

  • optimizer (str or obj) – name of optimizer, or class object of optimizer

  • optimizer_args (dict) – args of optimizer.

  • scheduler (str or obj) – name of scheduler, or class object of scheduler

  • scheduler_args (dict) – args of scheduler.

  • loss (str or obj) – name of loss, or class object of loss

  • loss_args (dict) – args of loss.

  • max_patience (int) – max number of patience for early stopping. early_stopping is enabled if ``max_patience is given.

  • loss_weights (list) – scalar coefficients to weight the loss.

  • load_weights (bool or str) – user defined algorithms should assume the following behavior. If False, not load model weights. If True, load model weights from default location. If str, load weights from given path.

  • save_weights (bool or str) – user defined algorithms should assume the following behavior. If False, not save model weights. If True, save model weights to default location. If str, save weights to given path.

  • metrics (list) – metrics of evaluation.

  • num_epochs (int) – number of epochs.

  • batch_size (int or dict) – size of mini batch, you can set different batch_size for test, train, valid.

  • num_workers (int) – number of workers for dataloaders.

  • verbose (int) – verbose option for fitting step. If None, it’s set based on logger.MIN_LEVEL

set_hps(params)

Set hyperparameters to this task.

Class attributes (self._XXX) are automatically set based on keys and values of given dict. Hyperparameters start with ‘model__’, ‘optimizer__’ and ‘loss__’ are considred as args of model, optimizer, loss, respectively. If value of hyperparameters is str and starts with ‘saver__’, value is retrieved from `Saver` instance, please see exampels below.

Parameters:

params (dict) – key and value of hyperparameters.

Example

>>> hps_dict = {
>>>    'num_epochs': 10, # normal hyperparameter
>>>    'optimizer__lr': 0.01 # hyperparameter of optimizer
>>>    'saver_hp': 'saver__key__value' # hyperparamer from saver
>>> }
>>> task.set_hps(hps_dict)
execute()

Execute a task.

fit(train_data=None, valid_data=None)

Fit model.

Parameters:
  • train_data (ndarray) – training data.

  • valid_data (ndarray) – validation data.

predict(data=None, phase=None)

Predict model.

Parameters:

data (ndarray) – prediction data.

update(data, phase='auto')

Update data in storegate.

Parameters:
  • data (ndarray) – new data.

  • phase (str) – train, valid, test, auto.

fit_predict(fit_args=None, predict_args=None)

Fit and predict model.

Parameters:
  • fit_args (dict) – arbitrary dict passed to fit().

  • predict_args (dict) – arbitrary dict passed to predict().

Returns:

results of prediction.

Return type:

ndarray or list

predict_update(data=None, phase=None)

Predict and update data in StoreGate.

Parameters:

data (ndarray) – data passed to predict() method.

property phases

Returns ML phases.

property input_var_names

Returns input_var_names.

property output_var_names

Returns output_var_names.

property save_var_names

Returns save_var_names.

property pred_var_names

Returns pred_var_names.

property true_var_names

Returns true_var_names.

property ml

Returns ML data class.

compile()

Compile model, optimizer and loss.

Compiled objects will be avaialble via self.ml.model, self.ml.optimizer and self.ml.loss.

Examples

>>> # compile all together,
>>> self.compile()
>>> # which is equivalent to:
>>> self.build_model() # set self._model
>>> self.compile_model() # set self.ml.model
>>> self.compile_optimizer() # set self.ml.optimizer
>>> self.compile_loss() # set self.ml.loss
build_model()

Build model.

compile_var_names()

Compile var_names.

compile_model()

Compile model.

compile_optimizer()

Compile optimizer.

compile_loss()

Compile loss.

load_model()

Load pre-trained model path from Saver.

Returns:

model path.

Return type:

str

dump_model(extra_args=None)

Dump current model to saver.

Parameters:

extra_args (dict) – extra metadata to be stored together with model.

load_metadata()

Load metadata.

get_input_true_data(phase)

Get input and true data.

Parameters:

phase (str) – data type (train, valid, test or None).

Returns:

(input, true) data for model.

Return type:

tuple

get_input_var_shapes(phase='train')

Get shape of input_var_names.

Parameters:

phase (str) – train, valid, test or None.

Returns:

shape of a variable, or list of shapes

Return type:

ndarray,shape of lit

get_metadata(metadata_key)

Returns metadata.

Parameters:

metadata_key (str) – key of Saver().

Returns:

arbitrary object stored in Saver.

Return type:

Obj

get_pred_index()

Returns prediction index passed to loss calculation.

Returns:

list of prediction index.

Return type:

list

do_train()

Perform train phase or not.

do_valid()

Perform valid phase or not.

do_test()

Perform test phase or not.

show_info()

Print information.