multiml.task.basic package

Subpackages

Submodules

Module contents

class multiml.task.basic.BaseTask(saver=None, input_saver_key='tmpkey', output_saver_key='tmpkey', storegate=None, data_id=None, name=None)

Bases: Task

Base task class for the default functions.

All subtasks defined by users, need to inherit this BaseTask. In user defined class, super.__init__() must be called in __init__() method. A task class is assumed to call its methods by following sequence: set_hps() -> execute() -> finalize(). If task class instance is registered to TaskScheduler as subtask, self._task_id and self._subtask_id are automatically set by TaskScheduler.

Examples

>>> task = BaseTask()
>>> task.set_hps({'hp_layer': 5, 'hp_epoch': 256})
>>> task.execute()
>>> task.finalize()
__init__(saver=None, input_saver_key='tmpkey', output_saver_key='tmpkey', storegate=None, data_id=None, name=None)

Initialize base task.

Parameters:
  • saver (Saver) – Saver class instance to record metadata data.

  • input_saver_key (int) – unique saver key to retrieve metadata.

  • output_saver_key (int) – unique saver key to save metadata.

  • storegate (Storegate) – Storegate class instance to manage data.

  • data_id (str) – data_id of Storegate, which is set by set_hps().

  • name (str) – task’s name. If None, classname is used alternatively.

execute()

Execute base task.

Users implement their algorithms.

finalize()

Finalize base task.

Users implement their algorithms.

set_hps(params)

Set hyperparameters to this task.

Class attributes (self._XXX) are automatically set based on keys and values of given dict. E.g. dict of {‘key0’: 0, ‘key1’: 1} is given, self._key0 = 0 and self._key1 = 1 are created.

property name

Return name of task.

property job_id

Return job_id of task.

property trial_id

Return trial_id of task.

property task_id

Return task_id of task.

property subtask_id

Return subtask_id of task.

property pool_id

Return pool_id of task.

property storegate

Return storegate of task.

property saver

Return saver of task.

property input_saver_key

Return input_saver_key.

property output_saver_key

Return output_saver_key.

get_unique_id()

Returns unique identifier of task.

class multiml.task.basic.MLBaseTask(phases=None, input_var_names=None, output_var_names=None, save_var_names=None, pred_var_names=None, true_var_names=None, var_names=None, model=None, model_args=None, optimizer=None, optimizer_args=None, scheduler=None, scheduler_args=None, loss=None, loss_args=None, max_patience=None, loss_weights=None, load_weights=False, save_weights=False, metrics=None, num_epochs=10, batch_size=64, num_workers=0, verbose=None, **kwargs)

Bases: BaseTask

Base task class for (deep) machine learning tasks.

__init__(phases=None, input_var_names=None, output_var_names=None, save_var_names=None, pred_var_names=None, true_var_names=None, var_names=None, model=None, model_args=None, optimizer=None, optimizer_args=None, scheduler=None, scheduler_args=None, loss=None, loss_args=None, max_patience=None, loss_weights=None, load_weights=False, save_weights=False, metrics=None, num_epochs=10, batch_size=64, num_workers=0, verbose=None, **kwargs)

Initialize ML base task.

This base class will be inherited by deep learning task classes, KerasBaseTask() and PytorchBaseTask(). input_var_names and output_var_names specify data for model inputs and outputs. If input_var_names is list, e.g. [‘var0’, ‘var1’], model will receive data with format of [(batch size, k), (batch size, k)], where k is arbitrary shape of each variable. If input_var_names is tuple, e.g. (‘var0’, ‘var1’), model will receive data with (batch size, M, k), where M is the number of variables. If output_var_names` is list, model must returns list of tensor data for each variable. If output_var_names` is tuple, model must returns a tensor data. pred_var_names and true_var_names specify data for loss calculations. If pred_var_names is given, only variables indicated by pred_var_names are selected from model outputs before being passed to loss calculation. Please see KerasBaseTask()` or ``PytorchBaseTask() for actual examples.

Parameters:
  • phases (list) – list to indicates ML phases, e.g. [‘train’, ‘test’]. If None is given, [‘train’, ‘valid’, ‘test’] is set.

  • input_var_names (str or list or tuple) – input variable names in StoreGate.

  • output_var_names (str or list or tuple) – output variable names of model.

  • save_var_names (str or list) – variable names saved to StoreGate..

  • pred_var_names (str or list) – prediction variable names passed to loss.

  • true_var_names (str or list or tuple) – true variable names.

  • var_names (str) – str of “input output true” variable names for shortcut. This is not valid to specify multiple variables.

  • model (str or obj) – name of model, or class object of model.

  • model_args (dict) – args of model, e.g. dict(param0=0, param1=1).

  • optimizer (str or obj) – name of optimizer, or class object of optimizer

  • optimizer_args (dict) – args of optimizer.

  • scheduler (str or obj) – name of scheduler, or class object of scheduler

  • scheduler_args (dict) – args of scheduler.

  • loss (str or obj) – name of loss, or class object of loss

  • loss_args (dict) – args of loss.

  • max_patience (int) – max number of patience for early stopping. early_stopping is enabled if ``max_patience is given.

  • loss_weights (list) – scalar coefficients to weight the loss.

  • load_weights (bool or str) – user defined algorithms should assume the following behavior. If False, not load model weights. If True, load model weights from default location. If str, load weights from given path.

  • save_weights (bool or str) – user defined algorithms should assume the following behavior. If False, not save model weights. If True, save model weights to default location. If str, save weights to given path.

  • metrics (list) – metrics of evaluation.

  • num_epochs (int) – number of epochs.

  • batch_size (int or dict) – size of mini batch, you can set different batch_size for test, train, valid.

  • num_workers (int) – number of workers for dataloaders.

  • verbose (int) – verbose option for fitting step. If None, it’s set based on logger.MIN_LEVEL

set_hps(params)

Set hyperparameters to this task.

Class attributes (self._XXX) are automatically set based on keys and values of given dict. Hyperparameters start with ‘model__’, ‘optimizer__’ and ‘loss__’ are considred as args of model, optimizer, loss, respectively. If value of hyperparameters is str and starts with ‘saver__’, value is retrieved from `Saver` instance, please see exampels below.

Parameters:

params (dict) – key and value of hyperparameters.

Example

>>> hps_dict = {
>>>    'num_epochs': 10, # normal hyperparameter
>>>    'optimizer__lr': 0.01 # hyperparameter of optimizer
>>>    'saver_hp': 'saver__key__value' # hyperparamer from saver
>>> }
>>> task.set_hps(hps_dict)
execute()

Execute a task.

fit(train_data=None, valid_data=None)

Fit model.

Parameters:
  • train_data (ndarray) – training data.

  • valid_data (ndarray) – validation data.

predict(data=None, phase=None)

Predict model.

Parameters:

data (ndarray) – prediction data.

update(data, phase='auto')

Update data in storegate.

Parameters:
  • data (ndarray) – new data.

  • phase (str) – train, valid, test, auto.

fit_predict(fit_args=None, predict_args=None)

Fit and predict model.

Parameters:
  • fit_args (dict) – arbitrary dict passed to fit().

  • predict_args (dict) – arbitrary dict passed to predict().

Returns:

results of prediction.

Return type:

ndarray or list

predict_update(data=None, phase=None)

Predict and update data in StoreGate.

Parameters:

data (ndarray) – data passed to predict() method.

property phases

Returns ML phases.

property input_var_names

Returns input_var_names.

property output_var_names

Returns output_var_names.

property save_var_names

Returns save_var_names.

property pred_var_names

Returns pred_var_names.

property true_var_names

Returns true_var_names.

property ml

Returns ML data class.

compile()

Compile model, optimizer and loss.

Compiled objects will be avaialble via self.ml.model, self.ml.optimizer and self.ml.loss.

Examples

>>> # compile all together,
>>> self.compile()
>>> # which is equivalent to:
>>> self.build_model() # set self._model
>>> self.compile_model() # set self.ml.model
>>> self.compile_optimizer() # set self.ml.optimizer
>>> self.compile_loss() # set self.ml.loss
build_model()

Build model.

compile_var_names()

Compile var_names.

compile_model()

Compile model.

compile_optimizer()

Compile optimizer.

compile_loss()

Compile loss.

load_model()

Load pre-trained model path from Saver.

Returns:

model path.

Return type:

str

dump_model(extra_args=None)

Dump current model to saver.

Parameters:

extra_args (dict) – extra metadata to be stored together with model.

load_metadata()

Load metadata.

get_input_true_data(phase)

Get input and true data.

Parameters:

phase (str) – data type (train, valid, test or None).

Returns:

(input, true) data for model.

Return type:

tuple

get_input_var_shapes(phase='train')

Get shape of input_var_names.

Parameters:

phase (str) – train, valid, test or None.

Returns:

shape of a variable, or list of shapes

Return type:

ndarray,shape of lit

get_metadata(metadata_key)

Returns metadata.

Parameters:

metadata_key (str) – key of Saver().

Returns:

arbitrary object stored in Saver.

Return type:

Obj

get_pred_index()

Returns prediction index passed to loss calculation.

Returns:

list of prediction index.

Return type:

list

do_train()

Perform train phase or not.

do_valid()

Perform valid phase or not.

do_test()

Perform test phase or not.

show_info()

Print information.

class multiml.task.basic.MLEnv(model=None, optimizer=None, scheduler=None, loss=None, loss_weights=None, multi_inputs=None, multi_outputs=None, multi_loss=None)

Bases: object

Data class to store compiled ML objects.

__init__(model=None, optimizer=None, scheduler=None, loss=None, loss_weights=None, multi_inputs=None, multi_outputs=None, multi_loss=None)

Initialize MLEnv.

clear()
property model

Returns model.

property optimizer

Returns optimizer.

property scheduler

Returns scheduler.

property loss

Returns loss.

property loss_weights

Returns loss_weights.

property multi_inputs

Returns multi_inputs.

property multi_outputs

Returns multi_outputs.

property multi_loss

Returns multi_loss.

show_info()

Print information.

validate(phase)

Validate environment for given phase.

class multiml.task.basic.SkleanPipelineTask(phases=None, input_var_names=None, output_var_names=None, save_var_names=None, pred_var_names=None, true_var_names=None, var_names=None, model=None, model_args=None, optimizer=None, optimizer_args=None, scheduler=None, scheduler_args=None, loss=None, loss_args=None, max_patience=None, loss_weights=None, load_weights=False, save_weights=False, metrics=None, num_epochs=10, batch_size=64, num_workers=0, verbose=None, **kwargs)

Bases: MLBaseTask

Wrapper task to process sklean object.

execute()

Execute fit.

class multiml.task.basic.ModelConnectionTask(subtasks, loss_weights=None, variable_mapping=None, **kwargs)

Bases: MLBaseTask

Build a single task connecting with multiple tasks.

ModelConnectionTask connects multiple ML tasks considering the input/output variables and dependencies of the tasks, then builds a single task. ML model of component tasks are trained diferentially, thus each ML model must be implemented by the same deep learning library, i.e. Keras or Pytorch. Each subtask must contain

  • input_var_names, output_var_names` and true_var_names,

  • loss function,

to compile subtask dependencies and data I/O formats. The following examples shows a workflow and its attributes, which are automatically compiled:

Examples

>>> '''
>>> (input0, input1, input2)
>>>      |   |        |
>>>   [subtask0]      |
>>>       |           |
>>>   (output0)       |
>>>       |           |
>>>   [subtask1]------+
>>>       |
>>>   (output1)
>>> '''
>>>
>>> input_var_names = ['input0', 'input1', 'input2']
>>> output_var_names = ['output0', 'output1']
>>> input_var_index = [[0, 1], [-1, 2]]
>>> output_var_index = [[0], [1]]

Examples

>>> task = ModelConnectionTask(subtasks=[your_subtask0, your_subtask2],
>>>                            optimizer='SGD')
>>> task.execute()
__init__(subtasks, loss_weights=None, variable_mapping=None, **kwargs)

Constructor of ModelConnectionTask.

Parameters:
  • subtasks (list) – list must contains ordered instance objects inherited from MLBaseTask.

  • loss_weights (list or dict or str) – list of loss weights for each task. last_loss and flat_loss are also allowed.

  • variable_mapping (list(str, str)) – Input variables are replaced following this list. Used for the case that the input variables change from pre-training to main-training (with model connecting).

  • **kwargs – Arbitrary keyword arguments passed to MLBaseTask.

compile()

Compile subtasks and this task.

compile_loss()

Compile loss and loss_weights.

Loss functions are retrieved from subtasks, thus each subtask must contain loss.

compile_var_names()

Compile subtask dependencies and I/O variables.

set_output_var_index()

Set output_var_names and output_var_index.

set_input_var_index()

Set input_var_names and input_var_index.