multiml.task.basic package
Subpackages
Submodules
- multiml.task.basic.base module
- multiml.task.basic.ml_base module
MLBaseTask
MLBaseTask.__init__()
MLBaseTask.set_hps()
MLBaseTask.execute()
MLBaseTask.fit()
MLBaseTask.predict()
MLBaseTask.update()
MLBaseTask.fit_predict()
MLBaseTask.predict_update()
MLBaseTask.phases
MLBaseTask.input_var_names
MLBaseTask.output_var_names
MLBaseTask.save_var_names
MLBaseTask.pred_var_names
MLBaseTask.true_var_names
MLBaseTask.ml
MLBaseTask.compile()
MLBaseTask.build_model()
MLBaseTask.compile_var_names()
MLBaseTask.compile_model()
MLBaseTask.compile_optimizer()
MLBaseTask.compile_loss()
MLBaseTask.load_model()
MLBaseTask.dump_model()
MLBaseTask.load_metadata()
MLBaseTask.get_input_true_data()
MLBaseTask.get_input_var_shapes()
MLBaseTask.get_metadata()
MLBaseTask.get_pred_index()
MLBaseTask.do_train()
MLBaseTask.do_valid()
MLBaseTask.do_test()
MLBaseTask.show_info()
- multiml.task.basic.ml_env module
- multiml.task.basic.ml_model_connection module
- multiml.task.basic.sklean_pipeline module
Module contents
- class multiml.task.basic.BaseTask(saver=None, input_saver_key='tmpkey', output_saver_key='tmpkey', storegate=None, data_id=None, name=None)
Bases:
Task
Base task class for the default functions.
All subtasks defined by users, need to inherit this
BaseTask
. In user defined class, super.__init__() must be called in __init__() method. A task class is assumed to call its methods by following sequence:set_hps()
->execute()
->finalize()
. If task class instance is registered toTaskScheduler
as subtask,self._task_id
andself._subtask_id
are automatically set byTaskScheduler
.Examples
>>> task = BaseTask() >>> task.set_hps({'hp_layer': 5, 'hp_epoch': 256}) >>> task.execute() >>> task.finalize()
- __init__(saver=None, input_saver_key='tmpkey', output_saver_key='tmpkey', storegate=None, data_id=None, name=None)
Initialize base task.
- Parameters:
saver (Saver) –
Saver
class instance to record metadata data.input_saver_key (int) – unique saver key to retrieve metadata.
output_saver_key (int) – unique saver key to save metadata.
storegate (Storegate) –
Storegate
class instance to manage data.data_id (str) – data_id of
Storegate
, which is set by set_hps().name (str) – task’s name. If None,
classname
is used alternatively.
- execute()
Execute base task.
Users implement their algorithms.
- finalize()
Finalize base task.
Users implement their algorithms.
- set_hps(params)
Set hyperparameters to this task.
Class attributes (self._XXX) are automatically set based on keys and values of given dict. E.g. dict of {‘key0’: 0, ‘key1’: 1} is given, self._key0 = 0 and self._key1 = 1 are created.
- property name
Return name of task.
- property job_id
Return job_id of task.
- property trial_id
Return trial_id of task.
- property task_id
Return task_id of task.
- property subtask_id
Return subtask_id of task.
- property pool_id
Return pool_id of task.
- property storegate
Return storegate of task.
- property saver
Return saver of task.
- property input_saver_key
Return input_saver_key.
- property output_saver_key
Return output_saver_key.
- get_unique_id()
Returns unique identifier of task.
- class multiml.task.basic.MLBaseTask(phases=None, input_var_names=None, output_var_names=None, save_var_names=None, pred_var_names=None, true_var_names=None, var_names=None, model=None, model_args=None, optimizer=None, optimizer_args=None, scheduler=None, scheduler_args=None, loss=None, loss_args=None, max_patience=None, loss_weights=None, load_weights=False, save_weights=False, metrics=None, num_epochs=10, batch_size=64, num_workers=0, verbose=None, **kwargs)
Bases:
BaseTask
Base task class for (deep) machine learning tasks.
- __init__(phases=None, input_var_names=None, output_var_names=None, save_var_names=None, pred_var_names=None, true_var_names=None, var_names=None, model=None, model_args=None, optimizer=None, optimizer_args=None, scheduler=None, scheduler_args=None, loss=None, loss_args=None, max_patience=None, loss_weights=None, load_weights=False, save_weights=False, metrics=None, num_epochs=10, batch_size=64, num_workers=0, verbose=None, **kwargs)
Initialize ML base task.
This base class will be inherited by deep learning task classes,
KerasBaseTask()
andPytorchBaseTask()
.input_var_names
andoutput_var_names
specify data for model inputs and outputs. Ifinput_var_names
is list, e.g. [‘var0’, ‘var1’], model will receive data with format of [(batch size, k), (batch size, k)], where k is arbitrary shape of each variable. Ifinput_var_names
is tuple, e.g. (‘var0’, ‘var1’), model will receive data with (batch size, M, k), where M is the number of variables. If output_var_names` is list, model must returns list of tensor data for each variable. If output_var_names` is tuple, model must returns a tensor data.pred_var_names
andtrue_var_names
specify data for loss calculations. Ifpred_var_names
is given, only variables indicated bypred_var_names
are selected from model outputs before being passed to loss calculation. Please seeKerasBaseTask()` or ``PytorchBaseTask()
for actual examples.- Parameters:
phases (list) – list to indicates ML phases, e.g. [‘train’, ‘test’]. If None is given, [‘train’, ‘valid’, ‘test’] is set.
input_var_names (str or list or tuple) – input variable names in StoreGate.
output_var_names (str or list or tuple) – output variable names of model.
save_var_names (str or list) – variable names saved to
StoreGate
..pred_var_names (str or list) – prediction variable names passed to loss.
true_var_names (str or list or tuple) – true variable names.
var_names (str) – str of “input output true” variable names for shortcut. This is not valid to specify multiple variables.
model (str or obj) – name of model, or class object of model.
model_args (dict) – args of model, e.g. dict(param0=0, param1=1).
optimizer (str or obj) – name of optimizer, or class object of optimizer
optimizer_args (dict) – args of optimizer.
scheduler (str or obj) – name of scheduler, or class object of scheduler
scheduler_args (dict) – args of scheduler.
loss (str or obj) – name of loss, or class object of loss
loss_args (dict) – args of loss.
max_patience (int) – max number of patience for early stopping.
early_stopping
is enabled if ``max_patience is given.loss_weights (list) – scalar coefficients to weight the loss.
load_weights (bool or str) – user defined algorithms should assume the following behavior. If False, not load model weights. If True, load model weights from default location. If str, load weights from given path.
save_weights (bool or str) – user defined algorithms should assume the following behavior. If False, not save model weights. If True, save model weights to default location. If str, save weights to given path.
metrics (list) – metrics of evaluation.
num_epochs (int) – number of epochs.
batch_size (int or dict) – size of mini batch, you can set different batch_size for test, train, valid.
num_workers (int) – number of workers for dataloaders.
verbose (int) – verbose option for fitting step. If None, it’s set based on logger.MIN_LEVEL
- set_hps(params)
Set hyperparameters to this task.
Class attributes (self._XXX) are automatically set based on keys and values of given dict. Hyperparameters start with ‘model__’, ‘optimizer__’ and ‘loss__’ are considred as args of model, optimizer, loss, respectively. If value of hyperparameters is str and starts with ‘saver__’, value is retrieved from
`Saver`
instance, please see exampels below.- Parameters:
params (dict) – key and value of hyperparameters.
Example
>>> hps_dict = { >>> 'num_epochs': 10, # normal hyperparameter >>> 'optimizer__lr': 0.01 # hyperparameter of optimizer >>> 'saver_hp': 'saver__key__value' # hyperparamer from saver >>> } >>> task.set_hps(hps_dict)
- execute()
Execute a task.
- fit(train_data=None, valid_data=None)
Fit model.
- Parameters:
train_data (ndarray) – training data.
valid_data (ndarray) – validation data.
- predict(data=None, phase=None)
Predict model.
- Parameters:
data (ndarray) – prediction data.
- update(data, phase='auto')
Update data in storegate.
- Parameters:
data (ndarray) – new data.
phase (str) –
train
,valid
,test
,auto
.
- fit_predict(fit_args=None, predict_args=None)
Fit and predict model.
- Parameters:
fit_args (dict) – arbitrary dict passed to
fit()
.predict_args (dict) – arbitrary dict passed to
predict()
.
- Returns:
results of prediction.
- Return type:
ndarray or list
- predict_update(data=None, phase=None)
Predict and update data in StoreGate.
- Parameters:
data (ndarray) – data passed to
predict()
method.
- property phases
Returns ML phases.
- property input_var_names
Returns input_var_names.
- property output_var_names
Returns output_var_names.
- property save_var_names
Returns save_var_names.
- property pred_var_names
Returns pred_var_names.
- property true_var_names
Returns true_var_names.
- property ml
Returns ML data class.
- compile()
Compile model, optimizer and loss.
Compiled objects will be avaialble via
self.ml.model
,self.ml.optimizer
andself.ml.loss
.Examples
>>> # compile all together, >>> self.compile() >>> # which is equivalent to: >>> self.build_model() # set self._model >>> self.compile_model() # set self.ml.model >>> self.compile_optimizer() # set self.ml.optimizer >>> self.compile_loss() # set self.ml.loss
- build_model()
Build model.
- compile_var_names()
Compile var_names.
- compile_model()
Compile model.
- compile_optimizer()
Compile optimizer.
- compile_loss()
Compile loss.
- load_model()
Load pre-trained model path from
Saver
.- Returns:
model path.
- Return type:
str
- dump_model(extra_args=None)
Dump current model to
saver
.- Parameters:
extra_args (dict) – extra metadata to be stored together with model.
- load_metadata()
Load metadata.
- get_input_true_data(phase)
Get input and true data.
- Parameters:
phase (str) – data type (train, valid, test or None).
- Returns:
(input, true) data for model.
- Return type:
tuple
- get_input_var_shapes(phase='train')
Get shape of input_var_names.
- Parameters:
phase (str) – train, valid, test or None.
- Returns:
shape of a variable, or list of shapes
- Return type:
ndarray,shape of lit
- get_metadata(metadata_key)
Returns metadata.
- Parameters:
metadata_key (str) – key of
Saver()
.- Returns:
arbitrary object stored in
Saver
.- Return type:
Obj
- get_pred_index()
Returns prediction index passed to loss calculation.
- Returns:
list of prediction index.
- Return type:
list
- do_train()
Perform train phase or not.
- do_valid()
Perform valid phase or not.
- do_test()
Perform test phase or not.
- show_info()
Print information.
- class multiml.task.basic.MLEnv(model=None, optimizer=None, scheduler=None, loss=None, loss_weights=None, multi_inputs=None, multi_outputs=None, multi_loss=None)
Bases:
object
Data class to store compiled ML objects.
- __init__(model=None, optimizer=None, scheduler=None, loss=None, loss_weights=None, multi_inputs=None, multi_outputs=None, multi_loss=None)
Initialize MLEnv.
- clear()
- property model
Returns model.
- property optimizer
Returns optimizer.
- property scheduler
Returns scheduler.
- property loss
Returns loss.
- property loss_weights
Returns loss_weights.
- property multi_inputs
Returns multi_inputs.
- property multi_outputs
Returns multi_outputs.
- property multi_loss
Returns multi_loss.
- show_info()
Print information.
- validate(phase)
Validate environment for given phase.
- class multiml.task.basic.SkleanPipelineTask(phases=None, input_var_names=None, output_var_names=None, save_var_names=None, pred_var_names=None, true_var_names=None, var_names=None, model=None, model_args=None, optimizer=None, optimizer_args=None, scheduler=None, scheduler_args=None, loss=None, loss_args=None, max_patience=None, loss_weights=None, load_weights=False, save_weights=False, metrics=None, num_epochs=10, batch_size=64, num_workers=0, verbose=None, **kwargs)
Bases:
MLBaseTask
Wrapper task to process sklean object.
- execute()
Execute fit.
- class multiml.task.basic.ModelConnectionTask(subtasks, loss_weights=None, variable_mapping=None, **kwargs)
Bases:
MLBaseTask
Build a single task connecting with multiple tasks.
ModelConnectionTask
connects multiple ML tasks considering the input/output variables and dependencies of the tasks, then builds a single task. ML model of component tasks are trained diferentially, thus each ML model must be implemented by the same deep learning library, i.e. Keras or Pytorch. Each subtask must containinput_var_names
, output_var_names` and true_var_names,loss function,
to compile subtask dependencies and data I/O formats. The following examples shows a workflow and its attributes, which are automatically compiled:
Examples
>>> ''' >>> (input0, input1, input2) >>> | | | >>> [subtask0] | >>> | | >>> (output0) | >>> | | >>> [subtask1]------+ >>> | >>> (output1) >>> ''' >>> >>> input_var_names = ['input0', 'input1', 'input2'] >>> output_var_names = ['output0', 'output1'] >>> input_var_index = [[0, 1], [-1, 2]] >>> output_var_index = [[0], [1]]
Examples
>>> task = ModelConnectionTask(subtasks=[your_subtask0, your_subtask2], >>> optimizer='SGD') >>> task.execute()
- __init__(subtasks, loss_weights=None, variable_mapping=None, **kwargs)
Constructor of ModelConnectionTask.
- Parameters:
subtasks (list) – list must contains ordered instance objects inherited from
MLBaseTask
.loss_weights (list or dict or str) – list of loss weights for each task.
last_loss
andflat_loss
are also allowed.variable_mapping (list(str, str)) – Input variables are replaced following this list. Used for the case that the input variables change from pre-training to main-training (with model connecting).
**kwargs – Arbitrary keyword arguments passed to
MLBaseTask
.
- compile()
Compile subtasks and this task.
- compile_loss()
Compile loss and loss_weights.
Loss functions are retrieved from subtasks, thus each subtask must contain
loss
.
- compile_var_names()
Compile subtask dependencies and I/O variables.
- set_output_var_index()
Set output_var_names and output_var_index.
- set_input_var_index()
Set input_var_names and input_var_index.