multiml.task.pytorch package

Subpackages

Submodules

Module contents

class multiml.task.pytorch.ModelConnectionTask(subtasks, loss_weights=None, variable_mapping=None, **kwargs)

Bases: ModelConnectionTask, PytorchBaseTask

Pytorch implementation of ModelConnectionTask.

build_model()

Build model.

class multiml.task.pytorch.PytorchChoiceBlockTask(subtasks, **kwargs)

Bases: PytorchBaseTask

__init__(subtasks, **kwargs)
Parameters:
  • subtasks (list) – list of task instances.

  • **kwargs – Arbitrary keyword arguments.

build_model()

Build model.

property choice
get_input_true_data(phase)

Get input and true data.

Parameters:

phase (str) – data type (train, valid, test or None).

Returns:

(input, true) data for model.

Return type:

tuple

get_storegate_dataset(phase)

Returns storegate dataset.

get_submodel_names()
get_inputs()
get_submodel(i_models)
class multiml.task.pytorch.PytorchBaseTask(device='cpu', gpu_ids=None, torchinfo=False, amp=False, torch_compile=False, dataset_args=None, dataloader_args=None, batch_sampler=False, metric_sample=1, **kwargs)

Bases: MLBaseTask

Base task for PyTorch model.

Examples

>>> # your pytorch model
>>> class MyPytorchModel(nn.Module):
>>>     def __init__(self, inputs=2, outputs=2):
>>>         super(MyPytorchModel, self).__init__()
>>>
>>>         self.fc1 = nn.Linear(inputs, outputs)
>>>         self.relu = nn.ReLU()
>>>
>>>     def forward(self, x):
>>>         return self.relu(self.fc1(x))
>>>
>>> # create task instance
>>> task = PytorchBaseTask(storegate=storegate,
>>>                        model=MyPytorchModel,
>>>                        input_var_names=('x0', 'x1'),
>>>                        output_var_names='outputs-pytorch',
>>>                        true_var_names='labels',
>>>                        optimizer='SGD',
>>>                        optimizer_args=dict(lr=0.1),
>>>                        loss='CrossEntropyLoss')
>>> task.set_hps({'num_epochs': 5})
>>> task.execute()
>>> task.finalize()
__init__(device='cpu', gpu_ids=None, torchinfo=False, amp=False, torch_compile=False, dataset_args=None, dataloader_args=None, batch_sampler=False, metric_sample=1, **kwargs)

Initialize the pytorch base task.

Parameters:
  • device (str or obj) – pytorch device, e.g. ‘cpu’, ‘cuda’.

  • gpu_ids (list) – GPU identifiers, e.g. [0, 1, 2]. data_parallel mode is enabled if gpu_ids is given.

  • torchinfo (bool) – show torchinfo summary after model compile.

  • amp (bool) – (expert option) enable amp mode.

  • torch_compile (bool) – (expert option) enable torch.compile.

  • dataset_args (dict) – args passed to default DataSet creation.

  • dataloader_args (dict) – args passed to default DataLoader creation.

  • batch_sampler (bool) – user batch_sampler or not.

  • metric_sample (float or int) – sampling ratio for running metrics.

compile()

Compile pytorch ml objects.

compile_model()

Compile pytorch model.

Compile model based on self._model type, which is usually set by __init__() or build_model() method. Compiled model is set to ``self.ml.model and moved to self._device.

compile_optimizer()

Compile pytorch optimizer and scheduler.

Compile optimizer based on self._optimizer type, which is usually set by __init__() method. Compiled optimizer is set to self.ml.optimizer.

compile_loss()

Compile pytorch loss.

Compile loss based on self._loss type, which is usually set by __init__() method. Compiled loss is set to self.ml.loss.

compile_device()

Compile device.

This method is valid only for multiprocessing mode so far. Devices are set based on pool_id.

load_model()

Load pre-trained pytorch model weights.

If model_path is given with : delimiter, e.g. my_model_path:features, only partial weights are loaded.

dump_model(extra_args=None)

Dump current pytorch model.

prepare_dataloaders(phases=None, dataset_args=None, dataloader_args=None, **kwargs)

Prepare dataloaders for all phases.

Parameters:
  • dataset_args (dict) – args passed to get_dataset. Dict format must be {‘train’: {…}, ‘valid’: {…}, ‘test’: {…}}.

  • dataloader_args (dict) – args passed to prepare_dataloader. Dict format must be {‘train’: {…}, ‘valid’: {…}, ‘test’: {…}}.

  • kwargs (dict) – arbitrary args.

prepare_dataloader(data=None, phase=None, dataset_args=None, dataloader_args=None)

Prepare dataloader.

If inputs are given, tensor_dataset() is called. If inputs are None, storegate_dataset with given phase is called.

Parameters:
  • data (ndarray) – data passed to tensor_dataset().

  • phase (str) – phase passed to storegate_dataset().

  • dataset_args (dict) – preload (bool): If True, all data are preloaded in the initialization of Dataset class. callbacks (list): list of data augmentation functions.

  • dataloader_args (dict) – pin_memory (bool): pin_memory for DataLoader. batch (bool): batch sampler or not..

Returns:

Pytorch dataloader instance.

Return type:

DataLoader

fit(train_data=None, valid_data=None, dataloaders=None, valid_step=1, dump=False)

Train model over epoch.

This methods train and valid model over epochs by calling step_epoch() method. train and valid need to be provided by train_data and valid_data options, or dataloaders option.

Parameters:
  • train_data (ndarray) – If train_data is given, data are converted to TendorDataset and set to dataloaders['train'].

  • valid_data (ndarray) – If valid_data is given, data are converted to TendorDataset and set to dataloaders['valid'].

  • dataloaders (dict) – dict of dataloaders, dict(train=xxx, valid=yyy).

  • valid_step (int) – step to process validation.

  • dump (bool) – If True, results are dumped together with model.

Returns:

history data of train and valid.

Return type:

list

predict(data=None, dataloader=None, phase=None, label=False)

Predict model.

This method predicts and returns results. Data need to be provided by data option, or setting property of dataloaders directory.

Parameters:
  • data (ndarray) – If data is given, data are converted to TendorDataset and set to dataloaders['test'].

  • dataloader (obj) – dataloader instance.

  • phase (str) – ‘all’ or ‘train’ or ‘valid’ or ‘test’ to specify dataloaders.

  • label (bool) – If True, returns metric results based on labels.

Returns:

results of prediction.

Return type:

ndarray or list

step_epoch(epoch, phase, dataloader, label=True)

Process model for given epoch and phase.

ml.model, ml.optimizer and ml.loss need to be set before calling this method, please see compile() method.

Parameters:
  • epoch (int) – epoch numer.

  • phase (str) – train mode or valid mode.

  • dataloader (obj) – dataloader instance.

  • label (bool) – If True, returns metric results based on labels.

Returns:

dict of result.

Return type:

dict

step_batch(data, phase, label=True)

Process batch data and update weights.

Parameters:
  • data (obj) – inputs and labels data.

  • phase (str) – train mode or valid mode or test mode.

  • label (bool) – If True, returns metric results based on labels.

Returns:

dict of result.

Return type:

dict

step_model(inputs)

Process model.

Parameters:

inputs (Tensor or list) – inputs data passed to model.

Returns:

outputs of model.

Return type:

Tensor or list

step_loss(outputs, labels)

Process loss function.

Parameters:
  • outputs (Tensor or list) – predicted data by model.

  • labels (Tensor or list) – true data.

Returns:

result of loss and subloss.

Return type:

dict

step_optimizer(loss)

Process optimizer.

Parameters:

loss (obf) – loss value.

get_dataset(data=None, phase=None, preload=False, callbacks=None)

Returns dataset from given ndarray data.

static get_tensor_dataset(data, callbacks=None)

Returns tensor dataset from given ndarray data.

get_storegate_dataset(phase, preload=False, callbacks=None)

Returns storegate dataset.

get_batch_sampler(phase, dataset)

Returns batch sampler.

add_device(data, device)

Add data to device.

fix_submodule(target)

Fix given parameters of model.

class multiml.task.pytorch.PytorchDDPTask(ddp=True, addr='localhost', port='12355', backend='nccl', find_unused_parameters=False, **kwargs)

Bases: PytorchBaseTask

Distributed data parallel (DDP) task for PyTorch model.

__init__(ddp=True, addr='localhost', port='12355', backend='nccl', find_unused_parameters=False, **kwargs)

Initialize the pytorch DDP task.

compile_model(rank=None, world_size=None)

Build model.

compile_device()

Compile device.

dump_model(extra_args=None)

Dump current pytorch model.

prepare_dataloader(rank, world_size, data=None, phase=None, dataset_args=None, dataloader_args=None)

Prepare dataloader.

get_distributed_sampler(phase, dataset, rank, world_size, batch=False)

Get batch sampler.

fix_submodule(target)

Fix given parameters of model.

execute()

Execute the pytorch DDP task.

Multi processes are launched

abstract execute_mp(rank=None, world_size=None)

User defined algorithms.

Examples

>>> setup(rank, world_size)
>>> # your algorithms
>>> # ...
>>> cleanup()
setup(rank, world_size)

Setup multi processing.

cleanup()

Cleanup multi processing.

class multiml.task.pytorch.PytorchClassificationTask(device='cpu', gpu_ids=None, torchinfo=False, amp=False, torch_compile=False, dataset_args=None, dataloader_args=None, batch_sampler=False, metric_sample=1, **kwargs)

Bases: PytorchBaseTask

Pytorch task for classification.

predict(**kwargs)

Predict model.

This method predicts and returns results. Data need to be provided by data option, or setting property of dataloaders directory.

Parameters:
  • data (ndarray) – If data is given, data are converted to TendorDataset and set to dataloaders['test'].

  • dataloader (obj) – dataloader instance.

  • phase (str) – ‘all’ or ‘train’ or ‘valid’ or ‘test’ to specify dataloaders.

  • label (bool) – If True, returns metric results based on labels.

Returns:

results of prediction.

Return type:

ndarray or list

class multiml.task.pytorch.PytorchASNGNASTask(asng_args, **kwargs)

Bases: ModelConnectionTask, PytorchBaseTask

__init__(asng_args, **kwargs)
Parameters:
  • subtasks (list) – list of task instances.

  • **kwargs – Arbitrary keyword arguments.

build_model()

Build model.

set_most_likely()
best_model()
get_most_likely()
get_thetas()
fit(train_data=None, valid_data=None, dataloaders=None, valid_step=1, sampler=None, rank=None, **kwargs)

Train model over epoch.

This methods train and valid model over epochs by calling train_model() method. train and valid need to be provided by train_data and valid_data options, or dataloaders option.

Parameters:
  • train_data (ndarray) – If train_data is given, data are converted to TendorDataset and set to dataloaders['train'].

  • valid_data (ndarray) – If valid_data is given, data are converted to TendorDataset and set to dataloaders['valid'].

  • dataloaders (dict) – dict of dataloaders, dict(train=xxx, valid=yyy).

  • valid_step (int) – step to process validation.

  • sampler (obf) – sampler to execute set_epoch().

  • kwargs (dict) – arbitrary args passed to train_model().

Returns:

history data of train and valid.

Return type:

list

step_epoch(epoch, phase, dataloader, label)

Process model for given epoch and phase.

ml.model, ml.optimizer and ml.loss need to be set before calling this method, please see compile() method.

Parameters:
  • epoch (int) – epoch numer.

  • phase (str) – train mode or valid mode.

  • dataloader (obj) – dataloader instance.

  • label (bool) – If True, returns metric results based on labels.

Returns:

dict of result.

Return type:

dict

finalize()

Finalize base task.

Users implement their algorithms.

get_submodel_names()
get_submodel(i_models)
asng()
class multiml.task.pytorch.PytorchASNGNASBlockTask(subtasks, job_id, **kwargs)

Bases: PytorchBaseTask

__init__(subtasks, job_id, **kwargs)
Parameters:
  • subtasks (list) – list of task instances.

  • **kwargs – Arbitrary keyword arguments.

build_model()

Build model.

get_input_true_data(phase)

Get input and true data.

Parameters:

phase (str) – data type (train, valid, test or None).

Returns:

(input, true) data for model.

Return type:

tuple

get_storegate_dataset(phase)

Returns storegate dataset.

get_submodel_names()
get_inputs()
get_submodel(i_models)