multiml.task.pytorch.pytorch_base module
PytorchBaseTask module.
- class multiml.task.pytorch.pytorch_base.PytorchBaseTask(device='cpu', gpu_ids=None, torchinfo=False, amp=False, torch_compile=False, dataset_args=None, dataloader_args=None, batch_sampler=False, metric_sample=1, **kwargs)
Bases:
MLBaseTask
Base task for PyTorch model.
Examples
>>> # your pytorch model >>> class MyPytorchModel(nn.Module): >>> def __init__(self, inputs=2, outputs=2): >>> super(MyPytorchModel, self).__init__() >>> >>> self.fc1 = nn.Linear(inputs, outputs) >>> self.relu = nn.ReLU() >>> >>> def forward(self, x): >>> return self.relu(self.fc1(x)) >>> >>> # create task instance >>> task = PytorchBaseTask(storegate=storegate, >>> model=MyPytorchModel, >>> input_var_names=('x0', 'x1'), >>> output_var_names='outputs-pytorch', >>> true_var_names='labels', >>> optimizer='SGD', >>> optimizer_args=dict(lr=0.1), >>> loss='CrossEntropyLoss') >>> task.set_hps({'num_epochs': 5}) >>> task.execute() >>> task.finalize()
- __init__(device='cpu', gpu_ids=None, torchinfo=False, amp=False, torch_compile=False, dataset_args=None, dataloader_args=None, batch_sampler=False, metric_sample=1, **kwargs)
Initialize the pytorch base task.
- Parameters:
device (str or obj) – pytorch device, e.g. ‘cpu’, ‘cuda’.
gpu_ids (list) – GPU identifiers, e.g. [0, 1, 2].
data_parallel
mode is enabled ifgpu_ids
is given.torchinfo (bool) – show torchinfo summary after model compile.
amp (bool) – (expert option) enable amp mode.
torch_compile (bool) – (expert option) enable torch.compile.
dataset_args (dict) – args passed to default DataSet creation.
dataloader_args (dict) – args passed to default DataLoader creation.
batch_sampler (bool) – user batch_sampler or not.
metric_sample (float or int) – sampling ratio for running metrics.
- compile()
Compile pytorch ml objects.
- compile_model()
Compile pytorch model.
Compile model based on self._model type, which is usually set by
__init__()
orbuild_model() method. Compiled model is set to ``self.ml.model
and moved toself._device
.
- compile_optimizer()
Compile pytorch optimizer and scheduler.
Compile optimizer based on self._optimizer type, which is usually set by
__init__()
method. Compiled optimizer is set toself.ml.optimizer
.
- compile_loss()
Compile pytorch loss.
Compile loss based on self._loss type, which is usually set by
__init__()
method. Compiled loss is set toself.ml.loss
.
- compile_device()
Compile device.
This method is valid only for multiprocessing mode so far. Devices are set based on
pool_id
.
- load_model()
Load pre-trained pytorch model weights.
If model_path is given with
:
delimiter, e.g. my_model_path:features, only partial weights are loaded.
- dump_model(extra_args=None)
Dump current pytorch model.
- prepare_dataloaders(phases=None, dataset_args=None, dataloader_args=None, **kwargs)
Prepare dataloaders for all phases.
- Parameters:
dataset_args (dict) – args passed to get_dataset. Dict format must be {‘train’: {…}, ‘valid’: {…}, ‘test’: {…}}.
dataloader_args (dict) – args passed to prepare_dataloader. Dict format must be {‘train’: {…}, ‘valid’: {…}, ‘test’: {…}}.
kwargs (dict) – arbitrary args.
- prepare_dataloader(data=None, phase=None, dataset_args=None, dataloader_args=None)
Prepare dataloader.
If inputs are given, tensor_dataset() is called. If inputs are None, storegate_dataset with given phase is called.
- Parameters:
data (ndarray) – data passed to tensor_dataset().
phase (str) – phase passed to storegate_dataset().
dataset_args (dict) – preload (bool): If True, all data are preloaded in the initialization of Dataset class. callbacks (list): list of data augmentation functions.
dataloader_args (dict) – pin_memory (bool): pin_memory for DataLoader. batch (bool): batch sampler or not..
- Returns:
Pytorch dataloader instance.
- Return type:
DataLoader
- fit(train_data=None, valid_data=None, dataloaders=None, valid_step=1, dump=False)
Train model over epoch.
This methods train and valid model over epochs by calling
step_epoch()
method. train and valid need to be provided bytrain_data
andvalid_data
options, ordataloaders
option.- Parameters:
train_data (ndarray) – If
train_data
is given, data are converted toTendorDataset
and set todataloaders['train']
.valid_data (ndarray) – If
valid_data
is given, data are converted toTendorDataset
and set todataloaders['valid']
.dataloaders (dict) – dict of dataloaders, dict(train=xxx, valid=yyy).
valid_step (int) – step to process validation.
dump (bool) – If True, results are dumped together with model.
- Returns:
history data of train and valid.
- Return type:
list
- predict(data=None, dataloader=None, phase=None, label=False)
Predict model.
This method predicts and returns results. Data need to be provided by
data
option, or setting property ofdataloaders
directory.- Parameters:
data (ndarray) – If
data
is given, data are converted toTendorDataset
and set todataloaders['test']
.dataloader (obj) – dataloader instance.
phase (str) – ‘all’ or ‘train’ or ‘valid’ or ‘test’ to specify dataloaders.
label (bool) – If True, returns metric results based on labels.
- Returns:
results of prediction.
- Return type:
ndarray or list
- step_epoch(epoch, phase, dataloader, label=True)
Process model for given epoch and phase.
ml.model
,ml.optimizer
andml.loss
need to be set before calling this method, please seecompile()
method.- Parameters:
epoch (int) – epoch numer.
phase (str) – train mode or valid mode.
dataloader (obj) – dataloader instance.
label (bool) – If True, returns metric results based on labels.
- Returns:
dict of result.
- Return type:
dict
- step_batch(data, phase, label=True)
Process batch data and update weights.
- Parameters:
data (obj) – inputs and labels data.
phase (str) – train mode or valid mode or test mode.
label (bool) – If True, returns metric results based on labels.
- Returns:
dict of result.
- Return type:
dict
- step_model(inputs)
Process model.
- Parameters:
inputs (Tensor or list) – inputs data passed to model.
- Returns:
outputs of model.
- Return type:
Tensor or list
- step_loss(outputs, labels)
Process loss function.
- Parameters:
outputs (Tensor or list) – predicted data by model.
labels (Tensor or list) – true data.
- Returns:
result of loss and subloss.
- Return type:
dict
- step_optimizer(loss)
Process optimizer.
- Parameters:
loss (obf) – loss value.
- get_dataset(data=None, phase=None, preload=False, callbacks=None)
Returns dataset from given ndarray data.
- static get_tensor_dataset(data, callbacks=None)
Returns tensor dataset from given ndarray data.
- get_storegate_dataset(phase, preload=False, callbacks=None)
Returns storegate dataset.
- get_batch_sampler(phase, dataset)
Returns batch sampler.
- add_device(data, device)
Add data to device.
- fix_submodule(target)
Fix given parameters of model.