multiml package
Subpackages
- multiml.agent package
- multiml.database package
- multiml.task package
- Subpackages
- Submodules
- Module contents
Task
BaseTask
MLBaseTask
MLBaseTask.__init__()
MLBaseTask.set_hps()
MLBaseTask.execute()
MLBaseTask.fit()
MLBaseTask.predict()
MLBaseTask.update()
MLBaseTask.fit_predict()
MLBaseTask.predict_update()
MLBaseTask.phases
MLBaseTask.input_var_names
MLBaseTask.output_var_names
MLBaseTask.save_var_names
MLBaseTask.pred_var_names
MLBaseTask.true_var_names
MLBaseTask.ml
MLBaseTask.compile()
MLBaseTask.build_model()
MLBaseTask.compile_var_names()
MLBaseTask.compile_model()
MLBaseTask.compile_optimizer()
MLBaseTask.compile_loss()
MLBaseTask.load_model()
MLBaseTask.dump_model()
MLBaseTask.load_metadata()
MLBaseTask.get_input_true_data()
MLBaseTask.get_input_var_shapes()
MLBaseTask.get_metadata()
MLBaseTask.get_pred_index()
MLBaseTask.do_train()
MLBaseTask.do_valid()
MLBaseTask.do_test()
MLBaseTask.show_info()
SkleanPipelineTask
Submodules
- multiml.const module
- multiml.hyperparameter module
Hyperparameter
Hyperparameters
Hyperparameters.__init__()
Hyperparameters.__len__()
Hyperparameters.__getitem__()
Hyperparameters.__contains__()
Hyperparameters.step()
Hyperparameters.add_hp_from_dict()
Hyperparameters.add_hp()
Hyperparameters.set_min_hps()
Hyperparameters.set_max_hps()
Hyperparameters.get_current_hps()
Hyperparameters.get_grid_hps()
Hyperparameters.get_hp_names()
- multiml.logger module
- multiml.saver module
- multiml.storegate module
StoreGate
StoreGate.__init__()
StoreGate.__getitem__()
StoreGate.__setitem__()
StoreGate.__delitem__()
StoreGate.__len__()
StoreGate.__contains__()
StoreGate.data_id
StoreGate.set_data_id()
StoreGate.backend
StoreGate.add_data()
StoreGate.update_data()
StoreGate.get_data()
StoreGate.delete_data()
StoreGate.clear_data()
StoreGate.create_empty()
StoreGate.get_data_ids()
StoreGate.get_var_names()
StoreGate.get_var_shapes()
StoreGate.get_metadata()
StoreGate.astype()
StoreGate.onehot()
StoreGate.argmax()
StoreGate.shuffle()
StoreGate.set_mode()
StoreGate.to_memory()
StoreGate.to_storage()
StoreGate.compile()
StoreGate.show_info()
- multiml.task_scheduler module
tasktuple
subtasktuple
tasktuple
subtasktuple
TaskScheduler
TaskScheduler.__init__()
TaskScheduler.__len__()
TaskScheduler.__getitem__()
TaskScheduler.add_task()
TaskScheduler.add_ordered_tasks()
TaskScheduler.add_ordered_subtasks()
TaskScheduler.add_subtask()
TaskScheduler.get_subtasks()
TaskScheduler.get_subtask_ids()
TaskScheduler.get_subtask()
TaskScheduler.get_parents_task_ids()
TaskScheduler.get_children_task_ids()
TaskScheduler.get_sorted_task_ids()
TaskScheduler.get_subtasks_with_hps()
TaskScheduler.get_all_subtasks_with_hps()
TaskScheduler.get_subtasks_pipeline()
TaskScheduler.show_info()
Module contents
- class multiml.StoreGate(backend='numpy', backend_args=None, data_id=None)
Bases:
object
Data management class for multiml execution.
StoreGate provides common interfaces to manage data between multiml agents and tasks with features of:
Different backends are supported (numpy or zarr, and hybrid of them),
Data are split into train, valid and test phases for ML,
Data are retrieved by
var_names
,phase
andindex
options.
Each dataset in the storegate is keyed by unique
data_id
. All data in the dataset are identified byvar_names
(column names). The number of samples in a phase is assumed to be the same for all variables in multiml agents and tasks. Thecompile()
method ensures the validity of the dataset.Examples
>>> from multiml.storegate import StoreGate >>> >>> # User defined parameters >>> var_names = ['var0', 'var1', 'var2'] >>> data = [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11]] >>> phase = (0.5, 0.25, 0.25) # fraction of train, valid, test >>> >>> # Add data to storegate >>> storegate = StoreGate(backend = 'numpy', data_id='test_id') >>> storegate.add_data(var_names=var_names, data=data, phase=phase) >>> >>> # Get data from storegate >>> storegate.get_data(var_names=var_names, phase='train') >>> storegate['train'][var_names][0]
- __init__(backend='numpy', backend_args=None, data_id=None)
Initialize the storegate and the backend architecture.
Initialize storegate and the backend architecture with its options.
numpy
backend manages data in memory,zarr
backend reads and writes data to storage of given path.hybrid
backend is combination ofnumpy
andzarr
backends, which allows to move data between memory and storage.- Parameters:
backend (str) – numpy (on memory), zarr (on storage), hybrid.
backend_args (dict) – backend options, e.g. path to zarr database. Please see
ZarrDatabase
andHybridDatabase
classes for details.data_id (str) – set default
data_id
if given.
- __getitem__(item)
Retrieve data by python getitem syntax.
Retrieve data by python getitem syntax, i.e.
storegate[phase][var_names][index]
.data_id
,phase
,var_names
andindex
need to be given to return selected data. If all parameters are set, selected data are returned. Otherwise, self instance class with given parameters is returned.- Parameters:
item (str or list or int or slice) – If item is str of train or valid or test,
phase
is set. If item is the other str or list of strs,var_names
is set. If item is int or slice, data with index (slice) are returned.- Returns:
please see description above.
- Return type:
self or ndarray
Example
>>> # get all train data >>> storegate['train']['var0'][:] >>> # slice train data by index >>> storegate['train']['var0'][0:2] >>> # loop by index >>> for data in storegate['train']['var0']: >>> print(data)
- __setitem__(item, data)
Update data by python setitem syntax.
Update data by python setitem syntax, i.e.
storegate[phase][var_names][index] = data
.data_id
,phase
,var_names
andindex
need to be given to update data.- Parameters:
item (int or slice) – Index of data to be updated.
data (list or ndarray) – new data.
Example
>>> # update all train data >>> storegate['train']['var0'][:] = data >>> # update train data by index >>> storegate['train']['var0'][0:2] = data[0:2]
- __delitem__(item)
Delete data by python delitem syntax.
Delete data by python setitem syntax, i.e.
del storegate[phase][var_names]
.data_id
,phase
,var_names
need to be given to delete data.- Parameters:
item (str or list) –
var_names
to be deleted.
Example
>>> # delete var0 from train phase >>> del storegate['train']['var0']
- __len__()
Returns number of samples for given
phase
anddata_id
.- Returns:
the number of samples in given conditions.
- Return type:
int
Examples
>>> len(storegate['train']) >>> len(storegate['test'])
- __contains__(item)
Check if given
var_name
is available in storegate.- Parameters:
item (str) – name of variables.
- Returns:
If
item
exists in given condisons or not.- Return type:
bool
Examples
>>> 'var0' in storegate['train'] >>> 'var1' in storegate['test']
- property data_id
Returns the current
data_id
.- Returns:
the current
data_id
.- Return type:
str
- set_data_id(data_id)
Set the default
data_id
and initialize the backend.If the default
data_id
is set, all methods defined in storegate, e.g.add_data()
use the defaultdata_id
to manage data.- Parameters:
data_id (str) – the default
data_id
.
- property backend
Return the current backend of storegate.
- Returns:
numpy or zarr or hybrid.
- Return type:
str
- add_data(var_names, data, phase='train', shuffle=False, do_compile=False)
Add data to the storegate with given options.
If
var_names
already exists in givendata_id
andphase
, the data are appended, otherwisevar_names
are newly registered and the data are stored.- Parameters:
var_names (str or list) – list of variable names, e.g. [‘var0’, ‘var1’, ‘var2’]. Single string, e.g. ‘var0’, is also allowed to add only one variable.
data (list or ndarray) – If
var_names
is single string, data shape must be (N, k) where N is the number of samples and k is an arbitrary shape of each data. Ifvar_names
is a tuple, data shape must be (N, M, k), where M is the number of variables. Ifvar_names
is a list, data mustbe a list of [(N, k), (N, k), (N, k)…], where diffeernt shapes of k are allowed.phase (str or tuple or list) – all (auto), train, valid, test or tuple. all divides the data to train, valid and test automatically, but only after the
compile
. If tuple (x, y, z) is given, the data are divided to train, valid and test. If contents of tuple is float and sum of the tuple is 1.0, the data are split to phases with fractions of (x, y, z) respectively. If contents of tuple is int, the data are split by given indexes.shuffle (bool or int) – data are shuffled if True or int. If int is given, it is used as random seed of
np.random
.do_compile (bool) – do compile if True after adding data.
Examples
>>> # add data to train phase >>> storegate.add_data(var_names='var0', data=np.array([0, 1, 2]), phase='train')
- update_data(var_names, data, phase='train', index=-1, do_compile=True)
Update data in storegate with given options.
Update (replace) data in the storegate. If
var_names
does not exist in givendata_id
andphase
, data are newly added. Otherwise, selected data are replaced with given data.- Parameters:
var_names (str or list(srt)) – see
add_data()
method.data (list or ndarray) – see
add_data()
method.phase (str or tuple) – see
add_data()
method.index (int or tuple) – If
index
is -1 (default), all data are updated for given options. Ifindex
is int, only the data withindex
is updated. If index is (x, y), data in the range (x, y) are updated.do_compile (bool) – do compile if True after updating data.
Examples
>>> # update data of train phase >>> storegate.update_data(var_names='var0', data=[1], phase='train', index=1)
- get_data(var_names, phase='train', index=-1)
Retrieve data from storegate with given options.
Get data from the storegate. Python getitem sytax is also supported, please see
__getitem__
method.- Parameters:
var_names (tuple or list or str) – If a tuple of variable names is given, e.g. (‘var0’, ‘var1’, ‘var2’), data with ndarray format are returned. Single string, e.g. ‘var0’, is also allowed. Please see the matrix below for shape of data. If list of variable names is given, e.g. [‘var0’, ‘var1’, ‘var2’], list of ndarray data for each variable are returned.
phase (str or None) – all, train, valid, test or None. If
phase
is all or None, data in all phases are returned, but it is allowed only after thecompile
.index (int or tuple) – see update_data method.
- Returns:
selected data by given options.
- Return type:
ndarray or list
- Shape of returns:
>>> # index var_names | single var | tuple vars >>> # ------------------------------------------------------------ >>> # single index (>=0) | k | (M, k) >>> # otherwise | (N, k) | (N, M, k) >>> # ------------------------------------------------------------ >>> # k = arbitrary shape of data >>> # M = number of var_names >>> # N = number of samples
Examples
>>> # get data by var_names, phase and index >>> storegate.get_data(var_names='var0', phase='train', index=1)
- delete_data(var_names, phase='train', do_compile=True)
Delete data associated with var_names.
All data associated with
var_names
are deleted. Partial deletions with index is not supported for now.- Parameters:
var_names (str or list) – see
add_data()
method.phase (str) – see
update_data()
method.do_compile (bool) – do compile if True after deletion.
Examples
>>> # delete data associated with var_names >>> storegate.get_data(var_names='var0', phase='train')
- clear_data()
Delete all data in the current data_id and backend
- create_empty(var_names, shape, phase='train', dtype='f4')
Create empty data in the current data_id and backend.
- Parameters:
var_names (str or list) – see
add_data()
method.shape (tuple) – shape of empty data.
phase (str) – see
update_data()
method.dtype (str) – dtype of empty data. Default float32.
- get_data_ids()
Returns registered data_ids in the backend.
- Returns:
list of registered
data_id
.- Return type:
list
- get_var_names(phase='train')
Returns registered var_names for given phase.
- Parameters:
phase (str) – train or valid or test.
- Returns:
list of variable names.
- Return type:
list
- get_var_shapes(var_names, phase='train')
Returns shapes of variables for given phase.
- Parameters:
var_names (str or list) – variable names.
phase (str) – train or valid or test.
- Returns:
shape of a variable, or list of shapes.
- Return type:
ndarray.shape or list
- get_metadata()
Returns a dict of metadata.
The metadata is available only after compile.
- Returns:
dict of metadata. Please see below for contents.
- Return type:
dict
- Metadata contents:
>>> { >>> 'compiled': 'compiled or not', >>> 'total_events': 'total events, sum of each phase', >>> 'sizes': { >>> 'train': 'total events of train phase', >>> 'valid': 'total events of valid phase', >>> 'test': 'total events of test phase', >>> 'all': 'total events', >>> } >>> 'valid_phases': 'phases containing events' >>> }
- astype(var_names, dtype, phase='train')
Convert data type to given dtype (operation is limited by memory)
- Parameters:
var_names (str or list) – see
add_data()
method.dtype (numpy.dtype) – dtypes of numpy. Please see numpy documents.
phase (str) – all, train, valid, test.
- onehot(var_names, num_classes, phase='train')
Convert data to onehot vectors (operation is limited by memory)
- Parameters:
var_names (str or list) – see
add_data()
method.num_classes (int) – the number of classes.
phase (str) – all, train, valid, test.
- argmax(var_names, axis, phase='train')
Convert data to argmax (operation is limited by memory)
- Parameters:
var_names (str or list) – see
add_data()
method.axis (int) – specifies axis.
phase (str) – all, train, valid, test.
- shuffle(phase='all', seed=0)
Shuffle data in given phase.
- Parameters:
phase (str) – all, train, valid, test.
seed (int) – seed of numpy.random
- set_mode(mode)
Set backend mode of hybrid architecture.
This method is valid for only hybrid database. If
mode
is numpy, basically data will be written in memory, andmode
is zarr, data will be written to storage.- Parameters:
mode (str) – numpy or zarr.
- to_memory(var_names, phase='train', output_var_names=None, callback=None)
Move data from storage to memory.
This method is valid for only hybrid backend. This should be effective to reduce data I/O impacts.
- Parameters:
var_names (str or list) – see
add_data()
method.phase (str) – all, train, valid, test.
output_var_names (str or list) – new var_names in numpy mode.
callback (obj) – callback function, which receives
var_names
anddata
and returns newvar_names
anddata
.
- to_storage(var_names, phase='train', output_var_names=None, callback=None)
Move data from storage to memory.
This method is valid for only hybrid backend. This is useful if data are large, then data need to be escaped to storage.
- Parameters:
var_names (str or list) – see
add_data()
method.phase (str) – all, train, valid, test.
output_var_names (str or list) – new var_names in zarr mode.
callback (obj) – callback function, which receives
var_names
anddata
and returns newvar_names
anddata
.
- compile(reset=False, show_info=False)
Check if registered samples are valid.
It is assumed that the
compile
is always called afteradd_data()
orupdate_data()
methods to validate registered data.- Parameters:
reset (bool) – special variable
active
is (re)set if True,active
variable is used to indicate that samples should be used or not. e.g. in the metric calculation.show_info (bool) – show information after compile.
- show_info()
Show information currently registered in storegate.
- class multiml.Saver(save_dir=None, serial_id=None, mode='zarr', recreate=False)
Bases:
object
Miscellaneous object management class.
Dictionary to save miscellaneous objects, and provides utility methods to manage ML metadata. There are two type of backends, zarr and dict, to store objects. zarr mode stores persistent objects, and dict mode stores temporary objects.
Examples
>>> from multiml import Saver >>> saver = Saver() >>> print(saver.save_dir) >>> saver['key0'] = obj0 >>> saver['key1'] = obj1
- __init__(save_dir=None, serial_id=None, mode='zarr', recreate=False)
Initialize Saver and create the base directory.
- Parameters:
save_dir (str) – directory path to save objects. If
None
is given, a temporary directory is created automatically bytempfile.mkdtemp()
method.serial_id (int) – suffix of
save_dir
, i.e. save_dir.*serial_id*. IfNone
is given,serial_id
is incremented automatically based on existence of the directory.mode (str) – zarr or dict for default metadata management.
recreate (bool) – recreate zarr database if True.
- __len__()
Returns the number of stored objects in zarr and dict.
- Returns:
the total number of stored objects in zarr and dict.
- Return type:
int
- __setitem__(key, obj)
Set key and store object to the default backend.
key
andobj
are stored to the default backend, zarr or dict.- Parameters:
key (str) – unique identifier of given object.
obj (obj) – arbitrary object to be stored.
- __getitem__(key)
Returns object for given key.
key
is searched from the both zarr and dict backends regardless of the default backend mode.- Parameters:
key (str) – unique identifier to retrieve object.
- Returns:
arbitrary object..
- Return type:
obj
- __delitem__(key)
Delete key and object from backends.
- Parameters:
key (str) – unique identifier to be deleted.
- init_zarr(recreate=False)
Initialize zarr database and confirm connection.
- Parameters:
recreate (bool) – If
recreate
is True, existing database is overwritten by an empty database.
- set_mode(mode)
Set default database (backend) mode.
- Parameters:
mode (str) – zarr or dict.
- keys(mode=None)
Return registered keys in backends.
- Parameters:
mode (str) – If zarr is given, keys in zarr database are returned. If dict is given, keys in dict database are returned. If None (default) all keys stored in the both backends are returned.
- Returns:
list of registered keys.
- Return type:
list
- add(key, obj, mode=None, check=False)
Add object to given backend by key.
If given
key
already exists in the given backend, object is overwritten. If thekey
already exists in the other backend, raises error, which can be avoided by settingcheck
= False. Ifmode
is None, the default backend is used to store object.- Parameters:
key (str) – unique identifier of given object.
obj (obj) – arbitrary object to be stored.
mode (str) – zarr or dict to specify the backend database.
check (bool) – If True, consistency between the backends is checked.
- delete(key, mode=None)
Delete key and object from the backends.
- Parameters:
key (str) – unique identifier to be deleted.
- save()
Save the objects registered in dict to zarr.
- to_memory(key)
Move object from zarr to dict.
- Parameters:
key (str) – the unique identifier to be moved.
- to_storage(key)
Move object from dict to storage.
- Parameters:
key (str) – the unique identifier to be moved.
- property save_dir
Returns the name of base directory of Saver.
- Returns:
the name of base directory.
- Return type:
str
- dump_ml(key, suffix=None, ml_type=None, **kwargs)
Dump machine learning models and parameters.
- Parameters:
key (str) – the unique identifier to store metadata.
suffix (str) – arbitrary suffix to key (e.g. job_id, epoch) to avoid conflicts.
ml_type (str) – keras or pytorch or None. If it is
None
, justkwargs
are dumped, which means ML model is not dumped.kwargs – arbitrary arguments. Only standard types (int, float, str, list, dict) are dumped due to a limitation of pickle.
- load_ml(key, suffix=None)
Load machine learning models and parameters.
- Parameters:
key (str) – the unique identifier to load metadata.
suffix (str) – arbitrary suffix to key (e.g. job_id, epoch).
- Returns:
arbitrary object.
- Return type:
obj
- class multiml.Hyperparameters(hps=None)
Bases:
object
Utility class to manage Hyperparameter classes.
The Hyperparameters class provides interfances to manage Hyperparameter class instances. This Hyperparameters class instance should be passed to TaskScheduler together with corresponding subtask.
Examples
>>> hps_dict = { >>> 'hp_layers': [5, 10, 15, 20], # discrete >>> 'hp_alpha': [1.0, 2.0, 3.0] # discrete >>> } >>> hps = Hyperparameters(hps_dict) >>> hps.set_min_hps() >>> hps.get_current_hps() >>> -> {'hp_layers': 5, 'hp_alpha': 1.0} >>> hps.step('hp_layers', 2) >>> hps.get_current_hps() >>> -> {'hp_layers': 15, 'hp_alpha': 1.0} >>> hps.step('hp_alpha', 1) >>> hps.get_current_hps() >>> -> {'hp_layers': 15, 'hp_alpha': 2.0}
- __init__(hps=None)
Initialize Hyperparameters class.
hps
option provides a shortcut to register hyperparameters. This option works only for discrete hyperparameters for now.- Parameters:
hps (dict) – a dictionary of hyperparameters. Please see
add_hp_from_dict()
method.
- __len__()
Returns the number of all possible combinations of hyperparameters.
- __getitem__(item)
Returns registered Hyperparameter class instance by index.
If
item
is str, Hyperparameter is searched by itsname
, and class instance is returned if it exists. Ifitem
is int, a dictionary of selected hyperparameters fromget_grid_hps()
method is returned.- Parameters:
item (str or int) – the name of hyperparameter, or index of all possible combination from
get_grid_hps()
method.- Returns:
please see the above description.
- Return type:
Hyperparameter or dict
- __contains__(item)
Check if Hyperparameter is registered or not.
- Parameters:
item (str) – the name of Hyperparameter.
- Returns:
True if Hyperparameter exists.
- Return type:
bool
- step(hp_name=None, step=None)
Update the current hyperparameter values.
- Parameters:
hp_name (str) – name of hyperparameter. If given
np_name
is None, any updatablehp_name
is selected arbitrarily.step (int or float) – see Hyperparameter class.
- Returns:
if the value is changed or not.
- Return type:
bool
- add_hp_from_dict(hps)
Add hyperparameters from dictionary.
Values of the dictionary should be a list of allowed hyperparameter values. Continuous mode is not supported yet.
- Parameters:
hps (dict) – dict of hyperparameter values.
Examples
>>> hp_dict = dict(hp0=[0, 1, 2], hp1=[3, 4, 5]) >>> hps.add_hp_from_dict(hp_dict)
- add_hp(name, values, is_continuous=False)
Add hyperparameter.
- Parameters:
name (str) – the name of hyperparameter.
values (tuple or list) – please see Hyperparameter class.
is_continuous (bool) – Hyperparameter is continuous or not.
- set_min_hps()
Set the minimum value for each Hyperparameter.
- set_max_hps()
Set the maximum value for each Hyperparameter.
- get_current_hps()
Returns the current values of Hyperparameters.
- Returns:
dict of the current hyperparameter values.
- Return type:
dict
- get_grid_hps()
Returns all possible combination of hyperparameters values.
If registered Hyperparameter class instance is continuous mode, values are sampled between
min
andmax
by dividingdefault step
.- Returns:
all possible combination of hyperparameters values.
- Return type:
list
- get_hp_names()
Returns the names of hyperparameters.
- Returns:
the names of hyperparameters.
- Return type:
list
- class multiml.TaskScheduler(ordered_tasks=None)
Bases:
object
Task management class for multiml execution.
Manage tasks and subtasks. Ordering of tasks are controlled by DAG by providing parents and childs dependencies.
Examples
>>> subtask = MyTask() >>> task_scheduler = TaskScheduler() >>> task_scheduler.add_task('task_id') >>> task_scheduler.add_subtask('task_id', 'subtask_id', subtask) >>> task_scheduler.get_sorted_task_ids()
- __init__(ordered_tasks=None)
Initialize the TaskScheduler and reset DAG.
ordered_tasks
option provides a shortcut of registering ordered task and subtask. Please seeadd_ordered_tasks()
andadd_ordered_subtasks()
methods for details. If task dependencies are complex, please add task and subtask usingadd_task()
andadd_subtask()
methods.- Parameters:
ordered_tasks (list) – list of ordered task_ids, or list of ordered subtasks. If given value is list of str,
add_ordered_tasks()
is called to register task_ids. If given value is list of other types,add_ordered_subtasks()
is called to register subtasks.
Examples
>>> # ordered task_ids >>> task_scheduler = TaskScheduler(['task0', 'task1']) >>> task_scheduler.add_subtask('task0', 'subtask0', env)
- __len__()
Returns number of all grid combination.
- Returns:
the number of all grid combination.
- Return type:
int
- __getitem__(item)
Returns
subtasktuples
by index.- Parameters:
item (int) – Index between 0 to len(task_scheduler).
Examples
>>> task_scheduler[0]
- add_task(task_id, parents=None, children=None, subtasks=None, add_to_dag=True)
Register task and add the relation between tasks.
If
subtasks
is provided as a list of dict, subtasks are also registered to giventask_id
. To specify dependencies of tasks,parents
or/andchildren
need to be set, andadd_to_dag
must be True.- Parameters:
task_id (str) – unique task identifier
parents (list or str) – list of parent task_ids, or str of parent task_id.
children (list or str) – list of child task_ids. or str of child task_id.
subtasks (list) – list of dict of subtasks with format of {‘subtask_id’: subtask_id, ‘env’: env, ‘hps’: hps}
add_to_dag (bool) – add task to DAG or not. To obtain task dependencies, e.g. ordered tasks, task need to be added to DAG.
- add_ordered_tasks(ordered_tasks)
Register ordered tasks.
For example, if
ordered_tasks
is [‘task0’, ‘task1’], ‘task0’ and ‘task0’ are registered with dependency of ‘task0 (parent)’ -> ‘task1 (child)’.- Parameters:
ordered_tasks (list) – list of task_ids
- add_ordered_subtasks(ordered_tasks)
Register ordered subtasks.
ordered_tasks
need to be a format of [task0, task1…], where e.g. task0 is a list of tuples [(‘subtask0’, env0, hps0), (‘subtask1’, env0, hps0)…].task_id
is automatically set with ‘step0’, ‘step1’… For the examples below, scheme of pipeline is:>>> step0 (subtask0, subtask1) -> step1 (subtask2, subtask2)
- Parameters:
ordered_tasks (list) – list of subtasks. Please see examples below.
Examples
>>> # ordered tasks with subtask_id and hyperparameters >>> step0 = [('subtask0', env0, hps0), ('subtask1', env1, hps1)] >>> step1 = [('subtask2', env2, hps2), ('subtask3', env3, hps3)] >>> steps = [step0, step1] >>> task_scheduler.add_ordered_subtasks(steps) >>> >>> # ordered tasks with hyperparameters (subtask_id will be class name) >>> step0 = [(env0, hps0), (env1, hps1)] >>> step1 = [(env2, hps2), (env3, hps3)] >>> steps = [step0, step1] >>> task_scheduler.add_ordered_subtasks(steps) >>> >>> # ordered tasks without hyperparameters >>> steps = [env0, env1] >>> task_scheduler.add_ordered_subtasks(steps)
- add_subtask(task_id, subtask_id=None, env=None, hps=None)
Register a subtask to given task.
Need to register the corresponding task before calling this method.
- Parameters:
task_id (str) – unique task identifier.
subtask_id (str) – unique subtask identifier.
env (BaseTask) – user defined subtask class instance. subtask class need to inherited from BaseTask class.
hps (dict or Hyperparameters) – user defined Hyperparameters class instance or dict. If hps is dict, dict is converted to Hyperparameters class instance automatically.
- get_subtasks(task_id)
Returns subtasks of tasktuple for given task_id.
- Parameters:
task_id (str) – unique task identifier.
- Returns:
list of subtasktuples for given
task_id
.- Return type:
list
- get_subtask_ids(task_id)
Returns subtask_ids by task_id.
- Parameters:
task_id (str) – unique task identifier.
- Returns:
list of subtask_ids for given
task_id
.- Return type:
list
- get_subtask(task_id, subtask_id)
Returns subtasktuple for given task_id and subtask_id.
- Parameters:
task_id (str) – unique task identifier.
subtask_id (str) – unique subtask identifier.
- Returns:
subtasktuple
for giventask_id
andsubtask_id
.- Return type:
- get_parents_task_ids(task_id)
Returns parent task_ids for given task_id.
- Parameters:
task_id (str) – unique task identifier.
- Returns:
list of parent
task_ids
for giventask_ids
.- Return type:
list
- get_children_task_ids(task_id)
Returns child task_ids for given task_id.
- Parameters:
task_id (str) – unique task identifier.
- Returns:
list of child
task_ids
for giventask_id
.- Return type:
list
- get_sorted_task_ids()
Returns topologically sorted task_ids.
- Returns:
a list of topologically sorted
task_ids
.- Return type:
list
- get_subtasks_with_hps(task_id)
Returns all combination of subtask_ids and hps for given task_id.
- Parameters:
task_id (str) – unique task identifier.
- Returns:
list of modified subtasktuples. Modified subtasktuple format is .task_id: task_id, .subtask_id: subtask_id, .env: subtask class instance, .hps: dictionary of hps.
- Return type:
list
- get_all_subtasks_with_hps()
Returns all combination of subtask_ids and hps for all task_ids.
- Returns:
list of
get_subtasks_with_hps()
for eachtask_id
.- Return type:
list
- get_subtasks_pipeline(index)
Returns modified subtasktuples for given index.
- Returns:
list of modified subtasktuples.
- Return type:
list
- show_info()
Show information of registered tasks and subtasks.