multiml.StoreGate
- class multiml.StoreGate(backend='numpy', backend_args=None, data_id=None)
Data management class for multiml execution.
StoreGate provides common interfaces to manage data between multiml agents and tasks with features of:
Different backends are supported (numpy or zarr, and hybrid of them),
Data are split into train, valid and test phases for ML,
Data are retrieved by
var_names,phaseandindexoptions.
Each dataset in the storegate is keyed by unique
data_id. All data in the dataset are identified byvar_names(column names). The number of samples in a phase is assumed to be the same for all variables in multiml agents and tasks. Thecompile()method ensures the validity of the dataset.Examples
>>> from multiml.storegate import StoreGate >>> >>> # User defined parameters >>> var_names = ['var0', 'var1', 'var2'] >>> data = [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11]] >>> phase = (0.5, 0.25, 0.25) # fraction of train, valid, test >>> >>> # Add data to storegate >>> storegate = StoreGate(backend = 'numpy', data_id='test_id') >>> storegate.add_data(var_names=var_names, data=data, phase=phase) >>> >>> # Get data from storegate >>> storegate.get_data(var_names=var_names, phase='train') >>> storegate['train'][var_names][0]
- __init__(backend='numpy', backend_args=None, data_id=None)
Initialize the storegate and the backend architecture.
Initialize storegate and the backend architecture with its options.
numpybackend manages data in memory,zarrbackend reads and writes data to storage of given path.hybridbackend is combination ofnumpyandzarrbackends, which allows to move data between memory and storage.- Parameters:
backend (str) – numpy (on memory), zarr (on storage), hybrid.
backend_args (dict) – backend options, e.g. path to zarr database. Please see
ZarrDatabaseandHybridDatabaseclasses for details.data_id (str) – set default
data_idif given.
Methods
__init__([backend, backend_args, data_id])Initialize the storegate and the backend architecture.
add_data(var_names, data[, phase, shuffle, ...])Add data to the storegate with given options.
argmax(var_names, axis[, phase])Convert data to argmax (operation is limited by memory)
astype(var_names, dtype[, phase])Convert data type to given dtype (operation is limited by memory)
Delete all data in the current data_id and backend
compile([reset, show_info])Check if registered samples are valid.
create_empty(var_names, shape[, phase, dtype])Create empty data in the current data_id and backend.
delete_data(var_names[, phase, do_compile])Delete data associated with var_names.
get_data(var_names[, phase, index])Retrieve data from storegate with given options.
Returns registered data_ids in the backend.
Returns a dict of metadata.
get_var_names([phase])Returns registered var_names for given phase.
get_var_shapes(var_names[, phase])Returns shapes of variables for given phase.
onehot(var_names, num_classes[, phase])Convert data to onehot vectors (operation is limited by memory)
set_data_id(data_id)Set the default
data_idand initialize the backend.set_mode(mode)Set backend mode of hybrid architecture.
Show information currently registered in storegate.
shuffle([phase, seed])Shuffle data in given phase.
to_memory(var_names[, phase, ...])Move data from storage to memory.
to_storage(var_names[, phase, ...])Move data from storage to memory.
update_data(var_names, data[, phase, index, ...])Update data in storegate with given options.
Attributes
Return the current backend of storegate.
Returns the current
data_id.- __init__(backend='numpy', backend_args=None, data_id=None)
Initialize the storegate and the backend architecture.
Initialize storegate and the backend architecture with its options.
numpybackend manages data in memory,zarrbackend reads and writes data to storage of given path.hybridbackend is combination ofnumpyandzarrbackends, which allows to move data between memory and storage.- Parameters:
backend (str) – numpy (on memory), zarr (on storage), hybrid.
backend_args (dict) – backend options, e.g. path to zarr database. Please see
ZarrDatabaseandHybridDatabaseclasses for details.data_id (str) – set default
data_idif given.
- __getitem__(item)
Retrieve data by python getitem syntax.
Retrieve data by python getitem syntax, i.e.
storegate[phase][var_names][index].data_id,phase,var_namesandindexneed to be given to return selected data. If all parameters are set, selected data are returned. Otherwise, self instance class with given parameters is returned.- Parameters:
item (str or list or int or slice) – If item is str of train or valid or test,
phaseis set. If item is the other str or list of strs,var_namesis set. If item is int or slice, data with index (slice) are returned.- Returns:
please see description above.
- Return type:
self or ndarray
Example
>>> # get all train data >>> storegate['train']['var0'][:] >>> # slice train data by index >>> storegate['train']['var0'][0:2] >>> # loop by index >>> for data in storegate['train']['var0']: >>> print(data)
- __setitem__(item, data)
Update data by python setitem syntax.
Update data by python setitem syntax, i.e.
storegate[phase][var_names][index] = data.data_id,phase,var_namesandindexneed to be given to update data.- Parameters:
item (int or slice) – Index of data to be updated.
data (list or ndarray) – new data.
Example
>>> # update all train data >>> storegate['train']['var0'][:] = data >>> # update train data by index >>> storegate['train']['var0'][0:2] = data[0:2]
- __delitem__(item)
Delete data by python delitem syntax.
Delete data by python setitem syntax, i.e.
del storegate[phase][var_names].data_id,phase,var_namesneed to be given to delete data.- Parameters:
item (str or list) –
var_namesto be deleted.
Example
>>> # delete var0 from train phase >>> del storegate['train']['var0']
- __len__()
Returns number of samples for given
phaseanddata_id.- Returns:
the number of samples in given conditions.
- Return type:
int
Examples
>>> len(storegate['train']) >>> len(storegate['test'])
- __contains__(item)
Check if given
var_nameis available in storegate.- Parameters:
item (str) – name of variables.
- Returns:
If
itemexists in given condisons or not.- Return type:
bool
Examples
>>> 'var0' in storegate['train'] >>> 'var1' in storegate['test']