multiml.StoreGate
- class multiml.StoreGate(backend='numpy', backend_args=None, data_id=None)
Data management class for multiml execution.
StoreGate provides common interfaces to manage data between multiml agents and tasks with features of:
Different backends are supported (numpy or zarr, and hybrid of them),
Data are split into train, valid and test phases for ML,
Data are retrieved by
var_names
,phase
andindex
options.
Each dataset in the storegate is keyed by unique
data_id
. All data in the dataset are identified byvar_names
(column names). The number of samples in a phase is assumed to be the same for all variables in multiml agents and tasks. Thecompile()
method ensures the validity of the dataset.Examples
>>> from multiml.storegate import StoreGate >>> >>> # User defined parameters >>> var_names = ['var0', 'var1', 'var2'] >>> data = [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11]] >>> phase = (0.5, 0.25, 0.25) # fraction of train, valid, test >>> >>> # Add data to storegate >>> storegate = StoreGate(backend = 'numpy', data_id='test_id') >>> storegate.add_data(var_names=var_names, data=data, phase=phase) >>> >>> # Get data from storegate >>> storegate.get_data(var_names=var_names, phase='train') >>> storegate['train'][var_names][0]
- __init__(backend='numpy', backend_args=None, data_id=None)
Initialize the storegate and the backend architecture.
Initialize storegate and the backend architecture with its options.
numpy
backend manages data in memory,zarr
backend reads and writes data to storage of given path.hybrid
backend is combination ofnumpy
andzarr
backends, which allows to move data between memory and storage.- Parameters:
backend (str) – numpy (on memory), zarr (on storage), hybrid.
backend_args (dict) – backend options, e.g. path to zarr database. Please see
ZarrDatabase
andHybridDatabase
classes for details.data_id (str) – set default
data_id
if given.
Methods
__init__
([backend, backend_args, data_id])Initialize the storegate and the backend architecture.
add_data
(var_names, data[, phase, shuffle, ...])Add data to the storegate with given options.
argmax
(var_names, axis[, phase])Convert data to argmax (operation is limited by memory)
astype
(var_names, dtype[, phase])Convert data type to given dtype (operation is limited by memory)
Delete all data in the current data_id and backend
compile
([reset, show_info])Check if registered samples are valid.
create_empty
(var_names, shape[, phase, dtype])Create empty data in the current data_id and backend.
delete_data
(var_names[, phase, do_compile])Delete data associated with var_names.
get_data
(var_names[, phase, index])Retrieve data from storegate with given options.
Returns registered data_ids in the backend.
Returns a dict of metadata.
get_var_names
([phase])Returns registered var_names for given phase.
get_var_shapes
(var_names[, phase])Returns shapes of variables for given phase.
onehot
(var_names, num_classes[, phase])Convert data to onehot vectors (operation is limited by memory)
set_data_id
(data_id)Set the default
data_id
and initialize the backend.set_mode
(mode)Set backend mode of hybrid architecture.
Show information currently registered in storegate.
shuffle
([phase, seed])Shuffle data in given phase.
to_memory
(var_names[, phase, ...])Move data from storage to memory.
to_storage
(var_names[, phase, ...])Move data from storage to memory.
update_data
(var_names, data[, phase, index, ...])Update data in storegate with given options.
Attributes
Return the current backend of storegate.
Returns the current
data_id
.- __init__(backend='numpy', backend_args=None, data_id=None)
Initialize the storegate and the backend architecture.
Initialize storegate and the backend architecture with its options.
numpy
backend manages data in memory,zarr
backend reads and writes data to storage of given path.hybrid
backend is combination ofnumpy
andzarr
backends, which allows to move data between memory and storage.- Parameters:
backend (str) – numpy (on memory), zarr (on storage), hybrid.
backend_args (dict) – backend options, e.g. path to zarr database. Please see
ZarrDatabase
andHybridDatabase
classes for details.data_id (str) – set default
data_id
if given.
- __getitem__(item)
Retrieve data by python getitem syntax.
Retrieve data by python getitem syntax, i.e.
storegate[phase][var_names][index]
.data_id
,phase
,var_names
andindex
need to be given to return selected data. If all parameters are set, selected data are returned. Otherwise, self instance class with given parameters is returned.- Parameters:
item (str or list or int or slice) – If item is str of train or valid or test,
phase
is set. If item is the other str or list of strs,var_names
is set. If item is int or slice, data with index (slice) are returned.- Returns:
please see description above.
- Return type:
self or ndarray
Example
>>> # get all train data >>> storegate['train']['var0'][:] >>> # slice train data by index >>> storegate['train']['var0'][0:2] >>> # loop by index >>> for data in storegate['train']['var0']: >>> print(data)
- __setitem__(item, data)
Update data by python setitem syntax.
Update data by python setitem syntax, i.e.
storegate[phase][var_names][index] = data
.data_id
,phase
,var_names
andindex
need to be given to update data.- Parameters:
item (int or slice) – Index of data to be updated.
data (list or ndarray) – new data.
Example
>>> # update all train data >>> storegate['train']['var0'][:] = data >>> # update train data by index >>> storegate['train']['var0'][0:2] = data[0:2]
- __delitem__(item)
Delete data by python delitem syntax.
Delete data by python setitem syntax, i.e.
del storegate[phase][var_names]
.data_id
,phase
,var_names
need to be given to delete data.- Parameters:
item (str or list) –
var_names
to be deleted.
Example
>>> # delete var0 from train phase >>> del storegate['train']['var0']
- __len__()
Returns number of samples for given
phase
anddata_id
.- Returns:
the number of samples in given conditions.
- Return type:
int
Examples
>>> len(storegate['train']) >>> len(storegate['test'])
- __contains__(item)
Check if given
var_name
is available in storegate.- Parameters:
item (str) – name of variables.
- Returns:
If
item
exists in given condisons or not.- Return type:
bool
Examples
>>> 'var0' in storegate['train'] >>> 'var1' in storegate['test']