StoreGate tutorial
Adding data
- Base arguments:
backend
(str):numpy
,zarr
orhybrid
.numpy
is suitable for small dataset, andzarr
is suitable for large dataset that exceds capacity of memorydata_id
(str): unique identifer of datasetbackend_args
(dict): args passed to backend database, e.g. directory path to zarr
>>> import numpy as np
>>> from multiml import StoreGate
>>> storegate = StoreGate(backend='numpy', data_id='toy-dataset')
>>> phase = (0.8, 0.1, 0.1) # fraction of train, valid, test phase
>>> storegate.add_data('x0', np.array(range(5)), phase)
>>> storegate.add_data('x1', np.array(range(5)), phase)
>>> storegate.compile()
>>> storegate.show_info()
[I] ================================================================================
[I] data_id : toy-dataset, compiled : True
[I] --------------------------------------------------------------------------------
[I] phase backend var_names var_types total_events var_shape
[I] ================================================================================
[I] train numpy x0 int64 3 ()
[I] train numpy x1 int64 3 ()
[I] --------------------------------------------------------------------------------
[I] phase backend var_names var_types total_events var_shape
[I] ================================================================================
[I] valid numpy x0 int64 1 ()
[I] valid numpy x1 int64 1 ()
[I] --------------------------------------------------------------------------------
[I] phase backend var_names var_types total_events var_shape
[I] ================================================================================
[I] test numpy x0 int64 1 ()
[I] test numpy x1 int64 1 ()
[I] ================================================================================
Retrieving data
- Base arguments:
var_names
(str or tuple or list): name of variables, please see the examples belowphase
(str):train
,valid
ortest
>>> # retrive all x0 data from train phase
>>> storegate['train']['x0'][:]
[0 1 2]
>>> # indedexing
>>> storegate['train']['x0'][0]
0
>>> storegate['train']['x0'][0:2]
[0 1]
>>> # retrive all x0 and x1 data from train phase,
>>> # if tuple is given, numpy.ndarray are returned,
>>> data = storegate['train'][('x0', 'x1')][:]
[[0 0]
[1 1]
[2 2]]
>>> # if list is given, list of numpy.ndarray for each variable are returned
>>> data = storegate['train'][['x0', 'x1']][:]
[array([0, 1, 2]), array([0, 1, 2])]
>>> # loop the StoreGate
>>> for index, data in enumerate(storegate['train'][('x0', 'x1')]):
>>> print (index, data)
0 [0 0]
1 [1 1]
2 [2 2]