GameraDataReader

class PlasmaCalcs.hookups.mage.gamera.gamera_io_tools.GameraDataReader(ftag, *, dir='.')

Bases: MageicDataReader

handles reading data from gamera files.

The most common case is probably that data is stored across multiple “rankfiles”,

with each file having all snapshots in time, but only one section of the grid.

It is also possible to have only a single rankfile (mpi=False mode).

[TODO] implement this case, probably just via Ri=Rj=Rk=1,

and using a different base file.

It is also possible to have data which is only 2D.

[TODO] implement this case, probably just affects result[…] = var[:] lines.

ftag: str

file tag, e.g. ‘runname’ from file like:

‘runname_0002_0002_0001_0000_0000_0000.gam.h5’.

dir: str

directory where files are located. Stored as self.dirname = os.path.abspath(dir).

Attributes in self:

rankfiles_all = xarray.DataArray of gamera rank file names with ftag in self.dirname.

filenames should look like ftag_Ri_Rj_Rk_ri_rj_rk.gam.h5,
where Ri, Rj, Rk are the total number of processors in each dimension,
which must be the same for all files (else will crash),
and ri, rj, rk are the 0-indexed processor indices for this file,
with zfill(4) in file names, e.g. 0000, 0001, 0002, …, 0010, …
(Alternatively, could be ftag.gam.h5 if didn’t use mpi,
in which case will use Ri=Rj=Rk=1 and ri=rj=rk=0 internally,
but this is [not yet implemented].)
filenames are stored in rankfiles_all as paths relative to self.dirname.
array.isel(ri=i, rj=j, rk=k) (or .sel) tells file for processor rank (i,j,k).
array[‘Ri’], ‘Rj’, ‘Rk’ are the total number of ranks in each dimension.
array[‘ftag’] = self.ftag, and array[‘dir’] = self.dirname.

rankfiles = xarray.DataArray of currently-being-considered rankfiles.

might be the same as rankfiles_all, or might be a slice from it.
array.sel(ri=i, rj=j, rk=k) tells file for processor rank (i,j,k).
(isel might give a different rank when not equal to rankfiles_all)

file0 = the first file, used for checking data info like shape, dtype, etc.

abspath of rankfiles_all.isel(ri=0, rj=0, rk=0).item().

Methods

`assert_rankfiles_contiguous`()	assert that rankfiles are contiguous, crashing with DimensionSizeError if not.
`azimuth`(*[, center, cache, full, squeeze, ...])	return azimuth angle (arctan(z,y) not y,x) in radians based on grid coordinates.
`directly_loadable_vars`()	return tuple of all directly loadable vars from 'Step#0' of file0.
`file0_attr`(key, *[, decode])	return value of attr, from self.open_file0().attrs[key].
`file0_attrs`(*[, decode])	return dict of file0.attrs.
`get_var`(v, step, *[, slices])	return numpy array of variable v across self.rankfiles at this step.
`get_var0`(v, step)	return numpy array of variable v at this step, from file0,
`grid`([x, center, cache, full, squeeze_x_close])	return xarray.Dataset of grid coordinates across self.rankfiles.
`grid_coords`(*[, center, cache, full, ...])	return standard grid coords as Dataset with 'x', 'y', 'z' data_vars, maybe extras too;
`ijk_coords`(*[, center])	return dict of {'i': i_coords, 'j': j_coords, 'k': k_coords}.
`maintaining_attrs`(attrs, *attrs_as_flags)	returns context manager which restores attrs of self to their original values, upon exit.
`open_file0`()	context manager for opening file0, for checking data info like shape, dtype, steps, etc.
`rankfile_isel`(rii, rjj, rkk)	return abspath of file at self.rankfiles.isel(ri=rii, rj=rjj, rk=rkk).item().
`rcyl`(*[, center, cache, full, squeeze, ...])	return cylindrical radius (sqrt(y^2 + z^2)) based on grid coordinates.
`steps_info`()	return dict of arrays telling 's', 't', 'MJD', and possibly 'timestep' too.
`using_attrs`([attrs_as_dict, _unset_sentinel])	returns context manager which sets attrs of obj upon entry; restores original values upon exit.
`_assert_shape_and_dtype`(arr, shape, dtype, ...)	helper for gamera loading routines, ensure var has expected shape and dtype,
`_get_rankfiles_all`()	return xarray.DataArray of rank files based on ftag in self.dirname.
`_get_var_sliced`(v, step, slices)	return numpy array of variable v across self.rankfiles at this step, with these global slices.
`_grid_centered`(xyz)	return dict of centered grid coordinates (cell centers) across self.rankfiles.
`_grid_to_center`(grid)	returns grid values at cell centers, given grid values (numpy array) at corners.
`_grid_uncentered`(xyz)	return dict of uncentered grid coordinates (cell corners) across self.rankfiles.
`_rankfiles_is_all`()	tells whether self.rankfiles is the same as self.rankfiles_all.
`_resqueeze_slices`(array, unsqueezed)	squeeze array along any axes which were unsqueezed.
`_slices_to_full_shape`(slices)	return full (i,j,k) shape implied by slices.
`_slices_to_rankfile_slices`(slices)	return dict of {(ri, rj, rk): (slices for that rankfile, corresponding global slices)}
`_unsqueeze_slices`(slices)	"un-squeezes slices, returning (unsqueezed_slices, dict telling which slices were changed).

Attributes

`Nshape`	(Ni, Nj, Nk) shape of full data var across self.rankfiles
`Nshape_full`	(Ni, Nj, Nk) shape of full data var across self.rankfiles_all
`Rshape`	(len(ri), len(rj), len(rk)), the shape of self.rankfiles.
`Rshape_full`	(Ri_full, Rj_full, Rk_full), the total number of ranks in each dimension.
`data_array_info`	return dict of info about data arrays from self.rankfiles.
`data_array_info_full`	self.data_array_info when self.rankfiles == self.rankfiles_all
`file0`	the first file, used for checking data info like shape, dtype, etc.
`maintaining`	alias to maintaining_attrs
`nshape`	(ni, nj, nk) shape of data var from each rankfile
`rankfiles`	xarray.DataArray of rank files currently being considered.
`rankfiles_all`	full xarray.DataArray of rank files based on ftag in self.dirname.
`steps`	return list of all steps available (as strs).
`using`	alias to using_attrs

property Nshape: (Ni, Nj, Nk) shape of full data var across self.rankfiles

property Nshape_full: (Ni, Nj, Nk) shape of full data var across self.rankfiles_all

property Rshape: (len(ri), len(rj), len(rk)), the shape of self.rankfiles.

Equivalent to self.Rshape_full if self.rankfiles == self.rankfiles_all.

(Internal code might call this Ri, Rj, Rk = Rshape, as a shorthand,

even though they are not necessary equal to self.rankfiles_all[‘Ri’], ‘Rj’, ‘Rk’)

property Rshape_full: (Ri_full, Rj_full, Rk_full), the total number of ranks in each dimension.

static _assert_shape_and_dtype(arr, shape, dtype, *, var): helper for gamera loading routines, ensure var has expected shape and dtype,

crash if shape or dtype is wrong.

arr can be array or any object with shape and dtype

(e.g. h5py dataset even if didn’t load data into memory yet).

_get_rankfiles_all(): return xarray.DataArray of rank files based on ftag in self.dirname.

See help(type(self)) for details.

_get_var_sliced(v, step, slices)

return numpy array of variable v across self.rankfiles at this step, with these global slices.

slices: dict

if provided, only load the pieces needed by slices; keys can be i, j, k.

_grid_centered(xyz): return dict of centered grid coordinates (cell centers) across self.rankfiles.

xyz: list of strs from ‘x’, ‘y’, ‘z’

tells which coordinates to return (‘X’, ‘Y’, ‘Z’ in files).

static _grid_to_center(grid)

returns grid values at cell centers, given grid values (numpy array) at corners.

grid: 1D, 2D, or 3D numpy array

grid values at corners, with shape (Ni+1,), (Ni+1, Nj+1), or (Ni+1, Nj+1, Nk+1).

result has shape (Ni,), (Ni, Nj), or (Ni, Nj, Nk).

_grid_uncentered(xyz): return dict of uncentered grid coordinates (cell corners) across self.rankfiles.

xyz: list of strs from ‘x’, ‘y’, ‘z’

tells which coordinates to return (‘X’, ‘Y’, ‘Z’ in files).

_rankfiles_is_all(): tells whether self.rankfiles is the same as self.rankfiles_all.

static _resqueeze_slices(array, unsqueezed): squeeze array along any axes which were unsqueezed.

assumes array axes will be ‘i’, ‘j’, (and ‘k’ if 3D).

_slices_to_full_shape(slices): return full (i,j,k) shape implied by slices.

_slices_to_rankfile_slices(slices): return dict of {(ri, rj, rk): (slices for that rankfile, corresponding global slices)}

ri, rj, rk are the proper indices of the rankfile (as implied by filename).

(Not to be confused with rii, rjj, and rkk which are just

indices along those dims for the current self.rankfiles.)

slices: dict of indexers for i, j, k.

Each indexer can be a slice object or an iterable of indices. (Not a single int though.)

These are the “global” slices.

static _unsqueeze_slices(slices): “un-squeezes slices, returning (unsqueezed_slices, dict telling which slices were changed).

unsqueeze converts integer to iterable of length 1.

The relevant dim should be squeezed away at some point to respect original inputs,

but for internal processing it is nice to not worry about number of dims changing.

assert_rankfiles_contiguous(): assert that rankfiles are contiguous, crashing with DimensionSizeError if not.

i.e. ri’s are contiguous, rj’s are contiguous, and rk’s are contiguous.

(Will also crash if ri, rj, or rk aren’t sorted in increasing order.

This is fine for now because other implementations assume it,

but could be relaxed in the future if there’s a use-case for it.)

azimuth(*, center=True, cache=True, full=False, squeeze=True, closeness_as='coords'): return azimuth angle (arctan(z,y) not y,x) in radians based on grid coordinates.

There is a good chance that result will be a 1D array (if squeeze=True),

because azimuth should only vary with k.

center, cache, and full all get passed to self.grid()

cache: bool

passed to self.grid(), also tells whether to cache self.azimuth() result,

given center, full, squeeze, and self.rankfiles.

squeeze: bool

whether to check if result is effectively less than 3D, and squeeze

unnecessary dimensions if any. (See xarray_squeeze_close for details)

closeness_as: ‘coords’, ‘attrs’, or None

where to put information about closeness == pre-squeeze variation along each dim,

for each dim dropped. (closeness = array.isel(dim=0)/array).std())

(smaller is closer.) (always 0 for dims of size 1.)

‘coords’ –> put info in result.coords

‘attrs’ –> put info in result.attrs

None –> do not include closeness values in the result.

property data_array_info: return dict of info about data arrays from self.rankfiles. Includes the following:

‘FshapeV’: shape of resulting data var across rankfiles; like (Ni,Nj,Nk).

‘FshapeG’: shape of resulting full grid across rankfiles; like (Ni+1,Nj+1,Nk+1).

‘RshapeV’: shape of resulting data var from each rankfile; like (ni,nj,nk).

‘RshapeG’: shape of resulting full grid from each rankfile; like (ni+1,nj+1,nk+1).

‘RshapeVt’: like ‘RshapeV’ but matches file axis order; like (nk,nj,ni),

‘RshapeGt’: like ‘RshapeG’ but matches file axis order; like (nk+1,nj+1,ni+1).

‘dtype’: dtype of data var array within each rankfile.

The ‘Vt’ and ‘Gt’ shapes are provided because the gamera files actually put k axis first,

but standard practice is to transpose to (i, j, k) order when reading.

Notes for convenience:

To get ni, nj, nk, use self.data_array_info[‘RshapeV’].

To get Ni, Nj, Nk, use self.data_array_info[‘FshapeV’].

The result depends on self.rankfiles, and will change if rankfiles was sliced.

property data_array_info_full: self.data_array_info when self.rankfiles == self.rankfiles_all

directly_loadable_vars(): return tuple of all directly loadable vars from ‘Step#0’ of file0.

property file0: the first file, used for checking data info like shape, dtype, etc.

abspath of rankfiles_all.isel(ri=0, rj=0, rk=0).item().

Always points to rankfiles_all’s file0, regardless of self.rankfiles.

file0_attr(key, *, decode=True): return value of attr, from self.open_file0().attrs[key].

decode: bool, whether to result.decode(‘utf-8’) if result is a bytes string.

file0_attrs(*, decode=True): return dict of file0.attrs.

E.g. result[‘UnitsID’] tells the UnitsID for the data.

decode: bool, whether to s.decode(‘utf-8’) for bytes strings s.

get_var(v, step, *, slices=None)

return numpy array of variable v across self.rankfiles at this step.

slices: None or dict

if provided, only load the pieces needed by slices; keys can be i, j, k.

get_var0(v, step): return numpy array of variable v at this step, from file0,

assuming file0 data shape matches full Nshape (else will crash).

(Transposed from fortran order to python order.)

grid(x=None, *, center=True, cache=True, full=False, squeeze_x_close='exact'): return xarray.Dataset of grid coordinates across self.rankfiles.

Will return DataArray instead if x is a single str.

x: None, ‘x’, ‘y’, ‘z’, or list of strs.

tells which coordinates to return.

(Corresponding to ‘X’, ‘Y’, ‘Z’ in the files themselves.)

‘x’, ‘y’, or ‘z’ –> return DataArray of that coordinate only.

list of strs –> return Dataset but with those coordinates only.

None –> equivalent to [‘x’, ‘y’, ‘z’]

center: bool

whether to return coordinates at cell centers.

When False, the grid has shape (Ni+1, Nj+1, Nk+1),

telling the coordinates at grid corners (instead of centers),

and will ensure that there is alignment between adjacent files.

E.g., last k of file000 must align with first k of file0001,

i.e., file000[‘X’][:,:,-1] == file001[‘X’][:,:,0],

where file000 = rankfiles.isel(ri=0, rj=0, rk=0).item(),

file001 = rankfiles.isel(ri=0, rj=0, rk=1).item().

When True, the grid has shape (Ni, Nj, Nk),

telling the coordinates at grid centers,

and there is no need to check for alignment!

cache: bool

whether to cache result (for x, center, and self.rankfiles).

if True, cache result in case it is reloaded.

Keeps a separate cache for centered / uncentered values.

Also keeps a separate cache for when self.rankfiles equals self.rankfiles_all.

Cache is destroyed if self.rankfiles is changed.

if False, don’t check or alter cache.

full: bool

whether to instead return result for self.rankfiles_all, ignoring self.rankfiles.

squeeze_x_close: ‘exact’ or bool

whether to xarray_squeeze_close() for ‘x’, specifically.

(The default gamera grid x is constant with respect to k;

this will remove k, after ensuring it is indeed constant)

‘exact’ –> tol=0, i.e. only drop if it is exactly constant

True –> drop if “close” to constant (see xarray_squeeze_close for details).

grid_coords(*, center=True, cache=True, full=False, squeeze_x_close='exact', azimuth='if1D', rcyl='if2D', as_coords=False, u_l=None): return standard grid coords as Dataset with ‘x’, ‘y’, ‘z’ data_vars, maybe extras too;

extras can include ‘azimuth’ and ‘rcyl’.

Equivalent to self.grid(…).assign(azimuth=self.azimuth(…), rcyl=self.rcyl(…)),

if assigning azimuth and rcyl (by default, only assigned if 1D / 2D, respectively.)

center, cache, full, squeeze_x_close:

passed directly to self.grid(); see help(self.grid) for details.

azimuth: ‘if1D’ or bool

whether to include azimuth data var. (arctan(z,y) not y,x)

‘if1D’ –> only if 1D.

rcyl: ‘if2D’ or bool

whether to include rcyl data var. (cylindrical radius; sqrt(y^2 + z^2))

‘if2D’ –> only if 2D.

as_coords: bool

whether to set_coords for all data vars in result.

if False, data_vars will not be included in result.coords.

if True, will do result.set_coords(result.data_vars).

u_l: None or number

if provided, scale length coords (i.e. x, y, z, rcyl, but not azimuth) by u_l.

ijk_coords(*, center=True): return dict of {‘i’: i_coords, ‘j’: j_coords, ‘k’: k_coords}.

when self.rankfiles == self.rankfiles_full:

if center, i_coords is np.arange(Ni)

else, i_coords -0.5 + np.arange(1+Ni)

and, similar for j_coords and k_coords.

if centered:

If self.rankfiles isn’t full, will still assign the appropriate coords,

accounting for the non-fullness. This may lead to jumps in coords.

else (not centered):

rankfiles must be contiguous.

E.g. there are ni+1 grid points along i axis in each file,

and ri’th i=-1 is the same values as (ri+1)’th i=0

(which is why full size is 1+ni*Ri, not (ni+1)*Ri.)

This means if we wanted just ri=1 and ri=3,

result would need to be shape (ni+1)+(ni+1), not 1+2*ni,

which is not allowed by our other codes,

and would be very tricky to manage.

Workaround: use centered grid instead, for non-contiguous rankfiles.

property maintaining: alias to maintaining_attrs

maintaining_attrs(*attrs, **attrs_as_flags): returns context manager which restores attrs of self to their original values, upon exit.

E.g. maintaining_attrs(obj, ‘attr1’, ‘attr2’, attr3=True, attr4=False)

–> will restore upon exit, original values of obj.attr1, attr2, and attr3, but not attr4.

property nshape: (ni, nj, nk) shape of data var from each rankfile

open_file0(): context manager for opening file0, for checking data info like shape, dtype, steps, etc.

rankfile_isel(rii, rjj, rkk): return abspath of file at self.rankfiles.isel(ri=rii, rj=rjj, rk=rkk).item().

property rankfiles: xarray.DataArray of rank files currently being considered.

Might be the same as rankfiles_all, or might be a slice from it.

if set to None, getting rankfiles will get rankfiles_all instead.

[TODO] add more convenient behavior too, like setting to a dict of slices.

See help(type(self)) for details.

property rankfiles_all: full xarray.DataArray of rank files based on ftag in self.dirname.

See help(type(self)) for details.

rcyl(*, center=True, cache=True, full=False, squeeze=True, closeness_as='coords'): return cylindrical radius (sqrt(y^2 + z^2)) based on grid coordinates.

center, cache, and full all get passed to self.grid()

property steps: return list of all steps available (as strs).

Sorts steps in increasing order of step number, i.e. step 7 comes before step 10.

steps_info(): return dict of arrays telling ‘s’, ‘t’, ‘MJD’, and possibly ‘timestep’ too.

‘s’ is step number (as str); ‘t’ is time; ‘MJD’ is Modified Julian Date.

property using: alias to using_attrs

using_attrs(attrs_as_dict={}, _unset_sentinel=ATTR_UNSET, **attrs_and_values): returns context manager which sets attrs of obj upon entry; restores original values upon exit.

_unset_sentinel: any value, default ATTR_UNSET

upon entry, delete any attrs with value _unset_sentinel (compared via ‘is’).

E.g. using_attrs(obj, _unset_sentinel=None, x=None) –> del obj.x upon entry.