PlasmaCalcs.dimensions.subsampling.Subsampler

class PlasmaCalcs.dimensions.subsampling.Subsampler(target, info)

Bases: object

interface to help with actually doing the subsampling, of a Subsamplable object.
target: Subsamplable
the Subsamplable object which self will subsample when called.
info: SubsamplingInfo, str, or json-like dict
str –> path to load subsampling info from.
can be filename or directory. see SubsamplingInfo.load for details.
dict –> subsampling info in json-like format, see SubsamplingInfo for details.
Below, “snap src” refers to the hashable object that identifies a snapshot.
Its exact behavior is not defined here and may vary across different Subsamplable subclasses.
E.g. it may be a filepath, or a Snap object.
The only expectation here is that each src must be hashable.
Expects Subsamplable target object to have:
- rawvars_loadable(src): list of all directly loadable vars within snap src.
- rawvar_load(var, src): load a single raw var from snap src.
OR override rawvar_load_and_subsample instead.
- rawvar_save(var, data, dst): save a single raw var to snap dst.
- target.snapdir: should tell abspath of folder with all snapshots.
- target.snaps.file_path(self): should tell array of snapshot filepaths.
- target.subsampling_info_cls: SubsamplingInfo subclass to use for this target.
Results will be saved to subsampling result paths; see SubsamplingResultPathManager for help.
__init__(target, info)

Methods

__init__(target, info)

dst_from_src(src)

path_manager(*[, make, exist_ok])

srcs()

srcs_and_src2vars()

srcs_byvars()

srcs_forall()

subsample(*[, subsampled_data_exist_ok])

subsample_snap(src, *[, dst, dst_exists_ok, ...])

dst_from_src(src)
return filepath for where to save subsampled data for this src.
Details determined by self.path_manager() and self.target.snap_src_to_filepath().
path_manager(*, make=True, exist_ok=True)
returns instance of path_manager_cls, based on self.target.snapdir.
Use this when creating subsampled data.

make: whether to create the ‘subsampling_result’ folder if it doesn’t exist yet. exist_ok: if False and any folders to make already exist, crash instead.

path_manager_cls

alias of SubsamplingResultPathManager

srcs()
returns list of all snap srcs in self.target (before applying any subsampling)
srcs_and_src2vars()
returns (srcs, src2vars), where:
srcs = [list of all snap srcs to keep]
src2vars = dict of {src: [vars to keep for src]} if relevant.
if keeping all vars from src, src2vars might exclude src key.
(if keeping all vars from all srcs, src2vars may be an empty dict.)
srcs_byvars()
dict of {var: [srcs in which to keep data for var]}
Subsampling will be relative to full list of snaps;
raise SubsamplingError if would need to write var at a snap where it was previously missing.
E.g. crash if var5 appears only every 5 snaps but info says slice to every 7 snaps for var5.
Any var with no snap subsampling indicated will be kept in all srcs where it was already.
srcs_forall()
list of all srcs, subsampled by self.info[‘forall’][‘snaps’].
subsample(*, subsampled_data_exist_ok=False)
apply subsampling to all data as appropriate (determined by self.info)
returns self.path_manager() instance with relevant paths.
(results will be saved to paths determined by SubsamplingResultPathManager;
probably ‘subsampling_result’ at the same directory-level as target.snapdir.)
Never overwrites any of the pre-subsampling data.
By default, refuses to overwrite any existing data at all.
subsampled_data_exist_ok: bool
whether it is okay for subsampled data file(s) to already exist before saving any data. Default False.
(Doesn’t interact with any subsampling_info files.)
subsample_snap(src, *, dst=None, dst_exists_ok=False, include_vars=None)
apply subsampling to all appropriate data vars in this snap.
returns abspath to dst where data was saved (or, None if no data was saved).
src: any hashable object
snap src to subsample.
dst: None or str
filepath for where to save subsampled data.
None –> self.dst_from_src(src)
include_vars: None or list of strs
list of vars to include in result.
None –> use self.target.rawvars_loadable(src)
(and do not apply any ‘snaps’ subsampling from subsampling_info)
if empty list, do not save any vars, and return None instead of path.