xarray_merge_from

PlasmaCalcs.tools.xarray_tools.xarray_misc.xarray_merge_from(array, other_arrays, *, ngroup=UNSET, verbose=True, **kw_xarray_merge)

xr.merge() this xarray object (DataArray or Dataset) with other xarray objects.

Internally, uses xarray_merge which might be a bit more efficient than xr.merge().
xarray_merge docs copied below for convenience:
———————————————–
xr.merge() these xarray objects (probably DataArrays or Datasets),
but maybe slightly more efficiently than just using xr.merge() directly.
Also, has slightly more convenient default behaviors:
- by default, allow name=None on inputs (treat all inputs with None name as same name)
- by default, if all inputs are DataArray with same name, output DataArray not Dataset.
ngroup: int>=2, None, or UNSET
[EFF] max number of arrays per xr.merge() call.
(Final results should be unaffected by ngroup, just affects internal strategy.)
UNSET –> use DEFAULTS.XARRAY_MERGE_NGROUP (default: 16).
None –> merge all arrays in a single xr.merge() call.
int>=2 –> merge arrays in groups of size ngroup, then merge the merged groups.
(This can be much faster than ngroup=None; see xarray_merge for details.)
verbose: bool or int
whether to print progress updates. (Ignored if N <= ngroup.)
0: never print progress updates
1: print updates every DEFAULTS.PROGRESS_UPDATES_PRINT_FREQ seconds (default: 2)
2: print updates every file.
missing_name_ok: bool
whether to allow name=None as a valid name for input arrays.
True –> treat all inputs with name=None as having the same name (None).
(if result is a Dataset this can lead to a key in result being None.)
to_ds: None
whether result should be a Dataset.
None –> DataArray if and only if all inputs are DataArrays with same name.
True –> always return Dataset.
False –> always return a DataArray (crash if any inputs are not DataArrays with same name).
_updater: None or ProgressUpdater
if provided, use this (instead of a new ProgressUpdater) to print updates, ignoring verbose.
— Timing notes: —
Tt seems like merging more than ~16 arrays at once can cause significant slowdowns,
when compared to merging groups of 16, then merging those results.
Tests with xarray 2024.7.0, Python 3.9.13, with subsets of 256 arrays totaling roughly 2 GB:
64 arrays with N=None –> ~42 seconds | N=2 –> ~16s | N=4, 8, 16 –> ~12s
128 arrays with N=None –> (didn’t test it) | N=16 –> ~30 seconds
256 arrays with N=None –> crashed due to memory errors | N=16 –> ~102 seconds