xarray.apply ufunc
Official documents
input_core_dims
Core dimensions are automatically moved to the last axes of input variables before applying func
Tips
Rapidly Prototyping High-Performance Meteorological Data Systems Using Xarray and Numba: 99th American Meteorological Society Annual Meeting Pangeo use case - when dask.array and xarray.apply_ufunc are not the answer
Xarray with Dask Arrays — Custom workflows and automatic parallelization
python xarray - dask performance apply along axis - Stack Overflow
python - What's the difference between dask=parallelized and dask=allowed in xarray's apply_ufunc? - Stack Overflow
Chunking and performance
A good rule of thumb is to create arrays with a minimum chunksize of at least one million elements (e.g., a 1000x1000 matrix).
apply_ufunc(dask='parallelized'): mix of chunked and non-chunked *args results in shape mismatch · Issue '#2817 · pydata/xarray · GitHub
Examples
heat content
code:python
import numpy as np
import xarray as xr
def ohc(temp, dz):
# strictly speaking, you need ρ, Cp
return np.nansum(temp * dz, axis=-1)
OHC = xr.apply_ufunc(
ohc,
input_core_dims="z"], ["z", # good for dask parallel
output_core_dims=[[]],
dask="parallelized",
)
Use apply_ufunc (with dask="parallelized") to make custom ocean “kernels” scale with Dask
This is a clean way to parallelize domain-specific operations (layer-thickness weighting, vertical integrals, transformations) without falling back to Python loops or .values.
code:python
import numpy as np
import xarray as xr
def thickness_weighted_mean(var, dz, axis=-1):
w = dz / np.nansum(dz, axis=axis, keepdims=True)
return np.nansum(var * w, axis=axis)
tw = xr.apply_ufunc(
thickness_weighted_mean,
input_core_dims="z"], ["z",
output_core_dims=[[]],
vectorize=True,
dask="parallelized",
output_dtypes=[ds"temp".dtype], )
Tip: being explicit about input_core_dims (which axis your function reduces over) helps xarray build an efficient parallel plan.
code:python
import xarray as xr
import numpy as np
def mld_from_threshold(temp, depth, threshold=0.2):
t0 = temp0
mask = np.abs(temp - t0None) > threshold
idx = mask.argmax(axis=-1)
mld = xr.apply_ufunc(
mld_from_threshold,
ds.theta,
ds.depth,
input_core_dims="depth"], ["depth",
output_core_dims=[[]],
vectorize=True,
dask="parallelized",
)
code:python
sigma0 = xr.apply_ufunc(gsw.sigma0, ds.salt, ds.thetaoga,
See also