xarray
xarray: N-D labeled arrays and datasets in Python
Xarray was inspired by and borrows heavily from pandas, the popular data analysis package focused on labelled tabular data. It is particularly tailored to working with netCDF files, which were the source of xarray’s data model, and integrates tightly with dask for parallel computing. Official documents
Makeing dataarray
code:python
data = xr.DataArray(np.random.randn(2, 3), dims=("x", "y"), coords={"x": 10, 20}) Tutorials
`#164 - MetPy 1.0 XArray Basics with the GFS : Unidata Developer's Blog `#165 - Using XArray to Subset in Lat/Lon and Map the GFS : Unidata Developer's Blog
subsetting, plotting
`#166 - Getting Familiar with XArray : Unidata Developer's Blog
dataset, dataarray, dimension, coordinate
`#167 - Building with XArray from Scratch : Unidata Developer's Blog
`#168 - Complex XArrays and Selecting : Unidata Developer's Blog
Handling NetCDF Files using XArray for Absolute Beginners
Xarray Tutorial — NCAR-ESDS
Examples
PyVideo.org · Efficient Atmospheric Analogue Selection with Xarray and Dask | SciPy 2019 | Tyler Wixstrom Tips
Area weighted mean
code:python
weights = np.cos(np.deg2rad(air.lat))
air_weighted = air.weighted(weights)
weighted_mean = air_weighted.mean(("lon", "lat"))
select beyond date line
lazy
code:python
ds.sel(lon=(ds.lon < -80) | (ds.lon > 40))
not lazy
code:python
ds_rolled = ds.assign_coords(lon=(ds.lon % 360)).roll(lon=(ds.dims'lon' // 2)) Mask some region
code:python
da_masked = da.where(~((da.lon > 140) & (da.lat < 40)))
Convert –180…180 → 0…360
code:python
# ds is your xarray.Dataset (or DataArray)
lon_name = "lon" # change if your coordinate is called something else
# 1. Convert –180…180 → 0…360 with modulo arithmetic
ds = ds.assign_coords({lon_name: (dslon_name + 360) % 360}) # 2. Make sure the new longitudes are monotonically increasing
ds = ds.sortby(lon_name)
# 3. (Optional) move the 0 ° / 360 ° slice to the far end if you need continuity
ds = ds.roll({lon_name: -np.searchsorted(dslon_name, 0)}, roll_coords=True) Rolling-window operators on irregular grids (eddy tracking, fronts)
Rolling operations can approximate spatial filters without explicit regridding.
code:python
# Approximate mesoscale smoothing (~5 grid points)
smooth = ds"zos".rolling(x=5, y=5, center=True).mean() # Eddy kinetic energy anomaly proxy
eke_prime = (ds"u" - ds"u".rolling(time=30).mean())**2 Key insight: rolling operations are chunk-local → very efficient if chunks align with window size.
Region-aware computations using masks as first-class data
Instead of looping over regions, encode regions as a labeled mask and use broadcasting.
code:python
# region_mask: integer labels (e.g., 1=Atlantic, 2=Pacific, ...)
# Compute basin means in one pass
basin_mean = ds"thetao".groupby(mask).mean(("x", "y")) This scales well with Dask and avoids Python loops entirely.
Lagrangian-style analysis with xarray + vectorized indexing
You can emulate particle-following diagnostics (e.g., along trajectories) without leaving the Eulerian grid by combining xarray with vectorized .sel.
code:python
# Example: sample temperature along drifting floats
traj = xr.Dataset({
"lon": ("traj", lon_points),
"lat": ("traj", lat_points),
"time": ("traj", time_points),
})
method="nearest"
)
This avoids building a full Lagrangian model for many diagnostic tasks.
On-the-fly vertical remapping (z → density / isopycnal space)
Rather than storing multiple coordinate systems, compute diagnostics directly in density space using lazy evaluation.
code:python
import xarray as xr
# Assume sigma0 already computed or available
# Bin onto isopycnal layers (example using groupby_bins)
bins = xr.DataArray(np.arange(20, 30, 0.1), dims="sigma_bin")
theta_iso = (
.groupby_bins(sigma, bins)
.mean(dim="z") # collapse vertical dimension into density classes
)
Faster groupby reductions for “binned” ocean diagnostics (hint: use flox-backed paths)
A lot of ocean analysis is “reduce after grouping”: seasonal cycles, isotherm bins, latitude bands, density layers, etc. In recent xarray ecosystems, reductions increasingly route through faster groupby engines (often via flox when available), which can be dramatically faster than naïve groupby on dask arrays (less overhead, fewer tasks). (This is an ecosystem trend; exact behavior depends on versions.)
Pattern:
code:python
# Monthly climatology of mixed-layer temperature (example)
clim = ds"thetao".sel(z=0).groupby("time.month").mean(("time","x","y")) anom = ds"thetao".sel(z=0).groupby("time.month") - clim Practical note: if groupby is slow, it’s often a sign your chunking is fighting the grouping dimension
Cross-dataset alignment for coupled diagnostics
Combine model output with observational products (e.g., altimetry, Argo) without manual interpolation pipelines.
code:python
aligned = xr.align(model_ds"zos", obs_ds"adt", join="inner") bias = aligned0 - aligned1 For more complex cases, pair with regridding but keep alignment explicit to avoid subtle mismatches.
Index engineering for faster subsetting (set_index / swap_dims)
Transform coordinates into indices that match query patterns.
code:pytyhon
# Example: build a multi-index for fast section extraction
ds = ds.set_index(points=("y", "x"))
subset = ds.sel(points=list_of_points)
This avoids repeated coordinate matching over large grids.
Multi-scale statistics with coarsen (same code, multiple effective resolutions)
code:python
t2 = t.coarsen(y=2, x=2, boundary="trim").mean()
t4 = t.coarsen(y=4, x=4, boundary="trim").mean()
Why it’s useful: quickly assesses scale dependence (mesoscale vs submesoscale contributions) and accelerates figure prototyping by working on coarser views first.
Vectorize frontal / local-structure metrics with rolling().construct()
code:python
g = np.hypot(ds"sst".differentiate("x"), ds"sst".differentiate("y")) w = g.rolling(x=5, y=5, center=True).construct({"x": "xw", "y": "yw"})
Why it’s useful: turns moving windows into explicit dimensions, enabling loop-free local statistics (front strength, local variance/anisotropy) that scale well with Dask.
Budget closure pipelines with labeled alignment
Leverage xarray’s alignment rules to construct term-by-term budgets (heat, salt, KE) safely.
code:python
tendency = ds"thetao".differentiate("time") residual = tendency - adv # auto-aligned by coords
Key point: dimension/coordinate safety reduces silent errors common in manual NumPy workflows.
Multi-resolution analysis via coarsen + map_overlap
Instead of regridding, generate scale-dependent diagnostics using coarsening and halo-aware operations.
code:python
# Coarse-grained field (e.g., 4x4 averaging)
coarse = ds"u".coarsen(x=4, y=4, boundary="trim").mean() # Gradient with overlap (avoids edge artifacts)
grad = ds"u".map_overlap( lambda x: x.diff("x"),
depth={"x": 1},
boundary="reflect"
)
Useful for scale decomposition of KE / enstrophy without separate grids
Unify swath/track observations into a 1D “obs” axis (stack + interp) for easy comparisons
For Argo, ship tracks, or satellite swaths, converting to a 1D observation axis can simplify collocation and model–data comparisons.
code:python
import xarray as xr
# Example: collapse swath (y, x) to obs, then drop missing points
sw = obs_ds"sst".stack(obs=("y", "x")).dropna("obs")
# Collocate model (time, y, x) onto observation points
m_at_obs = model_ds"sst".interp( )
Tip: if you have many observation points, process in time windows/batches (fits Dask chunking). For very large collocation jobs, consider precomputing nearest-neighbor indices to shift work from interpolation to indexed reads.
Hierarchical experiment management with DataTree: bundle ensembles / experiments / obs under one structure
Instead of juggling many separate Dataset objects, store them as a hierarchy and apply shared diagnostics (masks, derived variables, QC steps) uniformly.
code:python
import xarray as xr
from datatree import DataTree
tree = DataTree.from_dict({
"expA": dsA,
"expB": dsB,
"obs": dsObs,
})
# Conceptual example: apply a diagnostic across the subtree
def add_mld(node):
ds = node.ds
if ds is None or "sigma0" not in ds:
return node
node.ds = ds.assign(mld=mld)
return node
tree = tree.map_over_subtree(add_mld)
subpages
Related packages
xarray with MetPy Tutorial
xagg: A package to aggregate gridded data in xarray to polygons in geopandas xclim: Library of derived climate variables, ie climate indicators, based on xarray. xarray-clim: wrapper functions for xarray to perform common tasks in analyzing gridded climate and weather data xr-scipy: thin wrapper of scipy for xarray eco-system. xtrude: an xarray extension for 3D terrain visualization Xscale: Xscale a library of multi-dimensional signal processing tools using parallel computing xmovie: A simple way of creating beautiful movies from xarray objects. Xoak: Xoak is an Xarray extension that allows point-wise selection of irregular, n-dimensional data encoded in coordinates with an arbitrary number of dimensions. xoa: intended to help reading and manipulating observed and simulated ocean data. Xarray-Beam: Xarray-Beam is a Python library for building Apache Beam pipelines with Xarray datasets. ★N