xarray
xarray: N-D labeled arrays and datasets in Python
Xarray was inspired by and borrows heavily from pandas, the popular data analysis package focused on labelled tabular data.
It is particularly tailored to working with netCDF files, which were the source of xarray’s data model, and integrates tightly with dask for parallel computing.
http://xarray.pydata.org/en/stable/
Official documents
Interpolating data
Examples
Xarray related projects
Xarray Tutorial
What's New
Makeing dataarray
code:python
data = xr.DataArray(np.random.randn(2, 3), dims=("x", "y"), coords={"x": 10, 20})
Tutorials
MetPy Mondays
#53 - XArray Basics : Unidata Developer's Blog
`#164 - MetPy 1.0 XArray Basics with the GFS : Unidata Developer's Blog
https://www.unidata.ucar.edu/blogs/developer/entry/metpy-mondays-164-metpy-1
#Siphon
`#165 - Using XArray to Subset in Lat/Lon and Map the GFS : Unidata Developer's Blog
https://www.unidata.ucar.edu/blogs/developer/entry/metpy-mondays-165-using-xarray
subsetting, plotting
`#166 - Getting Familiar with XArray : Unidata Developer's Blog
https://www.unidata.ucar.edu/blogs/developer/entry/metpy-mondays-166-getting-familiar
dataset, dataarray, dimension, coordinate
`#167 - Building with XArray from Scratch : Unidata Developer's Blog
https://www.unidata.ucar.edu/blogs/developer/entry/metpy-mondays-167-building-with
`#168 - Complex XArrays and Selecting : Unidata Developer's Blog
https://www.unidata.ucar.edu/blogs/developer/entry/metpy-mondays-168-complex-xarrays
Handling NetCDF Files using XArray for Absolute Beginners
https://towardsdatascience.com/handling-netcdf-files-using-xarray-for-absolute-beginners-111a8ab4463f
Siphon
Xarray Tutorial — NCAR-ESDS
https://ncar.github.io/esds/posts/xarray-tutorial/
Examples
PyVideo.org · Efficient Atmospheric Analogue Selection with Xarray and Dask | SciPy 2019 | Tyler Wixstrom
https://pyvideo.org/scipy-2019/efficient-atmospheric-analogue-selection-with-xarray-and-dask-scipy-2019-tyler-wixstrom.html
Tips
Area weighted mean
Compare weighted and unweighted mean temperature
code:python
weights = np.cos(np.deg2rad(air.lat))
air_weighted = air.weighted(weights)
weighted_mean = air_weighted.mean(("lon", "lat"))
Weighted array reductions
select beyond date line
python - xarray slicing across the antimeridian - Geographic Information Systems Stack Exchange
lazy
code:python
ds.sel(lon=(ds.lon < -80) | (ds.lon > 40))
not lazy
code:python
ds_rolled = ds.assign_coords(lon=(ds.lon % 360)).roll(lon=(ds.dims'lon' // 2))
Mask some region
code:python
da_masked = da.where(~((da.lon > 140) & (da.lat < 40)))
Convert –180…180 → 0…360
code:python
# ds is your xarray.Dataset (or DataArray)
lon_name = "lon" # change if your coordinate is called something else
# 1. Convert –180…180 → 0…360 with modulo arithmetic
ds = ds.assign_coords({lon_name: (dslon_name + 360) % 360})
# 2. Make sure the new longitudes are monotonically increasing
ds = ds.sortby(lon_name)
# 3. (Optional) move the 0 ° / 360 ° slice to the far end if you need continuity
ds = ds.roll({lon_name: -np.searchsorted(dslon_name, 0)}, roll_coords=True)
Rolling-window operators on irregular grids (eddy tracking, fronts)
Rolling operations can approximate spatial filters without explicit regridding.
code:python
# Approximate mesoscale smoothing (~5 grid points)
smooth = ds"zos".rolling(x=5, y=5, center=True).mean()
# Eddy kinetic energy anomaly proxy
eke_prime = (ds"u" - ds"u".rolling(time=30).mean())**2
Key insight: rolling operations are chunk-local → very efficient if chunks align with window size.
Region-aware computations using masks as first-class data
Instead of looping over regions, encode regions as a labeled mask and use broadcasting.
code:python
# region_mask: integer labels (e.g., 1=Atlantic, 2=Pacific, ...)
mask = ds"basin_mask"
# Compute basin means in one pass
basin_mean = ds"thetao".groupby(mask).mean(("x", "y"))
This scales well with Dask and avoids Python loops entirely.
Lagrangian-style analysis with xarray + vectorized indexing
You can emulate particle-following diagnostics (e.g., along trajectories) without leaving the Eulerian grid by combining xarray with vectorized .sel.
code:python
# Example: sample temperature along drifting floats
traj = xr.Dataset({
"lon": ("traj", lon_points),
"lat": ("traj", lat_points),
"time": ("traj", time_points),
})
sampled = ds"thetao".sel(
x=traj"lon",
y=traj"lat",
time=traj"time",
method="nearest"
)
This avoids building a full Lagrangian model for many diagnostic tasks.
On-the-fly vertical remapping (z → density / isopycnal space)
Rather than storing multiple coordinate systems, compute diagnostics directly in density space using lazy evaluation.
code:python
import xarray as xr
# Assume sigma0 already computed or available
sigma = ds"sigma0"
# Bin onto isopycnal layers (example using groupby_bins)
bins = xr.DataArray(np.arange(20, 30, 0.1), dims="sigma_bin")
theta_iso = (
ds"thetao"
.groupby_bins(sigma, bins)
.mean(dim="z") # collapse vertical dimension into density classes
)
Faster groupby reductions for “binned” ocean diagnostics (hint: use flox-backed paths)
A lot of ocean analysis is “reduce after grouping”: seasonal cycles, isotherm bins, latitude bands, density layers, etc. In recent xarray ecosystems, reductions increasingly route through faster groupby engines (often via flox when available), which can be dramatically faster than naïve groupby on dask arrays (less overhead, fewer tasks). (This is an ecosystem trend; exact behavior depends on versions.)
Pattern:
code:python
# Monthly climatology of mixed-layer temperature (example)
clim = ds"thetao".sel(z=0).groupby("time.month").mean(("time","x","y"))
anom = ds"thetao".sel(z=0).groupby("time.month") - clim
Practical note: if groupby is slow, it’s often a sign your chunking is fighting the grouping dimension
Cross-dataset alignment for coupled diagnostics
Combine model output with observational products (e.g., altimetry, Argo) without manual interpolation pipelines.
code:python
aligned = xr.align(model_ds"zos", obs_ds"adt", join="inner")
bias = aligned0 - aligned1
For more complex cases, pair with regridding but keep alignment explicit to avoid subtle mismatches.
Index engineering for faster subsetting (set_index / swap_dims)
Transform coordinates into indices that match query patterns.
code:pytyhon
# Example: build a multi-index for fast section extraction
ds = ds.set_index(points=("y", "x"))
subset = ds.sel(points=list_of_points)
This avoids repeated coordinate matching over large grids.
Multi-scale statistics with coarsen (same code, multiple effective resolutions)
code:python
t = ds"temp"
t2 = t.coarsen(y=2, x=2, boundary="trim").mean()
t4 = t.coarsen(y=4, x=4, boundary="trim").mean()
Why it’s useful: quickly assesses scale dependence (mesoscale vs submesoscale contributions) and accelerates figure prototyping by working on coarser views first.
Vectorize frontal / local-structure metrics with rolling().construct()
code:python
g = np.hypot(ds"sst".differentiate("x"), ds"sst".differentiate("y"))
w = g.rolling(x=5, y=5, center=True).construct({"x": "xw", "y": "yw"})
front = w.mean("xw", "yw")
Why it’s useful: turns moving windows into explicit dimensions, enabling loop-free local statistics (front strength, local variance/anisotropy) that scale well with Dask.
Budget closure pipelines with labeled alignment
Leverage xarray’s alignment rules to construct term-by-term budgets (heat, salt, KE) safely.
code:python
tendency = ds"thetao".differentiate("time")
adv = -(ds"u" * ds"thetao".differentiate("x") +
ds"v" * ds"thetao".differentiate("y"))
residual = tendency - adv # auto-aligned by coords
Key point: dimension/coordinate safety reduces silent errors common in manual NumPy workflows.
Multi-resolution analysis via coarsen + map_overlap
Instead of regridding, generate scale-dependent diagnostics using coarsening and halo-aware operations.
code:python
# Coarse-grained field (e.g., 4x4 averaging)
coarse = ds"u".coarsen(x=4, y=4, boundary="trim").mean()
# Gradient with overlap (avoids edge artifacts)
grad = ds"u".map_overlap(
lambda x: x.diff("x"),
depth={"x": 1},
boundary="reflect"
)
Useful for scale decomposition of KE / enstrophy without separate grids
Unify swath/track observations into a 1D “obs” axis (stack + interp) for easy comparisons
For Argo, ship tracks, or satellite swaths, converting to a 1D observation axis can simplify collocation and model–data comparisons.
code:python
import xarray as xr
# Example: collapse swath (y, x) to obs, then drop missing points
sw = obs_ds"sst".stack(obs=("y", "x")).dropna("obs")
# Collocate model (time, y, x) onto observation points
m_at_obs = model_ds"sst".interp(
time=sw"time",
y=sw"lat",
x=sw"lon",
)
Tip: if you have many observation points, process in time windows/batches (fits Dask chunking). For very large collocation jobs, consider precomputing nearest-neighbor indices to shift work from interpolation to indexed reads.
Hierarchical experiment management with DataTree: bundle ensembles / experiments / obs under one structure
Instead of juggling many separate Dataset objects, store them as a hierarchy and apply shared diagnostics (masks, derived variables, QC steps) uniformly.
code:python
import xarray as xr
from datatree import DataTree
tree = DataTree.from_dict({
"expA": dsA,
"expB": dsB,
"obs": dsObs,
})
# Conceptual example: apply a diagnostic across the subtree
def add_mld(node):
ds = node.ds
if ds is None or "sigma0" not in ds:
return node
mld = (ds"sigma0" - ds"sigma0".isel(z=0)).argmax("z") # toy example
node.ds = ds.assign(mld=mld)
return node
tree = tree.map_over_subtree(add_mld)
subpages
xarray time series
xarray.apply_ufunc
xarray netcdf
xarray zarr
xarray dask
Related packages
xESMF
xgcm
MetPy
xarray with MetPy Tutorial
https://unidata.github.io/MetPy/latest/tutorials/xarray_tutorial.html#sphx-glr-tutorials-xarray-tutorial-py
MetPy interpolation
regionmask: create masks of geographical regions
xagg: A package to aggregate gridded data in xarray to polygons in geopandas
xclim: Library of derived climate variables, ie climate indicators, based on xarray.
xreshaper: PyReshaper-like operation with Xarray
xarray-clim: wrapper functions for xarray to perform common tasks in analyzing gridded climate and weather data
xr-scipy: thin wrapper of scipy for xarray eco-system.
xskillscore: Metrics for verifying forecasts
xarray-leaflet: xarray extension for tiled map plotting
Xarray-spatial: Raster-based Spatial Analytics for Python
xtrude: an xarray extension for 3D terrain visualization
Xscale: Xscale a library of multi-dimensional signal processing tools using parallel computing
xmovie: A simple way of creating beautiful movies from xarray objects.
Xoak: Xoak is an Xarray extension that allows point-wise selection of irregular, n-dimensional data encoded in coordinates with an arbitrary number of dimensions.
xoa: intended to help reading and manipulating observed and simulated ocean data.
Xarray-Beam: Xarray-Beam is a Python library for building Apache Beam pipelines with Xarray datasets.
★N