Xarray-Beam
Xarray-Beam is a Python library for building Apache Beam pipelines with Xarray datasets. The project aims to facilitate data transformations and analysis on large-scale multi-dimensional labeled arrays, such as:
* Ad-hoc computation on Xarray data, by dividing a xarray.Dataset into many smaller pieces ("chunks").
* Adjusting array chunks, using the Rechunker algorithm. * Ingesting large, multi-dimensional array datasets into an analysis-ready, cloud-optimized format, namely Zarr (see also Pangeo Forge). Calculating statistics (e.g., "climatology") across distributed datasets with arbitrary groups.
For more about our approach and how to get started, read the documentation!