dask
Dask provides advanced parallelism for analytics, enabling performance at scale for the tools you love
Documentation
Embarrassingly parallel Workloads — Dask Examples documentation
Best Practices — Dask documentation
If you have a machine with 100 GB and 10 cores, then you might want to choose chunks in the 1GB range. You have space for ten chunks per core which gives Dask a healthy margin, without having tasks that are too small
Array Best Practices — Dask documentation
Select a good chunk size
While optimal sizes and shapes are highly problem specific, it is rare to see chunk sizes below 100 MB in size. If you are dealing with float64 data then this is around (4000, 4000) in size for a 2D array or (100, 400, 400) for a 3D array.
DataFrame Best Practices — Dask documentation
Tutorial
Dask: Introduction - YouTube
Dask Live by Coiled - YouTube
2021/10/07
Tips
Data Pre-Processing in Python: How I learned to love parallelized applies with Dask and Numba python - How to map a column with dask - Stack Overflow
時間のかかる前処理をDaskで高速化 - ぴよぴよ.py
Best practices to go from 1000s of netcdf files to analyses on a HPC cluster? - HPC - Pangeo
Examples
CDAT/dask-cdms: cdms using dask cluster weather data across a cluster using NumPy in parallel with dask.array
python - Dask Read Data from Binary File - Stack Overflow
Asynchronous Optimization Algorithms with Dask
dask with matplotlib · GitHub
Pandas with Dask, For an Ultra-Fast Notebook - Towards Data Science
How to Convert a pandas Dataframe into a Dask Dataframe - YouTube
Tools
Dask JupyterLab Extension
Subpages
paid tutorial
Parallel Computing with Dask | DataCamp