Best tool for the job
Best tool for the job#
Rather than reinvent a tool, it’s better to use an existing one and contribute to its improvement. But it has to be a good one to begin with! Here are my favorites:
Data management:
large project (>3Gb):
datajoint(or another DAG + *sql)small project (<3Gb):
NetCDFthrough Xarray
Array manipulation:
First choice: Xarray (
df.to_xarray())Second choice: Pandas longform (
an_xarray.to_pandas())
Speedup:
numbafor single-threaded performancejoblibanddatajointfor parallelization
Visualization, must have faceting:
Quick:
an_xarray.plot(col='column_dimension')Powerful+interactive:
Holoviews+datashaderQuick and powerful:
df.hvplot(col='column_dimension')Beautiful: altair (+altair_save to reduce file size)
Estimation, Bayes, Monte Carlo, bespoke ML:
numpyrofor speed and future potentialstanfor examples and debuggingarvizto use either of the above interchangeably. Bonus: Xarray!
Sharing figures:
Jupyter-bookGitlab pagesnbconvertto htmlExcited for
papermill