Best tool for the job
Best tool for the job#
Rather than reinvent a tool, it’s better to use an existing one and contribute to its improvement. But it has to be a good one to begin with! Here are my favorites:
Data management:
large project (>3Gb):
datajoint
(or another DAG + *sql)small project (<3Gb):
NetCDF
through Xarray
Array manipulation:
First choice: Xarray (
df.to_xarray()
)Second choice: Pandas longform (
an_xarray.to_pandas()
)
Speedup:
numba
for single-threaded performancejoblib
anddatajoint
for parallelization
Visualization, must have faceting:
Quick:
an_xarray.plot(col='column_dimension')
Powerful+interactive:
Holoviews
+datashader
Quick and powerful:
df.hvplot(col='column_dimension')
Beautiful: altair (+altair_save to reduce file size)
Estimation, Bayes, Monte Carlo, bespoke ML:
numpyro
for speed and future potentialstan
for examples and debuggingarviz
to use either of the above interchangeably. Bonus: Xarray!
Sharing figures:
Jupyter-book
Gitlab pages
nbconvert
to htmlExcited for
papermill