Python is increasingly becoming the lingua-franca of the machine learning community. Venerable packages like scikit-learn, tensorflow, and pytorch mainly target python users. On the other hand, climate models are typically written in Fortran, for better or worse. How then should we move our python-based machine learning components into the Fortran model? The typical data-sciency answer to this would involve wrapping machine learning method with an HTTP API, perhaps using Flask, but HTTP is probably too slow to use practically within a tightly coupled system like a climate model.
I use a lot of different software tools in the course of my research, so I need a way to keep track of them in case something destroys my computer. That is the purpose of this post.
Python and python related tools Anaconda python distribution (version 3.5) Snakemake for managing scientific workflows Jupyter tools jupyter nteract: for running notebooks on desktop nbstripout: for removing output from ipython notebooks before commiting to a git repository.
I am currently working with model output from the cloud-resolving model SAM. The full-resolution datasets are often too-large to load into memory because I have over 16000 horizontal grid points, and it is more convenient to work with coarse-grained data. While, I can just boot up python and manually averaged the data, this is pretty unwieldy and there are many packages which already do this.
This article contains a nice overview of the different kinds of regridding methods.
At work, my datafiles are split accross several different machines with different filesystems. These are
My desktop my laptop NYU HPC NYU Abu Dhabi HPC In this post, I outline a strategy for archiving my data, and tracking where what is on different servers. My previous strategy, which I used on a few projects, was to archive the data individually within each project. This has the advantage of making the data shareable, but does not scale well when multiple different projects share a particular data source.
This tutorial describes the spline basis and smoothing techniques which are based on splines.
using Plots pyplot() Here is a simple function and noisy data. The object of this tutorial is to estimate the sinuosoidal curve given the noisy observations.
x = linspace(0, 2*pi, 100)[1:end-1] y = sin(4*x) yn = y + randn(size(y)) *.2 plot(x,y) scatter!(x,yn) To do this we will use cubic splines. The spline basis with knots at $ \xi_i $ is given by $ (x-\xi_i)^3_i $ .