Interpolating climate data with irregular measurement intervals in Python with pandas and traces -

February 15, 2014

consider series of data known coordinate (in case, paleoclimate data ages in thousands of years before present, or "ka"). many reasons, time coordinate these data never evenly spaced. analyses, critical compare data on same time coordinate.

what'd i'd love simple code takes unevenly spaced data , linearly interpolates them spacing, spacing interval defined user. mathematically there @ least 2 ways of doing this:

take rate of change between 2 points , using rate map values @ intermediate points;
do distance-weighted average, closer time point more heavily weighted. should same answer either way.

columns through c paleoclimate data uneven spacing. columns e through g same data, evenly spaced every 5 ka. want take data in columns through c , correct interpolation in columns e through g subject ka parameter set.

once basic code in place, it'd nice add few bells , whistles. extrapolation function time points outside domain helpful. example, have interpolated value 400 ka, though not have data times straddling 400 ka.

i have started pandas organizing data , post pointed me towards traces. still working on appreciate insight.

a (ka)     b       c 401.3      3.49    0.34 403.2      3.95    0.25 407.2      3.74    1.13 409.2      3.71    1.03 411.2      3.73    1.05 413.1      3.58    -0.08 415.1      4.4     0.46  ka = 5  e (ka)     f       g 400        3.18    0.40 405        3.86    0.65 410        3.72    1.04 415        4.36    0.43

included functions , handling of extrapolation

def get_line(s):     x0 = s.first_valid_index()     p0 = s.index.get_loc(x0)     p1 = p0 + 1     x1 = s.index[p1]     y0, y1 = s.at[x0], s.at[x1]     m = (y1 - y0) / (x1 - x0)     f = lambda x: (x - x0) * m + y0     return s.index[s.isnull()].to_series().map(f)  def interpolate(df, nidx):     ridx = df.index.union(nidx)     d = df.reindex(ridx).interpolate('index')     return d.fillna(d.apply(get_line)).loc[nidx]  print(interpolate(df.set_index('a (ka)'), [400, 405, 410, 420]).round(2))          b     c 400  3.18  0.40 405  3.86  0.65 410  3.72  1.04 420  4.40  0.46

answer interpolation

finding calculation @ ka 400 not interpolation... that's extrapolation. @ ka 405, interpolation takes 2 points around and... well... interpolates :-)

plan

set index 'a (ka)'
create sub index points care about
reindex union of old index , sub index. nan placed in new spots
interpolate fill in nan. make sure use method='index' correctly calculate relative index
slice out sub index

df = df.set_index('a (ka)') nidx = pd.rangeindex(400, 420, 5) ridx = df.index.union(nidx) df.reindex(ridx).interpolate('index').reindex(nidx)            b      c 400     nan    nan 405  3.8555  0.646 410  3.7180  1.038 415  4.3590  0.433

note @ index 400, still have nan.

Search This Blog

MOno

Interpolating climate data with irregular measurement intervals in Python with pandas and traces -

Comments

Post a Comment

Popular posts from this blog

'hasOwnProperty' in javascript -

python - ValueError: No axis named 1 for object type <class 'pandas.core.series.Series'> -

java - How to provide dependency injections in Eclipse RCP 3.x? -