Interpolating climate data with irregular measurement intervals in Python with pandas and traces -


consider series of data known coordinate (in case, paleoclimate data ages in thousands of years before present, or "ka"). many reasons, time coordinate these data never evenly spaced. analyses, critical compare data on same time coordinate.

what'd i'd love simple code takes unevenly spaced data , linearly interpolates them spacing, spacing interval defined user. mathematically there @ least 2 ways of doing this:

  1. take rate of change between 2 points , using rate map values @ intermediate points;
  2. do distance-weighted average, closer time point more heavily weighted. should same answer either way.

columns through c paleoclimate data uneven spacing. columns e through g same data, evenly spaced every 5 ka. want take data in columns through c , correct interpolation in columns e through g subject ka parameter set.

once basic code in place, it'd nice add few bells , whistles. extrapolation function time points outside domain helpful. example, have interpolated value 400 ka, though not have data times straddling 400 ka.

i have started pandas organizing data , post pointed me towards traces. still working on appreciate insight.

a (ka)     b       c 401.3      3.49    0.34 403.2      3.95    0.25 407.2      3.74    1.13 409.2      3.71    1.03 411.2      3.73    1.05 413.1      3.58    -0.08 415.1      4.4     0.46  ka = 5  e (ka)     f       g 400        3.18    0.40 405        3.86    0.65 410        3.72    1.04 415        4.36    0.43 

included functions , handling of extrapolation

def get_line(s):     x0 = s.first_valid_index()     p0 = s.index.get_loc(x0)     p1 = p0 + 1     x1 = s.index[p1]     y0, y1 = s.at[x0], s.at[x1]     m = (y1 - y0) / (x1 - x0)     f = lambda x: (x - x0) * m + y0     return s.index[s.isnull()].to_series().map(f)  def interpolate(df, nidx):     ridx = df.index.union(nidx)     d = df.reindex(ridx).interpolate('index')     return d.fillna(d.apply(get_line)).loc[nidx]  print(interpolate(df.set_index('a (ka)'), [400, 405, 410, 420]).round(2))          b     c 400  3.18  0.40 405  3.86  0.65 410  3.72  1.04 420  4.40  0.46 

answer interpolation

finding calculation @ ka 400 not interpolation... that's extrapolation. @ ka 405, interpolation takes 2 points around and... well... interpolates :-)

plan

  • set index 'a (ka)'
  • create sub index points care about
  • reindex union of old index , sub index. nan placed in new spots
  • interpolate fill in nan. make sure use method='index' correctly calculate relative index
  • slice out sub index

df = df.set_index('a (ka)') nidx = pd.rangeindex(400, 420, 5) ridx = df.index.union(nidx) df.reindex(ridx).interpolate('index').reindex(nidx)            b      c 400     nan    nan 405  3.8555  0.646 410  3.7180  1.038 415  4.3590  0.433 

note @ index 400, still have nan.


Comments

Popular posts from this blog

Command prompt result in label. Python 2.7 -

javascript - How do I use URL parameters to change link href on page? -

amazon web services - AWS Route53 Trying To Get Site To Resolve To www -