Interpolating climate data with irregular measurement intervals in Python with pandas and traces -
consider series of data known coordinate (in case, paleoclimate data ages in thousands of years before present, or "ka"). many reasons, time coordinate these data never evenly spaced. analyses, critical compare data on same time coordinate.
what'd i'd love simple code takes unevenly spaced data , linearly interpolates them spacing, spacing interval defined user. mathematically there @ least 2 ways of doing this:
- take rate of change between 2 points , using rate map values @ intermediate points;
- do distance-weighted average, closer time point more heavily weighted. should same answer either way.
columns through c paleoclimate data uneven spacing. columns e through g same data, evenly spaced every 5 ka. want take data in columns through c , correct interpolation in columns e through g subject ka parameter set.
once basic code in place, it'd nice add few bells , whistles. extrapolation function time points outside domain helpful. example, have interpolated value 400 ka, though not have data times straddling 400 ka.
i have started pandas organizing data , post pointed me towards traces. still working on appreciate insight.
a (ka) b c 401.3 3.49 0.34 403.2 3.95 0.25 407.2 3.74 1.13 409.2 3.71 1.03 411.2 3.73 1.05 413.1 3.58 -0.08 415.1 4.4 0.46 ka = 5 e (ka) f g 400 3.18 0.40 405 3.86 0.65 410 3.72 1.04 415 4.36 0.43
included functions , handling of extrapolation
def get_line(s): x0 = s.first_valid_index() p0 = s.index.get_loc(x0) p1 = p0 + 1 x1 = s.index[p1] y0, y1 = s.at[x0], s.at[x1] m = (y1 - y0) / (x1 - x0) f = lambda x: (x - x0) * m + y0 return s.index[s.isnull()].to_series().map(f) def interpolate(df, nidx): ridx = df.index.union(nidx) d = df.reindex(ridx).interpolate('index') return d.fillna(d.apply(get_line)).loc[nidx] print(interpolate(df.set_index('a (ka)'), [400, 405, 410, 420]).round(2)) b c 400 3.18 0.40 405 3.86 0.65 410 3.72 1.04 420 4.40 0.46
answer interpolation
finding calculation @ ka 400
not interpolation... that's extrapolation. @ ka 405
, interpolation takes 2 points around and... well... interpolates :-)
plan
- set index
'a (ka)'
- create sub index points care about
reindex
union of old index , sub index.nan
placed in new spotsinterpolate
fill innan
. make sure usemethod='index'
correctly calculate relative index- slice out sub index
df = df.set_index('a (ka)') nidx = pd.rangeindex(400, 420, 5) ridx = df.index.union(nidx) df.reindex(ridx).interpolate('index').reindex(nidx) b c 400 nan nan 405 3.8555 0.646 410 3.7180 1.038 415 4.3590 0.433
note @ index 400
, still have nan
.
Comments
Post a Comment