Shifting values in datetimeindex of pandas dataframe -
i have df datetimeindex of 30 minute intervals on long period (> 1 year), >17520 rows. reasons related daylight savings, 2 of index values repeated in index , 2 values missing. duplicated values are:
in[1]: df[df.index.duplicated('first')] out[2]: b c timestamp 2012-10-07 01:00:00 nan nan nan 2012-10-07 01:30:00 nan nan nan 2013-10-06 01:00:00 nan nan nan 2013-10-06 01:30:00 nan nan nan
i want change these missing values, 1 hour later:
in[3]: df[df.index.duplicated('first')].shift(1,freq="h") out[4]: b c timestamp 2012-10-07 02:00:00 nan nan nan 2012-10-07 02:30:00 nan nan nan 2013-10-06 02:00:00 nan nan nan 2013-10-06 02:30:00 nan nan nan
but doesn't change index:
df[df.index.duplicated('first')] = df[df.index.duplicated('first')].shift(1,freq="h")
what would?
i think need map duplicated index
rename
dict
:
print (df) b c timestamp 2013-10-06 01:00:00 1 nan nan 2013-10-06 01:30:00 2 nan nan 2013-10-06 01:00:00 3 nan nan 2013-10-06 01:30:00 4 nan nan 2012-10-08 01:30:00 5 nan nan 2013-10-10 01:00:00 6 nan nan df1 = df[df.index.duplicated('first')] d = dict(zip(df1.index, df1.shift(1,freq="h").index)) print (d) {timestamp('2013-10-06 01:00:00'): timestamp('2013-10-06 02:00:00'), timestamp('2013-10-06 01:30:00'): timestamp('2013-10-06 02:30:00')} df = df.rename(index=d) print (df) b c timestamp 2013-10-06 02:00:00 1 nan nan 2013-10-06 02:30:00 2 nan nan 2013-10-06 02:00:00 3 nan nan 2013-10-06 02:30:00 4 nan nan 2012-10-08 01:30:00 5 nan nan 2013-10-10 01:00:00 6 nan nan
similar solution:
idx = df.index[df.index.duplicated('first')] d = dict(zip(idx, idx.to_series().shift(freq="h").index)) print (d) {timestamp('2013-10-06 01:00:00'): timestamp('2013-10-06 02:00:00'), timestamp('2013-10-06 01:30:00'): timestamp('2013-10-06 02:30:00')} df = df.rename(index=d) print (df) b c timestamp 2013-10-06 02:00:00 1 nan nan 2013-10-06 02:30:00 2 nan nan 2013-10-06 02:00:00 3 nan nan 2013-10-06 02:30:00 4 nan nan 2012-10-08 01:30:00 5 nan nan 2013-10-10 01:00:00 6 nan nan 2013-10-06 02:30:00 8 nan nan 2012-10-08 01:30:00 9 nan nan 2013-10-10 01:00:00 10 nan nan
idx = df.index[df.index.duplicated('first')] s = idx.to_series().shift(freq="h") #swap index values in series d = pd.series(s.index.values, index = s.values).to_dict() print (d) {timestamp('2013-10-06 01:00:00'): timestamp('2013-10-06 02:00:00'), timestamp('2013-10-06 01:30:00'): timestamp('2013-10-06 02:30:00')} df = df.rename(index=d) print (df) b c timestamp 2013-10-06 02:00:00 1 nan nan 2013-10-06 02:30:00 2 nan nan 2013-10-06 02:00:00 3 nan nan 2013-10-06 02:30:00 4 nan nan 2012-10-08 01:30:00 5 nan nan 2013-10-10 01:00:00 6 nan nan
edit1:
you need add timedeltas
created cumcount
to_timedelta
original index.
delta = pd.to_timedelta(df.groupby(level=0).cumcount(), unit='h') print (delta) timestamp 2013-10-06 01:00:00 00:00:00 2013-10-06 01:30:00 00:00:00 2013-10-06 01:00:00 01:00:00 2013-10-06 01:30:00 01:00:00 2012-10-08 01:30:00 00:00:00 2013-10-10 01:00:00 00:00:00 dtype: timedelta64[ns] df.index = df.index + delta print (df) b c 2013-10-06 01:00:00 1 nan nan 2013-10-06 01:30:00 2 nan nan 2013-10-06 02:00:00 3 nan nan 2013-10-06 02:30:00 4 nan nan 2012-10-08 01:30:00 5 nan nan 2013-10-10 01:00:00 6 nan nan
Comments
Post a Comment