How to concat and transpose tow tables in python - python
I do not know why my code is not working, I want to transpose and concat tow tables in python.
my code:
import numpy as np
import pandas as pd
np.random.seed(100)
df = pd.DataFrame({'TR':np.arange(1, 6).repeat(5), 'A': np.random.randint(1, 100,25), 'B': np.random.randint(50, 100,25), 'C': np.random.randint(50, 1000,25), 'D': np.random.randint(5, 100,25) })
table = df.groupby('TR').mean().round(decimals=1)
table2 = df.drop(['TR'], axis=1).sem().round(decimals=1)
table2 = table2.T
pd.concat([table, table2])
The output should be:
TR A B C D
1 54.0 68.6 795.8 49.8
2 61.4 67.8 524.8 52.8
3 54.0 73.6 556.6 46.6
4 35.6 69.2 207.2 46.4
5 44.4 85.0 639.8 73.8
st 6.5 3.4 62.5 6.4
output
append after assign name
table2.name='st'
table=table.append(table2)
table
A B C D
TR
1 55.8 73.2 536.8 42.8
2 31.0 75.4 731.2 43.6
3 42.0 68.8 598.6 32.4
4 33.6 79.0 300.8 43.6
5 70.2 72.2 566.8 54.8
st 5.9 3.2 62.5 5.9
Related
Get max value in previous rows for matching rows [duplicate]
This question already has answers here: pandas rolling max with groupby (2 answers) Closed 4 months ago. Say I have a dataframe that records temperature measurements for various sensors: import pandas as pd df = pd.DataFrame({'sensor': ['A', 'C', 'A', 'C', 'B', 'B', 'C', 'A', 'A', 'A'], 'temperature': [4.8, 12.5, 25.1, 16.9, 20.4, 15.7, 7.7, 5.5, 27.4, 17.7]}) I would like to add a column max_prev_temp that will show the previous maximum temperature for the corresponding sensor. So this works: df["max_prev_temp"] = df.apply( lambda row: df[df["sensor"] == row["sensor"]].loc[: row.name, "temperature"].max(), axis=1, ) It returns: sensor temperature max_prev_temp 0 A 4.8 4.8 1 C 12.5 12.5 2 A 25.1 25.1 3 C 16.9 16.9 4 B 20.4 20.4 5 B 15.7 20.4 6 C 7.7 16.9 7 A 5.5 25.1 8 A 27.4 27.4 9 A 17.7 27.4 Problem is: my actual data set contains over 2 million rows, so this is excruciatingly slow (it probably will take about 2 hours). I understand that rolling is a better method, but I don't see to use it for this specific case. Any hint would be appreciated.
Use Series.expanding per groups with remove first level by Series.droplevel: df["max_prev_temp"] = df.groupby('sensor')["temperature"].expanding().max().droplevel(0) print (df) sensor temperature max_prev_temp 0 A 4.8 4.8 1 C 12.5 12.5 2 A 25.1 25.1 3 C 16.9 16.9 4 B 20.4 20.4 5 B 15.7 20.4 6 C 7.7 16.9 7 A 5.5 25.1 8 A 27.4 27.4 9 A 17.7 27.4
Use groupby.cummax: df['max_prev_temp'] = df.groupby('sensor')['temperature'].cummax() output: sensor temperature max_prev_temp 0 A 4.8 4.8 1 C 12.5 12.5 2 A 25.1 25.1 3 C 16.9 16.9 4 B 20.4 20.4 5 B 15.7 20.4 6 C 7.7 16.9 7 A 5.5 25.1 8 A 27.4 27.4 9 A 17.7 27.4
Copy column data into rows below in python pandas
I am new to python and need to have some database management. I have a large dataset in CSV and the column name repeats after a certain column. I need to copy that column set below the end of rows of the first set. As shown in the image below, I want to cut and paste the dataset for each ID { 03]01]17, 03]01]16, 03]01]15 and so on...I have attached here the sample data and required format. ,Day,Time,Q,V,N,Unnamed: 5,Q.1,V.1,N.1,Unnamed: 9,Q.2,V.2,N.2,Unnamed: 13,Q.3,V.3,N.3 0,,,03]01]17,,,,03]01]16,,,,03]01]15,,,,03]01]14,, 1,,,,,,,,,,,,,,,,, 2,2019N11,00:00-00:05,48,80.2,2.3,,65,78.8,2.8,,67,78.6,2.9, ,71,84.3,2.6 3,,00:05-00:10,87,75.1,4.2,,102,77.2,4.8,,98,76.2,4.7, ,94,83.9,4.4 4,,00:10-00:15,56,78.0,2.2,,62,81.2,2.3,,66,77.2,2.7, ,70,81.3,2.6 5,,00:15-00:20,62,73.6,2.7,,79,76.9,3.3,,82,74.5,3.5, ,78,84.1,2.8 6,,00:20-00:25,69,75.6,3.0,,84,77.4,3.6,,81,75.4,3.5, ,81,83.0,3.2 7,,00:25-00:30,65,76.0,2.6,,69,77.2,2.7,,75,76.1,3.2, ,72,84.4,2.7 8,,00:30-00:35,62,77.9,2.6,,77,79.4,3.2,,82,77.9,3.4, ,83,86.1,3.1 9,,00:35-00:40,63,80.0,2.2,,82,76.6,3.2,,79,78.7,3.2, ,74,86.0,2.6 10,,00:40-00:45,52,79.5,2.0,,66,81.2,2.2,,69,78.9,2.5, ,74,85.0,2.6 11,,00:45-00:50,59,73.9,2.6,,73,78.9,2.9,,76,76.7,3.0, ,73,84.3,2.6 12,,00:50-00:55,67,77.4,2.8,,89,78.0,3.4,,87,74.9,3.4, ,90,82.6,3.1 13,,00:55-01:00,49,74.2,1.9,,75,76.6,2.8,,78,73.5,3.0, ,75,82.9,2.6 dfsample = pd.read_clipboard(sep=',') dfsample ##Required_format ,ID,Day,Time,Q,V,N 0,03]01]17,2019N11,00:00-00:05,48,80.2,2.3 1,,,00:05-00:10,87,75.1,4.2 2,,,00:10-00:15,56,78.0,2.2 3,,,00:15-00:20,62,73.6,2.7 4,,,00:20-00:25,69,75.6,3.0 5,,,00:25-00:30,65,76.0,2.6 6,,,00:30-00:35,62,77.9,2.6 7,,,00:35-00:40,63,80.0,2.2 8,,,00:40-00:45,52,79.5,2.0 9,,,00:45-00:50,59,73.9,2.6 10,,,00:50-00:55,67,77.4,2.8 11,,,00:55-01:00,49,74.2,1.9 12,03]01]16,2019N11,00:00-00:05,65,78.8,2.8 13,,,00:05-00:10,102,77.2,4.8 14,,,00:10-00:15,62,81.2,2.3 15,,,00:15-00:20,79,76.9,3.3 16,,,00:20-00:25,84,77.4,3.6 17,,,00:25-00:30,69,77.2,2.7 18,,,00:30-00:35,77,79.4,3.2 19,,,00:35-00:40,82,76.6,3.2 20,,,00:40-00:45,66,81.2,2.2 21,,,00:45-00:50,73,78.9,2.9 22,,,00:50-00:55,89,78.0,3.4 23,,,00:55-01:00,75,76.6,2.8 24,03]01]15,2019N11,00:00-00:05,67,78.6,2.9 25,,,00:05-00:10,98,76.2,4.7 26,,,00:10-00:15,66,77.2,2.7 27,,,00:15-00:20,82,74.5,3.5 28,,,00:20-00:25,81,75.4,3.5 29,,,00:25-00:30,75,76.1,3.2 30,,,00:30-00:35,82,77.9,3.4 31,,,00:35-00:40,79,78.7,3.2 32,,,00:40-00:45,69,78.9,2.5 33,,,00:45-00:50,76,76.7,3.0 34,,,00:50-00:55,87,74.9,3.4 35,,,00:55-01:00,78,73.5,3.0 36,03]01]14,2019N11,00:00-00:05,71,84.3,2.6 37,,,00:05-00:10,94,83.9,4.4 38,,,00:10-00:15,70,81.3,2.6 39,,,00:15-00:20,78,84.1,2.8 40,,,00:20-00:25,81,83.0,3.2 41,,,00:25-00:30,72,84.4,2.7 42,,,00:30-00:35,83,86.1,3.1 43,,,00:35-00:40,74,86.0,2.6 44,,,00:40-00:45,74,85.0,2.6 45,,,00:45-00:50,73,84.3,2.6 46,,,00:50-00:55,90,82.6,3.1 47,,,00:55-01:00,75,82.9,2.6 dfrequired = pd.read_clipboard(sep=',') dfrequired
Please try this: import pandas as pd import numpy as np df = pd.read_csv('file.csv') df = df.drop('Unnamed: 0', axis=1) DAY = df.iloc[2, 0] ID = df.iloc[0, 2] TIME = df.iloc[2:, 1] result_df = pd.DataFrame() i = 0 for n in range(2, df.shape[1], 4): if i==0: first_col = n last_col = n+3 temp_df = df.iloc[:, first_col:last_col] temp_df = temp_df.iloc[2:, :] temp_df.insert(0, 'ID', np.nan) temp_df.iloc[0, 0] = ID temp_df.insert(1, 'Day', np.nan) temp_df.iloc[0, 1] = DAY temp_df.insert(2, 'Time', np.nan) temp_df.iloc[:, 2] = TIME result_df = result_df.append(temp_df) i += 1 else: first_col = n last_col = n+3 temp_df = df.iloc[:, first_col:last_col] ID = temp_df.iloc[0, 0] temp_df = temp_df.iloc[2:, :] temp_df.insert(0, 'ID', np.nan) temp_df.iloc[0, 0] = ID temp_df.insert(1, 'Day', np.nan) temp_df.iloc[0, 1] = DAY temp_df.insert(2, 'Time', np.nan) temp_df.iloc[:, 2] = TIME temp_df.columns = result_df.columns result_df = result_df.append(temp_df) result_df = result_df.reset_index(drop=True)
OK, there we go. First of all, I created a random sample file that looks just like yours: (link here: https://drive.google.com/file/d/121li6T5OfSlZ12-HrxyP_Thd4NIuu1jK/view?usp=sharing) Then, I uploaded it as a dataframe: import pandas as pd import numpy as np df = pd.read_csv('database.csv') output: My approach was to create lists and group them as a new dataframe: Q_index = [i for i in range(len(df.columns)) if 'Q' in df.columns[i]] id_list = [i for sub_list in [[id]+list(np.ones(11)*np.nan) for id in df.iloc[0].dropna()] for i in sub_list] day_list = df.loc[2,'Day'] time_list = (df['Time'][2:]).tolist()*len(Q_index) Q_list = [i for sub_list in [df.iloc[2:,i].tolist() for i in Q_index] for i in sub_list] V_list = [i for sub_list in [df.iloc[2:,i+1].tolist() for i in Q_index] for i in sub_list] N_list = [i for sub_list in [df.iloc[2:,i+2].tolist() for i in Q_index] for i in sub_list] result_df = pd.DataFrame({'ID' :id_list, 'Day' :day_list, 'Time':time_list, 'Q' :Q_list, 'V' :V_list, 'N' :N_list}).fillna(method='ffill') Output: result_df ID Day Time Q V N 0 3]01]17 2019N11 00:00-00:05 91.9 70.0 3.0 1 3]01]17 2019N11 00:05-00:10 92.7 80.1 4.0 2 3]01]17 2019N11 00:10-00:15 68.3 86.8 3.2 3 3]01]17 2019N11 00:15-00:20 40.2 74.5 4.4 4 3]01]17 2019N11 00:20-00:25 81.4 74.3 3.3 5 3]01]17 2019N11 00:25-00:30 45.2 85.0 4.8 6 3]01]17 2019N11 00:30-00:35 92.3 82.3 3.6 7 3]01]17 2019N11 00:35-00:40 78.7 81.2 3.0 8 3]01]17 2019N11 00:40-00:45 88.8 86.2 2.0 9 3]01]17 2019N11 00:45-00:50 75.4 79.9 4.5 10 3]01]17 2019N11 00:50-00:55 53.0 73.6 3.2 11 3]01]17 2019N11 00:55-01:00 58.9 82.7 4.4 12 3]01]16 2019N11 00:00-00:05 62.9 77.1 3.1 13 3]01]16 2019N11 00:05-00:10 52.2 78.7 2.0 14 3]01]16 2019N11 00:10-00:15 52.0 79.0 4.7 15 3]01]16 2019N11 00:15-00:20 77.6 85.3 4.4 16 3]01]16 2019N11 00:20-00:25 57.8 84.0 5;0 17 3]01]16 2019N11 00:25-00:30 47.9 77.0 3.1 18 3]01]16 2019N11 00:30-00:35 62.4 84.5 3.2 19 3]01]16 2019N11 00:35-00:40 84.5 83.4 5.0 20 3]01]16 2019N11 00:40-00:45 56.6 88.6 2.5 21 3]01]16 2019N11 00:45-00:50 47.9 84.7 4.8 22 3]01]16 2019N11 00:50-00:55 92.5 77.8 3.7 23 3]01]16 2019N11 00:55-01:00 60.6 75.0 4.5 24 3]01]15 2019N11 00:00-00:05 51.8 86.3 4.4 25 3]01]15 2019N11 00:05-00:10 52.9 83.6 5.0 26 3]01]15 2019N11 00:10-00:15 52.5 85.4 3.4 27 3]01]15 2019N11 00:15-00:20 46.1 81.2 2.3 28 3]01]15 2019N11 00:20-00:25 65.1 70.9 4.7 29 3]01]15 2019N11 00:25-00:30 65.2 77.6 2.6 30 3]01]15 2019N11 00:30-00:35 67.1 84.2 4.1 31 3]01]15 2019N11 00:35-00:40 42.2 82.2 3.3 32 3]01]15 2019N11 00:40-00:45 71.5 79.8 2.4 33 3]01]15 2019N11 00:45-00:50 65.1 72.3 2.9 34 3]01]15 2019N11 00:50-00:55 86.0 80.3 3.9 35 3]01]15 2019N11 00:55-01:00 92.8 85.9 4.1 36 3]01]14 2019N11 00:00-00:05 53.2 82.4 3.1 37 3]01]14 2019N11 00:05-00:10 98.0 76.0 3.5 38 3]01]14 2019N11 00:10-00:15 58.9 88.3 4.4 39 3]01]14 2019N11 00:15-00:20 95.3 85.1 3.2 40 3]01]14 2019N11 00:20-00:25 45.7 74.0 3.5 41 3]01]14 2019N11 00:25-00:30 48.6 89.7 4.8 42 3]01]14 2019N11 00:30-00:35 94.6 79.5 2.1 43 3]01]14 2019N11 00:35-00:40 71.8 73.0 3.8 44 3]01]14 2019N11 00:40-00:45 92.5 83.1 2.0 45 3]01]14 2019N11 00:45-00:50 70.3 79.4 4.2 46 3]01]14 2019N11 00:50-00:55 83.6 82.6 2.8 47 3]01]14 2019N11 00:55-01:00 56.2 89.1 2.6
Transforming yearwise data using pandas
I have a dataframe that looks like this: Temp Date 1981-01-01 20.7 1981-01-02 17.9 1981-01-03 18.8 1981-01-04 14.6 1981-01-05 15.8 ... ... 1981-12-27 15.5 1981-12-28 13.3 1981-12-29 15.6 1981-12-30 15.2 1981-12-31 17.4 365 rows × 1 columns And I want to transform It so That It looks like: 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 0 20.7 17.0 18.4 19.5 13.3 12.9 12.3 15.3 14.3 14.8 1 17.9 15.0 15.0 17.1 15.2 13.8 13.8 14.3 17.4 13.3 2 18.8 13.5 10.9 17.1 13.1 10.6 15.3 13.5 18.5 15.6 3 14.6 15.2 11.4 12.0 12.7 12.6 15.6 15.0 16.8 14.5 4 15.8 13.0 14.8 11.0 14.6 13.7 16.2 13.6 11.5 14.3 ... ... ... ... ... ... ... ... ... ... ... 360 15.5 15.3 13.9 12.2 11.5 14.6 16.2 9.5 13.3 14.0 361 13.3 16.3 11.1 12.0 10.8 14.2 14.2 12.9 11.7 13.6 362 15.6 15.8 16.1 12.6 12.0 13.2 14.3 12.9 10.4 13.5 363 15.2 17.7 20.4 16.0 16.3 11.7 13.3 14.8 14.4 15.7 364 17.4 16.3 18.0 16.4 14.4 17.2 16.7 14.1 12.7 13.0 My attempt: groups=df.groupby(df.index.year) keys=groups.groups.keys() years=pd.DataFrame() for key in keys: years[key]=groups.get_group(key)['Temp'].values Question: The above code is giving me my desired output but Is there is a more efficient way of transforming this? As I can't post the whole data because there are 3650 rows in the dataframe so you can download the csv file(60.6 kb) for testing from here
Try grabbing the year and dayofyear from the index then pivoting: import pandas as pd import numpy as np # Create Random Data dr = pd.date_range(pd.to_datetime("1981-01-01"), pd.to_datetime("1982-12-31")) df = pd.DataFrame(np.random.randint(1, 100, size=dr.shape), index=dr, columns=['Temp']) # Get Year and Day of Year df['year'] = df.index.year df['day'] = df.index.dayofyear # Pivot p = df.pivot(index='day', columns='year', values='Temp') print(p) p: year 1981 1982 day 1 38 85 2 51 70 3 76 61 4 71 47 5 44 76 .. ... ... 361 23 22 362 42 64 363 84 22 364 26 56 365 67 73 Run-Time via Timeit import timeit setup = ''' import pandas as pd import numpy as np # Create Random Data dr = pd.date_range(pd.to_datetime("1981-01-01"), pd.to_datetime("1983-12-31")) df = pd.DataFrame(np.random.randint(1, 100, size=dr.shape), index=dr, columns=['Temp'])''' pivot = ''' df['year'] = df.index.year df['day'] = df.index.dayofyear p = df.pivot(index='day', columns='year', values='Temp')''' groupby_for = ''' groups=df.groupby(df.index.year) keys=groups.groups.keys() years=pd.DataFrame() for key in keys: years[key]=groups.get_group(key)['Temp'].values''' if __name__ == '__main__': print("Pivot") print(timeit.timeit(setup=setup, stmt=pivot, number=1000)) print("Groupby For") print(timeit.timeit(setup=setup, stmt=groupby_for, number=1000)) Pivot 1.598973 Groupby For 2.3967995999999996 *Additional note, the groupby for option will not work for leap years as it will not be able to handle 1984 being 366 days instead of 365. Pivot will work regardless.
Fill NaN values from previous column with data
I have a dataframe in pandas, and I am trying to take data from the same row and different columns and fill NaN values in my data. How would I do this in pandas? For example, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 83 27.0 29.0 NaN 29.0 30.0 NaN NaN 15.0 16.0 17.0 NaN 28.0 30.0 NaN 28.0 18.0 The goal is for the data to look like this: 1 2 3 4 5 6 7 ... 10 11 12 13 14 15 16 83 NaN NaN NaN 27.0 29.0 29.0 30.0 ... 15.0 16.0 17.0 28.0 30.0 28.0 18.0 The goal is to be able to take the mean of the last five columns that have data. If there are not >= 5 data-filled cells, then take the average of however many cells there are.
Use function justify for improve performance with filter all columns without first by DataFrame.iloc: print (df) name 1 2 3 4 5 6 7 8 9 10 11 12 13 \ 80 bob 27.0 29.0 NaN 29.0 30.0 NaN NaN 15.0 16.0 17.0 NaN 28.0 30.0 14 15 16 80 NaN 28.0 18.0 df.iloc[:, 1:] = justify(df.iloc[:, 1:].to_numpy(), invalid_val=np.nan, side='right') print (df) name 1 2 3 4 5 6 7 8 9 10 11 12 13 \ 80 bob NaN NaN NaN NaN NaN 27.0 29.0 29.0 30.0 15.0 16.0 17.0 28.0 14 15 16 80 30.0 28.0 18.0 Function: #https://stackoverflow.com/a/44559180/2901002 def justify(a, invalid_val=0, axis=1, side='left'): """ Justifies a 2D array Parameters ---------- A : ndarray Input array to be justified axis : int Axis along which justification is to be made side : str Direction of justification. It could be 'left', 'right', 'up', 'down' It should be 'left' or 'right' for axis=1 and 'up' or 'down' for axis=0. """ if invalid_val is np.nan: mask = ~np.isnan(a) else: mask = a!=invalid_val justified_mask = np.sort(mask,axis=axis) if (side=='up') | (side=='left'): justified_mask = np.flip(justified_mask,axis=axis) out = np.full(a.shape, invalid_val) if axis==1: out[justified_mask] = a[mask] else: out.T[justified_mask.T] = a.T[mask.T] return out Performance: #100 rows df = pd.concat([df] * 100, ignore_index=True) #41 times slowier In [39]: %timeit df.loc[:,df.columns[1:]] = df.loc[:,df.columns[1:]].apply(fun, axis=1) 145 ms ± 23.7 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) In [41]: %timeit df.iloc[:, 1:] = justify(df.iloc[:, 1:].to_numpy(), invalid_val=np.nan, side='right') 3.54 ms ± 236 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) #1000 rows df = pd.concat([df] * 1000, ignore_index=True) #198 times slowier In [43]: %timeit df.loc[:,df.columns[1:]] = df.loc[:,df.columns[1:]].apply(fun, axis=1) 1.13 s ± 37.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) In [45]: %timeit df.iloc[:, 1:] = justify(df.iloc[:, 1:].to_numpy(), invalid_val=np.nan, side='right') 5.7 ms ± 184 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Assuming you need to move all NaN to the first columns I would define a function that takes all NaN and places them first and leave the rest as it is: def fun(row): index_order = row.index[row.isnull()].append(row.index[~row.isnull()]) row.iloc[:] = row[index_order].values return row df_fix = df.loc[:,df.columns[1:]].apply(fun, axis=1) If you need to overwrite the results in the same dataframe then: df.loc[:,df.columns[1:]] = df_fix.copy()
Pandas/Python: interpolation of multiple columns based on values specified for one reference column
df Out[1]: PRES HGHT TEMP DWPT RELH MIXR DRCT SKNT THTA THTE THTV 0 978.0 345 17.0 16.5 97 12.22 0 0 292.0 326.8 294.1 1 977.0 354 17.8 16.7 93 12.39 1 0 292.9 328.3 295.1 2 970.0 416 23.4 15.4 61 11.47 4 2 299.1 332.9 301.2 3 963.0 479 24.0 14.0 54 10.54 8 3 300.4 331.6 302.3 4 948.7 610 23.0 13.4 55 10.28 15 6 300.7 331.2 302.5 5 925.0 830 21.4 12.4 56 9.87 20 5 301.2 330.6 303.0 6 916.0 914 20.7 11.7 56 9.51 20 4 301.3 329.7 303.0 7 884.0 1219 18.2 9.2 56 8.31 60 4 301.8 326.7 303.3 8 853.1 1524 15.7 6.7 55 7.24 35 3 302.2 324.1 303.5 9 850.0 1555 15.4 6.4 55 7.14 20 2 302.3 323.9 303.6 10 822.8 1829 13.3 5.6 60 6.98 300 4 302.9 324.0 304.1 How do I interpolate the values of all the columns on specified PRES (pressure) values at say PRES=[950, 900, 875]? Is there an elegant pandas type of way to do this? The only way I can think of doing this is to first start with making empty NaN values for the entire row for each specified PRES values in a loop, then set PRES as index and then use the pandas native interpolate option: df.interpolate(method='index', inplace=True) Is there a more elegant solution?
Use your solution with no loop - reindex by union original index values with PRES list, but working only if all values are unique: PRES=[950, 900, 875] df = df.set_index('PRES') df = df.reindex(df.index.union(PRES)).sort_index(ascending=False).interpolate(method='index') print (df) HGHT TEMP DWPT RELH MIXR DRCT SKNT THTA THTE THTV 978.0 345.0 17.0 16.5 97.0 12.22 0.0 0.0 292.0 326.8 294.1 977.0 354.0 17.8 16.7 93.0 12.39 1.0 0.0 292.9 328.3 295.1 970.0 416.0 23.4 15.4 61.0 11.47 4.0 2.0 299.1 332.9 301.2 963.0 479.0 24.0 14.0 54.0 10.54 8.0 3.0 300.4 331.6 302.3 950.0 1829.0 13.3 5.6 60.0 6.98 300.0 4.0 302.9 324.0 304.1 948.7 610.0 23.0 13.4 55.0 10.28 15.0 6.0 300.7 331.2 302.5 925.0 830.0 21.4 12.4 56.0 9.87 20.0 5.0 301.2 330.6 303.0 916.0 914.0 20.7 11.7 56.0 9.51 20.0 4.0 301.3 329.7 303.0 900.0 1829.0 13.3 5.6 60.0 6.98 300.0 4.0 302.9 324.0 304.1 884.0 1219.0 18.2 9.2 56.0 8.31 60.0 4.0 301.8 326.7 303.3 875.0 1829.0 13.3 5.6 60.0 6.98 300.0 4.0 302.9 324.0 304.1 853.1 1524.0 15.7 6.7 55.0 7.24 35.0 3.0 302.2 324.1 303.5 850.0 1555.0 15.4 6.4 55.0 7.14 20.0 2.0 302.3 323.9 303.6 822.8 1829.0 13.3 5.6 60.0 6.98 300.0 4.0 302.9 324.0 304.1 If possible not unique values in PRES column, then use concat with sort_index: PRES=[950, 900, 875] df = df.set_index('PRES') df = (pd.concat([df, pd.DataFrame(index=PRES)]) .sort_index(ascending=False) .interpolate(method='index'))