I'm a bit new to panda and have some diabetic data that i would like to reorder.
I'd like to copy the data from column 'wakeup' through '23:00:00',
and put this data vertical under each other so I would get a new dataframe column:
5.6
8.1
9.9
6.3
4.1
13.3
NAN
3.9
3.3
6.8
.....etc
I'm assuming the data is in a dataframe already. You can index the columns you want and then use melt as suggested. Without any parameters, melt will 'stack' all your data into one column of a new dataframe. There's another column created to identify the original column names, but you can drop that if needed.
df.loc[:, 'wakeup':'23:00:00'].melt()
variable value
0 wakeup 5.6
1 wakeup 8.1
2 wakeup 9.9
3 wakeup 6.3
4 wakeup 4.1
5 wakeup 13.3
6 wakeup NAN
7 09:30:00 3.9
8 09:30:00 3.3
9 09:30:00 6.8
...
You mention you want this as another column, but there's no way to sensibly add it into your existing dataframe. The shape likely won't match also.
Solved it myself finally took me quite some time.
Notice here the orginal data was in df1 result in dfAllMeasurements
dfAllMeasurements = df1.loc[:, 'weekday':'23:00:00']
temp = dfAllMeasurements.set_index('weekday','ID').stack(dropna=False) #dropna = keeping NAN
dfAllMeasurements = temp.reset_index(drop=False, level=0).reset_index()
Related
I'm trying to exclude data that is filtered on another data frame using pandas jupyter. An example of the data frame can be seen below.
Data frame 1:
ID
Amount
AB-01
2.65
AB-02
3.6
AB-03
5.6
AB-04
7.6
AB-05
2
Dataframe 2:
ID
Amount
AB-01
2.65
AB-02
3.6
Desired outcome:
ID
Amount
AB-03
5.6
AB-04
7.6
AB-05
2
You can use isin
out = df1[~df1['ID'].isin(df2['ID'])]
print(out)
ID Amount
2 AB-03 5.6
3 AB-04 7.6
4 AB-05 2.0
I have a dataframe of portfolio returns:
date Portfolio %
30/11/2001 4.8
31/12/2001 -0.7
31/01/2002 1.3
28/02/2002 -1.4
29/03/2002 3.3
I need to create an index of returns, but to do this i need to have a starting figure of 1.0 and the formula references the previous row. The output should look like this:
date Portfolio % Index
1.0 NaN
30/11/2001 4.8 1.048
31/12/2001 -0.7 1.040
31/01/2002 1.3 1.054
28/02/2002 -1.4 1.039
29/03/2002 3.3 1.073
As an example the formula for the second result is:
1.048*(1+-0.7/100)
I've tried the following code, but it doesn't get the required result.
portfolio['Index'] = portfolio['Portfolio %'] / portfolio['Portfolio %'].iloc[0]
The issues i have:
I can't get the starting variable
I can't get the formula to reference the previous row.
I believe it is the same issue as this post: Create and index from returns PANDAS. However, it was never answered fully.
Use, Series.div, Series.add along with Series.cumprod :
df['Index'] = df['Portfolio %'].div(100).add(1).cumprod()
Result:
# print(df)
date Portfolio % Index
0 30/11/2001 4.8 1.048000
1 31/12/2001 -0.7 1.040664
2 31/01/2002 1.3 1.054193
3 28/02/2002 -1.4 1.039434
4 29/03/2002 3.3 1.073735
I have two pd.dataframes:
df1:
Year Replaced Not_replaced
2015 1.5 0.1
2016 1.6 0.3
2017 2.1 0.1
2018 2.6 0.5
df2:
Year HI LO RF
2015 3.2 2.9 3.0
2016 3.0 2.8 2.9
2017 2.7 2.5 2.6
2018 2.6 2.2 2.3
I need to create a third df3 by using the following equation:
df3[column1]=df1['Replaced']-df1['Not_replaced]+df2['HI']
df3[column2]=df1['Replaced']-df1['Not_replaced]+df2['LO']
df3[column3]=df1['Replaced']-df1['Not_replaced]+df2['RF']
I can merge the two dataframes and manually create 3 new columns one by one, but I can't figure out how to use the loop function to create the results.
You can create an empty dataframe & fill it with values while looping
(Note: col_names & df3.columns must be of the same length)
df3 = pd.DataFrame(columns = ['column1','column2','column3'])
col_names = ["HI", "LO","RF"]
for incol,df3column in zip(col_names,df3.columns):
df3[df3column] = df1['Replaced']-df1['Not_replaced']+df2[incol]
print(df3)
output
column1 column2 column3
0 4.6 4.3 4.4
1 4.3 4.1 4.2
2 4.7 4.5 4.6
3 4.7 4.3 4.4
for the for loop, I would first merge df1 and df2 into to create a new df, called df3. Then, I would create a list of te names of the columns you want to iterate through:
col_names = ["HI", "LO","RF"]
for col in col_names:
df3[f"column_{col}]= df3['Replaced']-df3['Not_replaced]+df3[col]
I have a dataframe of tick data, which i have resampled into minute data. doing a vanilla
df.resample('1Min').ohlc().fillna(method='ffill')
super easy.
I now need to iterate over that resampled dataframe each day at a time, but i cant figure out the best way to do it.
ive tried taking my 1min resampled dataframe and then resampling that for "1D" and then converting that to a list to iterate over and filter, but that gives me a list of:
Timestamp('2011-09-13 00:00:00', freq='D')
objects, and it wont let me slice a dataframe based on that.
this seems like it would be something easy, but i just cant find the answer. thanks-
#sample data_1m dataframe
data_1m.head()
open high low close
timestamp
2011-09-13 13:53:00 5.8 6.0 5.8 6.0
2011-09-13 13:54:00 5.8 6.0 5.8 6.0
2011-09-13 13:55:00 5.8 6.0 5.8 6.0
2011-09-13 13:56:00 5.8 6.0 5.8 6.0
2011-09-13 13:57:00 5.8 6.0 5.8 6.0
...
#i want to get everything for date 2011-09-13 im trying
days_in_df = data_1m.resample('1D').ohlc().fillna(method='ffill').index.to_list()
data_1m.loc[days_in_df[0]]
KeyError: Timestamp('2011-09-13 00:00:00', freq='D')
Here's my two cents. I don't resample the data so much as adding another index level to the frame:
data_1m = data_1m.reset_index()
data_1m['date'] = data_1m['timestamp'].astype('datetime64[D]')
data_1m = data_1m.set_index(['date', 'timestamp'])
And to select an entire day:
data_1m.loc['2011-09-13']
This question already has answers here:
Reshape wide to long in pandas
(2 answers)
Closed 4 years ago.
I am manipulating a data frame using Pandas in Python to match a specific format.
I currently have a data frame with a row for each measurement location (A or B). Each row has a nominal target and multiple measured data points.
This is the format I currently have:
df=
Location Nominal Meas1 Meas2 Meas3
A 4.0 3.8 4.1 4.3
B 9.0 8.7 8.9 9.1
I need to manipulate this data so there is only one measured data point per row, and copy the Location and Nominal values from the source rows to the new rows. The measured data also needs to be put in the first column.
This is the format I need:
df =
Meas Location Nominal
3.8 A 4.0
4.1 A 4.0
4.3 A 4.0
8.7 B 9.0
8.9 B 9.0
9.1 B 9.0
I have tried concat and append functions with and without transpose() with no success.
This is the most similar example I was able to find, but it did not get me there:
for index, row in df.iterrows():
pd.concat([row]*3, ignore_index=True)
Thank you!
Its' a wide to long problem
pd.wide_to_long(df,'Meas',i=['Location','Nominal'],j='drop').reset_index().drop('drop',1)
Out[637]:
Location Nominal Meas
0 A 4.0 3.8
1 A 4.0 4.1
2 A 4.0 4.3
3 B 9.0 8.7
4 B 9.0 8.9
5 B 9.0 9.1
Another solution, using melt:
new_df = (df.melt(['Location','Nominal'],
['Meas1', 'Meas2', 'Meas3'],
value_name = 'Meas')
.drop('variable', axis=1)
.sort_values('Location'))
>>> new_df
Location Nominal Meas
0 A 4.0 3.8
2 A 4.0 4.1
4 A 4.0 4.3
1 B 9.0 8.7
3 B 9.0 8.9
5 B 9.0 9.1