NaN when converting df to a series

NaN when converting df to a series - python

I have a dataframe with OHLC data. I need to get the close price into the pandas series, using the timestamp column as the index.
I am reading from a sqlite db into my df:
conn = sql.connect('allStockData.db')
price = pd.read_sql_query("SELECT * from ohlc_minutes", conn)
price['timestamp'] = pd.to_datetime(price['timestamp'])
print(price)
Which returns:
timestamp open high low close volume trade_count vwap symbol volume_10_day
0 2022-09-16 08:00:00+00:00 3.19 3.570 3.19 3.350 66475 458 3.404240 AAOI NaN
1 2022-09-16 08:05:00+00:00 3.35 3.440 3.33 3.430 28925 298 3.381131 AAOI NaN
2 2022-09-16 08:10:00+00:00 3.44 3.520 3.35 3.400 62901 643 3.445096 AAOI NaN
3 2022-09-16 08:15:00+00:00 3.37 3.390 3.31 3.360 17943 184 3.339721 AAOI NaN
4 2022-09-16 08:20:00+00:00 3.36 3.410 3.34 3.400 29123 204 3.383370 AAOI NaN
... ... ... ... ... ... ... ... ... ... ...
8759 2022-09-08 23:35:00+00:00 1.35 1.360 1.35 1.355 3835 10 1.350613 RUBY 515994.5
8760 2022-09-08 23:40:00+00:00 1.36 1.360 1.35 1.350 2780 7 1.353687 RUBY 515994.5
8761 2022-09-08 23:45:00+00:00 1.35 1.355 1.35 1.355 7080 11 1.350424 RUBY 515994.5
8762 2022-09-08 23:50:00+00:00 1.35 1.360 1.33 1.360 11664 30 1.351104 RUBY 515994.5
8763 2022-09-08 23:55:00+00:00 1.36 1.360 1.33 1.340 21394 32 1.348223 RUBY 515994.5
[8764 rows x 10 columns]
When I try to get the close into a series with the timestamp:
price = pd.Series(price['close'], index=price['timestamp'])
It returns a bunch of NaNs:
2022-09-16 08:00:00+00:00 NaN
2022-09-16 08:05:00+00:00 NaN
2022-09-16 08:10:00+00:00 NaN
2022-09-16 08:15:00+00:00 NaN
2022-09-16 08:20:00+00:00 NaN
..
2022-09-08 23:35:00+00:00 NaN
2022-09-08 23:40:00+00:00 NaN
2022-09-08 23:45:00+00:00 NaN
2022-09-08 23:50:00+00:00 NaN
2022-09-08 23:55:00+00:00 NaN
Name: close, Length: 8764, dtype: float64
If I remove the index:
price = pd.Series(price['close'])
The close is returned normally:
0 3.350
1 3.430
2 3.400
3 3.360
4 3.400
...
8759 1.355
8760 1.350
8761 1.355
8762 1.360
8763 1.340
Name: close, Length: 8764, dtype: float64
How can I return the close column as a pandas series, using my timestamp column as the index?

It's because price['close'] has it's own index which is incompatible with timestamp. Try use .values instead:
price = pd.Series(price['close'].values, index=price['timestamp'])

I needed to set the timestamp to the index before getting the the close as a series:
conn = sql.connect('allStockData.db')
price = pd.read_sql_query("SELECT * from ohlc_minutes", conn)
price['timestamp'] = pd.to_datetime(price['timestamp'])
price = price.set_index('timestamp')
print(price)
price = pd.Series(price['close'])
print(price)
Gives:
2022-09-16 08:00:00+00:00 3.350
2022-09-16 08:05:00+00:00 3.430
2022-09-16 08:10:00+00:00 3.400
2022-09-16 08:15:00+00:00 3.360
2022-09-16 08:20:00+00:00 3.400
...
2022-09-08 23:35:00+00:00 1.355
2022-09-08 23:40:00+00:00 1.350
2022-09-08 23:45:00+00:00 1.355
2022-09-08 23:50:00+00:00 1.360
2022-09-08 23:55:00+00:00 1.340
Name: close, Length: 8764, dtype: float64

Related

Getting NaN when using pandas groupby

I have a dataframe like so:
index date symbol stock_id open high low close volume vwap
0 0 2021-10-11 BVN 13 7.69 7.98 7.5600 7.61 879710 7.782174
1 1 2021-10-12 BVN 13 7.67 8.08 7.5803 8.02 794436 7.967061
2 2 2021-10-13 BVN 13 8.12 8.36 8.0900 8.16 716012 8.231286
3 3 2021-10-14 BVN 13 8.26 8.29 8.0500 8.28 586091 8.185899
4 4 2021-10-15 BVN 13 8.18 8.44 8.0600 8.44 1278409 8.284539
... ... ... ... ... ... ... ... ... ... ...
227774 227774 2022-10-04 ERIC 11000 6.27 6.32 6.2400 6.29 14655189 6.280157
227775 227775 2022-10-05 ERIC 11000 6.17 6.31 6.1500 6.29 10569193 6.219965
227776 227776 2022-10-06 ERIC 11000 6.20 6.25 6.1800 6.22 7918812 6.217198
227777 227777 2022-10-07 ERIC 11000 6.17 6.19 6.0800 6.10 9671252 6.135976
227778 227778 2022-10-10 ERIC 11000 6.13 6.15 6.0200 6.04 6310661 6.066256
[227779 rows x 10 columns]
And then a function to return a boolean mask on whether or not the df is consolidating inside of a range:
def is_consolidating(df, window=2, minp=2, percentage=0.95):
rolling_min = pd.Series(df['close']).rolling(window=window, min_periods=minp).min()
rolling_max = pd.Series(df['close']).rolling(window=window, min_periods=minp).max()
consolidation = np.where( (rolling_min / rolling_max) >= percentage, True, False)
return consolidation
Which I then call like:
df['t'] = df.groupby("stock_id").apply(is_consolidating)
The problem is when I print the df I am getting NaN for the values of my new column:
dan#danalgo:~/Documents/code/wolfhound$ python3 add_indicators_daily.py
index date symbol stock_id open high low close volume vwap t
0 0 2021-10-11 BVN 13 7.69 7.98 7.5600 7.61 879710 7.782174 NaN
1 1 2021-10-12 BVN 13 7.67 8.08 7.5803 8.02 794436 7.967061 NaN
2 2 2021-10-13 BVN 13 8.12 8.36 8.0900 8.16 716012 8.231286 NaN
3 3 2021-10-14 BVN 13 8.26 8.29 8.0500 8.28 586091 8.185899 NaN
4 4 2021-10-15 BVN 13 8.18 8.44 8.0600 8.44 1278409 8.284539 NaN
... ... ... ... ... ... ... ... ... ... ... ...
227774 227774 2022-10-04 ERIC 11000 6.27 6.32 6.2400 6.29 14655189 6.280157 NaN
227775 227775 2022-10-05 ERIC 11000 6.17 6.31 6.1500 6.29 10569193 6.219965 NaN
227776 227776 2022-10-06 ERIC 11000 6.20 6.25 6.1800 6.22 7918812 6.217198 NaN
227777 227777 2022-10-07 ERIC 11000 6.17 6.19 6.0800 6.10 9671252 6.135976 NaN
227778 227778 2022-10-10 ERIC 11000 6.13 6.15 6.0200 6.04 6310661 6.066256 NaN
[227779 rows x 11 columns]
Full code:
import pandas as pd
from IPython.display import display
import sqlite3 as sql
import numpy as np
conn = sql.connect('allStockData.db')
# get everything inside daily_ohlc and add to a dataframe
df = pd.read_sql_query("SELECT * from daily_ohlc_init", conn)
def is_consolidating(df, window=2, minp=2, percentage=0.95):
rolling_min = pd.Series(df['close']).rolling(window=window, min_periods=minp).min()
rolling_max = pd.Series(df['close']).rolling(window=window, min_periods=minp).max()
consolidation = np.where( (rolling_min / rolling_max) >= percentage, True, False)
return consolidation
df['t'] = df.groupby("stock_id").apply(is_consolidating)
print(df)
df.to_sql('daily_ohlc_init_with_indicators', if_exists='replace', con=conn, index=True)

You could do it like this:
def is_consolidating(grp, window=2, minp=2, percentage=0.95):
rolling_min = pd.Series(grp).rolling(window=window, min_periods=minp).min()
rolling_max = pd.Series(grp).rolling(window=window, min_periods=minp).max()
consolidation = np.where( (rolling_min / rolling_max) >= percentage, True, False)
return pd.Series(consolidation, index=grp.index)
df['t'] = df.groupby("stock_id")['close'].apply(is_consolidating)
print(df)
Output (part of it):
volume vwap t
0 879710 7.782174 False
1 794436 7.967061 False
2 716012 8.231286 True
3 586091 8.185899 True
4 1278409 8.284539 True
227774 14655189 6.280157 False
227775 10569193 6.219965 True
227776 7918812 6.217198 True
227777 9671252 6.135976 True
227778 6310661 6.066256 True

Compute returns from data frame

how do I compute the returns for the following dataframe? Let the name of the dataframe be refined_df
0 1
Date
2020-02-03 TSLA MSFT
2020-02-19 AMZN ADBE
2020-03-05 OYST GPRO
2020-03-20 AMZN OYST
2020-04-06 SGEN AEYE
2020-04-22 AEYE TSLA
2020-05-07 AAPL SGEN
and we also have another dataframe, storage_openprices
AAL AAPL ADBE AEYE AMZN GOOG GPRO MSFT OYST PACB RADI SGEN TSLA
Date
2020-01-14 27.910000 79.175003 347.010010 5.300000 1885.880005 1439.010010 4.230 163.389999 28.010000 4.850000 NaN 104.849998 108.851997
2020-01-15 27.450001 77.962502 346.420013 5.020000 1872.250000 1430.209961 4.160 162.619995 26.510000 4.800000 NaN 108.550003 105.952003
2020-01-16 27.790001 78.397499 345.980011 5.060000 1882.989990 1447.439941 4.280 164.350006 25.530001 4.930000 NaN 107.330002 98.750000
2020-01-17 28.299999 79.067497 349.000000 4.840000 1885.890015 1462.910034 4.360 167.419998 24.740000 5.030000 NaN 108.410004 101.522003
2020-01-21 27.969999 79.297501 346.369995 4.880000 1865.000000 1479.119995 4.280 166.679993 26.190001 4.950000 NaN 108.379997 106.050003
What I want is to return a new dataframe with the log returns of particular stock for the specific duration.
For example, the (0,0) entry of the returned dataframe is the log return for holding TSLA from 2020-02-03 till 2020-02-19. We refer to the open prices for tesla from refined_df
Similarly, for the (1,0) entry we return the log return for holding AMZN from 2020-02-19 till 2020-03-05.
Unsure if I should be using the apply and lambda function. My issue is calling the next row in computing the log returns.
EDIT:
The output, a dataframe should look like
0 1
Date
2020-02-03 0.14 0.21
2020-02-19 0.18 0.19
2020-03-05 XXXX XXXX
2020-03-20 XXXX XXXX
2020-04-06 XXXX XXXX
2020-04-22 XXXX XXXX
2020-05-07 XXXX XXXX
where 0.14 (a made-up number) is the log return of TSLA from 2020-02-03 to 2020-02-19, i.e. log(TSLA open price on 2020-02-19) - log(TSLA open price on 2020-02-03)
Thanks!

You can use merge_asof and direction='forward' parameter with reshaped both DataFrames by DataFrame.stack:
print (refined_df)
0 1
Date
2020-02-03 TSLA MSFT
2020-02-19 AMZN ADBE
2020-03-05 OYST GPRO
2020-03-20 AMZN OYST
2020-04-06 SGEN AEYE
2020-04-22 AEYE TSLA
2020-05-07 AAPL SGEN
#changed datetiems for match
print (storage_openprices)
AAL AAPL ADBE AEYE AMZN GOOG \
Date
2020-02-14 27.910000 79.175003 347.010010 5.30 1885.880005 1439.010010
2020-02-15 27.450001 77.962502 346.420013 5.02 1872.250000 1430.209961
2020-02-16 27.790001 78.397499 345.980011 5.06 1882.989990 1447.439941
2020-02-17 28.299999 79.067497 349.000000 4.84 1885.890015 1462.910034
2020-02-21 27.969999 79.297501 346.369995 4.88 1865.000000 1479.119995
GPRO MSFT OYST PACB RADI SGEN TSLA
Date
2020-02-14 4.23 163.389999 28.010000 4.85 NaN 104.849998 108.851997
2020-02-15 4.16 162.619995 26.510000 4.80 NaN 108.550003 105.952003
2020-02-16 4.28 164.350006 25.530001 4.93 NaN 107.330002 98.750000
2020-02-17 4.36 167.419998 24.740000 5.03 NaN 108.410004 101.522003
2020-02-21 4.28 166.679993 26.190001 4.95 NaN 108.379997 106.050003
df1 = storage_openprices.stack().rename_axis(['Date','type']).reset_index(name='new')
df2 = refined_df.stack().rename_axis(['Date','cols']).reset_index(name='type')
new = (pd.merge_asof(df2, df1, on='Date',by='type', direction='forward')
.pivot('Date','cols','new'))
print (new)
cols 0 1
Date
2020-02-03 108.851997 163.389999
2020-02-19 1865.000000 346.369995
2020-03-05 NaN NaN
2020-03-20 NaN NaN
2020-04-06 NaN NaN
2020-04-22 NaN NaN
2020-05-07 NaN NaN

Calculate average revenue per user per month

I have the following dataframe:
Timestamp userid Prices_USD
0 2016-12-01 6.270941895 1.08
1 2016-12-01 6.609813209 1.12
2 2016-12-01 6.632094115 9.70
3 2016-12-01 6.655789772 1.08
4 2016-12-01 6.764640751 9.33
... ... ... ...
1183 2017-03-27 6.529604089 1.08
1184 2017-03-27 6.682639674 6.72
1185 2017-03-27 6.773815105 10.0
I want to calculate, for each unique userid, their monthly spending.
I've tried the following:
sales_per_user.set_index('Timestamp',inplace=True)
sales_per_user.index = pd.to_datetime(sales_per_user.index)
m = sales_per_user.index.month
monthly_avg = sales_per_user.groupby(['userid', m]).Prices_USD.mean().to_frame()
But the resulting dataframe is this:
userid Timestamp Prices_USD
3.43964843 12 10.91
3.885813375 1 10.91
2 10.91
12 21.82
However, the timestamp column doesn't have the desired outcome. Ideally I would like
userid Timestamp Prices_USD
3.43964843 2016-12 10.91
3.885813375 2017-01 10.91
2017-02 10.91
2017-12 21.82
How do I fix that?

Try:
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
res = df.groupby([df['userid'], df['Timestamp'].dt.to_period('M')])['Prices_USD'].sum()
print(res)
Output
userid Timestamp
6.270942 2016-12 1.08
6.529604 2017-03 1.08
6.609813 2016-12 1.12
6.632094 2016-12 9.70
6.655790 2016-12 1.08
6.682640 2017-03 6.72
6.764641 2016-12 9.33
6.773815 2017-03 10.00
Name: Prices_USD, dtype: float64

Adding a row into DataFrame with multiindex

I would like to create a new DataFrame and a bunch of stock data per each date.
Declaring a DataFrame with a multi-index - date and stock ticker.
Adding data for 2020-06-07
date stock open high low close
2020-06-07 AAPL 33.50 34.20 32.1 33.30
2020-06-07 MSFT 53.50 54.20 32.1 53.30
Adding data for 2020-06-08
date stock open high low close
2020-06-07 AAPL 33.50 34.20 32.1 33.30
2020-06-07 MSFT 53.50 54.20 32.1 53.30
2020-06-08 AAPL 32.50 34.20 31.1 32.30
2020-06-08 MSFT 58.50 59.20 52.1 53.30
What would be the best and most efficient solution?
Here's my current version that doesn't work as I expect.
df = pd.DataFrame()
for date in dates:
universe500 = get_universe(date) #returns stocks on a specific date
for security in universe500:
prices = data.get_prices(security, ['open','high','low','close'], 1, '1d') # returns pd.DataFrame
df.iloc[(date, security),:] = prices

If prices is a DataFrame formatted in the same manner as the original df, you can use concat:
In[0]:
#consttucting a fake entry
arrays = [['06-07-2020'], ['ABCD']]
multi = pd.MultiIndex.from_arrays(arrays, names=('date', 'stock'))
to_add = pd.DataFrame({'open':1, 'high':2, 'low':3, 'close':4},index=multi)
print(to_add)
Out[0]:
open high low close
date stock
2020-06-09 ABCD 1 2 3 4
In[1]:
#now adding to your data
df = pd.concat([df, to_add])
print(df)
Out[1]:
open high low close
date stock
2020-06-07 AAPL 33.5 34.2 32.1 33.3
MSFT 53.5 54.2 32.1 53.3
2020-06-08 AAPL 32.5 34.2 31.1 32.3
MSFT 58.5 59.2 52.1 53.3
2020-06-09 ABCD 1.0 2.0 3.0 4.0
If the data (prices) were just an array of 4 numbers (the open, high, low, and close) values, then loc would work in the place you used iloc:
In[2]:
df.loc[('2020-06-10','WXYZ'),:] = [10,20,30,40]
Out[2]:
open high low close
date stock
2020-06-07 AAPL 33.5 34.2 32.1 33.3
MSFT 53.5 54.2 32.1 53.3
2020-06-08 AAPL 32.5 34.2 31.1 32.3
MSFT 58.5 59.2 52.1 53.3
2020-06-09 ABCD 1.0 2.0 3.0 4.0
2020-06-10 WXYZ 10.0 20.0 30.0 40.0

How to split a column into multiple columns in pandas?

I have this data in a pandas dataframe,
name date close quantity daily_cumm_returns
0 AARTIIND 2000-01-03 3.84 21885.82 0.000000
1 AARTIIND 2000-01-04 3.60 56645.64 -0.062500
2 AARTIIND 2000-01-05 3.52 24460.62 -0.083333
3 AARTIIND 2000-01-06 3.58 42484.24 -0.067708
4 AARTIIND 2000-01-07 3.42 16736.21 -0.109375
5 AARTIIND 2000-01-10 3.42 20598.42 -0.109375
6 AARTIIND 2000-01-11 3.41 20598.42 -0.111979
7 AARTIIND 2000-01-12 3.27 100417.29 -0.148438
8 AARTIIND 2000-01-13 3.43 20598.42 -0.106771
9 AARTIIND 2000-01-14 3.60 5149.61 -0.062500
10 AARTIIND 2000-01-17 3.46 14161.42 -0.098958
11 AARTIIND 2000-01-18 3.50 136464.53 -0.088542
12 AARTIIND 2000-01-19 3.52 21885.82 -0.083333
13 AARTIIND 2000-01-20 3.73 75956.66 -0.028646
14 AARTIIND 2000-01-21 3.84 77244.07 0.000000
15 AARTIIND 2000-02-01 4.21 90118.08 0.000000
16 AARTIIND 2000-02-02 4.52 238169.21 0.073634
17 AARTIIND 2000-02-03 4.38 163499.94 0.040380
18 AARTIIND 2000-02-04 4.44 108141.71 0.054632
19 AARTIIND 2000-02-07 4.26 68232.27 0.011876
20 AARTIIND 2000-02-08 4.00 108141.71 -0.049881
21 AARTIIND 2000-02-09 3.96 32185.04 -0.059382
22 AARTIIND 2000-02-10 4.13 43771.63 -0.019002
23 AARTIIND 2000-02-11 3.96 3862.20 -0.059382
24 AARTIIND 2000-02-14 3.94 12874.01 -0.064133
25 AARTIIND 2000-02-15 3.90 33472.42 -0.073634
26 AARTIIND 2000-02-16 3.90 25748.02 -0.073634
27 AARTIIND 2000-02-17 3.90 60507.86 -0.073634
28 AARTIIND 2000-02-18 4.22 45059.04 0.002375
29 AARTIIND 2000-02-21 4.42 81106.27 0.049881
I wish to select every months data and transpose that into a new row,
for e.g. the first 15 rows should become one row with name AARTIIND, date 2000-01-03 and then 15 columns having daily cummulative returns.
name date first second third fourth fifth .... fifteenth
0 AARTIIND 2000-01-03 0.00 -0.062 -0.083 -0.067 -0.109 .... 0.00
To group the data month wise I am using,
group = df.groupby([pd.Grouper(freq='1M', key='date'), 'name'])
Setting the rows individually by using the code below is very slow and my dataset has 1 million rows
data = pd.DataFrame(columns = ('name', 'date', 'daily_zscore_1', 'daily_zscore_2', 'daily_zscore_3', 'daily_zscore_4', 'daily_zscore_5', 'daily_zscore_6', 'daily_zscore_7', 'daily_zscore_8', 'daily_zscore_9', 'daily_zscore_10', 'daily_zscore_11', 'daily_zscore_12', 'daily_zscore_13', 'daily_zscore_14', 'daily_zscore_15'))
data.loc[0] = [x['name'].iloc[0], x['date'].iloc[0]].extend(x['daily_cumm_returns'])
Is there any other faster way to accomplish this, as I see it this is just transposing one column and hence should be very fast. I tried pivot and melt but don't understand how to use them in this situation.

This is a bit sloppy but it gets the job done.
# grab AAPL data
from pandas_datareader import data
df = data.DataReader('AAPL', 'google', start='2014-01-01')[['Close', 'Volume']]
# add name column
df['name'] = 'AAPL'
# get daily return relative to first of month
df['daily_cumm_return'] = df.resample('M')['Close'].transform(lambda x: (x - x[0]) / x[0])
# get the first of the month for each date
df['first_month_date'] = df.assign(index_col=df.index).resample('M')['index_col'].transform('first')
# get a ranking of the days 1 to n
df['day_rank']= df.resample('M')['first_month_date'].rank(method='first')
# pivot to get final
df_final = df.pivot_table(index=['name', 'first_month_date'], columns='day_rank', values='daily_cumm_return')
Sample Output
day_rank 1.0 2.0 3.0 4.0 5.0 6.0 \
name first_month_date
AAPL 2014-01-02 0.0 -0.022020 -0.016705 -0.023665 -0.017464 -0.029992
2014-02-03 0.0 0.014375 0.022052 0.021912 0.036148 0.054710
2014-03-03 0.0 0.006632 0.008754 0.005704 0.005173 0.006102
2014-04-01 0.0 0.001680 -0.005299 -0.018222 -0.033600 -0.033600
2014-05-01 0.0 0.001775 0.015976 0.004970 0.001420 -0.005917
2014-06-02 0.0 0.014141 0.025721 0.029729 0.026834 0.043314
2014-07-01 0.0 -0.000428 0.005453 0.026198 0.019568 0.019996
day_rank 7.0 8.0 9.0 10.0 11.0 \
name first_month_date
AAPL 2014-01-02 -0.036573 -0.031511 -0.012149 0.007593 0.002025
2014-02-03 0.068667 0.068528 0.085555 0.084578 0.088625
2014-03-03 0.015785 0.016846 0.005571 -0.005704 -0.001857
2014-04-01 -0.020936 -0.033600 -0.040708 -0.036831 -0.043810
2014-05-01 -0.010059 0.002249 0.003787 0.004024 -0.004497
2014-06-02 0.049438 0.045095 0.027614 0.016368 0.026612
2014-07-01 0.016253 0.018178 0.031330 0.019247 0.013473
day_rank 12.0 13.0 14.0 15.0 16.0 \
name first_month_date
AAPL 2014-01-02 -0.022526 -0.007340 -0.002911 0.005442 -0.012782
2014-02-03 0.071458 0.059037 0.047313 0.051779 0.040893
2014-03-03 0.006897 0.006632 0.001857 0.009683 0.021754
2014-04-01 -0.041871 -0.030887 -0.019385 -0.018351 -0.031274
2014-05-01 0.010178 0.022130 0.022367 0.025089 0.026627
2014-06-02 0.025276 0.026389 0.022826 0.012248 0.011357
2014-07-01 -0.004598 0.009731 0.004491 0.012831 0.039243
day_rank 17.0 18.0 19.0 20.0 21.0 \
name first_month_date
AAPL 2014-01-02 -0.004809 -0.084282 -0.094660 -0.096431 -0.095039
2014-02-03 0.031542 0.052059 0.049267 NaN NaN
2014-03-03 0.032763 0.022815 0.018437 0.017244 0.017111
2014-04-01 0.048204 0.055958 0.096795 0.093564 0.089429
2014-05-01 0.038225 0.057751 0.054911 0.074201 0.070178
2014-06-02 0.005233 0.006124 0.012137 0.024162 0.034740
2014-07-01 0.037532 0.044376 0.058811 0.051967 0.049508
day_rank 22.0 23.0
name first_month_date
AAPL 2014-01-02 NaN NaN
2014-02-03 NaN NaN
2014-03-03 NaN NaN
2014-04-01 NaN NaN
2014-05-01 NaN NaN
2014-06-02 NaN NaN
2014-07-01 0.022241 NaN

Admittedly this does not get exactly what you want...
I think one way to handle this problem would be to create new columns of month and day based on the datetime (date) column, then set a multiindex on the month and name, then pivot the table.
df['month'] = df.date.dt.month
df['day'] = df.date.dt.day
df.set_index(['month', 'name'], inplace=True)
df[['day', 'daily_cumm_returns']].pivot(index=df.index, columns='day')
Result is:
daily_cumm_returns \
day 1 2 3 4 5
month name
1 AARTIIND NaN NaN 0.00000 -0.062500 -0.083333
2 AARTIIND 0.0 0.073634 0.04038 0.054632 NaN
I can't figure out a way to keep the first date of each month group as a column, otherwise I think this is more or less what you're after.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

NaN when converting df to a series - python

It's because price['close'] has it's own index which is incompatible with timestamp. Try use .values instead: price = pd.Series(price['close'].values, index=price['timestamp'])

Related

Getting NaN when using pandas groupby

Compute returns from data frame

Calculate average revenue per user per month

Adding a row into DataFrame with multiindex

How to split a column into multiple columns in pandas?

Categories

Resources