How can I use the Apply() function for this problem? - python

I have a df, with random portfolios, they are shown as follows
>>> random_portafolios
AAPL weight MSFT weight XOM weight JNJ weight JPM weight AMZN weight GE weight FB weight T weight
0 0.188478 0.068795 0.141632 0.147974 0.178185 0.040370 0.020516 0.047275 0.166774
1 0.236818 0.008540 0.082680 0.088380 0.453573 0.021001 0.014043 0.089811 0.005155
2 0.179750 0.071711 0.050107 0.089424 0.080108 0.106136 0.155139 0.073487 0.194138
3 0.214392 0.015681 0.034284 0.276342 0.118263 0.002101 0.057484 0.000317 0.281137
4 0.301469 0.099750 0.046454 0.093279 0.020095 0.073545 0.178752 0.146486 0.040168
5 0.132916 0.006199 0.305137 0.032262 0.090356 0.169671 0.205602 0.003686 0.054172
the random portfolios have different weights
I also have the one-year returns on these actions
>>> StockReturns.head()
AAPL MSFT XOM TWTR JPM AMZN GE FB T
Date
2017-01-04 -0.001164 -0.004356 -0.011069 0.025547 0.001838 0.004657 0.000355 0.015660 -0.005874
2017-01-05 0.005108 0.000000 -0.014883 0.013642 -0.009174 0.030732 -0.005674 0.016682 -0.002686
2017-01-06 0.011146 0.008582 -0.000499 0.004681 0.000123 0.019912 0.002853 0.022707 -0.019930
2017-01-09 0.009171 -0.003170 -0.016490 0.019220 0.000741 0.001168 -0.004979 0.012074 -0.012641
2017-01-10 0.001049 -0.000335 -0.012829 -0.007429 0.002837 -0.001280 -0.002859 -0.004404 0.000278
Now I want to add two columns "Returns" and "Volatility" to that df. The returns and volatilities must be in function to the values of each row, and to do it for each row that there is.
He tried to use the Apply() function
def complex_computation():
WeightedReturns = StockReturns.mul(arr, axis=1)
ReturnsDaily= WeightedReturns.sum(axis=1)
mean_retorns_daily = np.mean(ReturnsDaily)
Returns = ((1+mean_retorns_daily)**252)
cov_mat =StockReturns.cov()
cov_mat_annual = cov_mat*252
Volatility= np.sqrt(np.dot(arr.T, np.dot(cov_mat_annual, arr)))
return Returns, Volatility
What I'm looking for is that this result appears in a row that corresponds to it (row 0, in this case), and do the same for the following rows that I have. Python def complex_computation(): said that name 'Returns' is not defined
After
def func(row):
random_portafolios['Volatility'].append(Volatility)
Returns, Volatility = complex_computation(row.values)
return pd.Series({'NewColumn1': Retturns,
'NewColumn2': Volatility})
and
def run_apply(random_portafolios):
df_result = random_portafolios.apply(func, axis=1)
return df_result
How could I run my code correctly?
Help me!

Related

Python Yfinance - Can't get SPY history

I'm following a tutorial on using Yfinance in Jupyter Notebook to get prices for SPY (S&P 500) in a dataframe. The code looks simple, but I can't seem to get the desired results.
df_tickers = pd.DataFrame()
spyticker = yf.Ticker("SPY")
print(spyticker)
df_ticker = spyticker.history(period="max", interval="1d", start="1998-12-01", end="2022-01-01" , auto_adjust=True, rounding=True)
df_ticker.head()
The error states: "SPY: No data found for this date range, symbol may be delisted." But when I print spyticker, I get the correct yfinance object:
yfinance.Ticker object <SPY>
I am not sure what your problem is but if I use the following:
spyticker = yf.Ticker("SPY")
df_ticker = spyticker.history(period="max", interval="1d", start="1998-12-01", end="2022-01-01" , auto_adjust=True, rounding=True)
df_ticker.head()
I get the following:
Open High Low Close Volume Dividends Stock Splits
Date
1998-12-01 76.02 77.27 75.43 77.00 8950600 0.0 0
1998-12-02 76.74 77.19 75.94 76.78 7495500 0.0 0
1998-12-03 76.76 77.45 75.35 75.51 12145300 0.0 0
1998-12-04 76.35 77.58 76.27 77.49 10339500 0.0 0
1998-12-07 77.29 78.21 77.25 77.86 4290000 0.0 0
My only explanation is that the call to spyticker.history already returns a dataframe, so it isn't necessary to define the df_ticker beforehand. ​

iterrate and save each stock historical data in dataframe without downloading in CSV

I would like to pull historical data from yfinance for a specific list of stocks. I want to store earch stock in a separate dataframes (each stock with its own df).
I can download it to multiple csv's through below code, but I couldn't find a way to store them in different dataframes (wihtout having to download them to csv)
import yfinance
stocks = ['TSLA','MSFT','NIO','AAPL','AMD','ADBE','ALGN','AMZN','AMGN','AEP','ADI','ANSS','AMAT','ASML','TEAM','ADSK']
for i in stocks:
df = yfinance.download(i, start='2015-01-01', end='2021-09-12')
df.to_csv( i + '.csv')
I want my end results to be a dataframe called "TSLA" for tsla historical data and another one called "MSFT" for msft data...and so on
I tried:
stock = ['TSLA','MSFT','NIO','AAPL','AMD']
df_ = {}
for i in stock:
df = yfinance.download(i, start='2015-01-01', end='2021-09-12')
df_["{}".format(i)] = df
And I have to call each dataframe by key to get it like df_["TSLA"] but this is not what I want. I need a datafram called only TSLA that have tsla data and so on. Is there a way to do it?
You don't need to download data multiple times. You just have to split whole data with groupby and create variables dynamically with locals():
stocks = ['TSLA', 'MSFT', 'NIO', 'AAPL', 'AMD', 'ADBE', 'ALGN', 'AMZN',
'AMGN', 'AEP', 'ADI', 'ANSS', 'AMAT', 'ASML', 'TEAM', 'ADSK']
data = yfinance.download(stocks, start='2015-01-01', end='2021-09-12')
for stock, df in data.groupby(level=1, axis=1):
locals()[stock] = df.droplevel(level=1, axis=1)
df.to_csv(f'{stock}.csv')
Output:
>>> TSLA
Adj Close Close High Low Open Volume
Date
2014-12-31 44.481998 44.481998 45.136002 44.450001 44.618000 11487500
2015-01-02 43.862000 43.862000 44.650002 42.652000 44.574001 23822000
2015-01-05 42.018002 42.018002 43.299999 41.431999 42.910000 26842500
2015-01-06 42.256001 42.256001 42.840000 40.841999 42.012001 31309500
2015-01-07 42.189999 42.189999 42.956001 41.956001 42.669998 14842000
... ... ... ... ... ... ...
2021-09-03 733.570007 733.570007 734.000000 724.200012 732.250000 15246100
2021-09-07 752.919983 752.919983 760.200012 739.260010 740.000000 20039800
2021-09-08 753.869995 753.869995 764.450012 740.770020 761.580017 18793000
2021-09-09 754.859985 754.859985 762.099976 751.630005 753.409973 14077700
2021-09-10 736.270020 736.270020 762.609985 734.520020 759.599976 15114300
[1686 rows x 6 columns]
>>> ANSS
Adj Close Close High Low Open Volume
Date
2014-12-31 82.000000 82.000000 83.480003 81.910004 83.080002 304600
2015-01-02 81.639999 81.639999 82.629997 81.019997 82.089996 282600
2015-01-05 80.860001 80.860001 82.070000 80.779999 81.290001 321500
2015-01-06 79.260002 79.260002 81.139999 78.760002 81.000000 344300
2015-01-07 79.709999 79.709999 80.900002 78.959999 79.919998 233300
... ... ... ... ... ... ...
2021-09-03 368.380005 368.380005 371.570007 366.079987 366.079987 293000
2021-09-07 372.070007 372.070007 372.410004 364.950012 369.609985 249500
2021-09-08 372.529999 372.529999 375.820007 369.880005 371.079987 325800
2021-09-09 371.970001 371.970001 375.799988 371.320007 372.519989 194900
2021-09-10 373.609985 373.609985 377.260010 372.470001 374.540009 278800
[1686 rows x 6 columns]
You can create global or local variable like
globals()["TSLA"] = "some value"
print(TSLA)
locals()["TSLA"] = "some value"
print(TSLA)
but frankly it is waste of time. It is much more useful to keep it as dictionary.
With dictionary you can use for-loop to run some code on all dataframes.
You can also seletect dataframes by name. etc.
Examples:
df_max = {}
for name, df in df_.items():
df_max[name] = df.max()
name = input("What to display: ")
df_[name].plot()

Is it possible to apply seasonal_decompose() after a groupby() where date is not the index of the data frame?

So, for a forecasting project, I have a really long Dataframe of multiple time series of the following type (it has a numerical index):
date
time_series_id
value
2015-08-01
0
0
2015-08-02
0
1
2015-08-03
0
2
2015-08-04
0
3
2015-08-01
1
2
2015-08-02
1
3
2015-08-03
1
4
2015-08-04
1
5
My objective, is to add 3 new columns to these dataset, for each individual time series (each id) that correspond to trend, seasonal and resid.
According to the characteristics of the dataset, they tend to have Nans at the start and the end of the dates.
What I was trying to do was the following:
from statsmodels.tsa.seasonal import seasonal_decompose
df.assign(trend = lambda x: x.groupby("time_series_id")["value"].transform(lambda s: s.mask(~s.isna(), other= seasonal_decompose(s[~s.isna()], model='aditive', extrapolate_trend='freq').trend))
The expected output (trend value are not actual values) should be:
date
time_series_id
value
trend
2015-08-01
0
0
1
2015-08-02
0
1
1
2015-08-03
0
2
1
2015-08-04
0
3
1
2015-08-01
1
2
1
2015-08-02
1
3
1
2015-08-03
1
4
1
2015-08-04
1
5
1
But I get the following error message:
AttributeError: 'Int64Index' object has no attribute 'inferred_freq'
In a previous iteration of my code, this worked for my individual time series data frames, since I had embedded the date column as an index of the data frame instead of an additional column, so the "x" that the lambda function takes has already a "date time" index appropriate for the seasonal_decompose function.
df.assign(
trend = lambda x: x["value"].mask(~x["value"].isna(), other =
seasonal_decompose(x["value"][~x["value"].isna()], model='aditive', extrapolate_trend='freq').trend))
My questions are, first: is it possible to achieve this using groupby? Or other approaches are possible second: is it possible to handle this that doesn't eat much memory? The original dataset I'm working on has approximately 1MM ~ rows, so any help is really welcomed :).
Did one of the already posed solutions work? If so or you found a different solution please share. I tried each without success, but I'm new to Python so probably missing something.
Here is what I came up with, using a for loop. For my dataset it took 8 minutes to decompose 20 million rows consisting of 6,000 different subsets. This works but I wish it were faster.
Date Time
Segment ID
Travel Time(Minutes)
2021-11-09 07:15:00
1
30
2021-11-09 07:30:00
1
18
2021-11-09 07:15:00
2
30
2021-11-09 07:30:00
2
17
segments = set(frame['Segment ID'])
data = pd.DataFrame([])
for s in segments:
df = frame[frame['Segment ID'] == s].set_index('Date Time').resample('H').mean()
comp = sm.tsa.seasonal_decompose(x=df['Travel Time(Minutes)'], period=24*7, two_sided=False)
df = df.join(comp.trend).join(comp.seasonal).join(comp.resid)
#optional columns with some statistics to find outliers and trend changes
df['resid zscore'] = (df['resid'] - df['resid'].mean()).div(df['resid'].std())
df['trend pct_change'] = df.trend.pct_change()
df['trend pct_change zscore'] = (df['trend pct_change'] - df['trend pct_change'].mean()).div(df['trend pct_change'].std())
data = data.append(df.dropna())
where you have lambda x: x.groupby(..., you don't have anything to group; you are telling it to group a row (I believe). You can try a setup like this, perhaps
Here you define a function to act on the group you are sending via the apply() method. Then you should be able to use your original code.
I have not tested this, but I use this setup quite often to work on groups.
def trend_function(x):
# do your lambda function here as you are sending each grouping
x.assign(
trend = lambda x: x["value"].mask(~x["value"].isna(), other =
seasonal_decompose(x["value"][~x["value"].isna()], model='aditive', extrapolate_trend='freq').trend))
return x
dfnew = df.groupby('time_series_id').apply(trend_function)
use extrapolate_trend='freq' as a parameter. you add the trend, seasonal, and residual to a dictionary and plot the dictionary
from statsmodels.graphics import tsaplots
import statsmodels.api as sm
date=['2015-08-01','2015-08-02','2015-08-03','2015-08-04','2015-08-01','2015-08-02','2015-08-03','2015-08-04']
time_series_id=[0,0,0,0,1,1,1,1]
value=[0,1,2,3,2,3,4,5]
df=pd.DataFrame({'date':date,'time_series_id':time_series_id,'value':value})
df['date']=pd.to_datetime(df['date'])
df=df.set_index('date')
print(df)
index_day = df.index.day
value_by_day = df.groupby(index_day)['value'].mean()
fig,ax = plt.subplots(figsize=(12,4))
value_by_day.plot(ax=ax)
plt.title('value by month')
plt.show()
df[['value']].boxplot()
plt.show()
fig,ax = plt.subplots(figsize=(12,4))
df[['value']].hist(ax=ax, bins=5)
plt.show()
fig,ax = plt.subplots(figsize=(12,4))
df[['value']].plot(kind='density', ax=ax)
plt.show()
plt.clf()
fig,ax = plt.subplots(figsize=(12,4))
plt.style.use('seaborn-pastel')
fig = tsaplots.plot_acf(df['value'], lags=4,ax=ax)
plt.show()
decomposition=sm.tsa.seasonal_decompose(x=df['value'],model='additive', extrapolate_trend='freq', period=1)
decomposition.plot()
plt.show()
decomposition_trend=decomposition.trend
ax= decomposition_trend.plot(figsize=(14,2))
ax.set_xlabel('Date')
ax.set_ylabel('Trend of time series')
ax.set_title('Trend values of the time series')
plt.show()
I changed the first piece of code according to my scinario.
Here's my code and attached output
data = pd.DataFrame([])
segments = set(subset['Planning_Material'])
for s in segments:
df = subset[subset['Planning_Material'] == s].set_index('Cal_year_month').resample('M').sum()
comp = sm.tsa.seasonal_decompose(df)
df = df.join(comp.trend).join(comp.seasonal).join(comp.resid)
df['Planning_Material'] = s
data = pd.concat([data,df])
data = data.reset_index()
data = data[['Planning_Material', 'Cal_year_month', 'Historical_demand', 'trend','seasonal','resid']]
data

Generating monthly means for all columns without initializing a list for each column?

I have time series data I want to generate the mean for each month, for each column. I have successfully done so, but by creating a list for each column - which wouldn't be feasible for thousands of columns.
How can I adapt my code to auto-populate the column names and values into a dataframe with thousands of columns?
For context, this data has 20 observations per hour for 12 months.
Original data:
timestamp 56TI1164 56FI1281 56TI1281 52FC1043 57TI1501
2016-12-31 23:55:00 117.9673 17876.27 39.10074 9302.815 49.23963
2017-01-01 00:00:00 118.1080 17497.48 39.10759 9322.773 48.97919
2017-01-01 00:05:00 117.7809 17967.33 39.11348 9348.223 48.94284
Output:
56TI1164 56FI1281 56TI1281 52FC1043 57TI1501
0 106.734147 16518.428734 16518.428734 7630.187992 45.992215
1 115.099825 18222.911023 18222.911023 9954.252911 47.334477
2 111.555504 19090.607211 19090.607211 9283.845649 48.939581
3 102.408996 18399.719852 18399.719852 7778.897037 48.130057
4 118.371951 20245.378742 20245.378742 9024.424210 64.796939
5 127.580516 21859.212675 21859.212675 9595.477455 70.952311
6 134.159082 22349.853561 22349.853561 10305.252112 75.195480
7 137.990638 21122.233427 21122.233427 10024.709142 74.755469
8 144.958318 18633.290818 18633.290818 11193.381098 66.776627
9 122.406489 20258.135923 20258.135923 10504.604420 61.793355
10 104.817850 18762.070668 18762.070668 9361.052983 51.802615
11 106.589672 20049.809554 20049.809554 9158.685383 51.611633
Successful code:
#separate data into months
v = list(range(1,13))
data_month = []
for i in v:
data_month.append(data[(data.index.month==i)])
# average per month for each sensor
mean_56TI1164 = []
mean_56FI1281 = []
mean_56TI1281 = []
mean_52FC1043 = []
mean_57TI1501 = []
for i in range(0,12):
mean_56TI1164.append(data_month[i]['56TI1164'].mean())
mean_56FI1281.append(data_month[i]['56FI1281'].mean())
mean_56TI1281.append(data_month[i]['56FI1281'].mean())
mean_52FC1043.append(data_month[i]['52FC1043'].mean())
mean_57TI1501.append(data_month[i]['57TI1501'].mean())
mean_df = {'56TI1164': mean_56TI1164, '56FI1281': mean_56FI1281, '56TI1281': mean_56TI1281, '52FC1043': mean_52FC1043, '57TI1501': mean_57TI1501}
mean_df = pd.DataFrame(mean_df, columns= ['56TI1164', '56FI1281', '56TI1281', '52FC1043', '57TI1501'])
mean_df
Unsuccessful attempt to condense:
col = list(data.columns)
mean_df = pd.DataFrame()
for i in range(0,12):
for j in col:
mean_df[j].append(data_month[i][j].mean())
mean_df
As suggested by G. Anderson, you can use groupby as in this example:
import pandas as pd
import io
csv="""timestamp 56TI1164 56FI1281 56TI1281 52FC1043 57TI1501
2016-12-31 23:55:00 117.9673 17876.27 39.10074 9302.815 49.23963
2017-01-01 00:00:00 118.1080 17497.48 39.10759 9322.773 48.97919
2017-01-01 00:05:00 117.7809 17967.33 39.11348 9348.223 48.94284
2018-01-01 00:05:00 120.0000 17967.33 39.11348 9348.223 48.94284
2018-01-01 00:05:00 124.0000 17967.33 39.11348 9348.223 48.94284"""
# The following lines read your data into a pandas dataframe;
# it may help if your data comes in the form you wrote in the question
dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')
data = pd.read_csv(io.StringIO(csv), sep='\s+(?!\d\d:\d\d:\d\d)', \
date_parser=dateparse, index_col=0, engine='python')
# Here is where your data is resampled by month and mean is calculated
data.groupby(pd.Grouper(freq='M')).mean()
# If you have missing months, use this instead:
#data.groupby(pd.Grouper(freq='M')).mean().dropna()
Result of data.groupby(pd.Grouper(freq='M')).mean().dropna() will be:
56TI1164 56FI1281 56TI1281 52FC1043 57TI1501
timestamp
2016-12-31 117.96730 17876.270 39.100740 9302.815 49.239630
2017-01-31 117.94445 17732.405 39.110535 9335.498 48.961015
2018-01-31 122.00000 17967.330 39.113480 9348.223 48.942840
Please note that I used data.groupby(pd.Grouper(freq='M')).mean().dropna() to get rid of NaN for the missing months (I added some data for January 2018 skipping what's in between).
Also note that the convoluted read_csv uses a regular expression as a separator: \s+ means one or more whitespace characters, while (?!\d\d:\d\d:\d\d) means "skip this whitespace if followed by something like 23:55:00".
Last engine='python' avoids warnings when read_csv() is used with regular expression

How can I process a DataFrame without a for loop?

My DataFrame is:
Date Open High Low Close Adj Close Volume
5932 2016-08-18 218.339996 218.899994 218.210007 218.860001 207.483215 52989300
5933 2016-08-19 218.309998 218.750000 217.740005 218.539993 207.179825 75443000
5934 2016-08-22 218.259995 218.800003 217.830002 218.529999 207.170364 61368800
5935 2016-08-23 219.250000 219.600006 218.899994 218.970001 207.587479 53399200
5936 2016-08-24 218.800003 218.910004 217.360001 217.850006 206.525711 71728900
5937 2016-08-25 217.399994 218.190002 217.220001 217.699997 206.383514 69224800
5938 2016-08-26 217.919998 219.119995 216.250000 217.289993 205.994827 122506300
5939 2016-08-29 217.440002 218.669998 217.399994 218.360001 207.009201 68606100
5940 2016-08-30 218.259995 218.589996 217.350006 218.000000 206.667908 58114500
5941 2016-08-31 217.610001 217.750000 216.470001 217.380005 206.080124 85269500
5942 2016-09-01 217.369995 217.729996 216.029999 217.389999 206.089645 97844200
5943 2016-09-02 218.389999 218.869995 217.699997 218.369995 207.018692 79293900
5944 2016-09-06 218.699997 219.119995 217.860001 219.029999 207.644394 56702100
5945 2016-09-07 218.839996 219.220001 218.300003 219.009995 207.625412 76554900
5946 2016-09-08 218.619995 218.940002 218.149994 218.509995 207.151398 73011600
5947 2016-09-09 216.970001 217.029999 213.250000 213.279999 202.193268 221589100
5948 2016-09-12 212.389999 216.809998 212.309998 216.339996 205.094223 168110900
5949 2016-09-13 214.839996 215.149994 212.500000 213.229996 202.145859 182828800
5950 2016-09-14 213.289993 214.699997 212.500000 213.149994 202.070023 134185500
5951 2016-09-15 212.960007 215.729996 212.750000 215.279999 204.089294 134427900
5952 2016-09-16 213.479996 213.690002 212.570007 213.369995 203.300430 155236400
Currently, I'm doing this:
state['open_price'] = lookback.Open.iloc[-1:].get_values()[0]
for ind, row in lookback.reset_index().iterrows():
if ind < self.LOOKBACK_DAYS:
state['close_' + str(self.LOOKBACK_DAYS - ind)] = row.Close
state['open_' + str(self.LOOKBACK_DAYS - ind)] = row.Open
state['volume_' + str(self.LOOKBACK_DAYS - ind)] = row.Volume
But this is exceedingly slow. Is there some more vectorized way to do this?
I am trying to convert this to:
cash 1.000000e+05
num_shares 0.000000e+00
cost_basis 0.000000e+00
open_price 1.316900e+02
close_20 1.301100e+02
open_20 1.302600e+02
volume_20 4.670420e+07
close_19 1.302100e+02
open_19 1.299900e+02
volume_19 4.320920e+07
close_18 1.300200e+02
open_18 1.300300e+02
volume_18 3.252300e+07
close_17 1.292200e+02
open_17 1.299300e+02
volume_17 8.207990e+07
close_16 1.300300e+02
open_16 1.294100e+02
volume_16 6.150570e+07
close_15 1.298000e+02
open_15 1.301100e+02
volume_15 7.057170e+07
close_14 1.298300e+02
open_14 1.300200e+02
volume_14 6.292560e+07
close_13 1.297300e+02
open_13 1.300700e+02
volume_13 6.162470e+07
close_12 1.305600e+02
open_12 1.297300e+02
...
close_10 1.308700e+02
open_10 1.308500e+02
volume_10 5.790620e+07
close_9 1.295400e+02
open_9 1.310600e+02
volume_9 8.018090e+07
close_8 1.297400e+02
open_8 1.297400e+02
volume_8 4.149650e+07
close_7 1.286400e+02
open_7 1.298500e+02
volume_7 7.279940e+07
close_6 1.288800e+02
open_6 1.287700e+02
volume_6 4.303370e+07
close_5 1.287100e+02
open_5 1.285900e+02
volume_5 5.105180e+07
close_4 1.286600e+02
open_4 1.288300e+02
volume_4 6.416770e+07
close_3 1.307000e+02
open_3 1.289300e+02
volume_3 9.253180e+07
close_2 1.309500e+02
open_2 1.307500e+02
volume_2 8.726900e+07
close_1 1.311300e+02
open_1 1.310000e+02
volume_1 8.600550e+07
Length: 64, dtype: float64
One way is to cheat and use the underlying arrays using .values
I'll add some steps that i took to create an equivalent example as well:
import pandas as pd
from itertools import product
initial = ['cash', 'num_shares', 'somethingsomething']
initial_series = pd.Series([1, 2, 3], index = initial)
print(initial_series)
#Output:
cash 1
num_shares 2
somethingsomething 3
dtype: int64
Okay, just some values at the start of your series in output, mocked for the example.
df = pd.read_clipboard(sep='\s\s+') #pure magic
print(df.head())
#Output:
Date Open ... Adj Close Volume
5932 2016-08-18 218.339996 ... 207.483215 52989300
5933 2016-08-19 218.309998 ... 207.179825 75443000
5934 2016-08-22 218.259995 ... 207.170364 61368800
5935 2016-08-23 219.250000 ... 207.587479 53399200
5936 2016-08-24 218.800003 ... 206.525711 71728900
[5 rows x 7 columns]
df is now essentially the dataframe you provided in the example. The clipboard trick comes from here and is a good read for pandas MCVEs.
to_select = ['Close', 'Open', 'Volume']
SOMELOOKBACK = 6000 #mocked
final_index = [f"{name}_{index}" for index, name in product((SOMELOOKBACK - df.index), to_select)]
This prepares the indexes and looks something like this
['Close_68',
'Open_68',
'Volume_68',
'Close_67',
'Open_67',
'Volume_67',
...
]
Now, just select the relevant columns from dataframe, use .values to get a 2d array then flatten, to get the final series.
final_series = pd.Series(df[to_select].values.flatten(), index = final_index)
result = initial_series.append(final_series)
#Output:
cash 1.000000e+00
num_shares 2.000000e+00
somethingsomething 3.000000e+00
Close_68 2.188600e+02
Open_68 2.183400e+02
Volume_68 5.298930e+07
Close_67 2.185400e+02
Open_67 2.183100e+02
Volume_67 7.544300e+07
Close_66 2.185300e+02
Open_66 2.182600e+02
Volume_66 6.136880e+07
...
Close_48 2.133700e+02
Open_48 2.134800e+02
Volume_48 1.552364e+08
Length: 66, dtype: float64

Categories