How to join a dataframe to a Series with different indices

How to join a dataframe to a Series with different indices - python

I have a pandas data frame that looks like:
High Low ... Volume OpenInterest
2018-01-02 983.25 975.50 ... 8387 67556
2018-01-03 986.75 981.00 ... 7447 67525
2018-01-04 985.25 977.00 ... 8725 67687
2018-01-05 990.75 984.00 ... 7948 67975
I calculate the Average True Range and save it into a series:
i = 0
TR_l = [0]
while i < (df.shape[0]-1):
#TR = max(df.loc[i + 1, 'High'], df.loc[i, 'Close']) - min(df.loc[i + 1, 'Low'], df.loc[i, 'Close'])
TR = max(df['High'][i+1], df['Close'][i]) - min(df['Low'][i+1], df['Close'][i])
TR_l.append(TR)
i = i + 1
TR_s = pd.Series(TR_l)
ATR = pd.Series(TR_s.ewm(span=n, min_periods=n).mean(), name='ATR_' + str(n))
With a 14-period rolling window ATR looks like:
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 NaN
10 NaN
11 NaN
12 NaN
13 8.096064
14 7.968324
15 8.455205
16 9.046418
17 8.895405
18 9.088769
19 9.641879
20 9.516764
But when I do:
df = df.join(ATR)
The ATR column in df is all NaN. It's because the indexes are different between the data frame and ATR. Is there any way to add the ATR column into the data frame?

Consider shift to avoid the while loop across rows and list building. Below uses Union Pacific (UNP) railroad stock data to demonstrate:
import pandas as pd
import pandas_datareader as pdr
stock_df = pdr.get_data_yahoo('UNP').loc['2019-01-01':'2019-03-29']
# SHIFT DATA ONE DAY BACK AND JOIN TO ORIGINAL DATA
stock_df = stock_df.join(stock_df.shift(-1), rsuffix='_future')
# CALCULATE TR DIFFERENCE BY ROW
stock_df['TR'] = stock_df.apply(lambda x: max(x['High_future'], x['Close']) - min(x['Low_future'], x['Close']), axis=1)
# CALCULATE EWM MEAN
n = 14
stock_df['ATR'] = stock_df['TR'].ewm(span=n, min_periods=n).mean()
Output
print(stock_df.head(20))
# High Low Open Close Volume Adj Close High_future Low_future Open_future Close_future Volume_future Adj Close_future TR ATR
# Date
# 2019-01-02 138.320007 134.770004 135.649994 137.779999 3606300.0 137.067413 136.750000 132.169998 136.039993 132.679993 5684500.0 131.993790 5.610001 NaN
# 2019-01-03 136.750000 132.169998 136.039993 132.679993 5684500.0 131.993790 138.580002 134.520004 134.820007 137.789993 5649900.0 137.077362 5.900009 NaN
# 2019-01-04 138.580002 134.520004 134.820007 137.789993 5649900.0 137.077362 139.229996 136.259995 137.330002 138.649994 4034200.0 137.932907 2.970001 NaN
# 2019-01-07 139.229996 136.259995 137.330002 138.649994 4034200.0 137.932907 152.889999 149.039993 151.059998 150.750000 10558800.0 149.970337 14.240005 NaN
# 2019-01-08 152.889999 149.039993 151.059998 150.750000 10558800.0 149.970337 151.059998 148.610001 150.289993 150.360001 4284600.0 149.582352 2.449997 NaN
# 2019-01-09 151.059998 148.610001 150.289993 150.360001 4284600.0 149.582352 155.289993 149.009995 149.899994 154.660004 6444600.0 153.860123 6.279999 NaN
# 2019-01-10 155.289993 149.009995 149.899994 154.660004 6444600.0 153.860123 155.029999 153.089996 153.639999 153.210007 3845200.0 152.417618 1.940002 NaN
# 2019-01-11 155.029999 153.089996 153.639999 153.210007 3845200.0 152.417618 154.240005 151.649994 152.229996 153.889999 3507100.0 153.094101 2.590012 NaN
# 2019-01-14 154.240005 151.649994 152.229996 153.889999 3507100.0 153.094101 154.360001 151.740005 153.789993 152.479996 4685100.0 151.691391 2.619995 NaN
# 2019-01-15 154.360001 151.740005 153.789993 152.479996 4685100.0 151.691391 153.729996 150.910004 152.910004 151.970001 4053200.0 151.184021 2.819992 NaN
# 2019-01-16 153.729996 150.910004 152.910004 151.970001 4053200.0 151.184021 154.919998 150.929993 151.110001 154.639999 4075400.0 153.840210 3.990005 NaN
# 2019-01-17 154.919998 150.929993 151.110001 154.639999 4075400.0 153.840210 158.800003 155.009995 155.539993 158.339996 5003900.0 157.521072 4.160004 NaN
# 2019-01-18 158.800003 155.009995 155.539993 158.339996 5003900.0 157.521072 157.199997 154.410004 156.929993 155.020004 6052900.0 154.218262 3.929993 NaN
# 2019-01-22 157.199997 154.410004 156.929993 155.020004 6052900.0 154.218262 156.020004 152.429993 155.449997 154.330002 4858000.0 153.531830 3.590012 4.011254
# 2019-01-23 156.020004 152.429993 155.449997 154.330002 4858000.0 153.531830 160.759995 156.009995 160.039993 160.339996 9222400.0 159.510742 6.429993 4.376440
# 2019-01-24 160.759995 156.009995 160.039993 160.339996 9222400.0 159.510742 162.000000 160.220001 161.460007 160.949997 7770700.0 160.117584 1.779999 3.991223
# 2019-01-25 162.000000 160.220001 161.460007 160.949997 7770700.0 160.117584 160.789993 159.339996 160.000000 159.899994 3733800.0 159.073013 1.610001 3.643168
# 2019-01-28 160.789993 159.339996 160.000000 159.899994 3733800.0 159.073013 160.929993 158.750000 160.039993 160.169998 3436900.0 159.341614 2.179993 3.432011
# 2019-01-29 160.929993 158.750000 160.039993 160.169998 3436900.0 159.341614 161.889999 159.440002 161.089996 160.820007 4112200.0 159.988266 2.449997 3.291831
# 2019-01-30 161.889999 159.440002 161.089996 160.820007 4112200.0 159.988266 160.990005 157.020004 160.750000 159.070007 7438600.0 158.247314 3.970001 3.387735

Related

Nested loops turn in rounds

I am trying to loop over a DataFrame.
my code sample is the following:
for index1, long in tqdm(sim_df_ha_rsi.iterrows(), total=sim_df_ha_rsi.shape[0]):
if index1 in validLongLimits.index:
order_index += 1
orders[index] = {
'Type': 'LONG',
'Open': open_price,
'DateOpen': start_date,
'Commission_open': open_price * 0.0001 * amount,
'Commission_close': None,
'Close': None,
'DateClose': None,
'Profit': None,
'Balance': 165000,
'IsOpen': True
}
for index2, short in sim_df_ha_rsi.iloc[index1:].iterrows():
if index2 in validShortLimits.index:
if long['Date'] < short['Date']:
orders[index]['IsOpen'] = False
orders[index]['DateClose'] = short['Date']
orders[index]['Close'] = short['Close']
orders[index]['Commission_close'] = short['Close'] * 0.0001 * amount
for index3, stop in sim_df_ha_rsi.iloc[index2:].iterrows():
if index3 in validLongLimits.index:
open_price = validLongLimits.loc[ validLongLimits.index == index3 ]['Open'].values[0]
start_date = validLongLimits.loc[ validLongLimits.index == index3 ]['Date'].values[0]
index = index3
break
break
I fail to catch the following pattern:
let sim_df_ha_rsi be the data. validLongLimits and validShortLimits are sublists of the data. for a given value in validLongLimits, find the following value which belongs to validShortLimits, then reloop for the next value in validLongLimits which comes after the shorted one.
sim_df_ha_rsi is like:
Date Open High Low Close RSI_14_1m RSI_14_5m
0 2023-01-03 10:45:00 20.090000 20.180000 20.059999 20.115000 NaN NaN
1 2023-01-03 11:00:00 20.110001 20.100000 19.870001 19.990000 NaN NaN
2 2023-01-03 11:15:00 19.995000 19.950001 19.860001 19.897500 NaN NaN
3 2023-01-03 11:30:00 19.889999 19.889999 19.799999 19.852499 NaN NaN
4 2023-01-03 11:45:00 19.860000 19.850000 19.790001 19.817500 NaN NaN
... ... ... ... ... ... ... ...
313 2023-01-17 17:00:00 17.675000 17.750000 17.650000 17.700000 72.283634 62.390653
314 2023-01-17 17:15:00 17.700000 17.740000 17.629999 17.682499 70.135453 62.390653
315 2023-01-17 17:30:00 17.679999 17.660000 17.370001 17.547500 56.248421 55.891129
316 2023-01-17 17:45:00 17.580000 17.540001 17.410000 17.487500 51.379162 55.891129
317 2023-01-17 18:00:00 17.500000 17.520000 17.520000 17.520000 53.716387 55.891129
my output would be like:
Type Open DateOpen Commission_open Commission_close Close DateClose Profit Balance IsOpen
63 LONG 19.430000 2023-01-05 11:30:00 0.19430 None None None None 165000 True
74 LONG 19.600000 2023-01-05 14:15:00 0.19600 None None None None 165000 True
75 LONG 19.420000 2023-01-05 14:30:00 0.19420 None None None None 165000 True
76 LONG 19.330000 2023-01-05 14:45:00 0.19330 None None None None 165000 True
I want to open position and close position consecutively through data with specific conditions. Then, recalculate it with the price I closed the previous one.
my code seemingly works but shortcuts many gaps. It makes many calculations to return, say, 2 elements. breaks cause some fragility in the code. I don't want to use any break statement but don't know how to do.
Would a while loop with an anchor be more useful here?

Paraphase your inner for-loop line by line (but haven't tested with data):
short_df = sim_df_ha_rsi.iloc[index1:].filter(items=validShortLimits.index, axis=0)
short_df = short_df[short_df['Date'] > long['Date']]
if len(short_df) > 0:
short = short_df.iloc[0]
index2 = short.name
orders[index]['IsOpen'] = False
orders[index]['DateClose'] = short['Date']
orders[index]['Close'] = short['Close']
orders[index]['Commission_close'] = short['Close'] * 0.0001 * amount
long_df = sim_df_ha_rsi.iloc[index2:].filter(items=validLongLimits.index, axis=0)
if len(long_df) > 0:
open_price = long_df.iloc[0]['Open'].values[0]
start_date = long_df.iloc[0]['Date'].values[0]
index = long_df.iloc[0].name

How to convert data from DataFrame to form

I'm trying to make a report and then convert it to the prescribed form but I don't know how. Below is my code:
data = pd.read_csv('https://raw.githubusercontent.com/hoatranobita/reports/main/Loan_list_test.csv')
data_pivot = pd.pivot_table(data,('CLOC_CUR_XC_BL'),index=['BIZ_TYPE_SBV_CODE'],columns=['TERM_CODE','CURRENCY_CD'],aggfunc=np.sum).reset_index
print(data_pivot)
Pivot table shows as below:
<bound method DataFrame.reset_index of TERM_CODE Ng?n h?n Trung h?n
CURRENCY_CD 1. VND 2. USD 1. VND 2. USD
BIZ_TYPE_SBV_CODE
201 170000.00 NaN 43533.42 NaN
202 2485441.64 5188792.76 2682463.04 1497309.06
204 35999.99 NaN NaN NaN
301 1120940.65 NaN 190915.62 453608.72
401 347929.88 182908.01 239123.29 NaN
402 545532.99 NaN 506964.23 NaN
403 21735.74 NaN 1855.92 NaN
501 10346.45 NaN NaN NaN
601 881974.40 NaN 50000.00 NaN
602 377216.09 NaN 828868.61 NaN
702 9798.74 NaN 23616.39 NaN
802 155099.66 NaN 762294.95 NaN
803 23456.79 NaN 97266.84 NaN
804 151590.00 NaN 378000.00 NaN
805 182925.30 54206.52 4290216.37 NaN>
Here is the prescribed form:
form = pd.read_excel('https://github.com/hoatranobita/reports/blob/main/Form%20A00034.xlsx?raw=true')
form.head()
Mã ngành kinh tế Dư nợ tín dụng (không bao gồm mua, đầu tư trái phiếu doanh nghiệp) Unnamed: 2 Unnamed: 3 Unnamed: 4 Unnamed: 5
0 NaN Ngắn hạn NaN Trung và dài hạn NaN Tổng cộng
1 NaN Bằng VND Bằng ngoại tệ Bằng VND Bằng ngoại tệ NaN
2 101.0 NaN NaN NaN NaN NaN
3 201.0 NaN NaN NaN NaN NaN
4 202.0 NaN NaN NaN NaN NaN
As you see, pivot table have no 101 but form has. So what I have to do to convert from Dataframe to Form that skip 101.
Thank you.

Hi First create a worksheet using xlsxwriter
import xlsxwriter
#start workbook
workbook = xlsxwriter.Workbook('merge1.xlsx')
#Introduce formatting
format = workbook.add_format({'border': 1,'bold': True})
#Adding a worksheet
worksheet = workbook.add_worksheet()
merge_format = workbook.add_format({
'bold':1,
'border': 1,
'align': 'center',
'valign': 'vcenter'})
#Starting the Headers
worksheet.merge_range('A1:A3', 'Mã ngành kinh tế', merge_format)
worksheet.merge_range('B1:F1', 'Dư nợ tín dụng (không bao gồm mua, đầu tư trái phiếu doanh nghiệp)', merge_format)
worksheet.merge_range('B2:C2', 'Ngắn hạn', merge_format)
worksheet.merge_range('D2:E2', 'Trung và dài hạn', merge_format)
worksheet.merge_range('F2:F3', 'Tổng cộng', merge_format)
worksheet.write(2, 1, 'Bằng VND',format)
worksheet.write(2, 2, 'Bằng ngoại tệ',format)
worksheet.write(2, 3, 'Bằng VND',format)
worksheet.write(2, 4, 'Bằng ngoại tệ',format)
After this formatting you can start writing to sheet looping through using worksheet.write() below I have included a sample
expenses = (
['Rent', 1000],
['Gas', 100],
['Food', 300],
['Gym', 50],
)
for item, cost in (expenses):
worksheet.write(row, col, item)
row += 1
In row and col you can specify the cell row and column number it goes as a numerical value like a matrix.
And finally close the workbook
workbook.close()

How can I manipulate my DataFrame/Table in order to display in the following format?

How can I modify the output from what it is currently, into the arrangement of the output as described at the bottom? I've tried stacking and un-stacking but I can't seem to hit the head on the nail. Help would be highly appreciated.
My code:
portfolio_count = 0
Equity_perportfolio = []
Portfolio_sequence = []
while portfolio_count < 1:
# declaring list
list = Tickers
portfolio_count = portfolio_count + 1
# initializing the value of n (Number of assets in portfolio)
n = 5
# printing n elements from list (add number while printing the potential portfolio)
potential_portfolio = random.sample(list, n)
print("Portfolio number", portfolio_count)
print(potential_portfolio)
#Pull 'relevant data' about the selected stocks. (Yahoo API?) # 1. df with Index Date and Closing
price_data_close = web.get_data_yahoo(potential_portfolio,
start = '2012-01-01',
end = '2021-03-31')['Close']
price_data = web.get_data_yahoo(potential_portfolio,
start = '2012-01-01',
end = '2021-03-31')
print(price_data)
Which gives me the following structure:(IGNORE NaNs)
Attributes Adj Close ... Volume
Symbols D HOLX PSX ... PSX MGM PG
Date ...
2012-01-03 36.209511 17.840000 NaN ... NaN 25873300.0 11565900.0
2012-01-04 35.912926 17.910000 NaN ... NaN 14717400.0 10595400.0
2012-01-05 35.837063 18.360001 NaN ... NaN 12437500.0 10085300.0
2012-01-06 35.471519 18.570000 NaN ... NaN 9079700.0 8421200.0
2012-01-09 35.423241 18.520000 NaN ... NaN 15750100.0 7836100.0
... ... ... ... ... ... ... ...
2021-03-25 75.220001 71.050003 82.440002 ... 2613300.0 9601500.0 7517300.0
2021-03-26 75.779999 73.419998 84.309998 ... 2368900.0 7809100.0 10820100.0
2021-03-29 76.699997 74.199997 82.529999 ... 1880600.0 7809700.0 11176000.0
2021-03-30 75.529999 73.870003 82.309998 ... 1960600.0 5668500.0 8090600.0
2021-03-31 75.959999 74.379997 81.540001 ... 2665200.0 7029900.0 9202600.0
However, I wanted it to output in this format:
Date Symbols Open High Low Close Volume Adjusted
04/12/2020 MMM 172.130005 173.160004 171.539993 172.460007 2663600 171.050461
07/12/2020 MMM 171.720001 172.5 169.179993 170.149994 2526800 168.759323
08/12/2020 MMM 169.740005 172.830002 169.699997 172.460007 1730800 171.050461
08/12/2020 MMM 169.740005 172.830002 169.699997 172.460007 1730800 171.050461
11/12/2020 D 172.300003 174.649994 172.169998 174.020004 1875700 172.597702
11/12/2020 D 172.300003 174.649994 172.169998 174.020004 1875700 172.597702
11/12/2020 D 172.300003 174.649994 172.169998 174.020004 1875700 172.597702
14/12/2020 D 175.669998 176.199997 172.990005 173.080002 3700100 171.66539
14/12/2020 D 175.669998 176.199997 172.990005 173.080002 3700100 171.66539
14/12/2020 PSX 175.669998 176.199997 172.990005 173.080002 3700100 171.66539
14/12/2020 PSX 175.669998 176.199997 172.990005 173.080002 3700100 171.66539
15/12/2020 PSX 174.389999 175.059998 172.550003 174.679993 2270600 173.252304
18/12/2020 PSX 176.759995 177.460007 175.110001 176.419998 4682000 174.978088
18/12/2020 PSX 176.759995 177.460007 175.110001 176.419998 4682000 174.978088
23/12/2020 PG 175.300003 175.809998 173.960007 173.990005 1762600 172.567963
28/12/2020 PG 175.309998 176.399994 174.389999 174.710007 1403000 173.282074
29/12/2020 PG 175.550003 175.639999 173.149994 173.850006 1218900 172.429108
31/12/2020 PG 174.119995 174.869995 173.179993 174.789993 1841300 173.361404
05/01/2021 PG 172.009995 173.25 170.649994 171.580002 2295300 170.177643
07/01/2021 MMM 171.559998 173.460007 166.160004 169.720001 5863400 168.332855
07/01/2021 MMM 171.559998 173.460007 166.160004 169.720001 5863400 168.332855
07/01/2021 MMM 171.559998 173.460007 166.160004 169.720001 5863400 168.332855
08/01/2021 MMM 169.169998 169.539993 164.610001 166.619995 4808100 165.258179
13/01/2021 MMM 167.270004 167.740005 166.050003 166.279999 2098000 164.920959
15/01/2021 MMM 165.630005 166.259995 163.380005 165.550003 3550700 164.19693
19/01/2021 MMM 167.259995 169.550003 166.800003 169.119995 3903200 167.737747

Write data from for loop into a dataframe pandas

I load data from yahoo finance using the motor_daily function. It takes in a list of tickers and gets me the data.
Here are the used libs:
import pandas as pd
import numpy as np
import datetime as dt
import yfinance as yf
Here is the function definition:
def motor_daily(ticker_file):
tickers_list = ticker_file #SP100
stocks = yf.download(tickers_list, start = start, end = tomorrow) #YYYY-MM-DD
company_name = []
ticker_code = []
for ticker in tickers_list:
loaded_ticker = yf.Ticker(ticker)
tickers = ticker
ticker_code.append(tickers)
finance = pd.DataFrame(ticker_code)
finance["Ticker"] = pd.DataFrame(ticker_code)
finance["Ticker_start"] = finance["Ticker"].str.split('-').str[0]
finance= finance.drop(columns=[0])
stocks_close = stocks.Close
stocks_close = stocks_close.reset_index()
return stocks_close
def ticker_data(list):
data = []
for ticks in list:
data.append(motor_daily(ticks))
return data
The above function loads closing prices for each ticker / stock name in the list (therefore the loop) and stores this in data.
list_of_lists includes:
[['VOW3.DE', 'BMW.DE', 'BEI.DE', 'DPW.DE', 'FME.DE'],
['ISS.CO', 'LUN.CO', 'CARL-B.CO', 'TRYG.CO', 'SIM.CO']]
Output of print(ticker_data(list_of_list))
[ Date BEI.DE BMW.DE DPW.DE FME.DE VOW3.DE
0 2021-03-10 86.860001 81.339996 43.650002 60.840000 196.020004
1 2021-03-11 86.139999 78.519997 44.549999 61.340000 192.039993
2 2021-03-12 87.080002 77.480003 45.060001 60.939999 190.220001
3 2021-03-15 86.959999 77.800003 44.919998 60.759998 194.779999
4 2021-03-16 87.680000 80.500000 45.580002 61.259998 207.850006
5 2021-03-17 88.260002 85.459999 45.419998 60.779999 230.800003,
Date CARL-B.CO ISS.CO LUN.CO SIM.CO TRYG.CO
0 2021-03-10 1012.0 122.599998 243.600006 768.0 135.399994
1 2021-03-11 1009.0 120.300003 235.300003 780.0 143.500000
2 2021-03-12 1006.0 121.150002 237.000000 772.5 143.699997
3 2021-03-15 1006.5 124.250000 236.300003 783.0 145.100006
4 2021-03-16 983.0 125.550003 236.100006 795.5 147.399994
5 2021-03-17 982.0 121.949997 230.300003 778.0 143.899994]
When I try to convert the output to a dataframe using:
df = pd.DataFrame(ticker_data(list_of_list)) output is
ValueError: Must pass 2-d input. shape=(2, 6, 6)
I cannot convert this to a pandas dataframe, how should I go about doing this?

Your motor_daily has a bunch of unused elements. Also, I had to define the start and end times.
import pandas as pd
import numpy as np
import datetime as dt
import yfinance as yf
def motor_daily(ticker_list):
start = pd.Timestamp('now').normalize() - pd.offsets.Day(7)
end = pd.Timestamp('now').normalize() + pd.offsets.BusinessDay(0)
return yf.download(ticker_list, start=start, end=end).Close
list_of_lists = [
['VOW3.DE', 'BMW.DE', 'BEI.DE', 'DPW.DE', 'FME.DE'],
['ISS.CO', 'LUN.CO', 'CARL-B.CO', 'TRYG.CO', 'SIM.CO']
]
df = pd.concat(map(motor_daily, list_of_lists), axis=1)
# I transposed for prettier printing
df.T
Date 2021-03-10 2021-03-11 2021-03-12 2021-03-15 2021-03-16
BEI.DE 86.860001 86.139999 87.080002 86.959999 87.680000
BMW.DE 81.339996 78.519997 77.480003 77.800003 80.500000
DPW.DE 43.650002 44.549999 45.060001 44.919998 45.580002
FME.DE 60.840000 61.340000 60.939999 60.759998 61.259998
VOW3.DE 196.020004 192.039993 190.220001 194.779999 207.850006
CARL-B.CO 1012.000000 1009.000000 1006.000000 1006.500000 983.000000
ISS.CO 122.599998 120.300003 121.150002 124.250000 125.550003
LUN.CO 243.600006 235.300003 237.000000 236.300003 236.100006
SIM.CO 768.000000 780.000000 772.500000 783.000000 795.500000
TRYG.CO 135.399994 143.500000 143.699997 145.100006 147.399994

You can iterate through ticker_data(list_of_list) and make multiple dataframes:
lol = [['VOW3.DE', 'BMW.DE', 'BEI.DE', 'DPW.DE', 'FME.DE'],
['ISS.CO', 'LUN.CO', 'CARL-B.CO', 'TRYG.CO', 'SIM.CO']]
res = ticker_data(lol)
dataframes = [pd.DataFrame(lst) for lst in res]
print(dataframes[0])
#
Date BEI.DE BMW.DE DPW.DE FME.DE VOW3.DE
0 1996-11-08 NaN 18.171000 NaN NaN NaN
1 1996-11-11 NaN 18.122000 NaN NaN NaN
2 1996-11-12 NaN 18.259001 NaN NaN NaN
3 1996-11-13 NaN 18.230000 NaN NaN NaN
4 1996-11-14 NaN 18.289000 NaN NaN NaN
... ... ... ... ... ... ...
6241 2021-03-11 86.139999 78.519997 44.549999 61.340000 192.039993
6242 2021-03-12 87.080002 77.480003 45.060001 60.939999 190.220001
6243 2021-03-15 86.959999 77.800003 44.919998 60.759998 194.779999
6244 2021-03-16 87.680000 80.500000 45.580002 61.259998 207.850006
6245 2021-03-17 88.260002 85.459999 45.419998 60.779999 230.800003

Receive NaN for variables in a list after iterating through it

I have a list of shares that make up an ETF. I have formatted the tickers into a list and have named this variable assets
print(assets)
['AUD', 'CRWD', 'SPLK', 'OKTA', 'AVGO', 'CSCO', 'NET', 'ZS', 'AKAM', 'FTNT', 'BAH', 'CYBR', 'CHKP', 'BA/', 'VMW', 'PFPT', 'PANW', 'VRSN', 'FFIV', 'JNPR', 'LDOS', '4704', 'FEYE', 'QLYS', 'SAIC', 'RPD', 'HO', 'MIME', 'SAIL', 'VRNS', 'ITRI', 'AVST', 'MANT', 'TENB', '053800', 'ZIXI', 'OSPN', 'RDWR', 'ULE', 'MOBL', 'ATEN', 'TUFN', 'RBBN', 'NCC', 'KRW', 'EUR', 'JPY', 'GBP', 'USD']
I use the following for loop to iterate through the list and pull historical data from yahoo
for i in assets:
try:
df[i] = web.DataReader(i, data_source='yahoo', start=start, end=end)['Adj Close']
except RemoteDataError:
print(f'{i}')
continue
I am returned with:
BA/
4704
H0
053800
KRW
JPY
Suggesting these assets cannot be found on yahoo finance. I understand this is the case and accept that.
When I look for the stocks that have theoretically been found (e.g. df['FEYE']) on yahoo finance I get the following.
0 NaN 1 NaN 2 NaN 3 NaN 4 NaN 5 NaN 6 NaN 7 NaN 8 NaN 9 NaN 10 NaN 11 NaN 12 NaN 13 NaN 14 NaN 15 NaN 16 NaN 17 NaN 18 NaN 19 NaN 20 NaN 21 NaN 22 NaN 23 NaN 24 NaN 25 NaN 26 NaN 27 NaN 28 NaN 29 NaN 30 NaN 31 NaN 32 NaN 33 NaN 34 NaN 35 NaN 36 NaN 37 NaN 38 NaN 39 NaN 40 NaN 41 NaN 42 NaN 43 NaN 44 NaN 45 NaN 46 NaN 47 NaN 48 NaN
Name: FEYE, dtype: float64
When I proceed normally with just one share
(e.g. CSCO = web.DataReader(assets[5], data_source='yahoo', start=start, end=end)['Adj Close'])
It is all ok.
Any help is greatly appreciated,
Thank you!

Here is reproducible testing example of code and output.
If You have existing dataframe named df then new data is incompatible in terms of index and maybe column names.
Creating new dataframe is needed but outside the loop. Each itertation creates new column with ticker data.
import pandas as pd
import pandas_datareader.data as web
from pandas_datareader._utils import RemoteDataError
assets=['AUD', 'CRWD', 'SPLK', 'OKTA', 'AVGO', 'CSCO', 'NET', 'ZS', 'AKAM', 'FTNT', 'BAH', 'CYBR', 'CHKP', 'BA/', 'VMW', 'PFPT', 'PANW', 'VRSN', 'FFIV', 'JNPR', 'LDOS', '4704', 'FEYE', 'QLYS', 'SAIC', 'RPD', 'HO', 'MIME', 'SAIL', 'VRNS', 'ITRI', 'AVST', 'MANT', 'TENB', '053800', 'ZIXI', 'OSPN', 'RDWR', 'ULE', 'MOBL', 'ATEN', 'TUFN', 'RBBN', 'NCC', 'KRW', 'EUR', 'JPY', 'GBP', 'USD']
df = pd.DataFrame()
for i in assets:
try:
print(f'Try: {i}')
df[i] = web.DataReader(i, data_source='yahoo')['Adj Close']
except RemoteDataError as r:
print(f'Try: {i}: {r}')
continue
result:
Try: AUD
Try: CRWD
Try: SPLK
Try: OKTA
Try: AVGO
Try: CSCO
Try: NET
Try: ZS
Try: AKAM
Try: FTNT
Try: BAH
Try: CYBR
Try: CHKP
Try: BA/
Try: BA/: Unable to read URL: https://finance.yahoo.com/quote/BA//history?period1=1435975200&period2=1593741599&interval=1d&frequency=1d&filter=history
Response Text:
b'<html>\n<meta charset=\'utf-8\'>\n<script>\nvar u=\'https://www.yahoo.com/?err=404&err_url=https%3a%2f%2ffinance.yahoo.com%2fquote%2fBA%2f%2fhistory%3fperiod1%3d1435975200%26period2%3d1593741599%26interval%3d1d%26frequency%3d1d%26filter%3dhistory\';\nif(window!=window.top){\n document.write(\'<p>Content is currently unavailable.</p><img src="//geo.yahoo.com/p?s=1197757039&t=\'+new Date().getTime()+\'&_R=\'+encodeURIComponent(document.referrer)+\'&err=404&err_url=\'+u+\'" width="0px" height="0px"/>\');\n}else{\n window.location.replace(u);\n}\n</script>\n<noscript><META http-equiv="refresh" content="0;URL=\'https://www.yahoo.com/?err=404&err_url=https%3a%2f%2ffinance.yahoo.com%2fquote%2fBA%2f%2fhistory%3fperiod1%3d1435975200%26period2%3d1593741599%26interval%3d1d%26frequency%3d1d%26filter%3dhistory\'"></noscript>\n</html>\n'
Try: VMW
Try: PFPT
Try: PANW
Try: VRSN
Try: FFIV
Try: JNPR
Try: LDOS
Try: 4704
Try: 4704: No data fetched for symbol 4704 using YahooDailyReader
Try: FEYE
Try: QLYS
Try: SAIC
Try: RPD
Try: HO
Try: HO: No data fetched for symbol HO using YahooDailyReader
Try: MIME
Try: SAIL
Try: VRNS
Try: ITRI
Try: AVST
Try: MANT
Try: TENB
Try: 053800
Try: 053800: No data fetched for symbol 053800 using YahooDailyReader
Try: ZIXI
Try: OSPN
Try: RDWR
Try: ULE
Try: MOBL
Try: ATEN
Try: TUFN
Try: RBBN
Try: NCC
Try: KRW
Try: KRW: No data fetched for symbol KRW using YahooDailyReader
Try: EUR
Try: JPY
Try: JPY: No data fetched for symbol JPY using YahooDailyReader
Try: GBP
Please note there are 2 types of error:
when ticker does not exists, for example "HO"
when resulting URL is wrong due to "/" in "BA/"
Head of result set dataframe df.head():
AUD CRWD SPLK OKTA ... NCC EUR GBP USD
Date ...
2015-11-03 51.500000 NaN 57.139999 NaN ... 3.45 NaN 154.220001 13.608685
2015-12-22 55.189999 NaN 54.369999 NaN ... 3.48 NaN 148.279999 13.924644
2015-12-23 55.560001 NaN 56.509998 NaN ... 3.48 NaN 148.699997 14.146811
2015-12-24 55.560001 NaN 56.779999 NaN ... 3.48 NaN 149.119995 14.324224
2015-12-28 56.270000 NaN 57.660000 NaN ... 3.48 NaN 148.800003 14.057305
[5 rows x 43 columns]
Hope this helps.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to join a dataframe to a Series with different indices - python

Related

Nested loops turn in rounds

How to convert data from DataFrame to form

How can I manipulate my DataFrame/Table in order to display in the following format?

Write data from for loop into a dataframe pandas

Receive NaN for variables in a list after iterating through it

Categories

Resources