I am trying to download stock data from yfinance. I have a ticker list called ticker_list_1 with a bunch of symbols. The structure is:
array(['A', 'AAP', 'AAPL', 'ABB', 'ABBV', 'ABC', 'ABEV', 'ABMD', 'ABNB',....
Now I am using the following code to download the data:
i = 0
xrz_data = {}
for i, len in enumerate (ticker_list_1):
data = yf.download(ticker_list_1[i], start ="2019-01-01", end="2022-10-20")
xrz_data[i] = data
i=i+1
The problem I am having is that once downloaded the data is saved as xrz_data. I can access a specific dataset through the code xrz_data[I] with the corresponding number. However I want to be able to access the data through xrz_data[tickername].
How can I achieve this ?
I have tried using xrz_data[i].values = data which gives me the error KeyError: 0
EDIT:
Here is the current output of the for loop:
{0: Open High Low Close Adj Close \
Date
2018-12-31 66.339996 67.480003 66.339996 67.459999 65.660202
2019-01-02 66.500000 66.570000 65.300003 65.690002 63.937435
2019-01-03 65.529999 65.779999 62.000000 63.270000 61.582008
2019-01-04 64.089996 65.949997 64.089996 65.459999 63.713589
2019-01-07 65.639999 67.430000 65.610001 66.849998 65.066483
... ... ... ... ... ...
2022-10-13 123.000000 128.830002 122.349998 127.900002 127.900002
2022-10-14 129.000000 130.220001 125.470001 125.699997 125.699997
2022-10-17 127.379997 131.089996 127.379997 130.559998 130.559998
2022-10-18 133.919998 134.679993 131.199997 132.300003 132.300003
2022-10-19 130.110001 130.270004 127.239998 128.960007 128.960007
[959 rows x 6 columns],
1: Open High Low Close Adj Close \
Date
2018-12-31 156.050003 157.679993 154.990005 157.460007 149.844055
2019-01-02 156.160004 159.919998 153.820007 157.919998 150.281815
........
My desired output would be:
AAP: Open High Low Close Adj Close \
Date
2018-12-31 156.050003 157.679993 154.990005 157.460007 149.844055
2019-01-02 156.160004 159.919998 153.820007 157.919998 150.281815
........
I was able to figure it out. With the following code I am able to download the data and access specific tickers through f.e. xrz_data['AAPL‘]
i = 0
xrz_data = {}
for ticker in ticker_list_1:
data = yf.download(ticker, start ="2019-01-01", end="2022-10-20")
xrz_data[ticker] = data
Related
I'm following a tutorial on using Yfinance in Jupyter Notebook to get prices for SPY (S&P 500) in a dataframe. The code looks simple, but I can't seem to get the desired results.
df_tickers = pd.DataFrame()
spyticker = yf.Ticker("SPY")
print(spyticker)
df_ticker = spyticker.history(period="max", interval="1d", start="1998-12-01", end="2022-01-01" , auto_adjust=True, rounding=True)
df_ticker.head()
The error states: "SPY: No data found for this date range, symbol may be delisted." But when I print spyticker, I get the correct yfinance object:
yfinance.Ticker object <SPY>
I am not sure what your problem is but if I use the following:
spyticker = yf.Ticker("SPY")
df_ticker = spyticker.history(period="max", interval="1d", start="1998-12-01", end="2022-01-01" , auto_adjust=True, rounding=True)
df_ticker.head()
I get the following:
Open High Low Close Volume Dividends Stock Splits
Date
1998-12-01 76.02 77.27 75.43 77.00 8950600 0.0 0
1998-12-02 76.74 77.19 75.94 76.78 7495500 0.0 0
1998-12-03 76.76 77.45 75.35 75.51 12145300 0.0 0
1998-12-04 76.35 77.58 76.27 77.49 10339500 0.0 0
1998-12-07 77.29 78.21 77.25 77.86 4290000 0.0 0
My only explanation is that the call to spyticker.history already returns a dataframe, so it isn't necessary to define the df_ticker beforehand.
I would like to pull historical data from yfinance for a specific list of stocks. I want to store earch stock in a separate dataframes (each stock with its own df).
I can download it to multiple csv's through below code, but I couldn't find a way to store them in different dataframes (wihtout having to download them to csv)
import yfinance
stocks = ['TSLA','MSFT','NIO','AAPL','AMD','ADBE','ALGN','AMZN','AMGN','AEP','ADI','ANSS','AMAT','ASML','TEAM','ADSK']
for i in stocks:
df = yfinance.download(i, start='2015-01-01', end='2021-09-12')
df.to_csv( i + '.csv')
I want my end results to be a dataframe called "TSLA" for tsla historical data and another one called "MSFT" for msft data...and so on
I tried:
stock = ['TSLA','MSFT','NIO','AAPL','AMD']
df_ = {}
for i in stock:
df = yfinance.download(i, start='2015-01-01', end='2021-09-12')
df_["{}".format(i)] = df
And I have to call each dataframe by key to get it like df_["TSLA"] but this is not what I want. I need a datafram called only TSLA that have tsla data and so on. Is there a way to do it?
You don't need to download data multiple times. You just have to split whole data with groupby and create variables dynamically with locals():
stocks = ['TSLA', 'MSFT', 'NIO', 'AAPL', 'AMD', 'ADBE', 'ALGN', 'AMZN',
'AMGN', 'AEP', 'ADI', 'ANSS', 'AMAT', 'ASML', 'TEAM', 'ADSK']
data = yfinance.download(stocks, start='2015-01-01', end='2021-09-12')
for stock, df in data.groupby(level=1, axis=1):
locals()[stock] = df.droplevel(level=1, axis=1)
df.to_csv(f'{stock}.csv')
Output:
>>> TSLA
Adj Close Close High Low Open Volume
Date
2014-12-31 44.481998 44.481998 45.136002 44.450001 44.618000 11487500
2015-01-02 43.862000 43.862000 44.650002 42.652000 44.574001 23822000
2015-01-05 42.018002 42.018002 43.299999 41.431999 42.910000 26842500
2015-01-06 42.256001 42.256001 42.840000 40.841999 42.012001 31309500
2015-01-07 42.189999 42.189999 42.956001 41.956001 42.669998 14842000
... ... ... ... ... ... ...
2021-09-03 733.570007 733.570007 734.000000 724.200012 732.250000 15246100
2021-09-07 752.919983 752.919983 760.200012 739.260010 740.000000 20039800
2021-09-08 753.869995 753.869995 764.450012 740.770020 761.580017 18793000
2021-09-09 754.859985 754.859985 762.099976 751.630005 753.409973 14077700
2021-09-10 736.270020 736.270020 762.609985 734.520020 759.599976 15114300
[1686 rows x 6 columns]
>>> ANSS
Adj Close Close High Low Open Volume
Date
2014-12-31 82.000000 82.000000 83.480003 81.910004 83.080002 304600
2015-01-02 81.639999 81.639999 82.629997 81.019997 82.089996 282600
2015-01-05 80.860001 80.860001 82.070000 80.779999 81.290001 321500
2015-01-06 79.260002 79.260002 81.139999 78.760002 81.000000 344300
2015-01-07 79.709999 79.709999 80.900002 78.959999 79.919998 233300
... ... ... ... ... ... ...
2021-09-03 368.380005 368.380005 371.570007 366.079987 366.079987 293000
2021-09-07 372.070007 372.070007 372.410004 364.950012 369.609985 249500
2021-09-08 372.529999 372.529999 375.820007 369.880005 371.079987 325800
2021-09-09 371.970001 371.970001 375.799988 371.320007 372.519989 194900
2021-09-10 373.609985 373.609985 377.260010 372.470001 374.540009 278800
[1686 rows x 6 columns]
You can create global or local variable like
globals()["TSLA"] = "some value"
print(TSLA)
locals()["TSLA"] = "some value"
print(TSLA)
but frankly it is waste of time. It is much more useful to keep it as dictionary.
With dictionary you can use for-loop to run some code on all dataframes.
You can also seletect dataframes by name. etc.
Examples:
df_max = {}
for name, df in df_.items():
df_max[name] = df.max()
name = input("What to display: ")
df_[name].plot()
I am scraping data from YAHOO and trying to pull a certain value from its data frame, using a value that is on another file.
I have managed to scrape the data and show it as a data frame. the thing is I am trying to extract a certain value from the data using another df.
this is the csv i got
df_earnings=pd.read_excel(r"C:Earnings to Update.xlsx",index_col=2)
stock_symbols = df_earnings.index
output:
Date E Time Company Name
Stock Symbol
CALM 2019-04-01 Before The Open Cal-Maine Foods
CTRA 2019-04-01 Before The Open Contura Energy
NVGS 2019-04-01 Before The Open Navigator Holdings
ANGO 2019-04-02 Before The Open AngioDynamics
LW 2019-04-02 Before The Open Lamb Weston`
then I download the csv for each stock with the data from yahoo finance:
driver.get(f'https://finance.yahoo.com/quote/{stock_symbol}/history?period1=0&period2=2597263000&interval=1d&filter=history&frequency=1d')
output:
Open High Low ... Adj Close Volume Stock Name
Date ...
1996-12-12 1.81250 1.8125 1.68750 ... 0.743409 1984400 CALM
1996-12-13 1.71875 1.8125 1.65625 ... 0.777510 996800 CALM
1996-12-16 1.81250 1.8125 1.71875 ... 0.750229 122000 CALM
1996-12-17 1.75000 1.8125 1.75000 ... 0.774094 239200 CALM
1996-12-18 1.81250 1.8125 1.75000 ... 0.791151 216400 CALM
my problem is here I don't know how to find the date form my data frame and extract it from the downloaded file.
now I don't want to insert a manual date like this :
df = pd.DataFrame.from_csv(file_path)
df['Stock Name'] = stock_symbol
print(df.head())
df = df.reset_index()
print(df.loc[df['Date'] == '2019-04-01'])
output:
Date Open High ... Adj Close Volume Stock Name
5610 2019-04-01 46.700001 47.0 ... 42.987827 846900 CALM
I want a condition that will run my data frame for each stock and pull the date needed
print(df.loc[df['Date'] == the date that is next to the symbol that i just downloaded the file for])
I suppose you could make use of a variable to hold the date.
for sy in stock_symbols:
# The value from the 'Date' column in df_earnings
dt = df_earnings.loc[df_earnings.index == sy, 'Date'][sy]
# From the second block of your code relating to 'manual' date
df = pd.DataFrame.from_csv(file_path)
df['Stock Name'] = sy
df = df.reset_index()
print(df.loc[df['Date'] == dt])
I have time series data I want to generate the mean for each month, for each column. I have successfully done so, but by creating a list for each column - which wouldn't be feasible for thousands of columns.
How can I adapt my code to auto-populate the column names and values into a dataframe with thousands of columns?
For context, this data has 20 observations per hour for 12 months.
Original data:
timestamp 56TI1164 56FI1281 56TI1281 52FC1043 57TI1501
2016-12-31 23:55:00 117.9673 17876.27 39.10074 9302.815 49.23963
2017-01-01 00:00:00 118.1080 17497.48 39.10759 9322.773 48.97919
2017-01-01 00:05:00 117.7809 17967.33 39.11348 9348.223 48.94284
Output:
56TI1164 56FI1281 56TI1281 52FC1043 57TI1501
0 106.734147 16518.428734 16518.428734 7630.187992 45.992215
1 115.099825 18222.911023 18222.911023 9954.252911 47.334477
2 111.555504 19090.607211 19090.607211 9283.845649 48.939581
3 102.408996 18399.719852 18399.719852 7778.897037 48.130057
4 118.371951 20245.378742 20245.378742 9024.424210 64.796939
5 127.580516 21859.212675 21859.212675 9595.477455 70.952311
6 134.159082 22349.853561 22349.853561 10305.252112 75.195480
7 137.990638 21122.233427 21122.233427 10024.709142 74.755469
8 144.958318 18633.290818 18633.290818 11193.381098 66.776627
9 122.406489 20258.135923 20258.135923 10504.604420 61.793355
10 104.817850 18762.070668 18762.070668 9361.052983 51.802615
11 106.589672 20049.809554 20049.809554 9158.685383 51.611633
Successful code:
#separate data into months
v = list(range(1,13))
data_month = []
for i in v:
data_month.append(data[(data.index.month==i)])
# average per month for each sensor
mean_56TI1164 = []
mean_56FI1281 = []
mean_56TI1281 = []
mean_52FC1043 = []
mean_57TI1501 = []
for i in range(0,12):
mean_56TI1164.append(data_month[i]['56TI1164'].mean())
mean_56FI1281.append(data_month[i]['56FI1281'].mean())
mean_56TI1281.append(data_month[i]['56FI1281'].mean())
mean_52FC1043.append(data_month[i]['52FC1043'].mean())
mean_57TI1501.append(data_month[i]['57TI1501'].mean())
mean_df = {'56TI1164': mean_56TI1164, '56FI1281': mean_56FI1281, '56TI1281': mean_56TI1281, '52FC1043': mean_52FC1043, '57TI1501': mean_57TI1501}
mean_df = pd.DataFrame(mean_df, columns= ['56TI1164', '56FI1281', '56TI1281', '52FC1043', '57TI1501'])
mean_df
Unsuccessful attempt to condense:
col = list(data.columns)
mean_df = pd.DataFrame()
for i in range(0,12):
for j in col:
mean_df[j].append(data_month[i][j].mean())
mean_df
As suggested by G. Anderson, you can use groupby as in this example:
import pandas as pd
import io
csv="""timestamp 56TI1164 56FI1281 56TI1281 52FC1043 57TI1501
2016-12-31 23:55:00 117.9673 17876.27 39.10074 9302.815 49.23963
2017-01-01 00:00:00 118.1080 17497.48 39.10759 9322.773 48.97919
2017-01-01 00:05:00 117.7809 17967.33 39.11348 9348.223 48.94284
2018-01-01 00:05:00 120.0000 17967.33 39.11348 9348.223 48.94284
2018-01-01 00:05:00 124.0000 17967.33 39.11348 9348.223 48.94284"""
# The following lines read your data into a pandas dataframe;
# it may help if your data comes in the form you wrote in the question
dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')
data = pd.read_csv(io.StringIO(csv), sep='\s+(?!\d\d:\d\d:\d\d)', \
date_parser=dateparse, index_col=0, engine='python')
# Here is where your data is resampled by month and mean is calculated
data.groupby(pd.Grouper(freq='M')).mean()
# If you have missing months, use this instead:
#data.groupby(pd.Grouper(freq='M')).mean().dropna()
Result of data.groupby(pd.Grouper(freq='M')).mean().dropna() will be:
56TI1164 56FI1281 56TI1281 52FC1043 57TI1501
timestamp
2016-12-31 117.96730 17876.270 39.100740 9302.815 49.239630
2017-01-31 117.94445 17732.405 39.110535 9335.498 48.961015
2018-01-31 122.00000 17967.330 39.113480 9348.223 48.942840
Please note that I used data.groupby(pd.Grouper(freq='M')).mean().dropna() to get rid of NaN for the missing months (I added some data for January 2018 skipping what's in between).
Also note that the convoluted read_csv uses a regular expression as a separator: \s+ means one or more whitespace characters, while (?!\d\d:\d\d:\d\d) means "skip this whitespace if followed by something like 23:55:00".
Last engine='python' avoids warnings when read_csv() is used with regular expression
Looking for some help on a small project. I am trying to learn Python and I'm totally lost on a problem. Please let me explain.
I have a csv file that contains 'Apple share prices', so far i can import into Python using the csv module, however, I need to analyse the data and generate monthly averages and determine the best and worst 6 months. My csv columns are Date, Price.
Help is much appreciated.
"Date","Open","High","Low","Close","Volume","Adj Close"
"2012-11-14",660.66,662.18,650.5,652.55,1668400,652.55
"2012-11-13",663,667.6,658.23,659.05,1594200,659.05
"2012-11-12",663.75,669.8,660.87,665.9,1405900,665.9
"2012-11-09",654.65,668.34,650.3,663.03,3114100,663.03
"2012-11-08",670.2,671.49,651.23,652.29,2597000,652.29
"2012-11-07",675,678.23,666.49,667.12,2232300,667.12
"2012-11-06",685.48,686.5,677.55,681.72,1582800,681.72
"2012-11-05",684.5,686.86,675.56,682.96,1635900,682.96
"2012-11-02",694.79,695.55,687.37,687.92,2324400,687.92
"2012-11-01",679.5,690.9,678.72,687.59,2050100,687.59
"2012-10-31",679.86,681,675,680.3,1537000,680.3
"2012-10-26",676.5,683.03,671.2,675.15,1950800,675.15
"2012-10-25",680,682,673.51,677.76,2401100,677.76
"2012-10-24",686.8,687,675.27,677.3,2496500,677.3
etc...
With pandas this would be
In [28]: df = pd.read_csv('my_data.csv', parse_dates=True, index_col=0, sep=',')
In [29]: df
Out[29]:
Open High Low Close Volume Adj Close
Date
2012-11-14 660.66 662.18 650.50 652.55 1668400 652.55
2012-11-13 663.00 667.60 658.23 659.05 1594200 659.05
2012-11-12 663.75 669.80 660.87 665.90 1405900 665.90
2012-11-09 654.65 668.34 650.30 663.03 3114100 663.03
2012-11-08 670.20 671.49 651.23 652.29 2597000 652.29
2012-11-07 675.00 678.23 666.49 667.12 2232300 667.12
2012-11-06 685.48 686.50 677.55 681.72 1582800 681.72
2012-11-05 684.50 686.86 675.56 682.96 1635900 682.96
2012-11-02 694.79 695.55 687.37 687.92 2324400 687.92
2012-11-01 679.50 690.90 678.72 687.59 2050100 687.59
2012-10-31 679.86 681.00 675.00 680.30 1537000 680.30
2012-10-26 676.50 683.03 671.20 675.15 1950800 675.15
2012-10-25 680.00 682.00 673.51 677.76 2401100 677.76
2012-10-24 686.80 687.00 675.27 677.30 2496500 677.30
In [30]: monthly = df.resample('1M')
In [31]: monthly
Out[30]:
Open High Low Close Volume Adj Close
Date
2012-10-31 680.790 683.2575 673.745 677.6275 2096350 677.6275
2012-11-30 673.153 677.7450 665.682 670.0130 2020510 670.0130
You can than sort for the column you want
In [33]: monthly.sort('Close')
Out[33]:
Open High Low Close Volume Adj Close
Date
2012-11-30 673.153 677.7450 665.682 670.0130 2020510 670.0130
2012-10-31 680.790 683.2575 673.745 677.6275 2096350 677.6275
You can even fetch the data from Yahoo finance:
In [37]: from pandas.io import data as pddata
In [40]: df = pddata.DataReader('AAPL', data_source='yahoo', start='2012-01-01')
In [41]: df.resample('1M').sort('Close')
Out[44]:
Open High Low Close Volume Adj Close
Date
2012-01-31 428.760000 431.008500 425.810500 428.578000 12249740.000000 424.804500
2012-02-29 494.803000 500.849000 491.437500 497.571000 20300990.000000 493.191000
2012-11-30 560.365385 566.118462 548.523846 555.789231 24861884.615385 554.970769
2012-05-31 565.785000 572.141364 558.397273 564.673182 18029781.818182 559.702273
2012-06-30 574.660952 578.889048 569.213333 574.562381 13360247.619048 569.504762
2012-03-31 576.858182 582.064545 570.245909 577.507727 25299250.000000 572.424545
2012-07-31 599.610000 604.920952 594.680476 601.068095 15152466.666667 595.776667
2012-04-30 609.607500 615.487500 598.650000 606.003000 27855340.000000 600.668500
2012-10-31 638.667143 643.650476 628.213810 634.714286 20651071.428571 631.828571
2012-08-31 641.527826 646.655217 637.138261 642.696087 12851252.173913 639.090870
2012-09-30 682.118421 687.007895 676.095263 681.568421 17291363.157895 678.470526
After you have read the items and saved the [month, mean_price] pairs in a list, you can sort the list:
import operator
values_list.sort(key=operator.itemgetter(1))
This will sort the values by price. To get the top n values:
print values_list[-n:]
Or the bottom n:
print values_list[:n]