I know using pandas this is how you normally get daily stock price quotes. But I'm wondering if its possible to get monthly or weekly quotes, is there maybe a parameter I can pass through to get monthly quotes?
from pandas.io.data import DataReader
from datetime import datetime
ibm = DataReader('IBM', 'yahoo', datetime(2000,1,1), datetime(2012,1,1))
print(ibm['Adj Close'])
Monthly closing prices from Yahoo! Finance...
import pandas_datareader.data as web
data = web.get_data_yahoo('IBM','01/01/2015',interval='m')
where you can replace the interval input as required ('d', 'w', 'm', etc).
Using Yahoo Finance, it is possible to get Stock Prices using "interval" option with instead of "m" as shown:
#Library
import yfinance as yf
from datetime import datetime
#Load Stock price
df = yf.download("IBM", start= datetime(2000,1,1), end = datetime(2012,1,1),interval='1mo')
df
The result is:
The other possible interval options are:
1m,
2m,
5m,
15m,
30m,
60m,
90m,
1h,
1d,
5d,
1wk,
1mo,
3mo.
try this:
In [175]: from pandas_datareader.data import DataReader
In [176]: ibm = DataReader('IBM', 'yahoo', '2001-01-01', '2012-01-01')
UPDATE: show average for Adj Close only (month start)
In [12]: ibm.groupby(pd.TimeGrouper(freq='MS'))['Adj Close'].mean()
Out[12]:
Date
2001-01-01 79.430605
2001-02-01 86.625519
2001-03-01 75.938913
2001-04-01 81.134375
2001-05-01 90.460754
2001-06-01 89.705042
2001-07-01 83.350254
2001-08-01 82.100543
2001-09-01 74.335789
2001-10-01 79.937451
...
2011-03-01 141.628553
2011-04-01 146.530774
2011-05-01 150.298053
2011-06-01 146.844772
2011-07-01 158.716834
2011-08-01 150.690990
2011-09-01 151.627555
2011-10-01 162.365699
2011-11-01 164.596963
2011-12-01 167.924676
Freq: MS, Name: Adj Close, dtype: float64
show average for Adj Close only (month end)
In [13]: ibm.groupby(pd.TimeGrouper(freq='M'))['Adj Close'].mean()
Out[13]:
Date
2001-01-31 79.430605
2001-02-28 86.625519
2001-03-31 75.938913
2001-04-30 81.134375
2001-05-31 90.460754
2001-06-30 89.705042
2001-07-31 83.350254
2001-08-31 82.100543
2001-09-30 74.335789
2001-10-31 79.937451
...
2011-03-31 141.628553
2011-04-30 146.530774
2011-05-31 150.298053
2011-06-30 146.844772
2011-07-31 158.716834
2011-08-31 150.690990
2011-09-30 151.627555
2011-10-31 162.365699
2011-11-30 164.596963
2011-12-31 167.924676
Freq: M, Name: Adj Close, dtype: float64
monthly averages (all columns):
In [179]: ibm.groupby(pd.TimeGrouper(freq='M')).mean()
Out[179]:
Open High Low Close Volume Adj Close
Date
2001-01-31 100.767857 103.553571 99.428333 101.870357 9474409 79.430605
2001-02-28 111.193160 113.304210 108.967368 110.998422 8233626 86.625519
2001-03-31 97.366364 99.423637 95.252272 97.281364 11570454 75.938913
2001-04-30 103.990500 106.112500 102.229501 103.936999 11310545 81.134375
2001-05-31 115.781363 117.104091 114.349091 115.776364 7243463 90.460754
2001-06-30 114.689524 116.199048 113.739523 114.777618 6806176 89.705042
2001-07-31 106.717143 108.028095 105.332857 106.646666 7667447 83.350254
2001-08-31 105.093912 106.196521 103.856522 104.939999 6234847 82.100543
2001-09-30 95.138667 96.740000 93.471334 94.987333 12620833 74.335789
2001-10-31 101.400870 103.140000 100.327827 102.145217 9754413 79.937451
2001-11-30 113.449047 114.875715 112.510952 113.938095 6435061 89.256046
2001-12-31 120.651001 122.076000 119.790500 121.087999 6669690 94.878736
2002-01-31 116.483334 117.509524 114.613334 115.994762 9217280 90.887920
2002-02-28 103.194210 104.389474 101.646316 102.961579 9069526 80.764672
2002-03-31 105.246500 106.764499 104.312999 105.478499 7563425 82.756873
... ... ... ... ... ... ...
2010-10-31 138.956188 140.259048 138.427142 139.631905 6537366 122.241844
2010-11-30 144.281429 145.164762 143.385241 144.439524 4956985 126.878319
2010-12-31 145.155909 145.959545 144.567273 145.251819 4245127 127.726929
2011-01-31 152.595000 153.950499 151.861000 153.181501 5941580 134.699880
2011-02-28 163.217895 164.089474 162.510002 163.339473 4687763 144.050847
2011-03-31 160.433912 161.745652 159.154349 160.425651 5639752 141.628553
2011-04-30 165.437501 166.587500 164.760500 165.978500 5038475 146.530774
2011-05-31 169.657144 170.679046 168.442858 169.632857 5276390 150.298053
2011-06-30 165.450455 166.559093 164.691819 165.593635 4792836 146.844772
2011-07-31 178.124998 179.866502 177.574998 178.981500 5679660 158.716834
2011-08-31 169.734350 171.690435 166.749567 169.360434 8480613 150.690990
2011-09-30 169.752858 172.034761 168.109999 170.245714 6566428 151.627555
2011-10-31 181.529525 183.597145 180.172379 182.302381 6883985 162.365699
2011-11-30 184.536668 185.950952 182.780477 184.244287 4619719 164.596963
2011-12-31 188.151428 189.373809 186.421905 187.789047 4925547 167.924676
[132 rows x 6 columns]
weekly averages (all columns):
In [180]: ibm.groupby(pd.TimeGrouper(freq='W')).mean()
Out[180]:
Open High Low Close Volume Adj Close
Date
2001-01-07 89.234375 94.234375 87.890625 91.656250 11060200 71.466436
2001-01-14 93.412500 95.062500 91.662500 93.412500 7470200 72.835824
2001-01-21 100.250000 103.921875 99.218750 102.250000 13851500 79.726621
2001-01-28 109.575000 111.537500 108.675000 110.600000 8056720 86.237303
2001-02-04 113.680000 115.465999 111.734000 113.582001 6538080 88.562436
2001-02-11 113.194002 115.815999 111.639999 113.884001 7269320 88.858876
2001-02-18 113.960002 116.731999 113.238000 115.106000 7225420 89.853021
2001-02-25 109.525002 111.375000 105.424999 107.977501 10722700 84.288436
2001-03-04 103.390001 106.052002 100.386000 103.228001 11982540 80.580924
2001-03-11 105.735999 106.920000 103.364002 104.844002 9226900 81.842391
2001-03-18 95.660001 97.502002 93.185997 94.899998 13863740 74.079992
2001-03-25 90.734000 92.484000 88.598000 90.518001 11382280 70.659356
2001-04-01 95.622000 97.748000 94.274000 96.106001 10467580 75.021411
2001-04-08 95.259999 97.360001 93.132001 94.642000 12312580 73.878595
2001-04-15 98.350000 99.520000 95.327502 97.170000 10218625 75.851980
... ... ... ... ... ... ...
2011-09-25 170.678003 173.695996 169.401996 171.766000 6358100 152.981582
2011-10-02 176.290002 178.850000 174.729999 176.762000 7373680 157.431216
2011-10-09 175.920001 179.200003 174.379999 177.792001 7623560 158.348576
2011-10-16 185.366000 187.732001 184.977997 187.017999 5244180 166.565614
2011-10-23 180.926001 182.052002 178.815997 180.351999 9359200 160.628611
2011-10-30 183.094003 184.742001 181.623996 183.582001 5743800 163.505379
2011-11-06 184.508002 186.067999 183.432004 184.716003 4583780 164.515366
2011-11-13 185.350000 186.690002 183.685999 185.508005 4180620 165.750791
2011-11-20 187.600003 189.101999 185.368002 186.738000 5104420 166.984809
2011-11-27 181.067497 181.997501 178.717499 179.449997 4089350 160.467733
2011-12-04 185.246002 187.182001 184.388000 186.052002 5168720 166.371376
2011-12-11 191.841998 194.141998 191.090002 192.794000 4828580 172.400204
2011-12-18 191.085999 191.537998 187.732001 188.619998 6037220 168.667729
2011-12-25 183.810001 184.634003 181.787997 183.678000 5433360 164.248496
2012-01-01 185.140003 185.989998 183.897499 184.750000 3029925 165.207100
[574 rows x 6 columns]
Get it from Quandl:
import pandas as pd
import quandl
quandl.ApiConfig.api_key = 'xxxxxxxxxxxx' # Optional
quandl.ApiConfig.api_version = '2015-04-09' # Optional
ibm = quandl.get("WIKI/IBM", start_date="2000-01-01", end_date="2012-01-01", collapse="monthly", returns="pandas")
Related
I load data from yahoo finance using the motor_daily function. It takes in a list of tickers and gets me the data.
Here are the used libs:
import pandas as pd
import numpy as np
import datetime as dt
import yfinance as yf
Here is the function definition:
def motor_daily(ticker_file):
tickers_list = ticker_file #SP100
stocks = yf.download(tickers_list, start = start, end = tomorrow) #YYYY-MM-DD
company_name = []
ticker_code = []
for ticker in tickers_list:
loaded_ticker = yf.Ticker(ticker)
tickers = ticker
ticker_code.append(tickers)
finance = pd.DataFrame(ticker_code)
finance["Ticker"] = pd.DataFrame(ticker_code)
finance["Ticker_start"] = finance["Ticker"].str.split('-').str[0]
finance= finance.drop(columns=[0])
stocks_close = stocks.Close
stocks_close = stocks_close.reset_index()
return stocks_close
def ticker_data(list):
data = []
for ticks in list:
data.append(motor_daily(ticks))
return data
The above function loads closing prices for each ticker / stock name in the list (therefore the loop) and stores this in data.
list_of_lists includes:
[['VOW3.DE', 'BMW.DE', 'BEI.DE', 'DPW.DE', 'FME.DE'],
['ISS.CO', 'LUN.CO', 'CARL-B.CO', 'TRYG.CO', 'SIM.CO']]
Output of print(ticker_data(list_of_list))
[ Date BEI.DE BMW.DE DPW.DE FME.DE VOW3.DE
0 2021-03-10 86.860001 81.339996 43.650002 60.840000 196.020004
1 2021-03-11 86.139999 78.519997 44.549999 61.340000 192.039993
2 2021-03-12 87.080002 77.480003 45.060001 60.939999 190.220001
3 2021-03-15 86.959999 77.800003 44.919998 60.759998 194.779999
4 2021-03-16 87.680000 80.500000 45.580002 61.259998 207.850006
5 2021-03-17 88.260002 85.459999 45.419998 60.779999 230.800003,
Date CARL-B.CO ISS.CO LUN.CO SIM.CO TRYG.CO
0 2021-03-10 1012.0 122.599998 243.600006 768.0 135.399994
1 2021-03-11 1009.0 120.300003 235.300003 780.0 143.500000
2 2021-03-12 1006.0 121.150002 237.000000 772.5 143.699997
3 2021-03-15 1006.5 124.250000 236.300003 783.0 145.100006
4 2021-03-16 983.0 125.550003 236.100006 795.5 147.399994
5 2021-03-17 982.0 121.949997 230.300003 778.0 143.899994]
When I try to convert the output to a dataframe using:
df = pd.DataFrame(ticker_data(list_of_list)) output is
ValueError: Must pass 2-d input. shape=(2, 6, 6)
I cannot convert this to a pandas dataframe, how should I go about doing this?
Your motor_daily has a bunch of unused elements. Also, I had to define the start and end times.
import pandas as pd
import numpy as np
import datetime as dt
import yfinance as yf
def motor_daily(ticker_list):
start = pd.Timestamp('now').normalize() - pd.offsets.Day(7)
end = pd.Timestamp('now').normalize() + pd.offsets.BusinessDay(0)
return yf.download(ticker_list, start=start, end=end).Close
list_of_lists = [
['VOW3.DE', 'BMW.DE', 'BEI.DE', 'DPW.DE', 'FME.DE'],
['ISS.CO', 'LUN.CO', 'CARL-B.CO', 'TRYG.CO', 'SIM.CO']
]
df = pd.concat(map(motor_daily, list_of_lists), axis=1)
# I transposed for prettier printing
df.T
Date 2021-03-10 2021-03-11 2021-03-12 2021-03-15 2021-03-16
BEI.DE 86.860001 86.139999 87.080002 86.959999 87.680000
BMW.DE 81.339996 78.519997 77.480003 77.800003 80.500000
DPW.DE 43.650002 44.549999 45.060001 44.919998 45.580002
FME.DE 60.840000 61.340000 60.939999 60.759998 61.259998
VOW3.DE 196.020004 192.039993 190.220001 194.779999 207.850006
CARL-B.CO 1012.000000 1009.000000 1006.000000 1006.500000 983.000000
ISS.CO 122.599998 120.300003 121.150002 124.250000 125.550003
LUN.CO 243.600006 235.300003 237.000000 236.300003 236.100006
SIM.CO 768.000000 780.000000 772.500000 783.000000 795.500000
TRYG.CO 135.399994 143.500000 143.699997 145.100006 147.399994
You can iterate through ticker_data(list_of_list) and make multiple dataframes:
lol = [['VOW3.DE', 'BMW.DE', 'BEI.DE', 'DPW.DE', 'FME.DE'],
['ISS.CO', 'LUN.CO', 'CARL-B.CO', 'TRYG.CO', 'SIM.CO']]
res = ticker_data(lol)
dataframes = [pd.DataFrame(lst) for lst in res]
print(dataframes[0])
#
Date BEI.DE BMW.DE DPW.DE FME.DE VOW3.DE
0 1996-11-08 NaN 18.171000 NaN NaN NaN
1 1996-11-11 NaN 18.122000 NaN NaN NaN
2 1996-11-12 NaN 18.259001 NaN NaN NaN
3 1996-11-13 NaN 18.230000 NaN NaN NaN
4 1996-11-14 NaN 18.289000 NaN NaN NaN
... ... ... ... ... ... ...
6241 2021-03-11 86.139999 78.519997 44.549999 61.340000 192.039993
6242 2021-03-12 87.080002 77.480003 45.060001 60.939999 190.220001
6243 2021-03-15 86.959999 77.800003 44.919998 60.759998 194.779999
6244 2021-03-16 87.680000 80.500000 45.580002 61.259998 207.850006
6245 2021-03-17 88.260002 85.459999 45.419998 60.779999 230.800003
I want to scrape some data from the website so I write the code to create a list which contains all records. And then, I want extract some elements from all records to create a dataframe.
However, some information of the dataframe is missing. In the all data list, it has the information from 2012 to 2019 but the dataframe only has 2018 and 2019 information. I tried different ways the resolve the problem. Finally, I find out if I do not use Zip function, the problem will not occur, may I know why and if I do not use Zip function, any solution I can use?
import requests
import pandas as pd
records = []
tickers = ['AAL']
url_metrics = 'https://stockrow.com/api/companies/{}/financials.json?ticker={}&dimension=A§ion=Growth'
indicators_url = 'https://stockrow.com/api/indicators.json'
# scrape all data and append to a list - all_records
for s in tickers:
indicators = {i['id']: i for i in requests.get(indicators_url).json()}
all_records = []
for d in requests.get(url_metrics.format(s,s)).json():
d['id'] = indicators[d['id']]['name']
all_records.append(d)
gross_profit_growth = next(d for d in all_records if 'Gross Profit Growth' in d['id'])
operating_income_growth = next(d for d in all_records if 'Operating Income Growth' in d['id'])
net_income_growth = next(d for d in all_records if 'Net Income Growth' in d['id'])
diluted_eps_growth = next(d for d in all_records if 'EPS Growth (diluted)' in d['id'])
operating_cash_flow_growth = next(d for d in all_records if 'Operating Cash Flow Growth' in d['id'])
# extract values from all_records and create the dataframe
for (k1, v1), (_, v2), (_, v3), (_, v4), (_, v5) in zip(gross_profit_growth.items(), operating_income_growth.items(), net_income_growth.items(), diluted_eps_growth.items(), operating_cash_flow_growth.items()):
if k1 in ('id'):
continue
records.append({
'symbol' : s,
'date' : k1,
'gross_profit_growth%': v1,
'operating_income_growth%': v2,
'net_income_growth%': v3,
'diluted_eps_growth%' : v4,
'operating_cash_flow_growth%' : v5
})
df = pd.DataFrame(records)
df.head(50)
The result is incorrect. It only has 2018 and 2019 data. It should have data from 2012 to 2019.
symbol date gross_profit_growth% operating_income_growth% net_income_growth% diluted_eps_growth% operating_cash_flow_growth%
0 AAL 2019-12-31 0.0405 -0.1539 -0.0112 0.2508 0.0798
1 AAL 2018-12-31 -0.0876 -0.2463 0.0 -0.2231 -0.2553
My excepted result:
symbol date gross_profit_growth% operating_income_growth% net_income_growth% diluted_eps_growth% operating_cash_flow_growth%
0 AAL 31/12/2019 0.0405 0.154 0.1941 0.2508 0.0798
1 AAL 31/12/2018 -0.0876 -0.3723 0.1014 -0.2231 -0.2553
2 AAL 31/12/2017 -0.0165 -0.1638 -0.5039 -0.1892 -0.2728
3 AAL 31/12/2016 -0.079 -0.1844 -0.6604 -0.5655 0.044
4 AAL 31/12/2015 0.1983 0.4601 1.6405 1.8168 1.0289
5 AAL 31/12/2014 0.7305 2.0372 2.5714 1.2308 3.563
6 AAL 31/12/2013 0.3575 8.4527 0.0224 nan -0.4747
7 AAL 31/12/2012 0.1688 1.1427 0.052 nan 0.7295
8 AAL 31/12/2011 0.0588 -4.3669 -3.2017 nan -0.4013
9 AAL 31/12/2010 0.3413 1.3068 0.6792 nan 0.3344
import requests
import pandas as pd
records = []
tickers = ['A', 'AAL', 'AAPL']
url_metrics = 'https://stockrow.com/api/companies/{}/financials.json?ticker={}&dimension=A§ion=Growth'
indicators_url = 'https://stockrow.com/api/indicators.json'
for s in tickers:
print('Getting data for ticker: {}'.format(s))
indicators = {i['id']: i for i in requests.get(indicators_url).json()}
all_records = []
for d in requests.get(url_metrics.format(s,s)).json():
d['id'] = indicators[d['id']]['name']
all_records.append(d)
gross_profit_growth = next(d for d in all_records if 'Gross Profit Growth' == d['id'])
operating_income_growth = next(d for d in all_records if 'Operating Income Growth' == d['id'])
net_income_growth = next(d for d in all_records if 'Net Income Growth' == d['id'])
eps_growth_diluted = next(d for d in all_records if 'EPS Growth (diluted)' == d['id'])
operating_cash_flow_growth = next(d for d in all_records if 'Operating Cash Flow Growth' == d['id'])
del gross_profit_growth['id']
del operating_income_growth['id']
del net_income_growth['id']
del eps_growth_diluted['id']
del operating_cash_flow_growth['id']
d1 = pd.DataFrame({'date': gross_profit_growth.keys(), 'gross_profit_growth%': gross_profit_growth.values()}).set_index('date')
d2 = pd.DataFrame({'date': operating_income_growth.keys(), 'operating_income_growth%': operating_income_growth.values()}).set_index('date')
d3 = pd.DataFrame({'date': net_income_growth.keys(), 'net_income_growth%': net_income_growth.values()}).set_index('date')
d4 = pd.DataFrame({'date': eps_growth_diluted.keys(), 'diluted_eps_growth%': eps_growth_diluted.values()}).set_index('date')
d5 = pd.DataFrame({'date': operating_cash_flow_growth.keys(), 'operating_cash_flow_growth%': operating_cash_flow_growth.values()}).set_index('date')
d = pd.concat([d1, d2, d3, d4, d5], axis=1)
d['symbol'] = s
records.append(d)
df = pd.concat(records)
print(df)
Prints:
gross_profit_growth% operating_income_growth% net_income_growth% diluted_eps_growth% operating_cash_flow_growth% symbol
2019-10-31 0.0466 0.0409 2.3892 2.4742 -0.0607 A
2018-10-31 0.1171 0.1202 -0.538 -0.5381 0.2227 A
2017-10-31 0.0919 0.3122 0.4805 0.5 0.1211 A
2016-10-31 0.0764 0.1782 0.1521 0.1765 0.5488 A
2015-10-31 0.0329 0.2458 -0.2696 -0.1905 -0.2996 A
2014-10-31 0.0362 0.0855 -0.252 -0.3 -0.3655 A
2013-10-31 -0.4709 -0.655 -0.3634 -0.3578 -0.0619 A
2012-10-31 0.0213 0.0448 0.1393 0.1474 -0.0254 A
2011-10-31 0.2044 0.8922 0.4795 0.6102 0.7549 A
2019-12-31 0.0405 0.154 0.1941 0.2508 0.0798 AAL
2018-12-31 -0.0876 -0.3723 0.1014 -0.2231 -0.2553 AAL
2017-12-31 -0.0165 -0.1638 -0.5039 -0.1892 -0.2728 AAL
2016-12-31 -0.079 -0.1844 -0.6604 -0.5655 0.044 AAL
2015-12-31 0.1983 0.4601 1.6405 1.8168 1.0289 AAL
2014-12-31 0.7305 2.0372 2.5714 1.2308 3.563 AAL
2013-12-31 0.3575 8.4527 0.0224 NaN -0.4747 AAL
2012-12-31 0.1688 1.1427 0.052 NaN 0.7295 AAL
2011-12-31 0.0588 -4.3669 -3.2017 NaN -0.4013 AAL
2010-12-31 0.3413 1.3068 0.6792 NaN 0.3344 AAL
2020-09-30 0.0667 0.0369 0.039 NaN 0.1626 AAPL
2019-09-30 -0.0338 -0.0983 -0.0718 -0.0017 -0.1039 AAPL
2018-09-30 0.1548 0.1557 0.2312 0.2932 0.2057 AAPL
2017-09-30 0.0466 0.022 0.0583 0.1083 -0.0303 AAPL
2016-09-30 -0.1 -0.1573 -0.1443 -0.0987 -0.185 AAPL
2015-09-30 0.3273 0.3567 0.3514 0.4295 0.3609 AAPL
2014-09-30 0.0969 0.0715 0.0668 0.1358 0.1127 AAPL
2013-09-30 -0.0635 -0.113 -0.1125 -0.0996 0.0553 AAPL
2012-09-30 0.567 0.6348 0.6099 0.595 0.3551 AAPL
2011-09-30 0.706 0.8379 0.8499 0.827 1.0182 AAPL
I'm using matplotlib to draw trendance line for stock data.
import pandas as pd
import matplotlib.pyplot as plt
A = pd.read_csv('daily/A.csv', index_col=[0])
print(A)
AAL = pd.read_csv('daily/AAL.csv', index_col=[0])
print(AAL)
A['Close'].plot()
AAL['Close'].plot()
plt.show()
then result is:
High Low Open Close Volume Adj Close
Date
1999-11-18 35.77 28.61 32.55 31.47 62546300.0 27.01
1999-11-19 30.76 28.48 30.71 28.88 15234100.0 24.79
1999-11-22 31.47 28.66 29.55 31.47 6577800.0 27.01
1999-11-23 31.21 28.61 30.40 28.61 5975600.0 24.56
1999-11-24 30.00 28.61 28.70 29.37 4843200.0 25.21
... ... ... ... ... ... ...
2020-06-24 89.08 86.32 89.08 86.56 1806600.0 86.38
2020-06-25 87.35 84.80 86.43 87.26 1350100.0 87.08
2020-06-26 87.56 85.52 87.23 85.90 2225800.0 85.72
2020-06-29 87.36 86.11 86.56 87.29 1302500.0 87.29
2020-06-30 88.88 87.24 87.33 88.37 1428931.0 88.37
[5186 rows x 6 columns]
High Low Open Close Volume Adj Close
Date
2005-09-27 21.40 19.10 21.05 19.30 961200.0 18.19
2005-09-28 20.53 19.20 19.30 20.50 5747900.0 19.33
2005-09-29 20.58 20.10 20.40 20.21 1078200.0 19.05
2005-09-30 21.05 20.18 20.26 21.01 3123300.0 19.81
2005-10-03 21.75 20.90 20.90 21.50 1057900.0 20.27
... ... ... ... ... ... ...
2020-06-24 13.90 12.83 13.59 13.04 140975500.0 13.04
2020-06-25 13.24 12.18 12.53 13.17 117383400.0 13.17
2020-06-26 13.29 12.13 13.20 12.38 108813000.0 12.38
2020-06-29 13.51 12.02 12.57 13.32 114650300.0 13.32
2020-06-30 13.48 12.88 13.10 13.07 68669742.0 13.07
[3715 rows x 6 columns]
yes, the begin of 2 stocks is different, the end date is same.
so i get the plot is like this:
stockplot
this is not normal like others.
so, who could give me any advice, to draw a normal trendance line for 2 stocks?
You can try for making two different plots with same limits and then put one over the another for comparison.
I have a pandas data frame that looks like:
High Low ... Volume OpenInterest
2018-01-02 983.25 975.50 ... 8387 67556
2018-01-03 986.75 981.00 ... 7447 67525
2018-01-04 985.25 977.00 ... 8725 67687
2018-01-05 990.75 984.00 ... 7948 67975
I calculate the Average True Range and save it into a series:
i = 0
TR_l = [0]
while i < (df.shape[0]-1):
#TR = max(df.loc[i + 1, 'High'], df.loc[i, 'Close']) - min(df.loc[i + 1, 'Low'], df.loc[i, 'Close'])
TR = max(df['High'][i+1], df['Close'][i]) - min(df['Low'][i+1], df['Close'][i])
TR_l.append(TR)
i = i + 1
TR_s = pd.Series(TR_l)
ATR = pd.Series(TR_s.ewm(span=n, min_periods=n).mean(), name='ATR_' + str(n))
With a 14-period rolling window ATR looks like:
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 NaN
10 NaN
11 NaN
12 NaN
13 8.096064
14 7.968324
15 8.455205
16 9.046418
17 8.895405
18 9.088769
19 9.641879
20 9.516764
But when I do:
df = df.join(ATR)
The ATR column in df is all NaN. It's because the indexes are different between the data frame and ATR. Is there any way to add the ATR column into the data frame?
Consider shift to avoid the while loop across rows and list building. Below uses Union Pacific (UNP) railroad stock data to demonstrate:
import pandas as pd
import pandas_datareader as pdr
stock_df = pdr.get_data_yahoo('UNP').loc['2019-01-01':'2019-03-29']
# SHIFT DATA ONE DAY BACK AND JOIN TO ORIGINAL DATA
stock_df = stock_df.join(stock_df.shift(-1), rsuffix='_future')
# CALCULATE TR DIFFERENCE BY ROW
stock_df['TR'] = stock_df.apply(lambda x: max(x['High_future'], x['Close']) - min(x['Low_future'], x['Close']), axis=1)
# CALCULATE EWM MEAN
n = 14
stock_df['ATR'] = stock_df['TR'].ewm(span=n, min_periods=n).mean()
Output
print(stock_df.head(20))
# High Low Open Close Volume Adj Close High_future Low_future Open_future Close_future Volume_future Adj Close_future TR ATR
# Date
# 2019-01-02 138.320007 134.770004 135.649994 137.779999 3606300.0 137.067413 136.750000 132.169998 136.039993 132.679993 5684500.0 131.993790 5.610001 NaN
# 2019-01-03 136.750000 132.169998 136.039993 132.679993 5684500.0 131.993790 138.580002 134.520004 134.820007 137.789993 5649900.0 137.077362 5.900009 NaN
# 2019-01-04 138.580002 134.520004 134.820007 137.789993 5649900.0 137.077362 139.229996 136.259995 137.330002 138.649994 4034200.0 137.932907 2.970001 NaN
# 2019-01-07 139.229996 136.259995 137.330002 138.649994 4034200.0 137.932907 152.889999 149.039993 151.059998 150.750000 10558800.0 149.970337 14.240005 NaN
# 2019-01-08 152.889999 149.039993 151.059998 150.750000 10558800.0 149.970337 151.059998 148.610001 150.289993 150.360001 4284600.0 149.582352 2.449997 NaN
# 2019-01-09 151.059998 148.610001 150.289993 150.360001 4284600.0 149.582352 155.289993 149.009995 149.899994 154.660004 6444600.0 153.860123 6.279999 NaN
# 2019-01-10 155.289993 149.009995 149.899994 154.660004 6444600.0 153.860123 155.029999 153.089996 153.639999 153.210007 3845200.0 152.417618 1.940002 NaN
# 2019-01-11 155.029999 153.089996 153.639999 153.210007 3845200.0 152.417618 154.240005 151.649994 152.229996 153.889999 3507100.0 153.094101 2.590012 NaN
# 2019-01-14 154.240005 151.649994 152.229996 153.889999 3507100.0 153.094101 154.360001 151.740005 153.789993 152.479996 4685100.0 151.691391 2.619995 NaN
# 2019-01-15 154.360001 151.740005 153.789993 152.479996 4685100.0 151.691391 153.729996 150.910004 152.910004 151.970001 4053200.0 151.184021 2.819992 NaN
# 2019-01-16 153.729996 150.910004 152.910004 151.970001 4053200.0 151.184021 154.919998 150.929993 151.110001 154.639999 4075400.0 153.840210 3.990005 NaN
# 2019-01-17 154.919998 150.929993 151.110001 154.639999 4075400.0 153.840210 158.800003 155.009995 155.539993 158.339996 5003900.0 157.521072 4.160004 NaN
# 2019-01-18 158.800003 155.009995 155.539993 158.339996 5003900.0 157.521072 157.199997 154.410004 156.929993 155.020004 6052900.0 154.218262 3.929993 NaN
# 2019-01-22 157.199997 154.410004 156.929993 155.020004 6052900.0 154.218262 156.020004 152.429993 155.449997 154.330002 4858000.0 153.531830 3.590012 4.011254
# 2019-01-23 156.020004 152.429993 155.449997 154.330002 4858000.0 153.531830 160.759995 156.009995 160.039993 160.339996 9222400.0 159.510742 6.429993 4.376440
# 2019-01-24 160.759995 156.009995 160.039993 160.339996 9222400.0 159.510742 162.000000 160.220001 161.460007 160.949997 7770700.0 160.117584 1.779999 3.991223
# 2019-01-25 162.000000 160.220001 161.460007 160.949997 7770700.0 160.117584 160.789993 159.339996 160.000000 159.899994 3733800.0 159.073013 1.610001 3.643168
# 2019-01-28 160.789993 159.339996 160.000000 159.899994 3733800.0 159.073013 160.929993 158.750000 160.039993 160.169998 3436900.0 159.341614 2.179993 3.432011
# 2019-01-29 160.929993 158.750000 160.039993 160.169998 3436900.0 159.341614 161.889999 159.440002 161.089996 160.820007 4112200.0 159.988266 2.449997 3.291831
# 2019-01-30 161.889999 159.440002 161.089996 160.820007 4112200.0 159.988266 160.990005 157.020004 160.750000 159.070007 7438600.0 158.247314 3.970001 3.387735
I want to do interpolation for a Pandas series of the following structure
X
22.88 3.047
45.75 3.215
68.63 3.328
91.50 3.423
114.38 3.516
137.25 3.578
163.40 3.676
196.08 3.756
228.76 3.861
261.44 3.942
294.12 4.012
326.80 4.084
359.48 4.147
392.16 4.197
Name: Y, dtype: float64
I want to interpolate the data so that I have a new series to cover X=[23:392:1]. I looked up the document but didn't find where I could input the new x-axis. Did I miss something? How can I do interpolation with the new x-axis?
This can be done with pandas's reindex and interpolate:
In [27]: s
Out[27]:
1
0
22.88 3.047
45.75 3.215
68.63 3.328
91.50 3.423
114.38 3.516
137.25 3.578
163.40 3.676
196.08 3.756
228.76 3.861
261.44 3.942
294.12 4.012
326.80 4.084
359.48 4.147
392.16 4.197
[14 rows x 1 columns]
In [28]: idx = pd.Index(np.arange(23, 392))
In [29]: s.reindex(s.index + idx).interpolate(method='values')
Out[29]:
1
22.88 3.047000
23.00 3.047882
24.00 3.055227
25.00 3.062573
26.00 3.069919
27.00 3.077265
28.00 3.084611
29.00 3.091957
30.00 3.099303
31.00 3.106648
32.00 3.113994
33.00 3.121340
34.00 3.128686
35.00 3.136032
36.00 3.143378
37.00 3.150724
38.00 3.158070
39.00 3.165415
40.00 3.172761
41.00 3.180107
42.00 3.187453
43.00 3.194799
44.00 3.202145
45.00 3.209491
45.75 3.215000
46.00 3.216235
47.00 3.221174
48.00 3.226112
The idea is the create the index you want (s.index + idx), which is sorted automatically, reindex an that (which makes a bunch of NaNs at the new points, and the interpolate to fill the NaNs, using the values method, which interpolates at the index points.
You can call numpy.interp() directly:
import numpy as np
import pandas as pd
import io
data = """x y
22.88 3.047
45.75 3.215
68.63 3.328
91.50 3.423
114.38 3.516
137.25 3.578
163.40 3.676
196.08 3.756
228.76 3.861
261.44 3.942
294.12 4.012
326.80 4.084
359.48 4.147
392.16 4.197"""
s = pd.read_csv(io.BytesIO(data), delim_whitespace=True, index_col=0, squeeze=True)
new_idx = np.arange(23,393)
new_val = np.interp(new_idx, s.index.values.astype(float), s.values)
s2 = pd.Series(new_val, new_idx)