I try to scrape a webpage with hourly energy prices. I want to use the data for home-automation. if the hourly price =< baseload price, certain times should turn on via Mqtt.
I managed to get the data from the baseload price and the hourly prices from its column. The output from the column seems not to be in one list but in 24 lists. correct? how to fix this so that the hourly price can be compared with the baseload price?
import datetime
import pytz
import requests
from bs4 import BeautifulSoup as bs
today_utc = pytz.utc.localize(datetime.datetime.utcnow())
today = today_utc.astimezone(pytz.timezone("Europe/Amsterdam"))
text_today = today.strftime("%y-%m-%d")
print(today)
print(text_today)
yesterday = datetime.datetime.now(tz=pytz.timezone("Europe/Amsterdam")) - datetime.timedelta(1)
text_yesterday = yesterday.strftime("%y-%m-%d")
print(yesterday)
print(text_yesterday)
url_part1 = 'https://www.epexspot.com/en/market-data?market_area=NL&trading_date='
url_part2 = '&delivery_date='
url_part3 = '&underlying_year=&modality=Auction&sub_modality=DayAhead&technology=&product=60&data_mode=table&period=&production_period='
url_text = url_part1+text_yesterday+url_part2+text_today+url_part3
print(url_text)
html_text = requests.get(url_text).text
#print(html_text)
soup = bs(html_text,'lxml')
#print(soup.prettify())
baseload = soup.find_all('div', class_='flex day-1')
for baseload_price in baseload:
baseload_price = baseload_price.find('span').text.replace(' ', '')
print(baseload_price)
table = soup.find_all('tr',{'class':"child"})
#print(table)
for columns in table:
column3 = columns.find_all('td')[3:]
#print(columns)
column3_text = [td.text.strip() for td in column3]
column3_text = column3_text
print(column3_text)
In the for loop for columns in table, you are creating a new list column3_text. If you intend for column3 text to be a list of the next 24 hours, you can replace this for loop with this:
column3_text = [column.find_all("td")[3].text.strip() for column in table]
Additionally, if you are going to be comparing the baseload price to the hourly prices, you'll want to convert the strings to floats or Decimals. :)
You simply need to use join:
column3_text = "".join([td.text.strip() for td in column3])
If you want to compare the values use pandas.
Here's how:
import datetime
import urllib.parse
import pandas as pd
import requests
from bs4 import BeautifulSoup
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:104.0) Gecko/20100101 Firefox/104.0",
}
today = datetime.datetime.today().strftime("%Y-%m-%d")
yesterday = (
datetime.datetime.today() - datetime.timedelta(days=1)
).strftime("%Y-%m-%d")
url = "https://www.epexspot.com/en/market-data?"
data = {
"market_area": "NL",
"trading_date": yesterday,
"delivery_date": today,
"underlying_year": "",
"modality": "Auction",
"sub_modality": "DayAhead",
"technology": "",
"product": "60",
"data_mode": "table",
"period": "",
"production_period": "",
}
query_url = f"{url}{urllib.parse.urlencode(data)}"
with requests.Session() as s:
s.headers.update(headers)
response = s.get(query_url).text
baseload = (
BeautifulSoup(response, "html.parser")
.select_one(".day-1 > span:nth-child(1)")
.text
)
print(f"Baselaod: {baseload}")
df = pd.concat(pd.read_html(response, flavor="lxml"), ignore_index=True)
df.columns = range(df.shape[1])
df = df.drop(df.columns[[4, 5, 6, 7]], axis=1)
df['is_higher'] = df[[3]].apply(lambda x: (x >= float(baseload)), axis=1)
df['price_diff'] = df[[3]].apply(lambda x: (x - float(baseload)), axis=1)
df = df.set_axis(
[
"buy_volume",
"sell_volume",
"volume",
"price",
"is_higher",
"price_diff",
],
axis=1,
copy=False,
)
df.insert(
0,
"hours",
[
f"0{value}:00 - {value + 1}:00" if value < 10
else f"{value}:00 - {value + 1}:00"
for value in range(0, 24)
],
)
print(df)
Output:
Baselaod: 144.32
hours buy_volume sell_volume ... price is_higher price_diff
0 00:00 - 1:00 2052.2 3608.7 ... 124.47 False -19.85
1 01:00 - 2:00 2467.8 3408.9 ... 119.09 False -25.23
2 02:00 - 3:00 2536.8 3220.5 ... 116.32 False -28.00
3 03:00 - 4:00 2552.0 3206.5 ... 114.60 False -29.72
4 04:00 - 5:00 2524.4 3010.0 ... 115.07 False -29.25
5 05:00 - 6:00 2542.4 3342.7 ... 123.54 False -20.78
6 06:00 - 7:00 2891.2 3872.2 ... 145.42 True 1.10
7 07:00 - 8:00 3413.2 3811.0 ... 166.40 True 22.08
8 08:00 - 9:00 3399.4 3566.0 ... 168.00 True 23.68
9 09:00 - 10:00 2919.3 3159.4 ... 153.30 True 8.98
10 10:00 - 11:00 2680.2 3611.5 ... 143.35 False -0.97
11 11:00 - 12:00 2646.8 3722.3 ... 141.95 False -2.37
12 12:00 - 13:00 2606.4 3723.3 ... 141.96 False -2.36
13 13:00 - 14:00 2559.7 3232.3 ... 145.96 True 1.64
14 14:00 - 15:00 2544.9 3261.2 ... 155.00 True 10.68
15 15:00 - 16:00 2661.7 3428.0 ... 169.15 True 24.83
16 16:00 - 17:00 3072.2 3529.4 ... 173.36 True 29.04
17 17:00 - 18:00 3593.7 3091.4 ... 192.00 True 47.68
18 18:00 - 19:00 3169.0 3255.4 ... 182.86 True 38.54
19 19:00 - 20:00 2710.1 3630.3 ... 167.96 True 23.64
20 20:00 - 21:00 2896.3 3728.8 ... 147.17 True 2.85
21 21:00 - 22:00 3160.3 3639.2 ... 136.78 False -7.54
22 22:00 - 23:00 3506.2 3196.3 ... 119.90 False -24.42
23 23:00 - 24:00 3343.8 3414.1 ... 100.00 False -44.32
I would like to resample and interpolate my data (ca. n=2400; years 1990 - 2000; Season: December - July, Price values) based on an index (development stage), which increments from 0 until 2. The increment is daily and the final value (2) can be reached mostly in July, however sometimes even in May. However, I would like to monitor price values at specific (interpolated) development stages, i.e. DEV_LIST = [0, 0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2] every season.
Here is my code snippet:
In: df
Out:
day DEV PRICE MONTH YEAR MONTH_YEAR
0 1990-11-15 0.1000 15.0000 11 1990 Nov 1990
1 1990-11-16 0.1864 16.0000 11 1990 Nov 1990
2 1990-11-17 0.1937 20.0000 11 1990 Nov 1990
3 1990-11-18 0.2621 24.0000 11 1990 Nov 1990
4 1990-11-19 0.2634 30.0000 11 1990 Nov 1990
... ... ... ... ... ... ...
2396 2000-06-06 1.9405 9010.0000 06 2000 Jun 2000
2397 2000-06-07 1.9568 9015.0000 06 2000 Jun 2000
2398 2000-06-08 1.9736 9020.0000 06 2000 Jun 2000
2399 2000-06-09 1.9907 9023.0000 06 2000 Jun 2000
2400 2000-06-10 NaN NaN 06 2000 Jun 2000
I was thinking to do something like this:
DEV_LIST = [0, 0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2]
year_range_for_dev = range(1990, 2001)
DEV_year = pd.DataFrame({'DEV' :
[DEV for DEV in DEV_LIST for year in year_range_for_dev ],
'YEAR':
[year for DEV in DEV_LIST for year in year_range_for_dev]})
df_2 = df[['DEV', 'PRICE']]
df_2 = df_2[~df_2['DEV'].isin(DEV_LIST)].set_index('DEV')
df_2_resampled = df_2.reindex(df_2 .index.union(
DEV_year['DEV'])).interpolate('values')
Probably continuing with pandas.merge_asof for putting back dates later. However, I am not able to pass this error:
ValueError: cannot reindex from a duplicate axis
Desired output should look like this:
DEV PRICE
1 0.0 12.0000
2 0.1000 15.0000
3 0.1864 16.0000
4 0.1937 20.0000
5 0.2500 24.0000
6 0.2621 30.0000
7 0.2634 32.0000
... ... ...
2405 1.9405 9010.0000
2406 1.9568 9015.0000
2407 1.9736 9020.0000
2408 1.9907 9023.0000
2409 2.0000 9025.0000
Many thanks for any suggestions.
I have written a python code which scrape information from a website. I tried to apply multi-thread method in my code. Here's my code before applying multithreading: It run perfectly on my PC.
import requests
from bs4 import BeautifulSoup
import pandas as pd
import investpy
def getCurrencyHistorical():
t1 = time.perf_counter()
headers = {'Accept-Language': 'en-US,en;q=0.9',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36 Edg/88.0.705.63',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,/;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive'}
links = {"USD-IDR":"https://www.investing.com/currencies/usd-idr-historical-data",
"USD-JPY":"https://www.investing.com/currencies/usd-jpy-historical-data",
"USD-CNY":"https://www.investing.com/currencies/usd-cny-historical-data"}
column = []
output = []
for key, value in links.items():
page = requests.get(value, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
table =soup.select('table')[0]
#ColumnName
rows = table.find_all('tr')
for row in rows:
cols = row.find_all('th')
cols = [item.text.strip() for item in cols]
column.append(cols)
outs = row.find_all('td')
outs = [item.text.strip() for item in outs]
outs.append(key)
output.append(outs)
del output[0]
#print(value)
#print(output)
column[0].append('Currency')
df = pd.DataFrame(output, columns = column[0])
t2 = time.perf_counter()
print(f'Finished in {t2-t1} seconds')
return(df)
But, when I convert to below, I got some error. here's the code after applying multithreading:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import time
import concurrent.futures
from functools import partial
import psutil
def process_data(key, page):
soup = BeautifulSoup(page, 'html.parser')
table =soup.select('table')[0]
#ColumnName
rows = table.find_all('tr')
for row in rows:
cols = row.find_all('th')
cols = [item.text.strip() for item in cols]
outs = row.find_all('td')
outs = [item.text.strip() for item in outs]
outs.append(key)
return cols, outs
def getCurrencyHistorical(session, pool_executor, item):
key, value = item
page = session.get(value)
f = pool_executor.submit(process_data, key, page.content)
return f.result()
def main():
t1 = time.perf_counter()
links = {"USD-IDR":"https://www.investing.com/currencies/usd-idr-historical-data",
"USD-JPY":"https://www.investing.com/currencies/usd-jpy-historical-data",
"USD-CNY":"https://www.investing.com/currencies/usd-cny-historical-data"}
with requests.Session() as session:
user_agent = "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.37"
session.headers = {'User-Agent': user_agent}
column = []
output = []
with concurrent.futures.ProcessPoolExecutor(psutil.cpu_count(logical=False)) as pool_executor, \
concurrent.futures.ThreadPoolExecutor(max_workers=len(links)) as executor:
for return_value in executor.map(partial(getCurrencyHistorical, session, pool_executor), links.items()):
cols, outs = return_value
column.append(cols)
output.append(outs)
del output[0]
column[0].append('Currency')
df = pd.DataFrame(output, columns = column[0])
t2 = time.perf_counter()
print(f'Finished in {t2-t1} seconds')
print(df)
# Required for Windows:
if __name__ == '__main__':
main()
I got error raise ValueError(err) from err. ValueError: 1 columns passed, passed data had 7 columns. and it comes from the line df = pd.DataFrame(output, columns = column[0]). What is wrong? Thank you.
process_data should be just like the non-multiprocessing case except for the fact it is only processing one key-value pair, but that's not what you have done. The main process now must do extend operations on the lists returned by process_data.
Update
You were not retrieving the data items for key "USD-JPY" because you were not looking at the correct table. You should be looking at the table with id 'curr_table'. I have also updated the multiprocessing pool size per my comment to your question.
import requests
from bs4 import BeautifulSoup
import pandas as pd
import time
import concurrent.futures
from functools import partial
from os import cpu_count
def process_data(key, page):
soup = BeautifulSoup(page, 'html.parser')
table = soup.find('table', {'id': 'curr_table'})
#ColumnName
rows = table.find_all('tr')
column = []
output = []
for row in rows:
cols = row.find_all('th')
cols = [item.text.strip() for item in cols]
column.append(cols)
outs = row.find_all('td')
outs = [item.text.strip() for item in outs]
outs.append(key)
output.append(outs)
del output[0]
return column, output
def getCurrencyHistorical(session, pool_executor, item):
key, value = item
page = session.get(value)
f = pool_executor.submit(process_data, key, page.content)
return f.result()
def main():
t1 = time.perf_counter()
links = {"USD-IDR":"https://www.investing.com/currencies/usd-idr-historical-data",
"USD-JPY":"https://www.investing.com/currencies/usd-jpy-historical-data",
"USD-CNY":"https://www.investing.com/currencies/usd-cny-historical-data"}
with requests.Session() as session:
user_agent = "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.37"
session.headers = {'User-Agent': user_agent}
column = []
output = []
with concurrent.futures.ProcessPoolExecutor(min(len(links), cpu_count())) as pool_executor, \
concurrent.futures.ThreadPoolExecutor(max_workers=len(links)) as executor:
for return_value in executor.map(partial(getCurrencyHistorical, session, pool_executor), links.items()):
cols, outs = return_value
column.extend(cols)
output.extend(outs)
column[0].append('Currency')
df = pd.DataFrame(output, columns = column[0])
t2 = time.perf_counter()
print(f'Finished in {t2-t1} seconds')
pd.set_option("display.max_rows", None, "display.max_columns", None)
print(df)
# Required for Windows:
if __name__ == '__main__':
main()
Prints:
Finished in 2.1944901 seconds
Date Price Open High Low Change % Currency
0 Aug 26, 2021 14,417.5 14,425.0 14,430.0 14,411.0 0.16% USD-IDR
1 Aug 25, 2021 14,395.0 14,405.0 14,421.0 14,387.5 0.03% USD-IDR
2 Aug 24, 2021 14,390.0 14,395.0 14,407.5 14,377.5 -0.14% USD-IDR
3 Aug 23, 2021 14,410.0 14,435.0 14,438.5 14,404.0 -0.28% USD-IDR
4 Aug 20, 2021 14,450.0 14,475.0 14,485.0 14,422.5 0.35% USD-IDR
5 Aug 19, 2021 14,400.0 14,405.0 14,425.0 14,392.5 0.21% USD-IDR
6 Aug 18, 2021 14,370.0 14,387.5 14,400.0 14,372.5 0.00% USD-IDR
7 Aug 16, 2021 14,370.0 14,390.0 14,395.0 14,371.5 -0.10% USD-IDR
8 Aug 13, 2021 14,385.0 14,382.5 14,395.0 14,366.0 0.03% USD-IDR
9 Aug 12, 2021 14,380.0 14,395.0 14,407.5 14,366.0 0.00% USD-IDR
10 Aug 10, 2021 14,380.0 14,375.0 14,402.0 14,375.0 0.14% USD-IDR
11 Aug 09, 2021 14,360.0 14,370.0 14,387.5 14,357.5 0.07% USD-IDR
12 Aug 06, 2021 14,350.0 14,360.0 14,377.5 14,347.5 0.07% USD-IDR
13 Aug 05, 2021 14,340.0 14,330.0 14,360.0 14,321.0 0.21% USD-IDR
14 Aug 04, 2021 14,310.0 14,325.0 14,347.5 14,304.5 -0.21% USD-IDR
15 Aug 03, 2021 14,340.0 14,375.0 14,388.0 14,338.5 -0.55% USD-IDR
16 Aug 02, 2021 14,420.0 14,465.0 14,472.5 14,422.5 -0.28% USD-IDR
17 Jul 30, 2021 14,460.0 14,435.0 14,477.5 14,434.5 -0.14% USD-IDR
18 Jul 29, 2021 14,480.0 14,490.0 14,502.5 14,482.5 -0.03% USD-IDR
19 Jul 28, 2021 14,485.0 14,500.0 14,512.5 14,485.0 -0.03% USD-IDR
20 Jul 27, 2021 14,490.0 14,473.5 14,497.5 14,465.0 0.07% USD-IDR
21 Jul 26, 2021 14,480.0 14,510.0 14,522.5 14,470.0 -0.07% USD-IDR
22 Aug 26, 2021 110.10 109.98 110.23 109.93 0.10% USD-JPY
23 Aug 25, 2021 109.99 109.64 110.13 109.61 0.34% USD-JPY
24 Aug 24, 2021 109.62 109.69 109.89 109.41 -0.05% USD-JPY
25 Aug 23, 2021 109.68 109.81 110.15 109.65 -0.11% USD-JPY
26 Aug 20, 2021 109.80 109.75 109.89 109.57 0.07% USD-JPY
27 Aug 19, 2021 109.72 109.76 110.23 109.49 -0.02% USD-JPY
28 Aug 18, 2021 109.74 109.57 110.07 109.47 0.16% USD-JPY
29 Aug 17, 2021 109.57 109.22 109.66 109.12 0.31% USD-JPY
30 Aug 16, 2021 109.23 109.71 109.76 109.11 -0.31% USD-JPY
31 Aug 13, 2021 109.57 110.39 110.46 109.54 -0.73% USD-JPY
32 Aug 12, 2021 110.38 110.42 110.55 110.31 -0.02% USD-JPY
33 Aug 11, 2021 110.40 110.58 110.81 110.31 -0.14% USD-JPY
34 Aug 10, 2021 110.56 110.29 110.60 110.28 0.25% USD-JPY
35 Aug 09, 2021 110.28 110.26 110.36 110.02 0.03% USD-JPY
36 Aug 06, 2021 110.25 109.77 110.36 109.69 0.46% USD-JPY
37 Aug 05, 2021 109.74 109.49 109.79 109.40 0.25% USD-JPY
38 Aug 04, 2021 109.47 109.07 109.68 108.72 0.39% USD-JPY
39 Aug 03, 2021 109.04 109.32 109.36 108.88 -0.22% USD-JPY
40 Aug 02, 2021 109.28 109.69 109.79 109.18 -0.38% USD-JPY
41 Jul 30, 2021 109.70 109.49 109.83 109.36 0.22% USD-JPY
42 Jul 29, 2021 109.46 109.91 109.96 109.42 -0.40% USD-JPY
43 Jul 28, 2021 109.90 109.75 110.29 109.74 0.13% USD-JPY
44 Jul 27, 2021 109.76 110.36 110.41 109.58 -0.53% USD-JPY
45 Jul 26, 2021 110.34 110.57 110.59 110.11 -0.18% USD-JPY
46 Aug 26, 2021 6.4815 6.4725 6.4866 6.4725 0.09% USD-CNY
47 Aug 25, 2021 6.4756 6.4714 6.4811 6.4707 0.07% USD-CNY
48 Aug 24, 2021 6.4710 6.4790 6.4851 6.4676 -0.15% USD-CNY
49 Aug 23, 2021 6.4805 6.4915 6.4973 6.4788 -0.32% USD-CNY
50 Aug 20, 2021 6.5012 6.4960 6.5057 6.4935 0.11% USD-CNY
51 Aug 19, 2021 6.4942 6.4847 6.4997 6.4840 0.16% USD-CNY
52 Aug 18, 2021 6.4841 6.4861 6.4872 6.4776 -0.02% USD-CNY
53 Aug 17, 2021 6.4854 6.4787 6.4889 6.4759 0.17% USD-CNY
54 Aug 16, 2021 6.4742 6.4774 6.4810 6.4719 -0.04% USD-CNY
55 Aug 13, 2021 6.4768 6.4778 6.4854 6.4749 -0.02% USD-CNY
56 Aug 12, 2021 6.4782 6.4767 6.4811 6.4719 -0.00% USD-CNY
57 Aug 11, 2021 6.4783 6.4846 6.4894 6.4752 -0.11% USD-CNY
58 Aug 10, 2021 6.4852 6.4826 6.4875 6.4774 -0.01% USD-CNY
59 Aug 09, 2021 6.4857 6.4835 6.4895 6.4731 0.05% USD-CNY
60 Aug 06, 2021 6.4825 6.4660 6.4848 6.4622 0.34% USD-CNY
61 Aug 05, 2021 6.4608 6.4671 6.4677 6.4595 -0.07% USD-CNY
62 Aug 04, 2021 6.4655 6.4662 6.4673 6.4555 -0.07% USD-CNY
63 Aug 03, 2021 6.4700 6.4656 6.4710 6.4604 0.12% USD-CNY
64 Aug 02, 2021 6.4620 6.4615 6.4693 6.4580 0.02% USD-CNY
65 Jul 30, 2021 6.4609 6.4645 6.4693 6.4506 0.07% USD-CNY
66 Jul 29, 2021 6.4562 6.4908 6.4908 6.4544 -0.53% USD-CNY
67 Jul 28, 2021 6.4905 6.5095 6.5101 6.4891 -0.31% USD-CNY
68 Jul 27, 2021 6.5104 6.4760 6.5132 6.4735 0.43% USD-CNY
69 Jul 26, 2021 6.4825 6.4790 6.4875 6.4785 0.03% USD-CNY
I want to scrape some data from the website so I write the code to create a list which contains all records. And then, I want extract some elements from all records to create a dataframe.
However, some information of the dataframe is missing. In the all data list, it has the information from 2012 to 2019 but the dataframe only has 2018 and 2019 information. I tried different ways the resolve the problem. Finally, I find out if I do not use Zip function, the problem will not occur, may I know why and if I do not use Zip function, any solution I can use?
import requests
import pandas as pd
records = []
tickers = ['AAL']
url_metrics = 'https://stockrow.com/api/companies/{}/financials.json?ticker={}&dimension=A§ion=Growth'
indicators_url = 'https://stockrow.com/api/indicators.json'
# scrape all data and append to a list - all_records
for s in tickers:
indicators = {i['id']: i for i in requests.get(indicators_url).json()}
all_records = []
for d in requests.get(url_metrics.format(s,s)).json():
d['id'] = indicators[d['id']]['name']
all_records.append(d)
gross_profit_growth = next(d for d in all_records if 'Gross Profit Growth' in d['id'])
operating_income_growth = next(d for d in all_records if 'Operating Income Growth' in d['id'])
net_income_growth = next(d for d in all_records if 'Net Income Growth' in d['id'])
diluted_eps_growth = next(d for d in all_records if 'EPS Growth (diluted)' in d['id'])
operating_cash_flow_growth = next(d for d in all_records if 'Operating Cash Flow Growth' in d['id'])
# extract values from all_records and create the dataframe
for (k1, v1), (_, v2), (_, v3), (_, v4), (_, v5) in zip(gross_profit_growth.items(), operating_income_growth.items(), net_income_growth.items(), diluted_eps_growth.items(), operating_cash_flow_growth.items()):
if k1 in ('id'):
continue
records.append({
'symbol' : s,
'date' : k1,
'gross_profit_growth%': v1,
'operating_income_growth%': v2,
'net_income_growth%': v3,
'diluted_eps_growth%' : v4,
'operating_cash_flow_growth%' : v5
})
df = pd.DataFrame(records)
df.head(50)
The result is incorrect. It only has 2018 and 2019 data. It should have data from 2012 to 2019.
symbol date gross_profit_growth% operating_income_growth% net_income_growth% diluted_eps_growth% operating_cash_flow_growth%
0 AAL 2019-12-31 0.0405 -0.1539 -0.0112 0.2508 0.0798
1 AAL 2018-12-31 -0.0876 -0.2463 0.0 -0.2231 -0.2553
My excepted result:
symbol date gross_profit_growth% operating_income_growth% net_income_growth% diluted_eps_growth% operating_cash_flow_growth%
0 AAL 31/12/2019 0.0405 0.154 0.1941 0.2508 0.0798
1 AAL 31/12/2018 -0.0876 -0.3723 0.1014 -0.2231 -0.2553
2 AAL 31/12/2017 -0.0165 -0.1638 -0.5039 -0.1892 -0.2728
3 AAL 31/12/2016 -0.079 -0.1844 -0.6604 -0.5655 0.044
4 AAL 31/12/2015 0.1983 0.4601 1.6405 1.8168 1.0289
5 AAL 31/12/2014 0.7305 2.0372 2.5714 1.2308 3.563
6 AAL 31/12/2013 0.3575 8.4527 0.0224 nan -0.4747
7 AAL 31/12/2012 0.1688 1.1427 0.052 nan 0.7295
8 AAL 31/12/2011 0.0588 -4.3669 -3.2017 nan -0.4013
9 AAL 31/12/2010 0.3413 1.3068 0.6792 nan 0.3344
import requests
import pandas as pd
records = []
tickers = ['A', 'AAL', 'AAPL']
url_metrics = 'https://stockrow.com/api/companies/{}/financials.json?ticker={}&dimension=A§ion=Growth'
indicators_url = 'https://stockrow.com/api/indicators.json'
for s in tickers:
print('Getting data for ticker: {}'.format(s))
indicators = {i['id']: i for i in requests.get(indicators_url).json()}
all_records = []
for d in requests.get(url_metrics.format(s,s)).json():
d['id'] = indicators[d['id']]['name']
all_records.append(d)
gross_profit_growth = next(d for d in all_records if 'Gross Profit Growth' == d['id'])
operating_income_growth = next(d for d in all_records if 'Operating Income Growth' == d['id'])
net_income_growth = next(d for d in all_records if 'Net Income Growth' == d['id'])
eps_growth_diluted = next(d for d in all_records if 'EPS Growth (diluted)' == d['id'])
operating_cash_flow_growth = next(d for d in all_records if 'Operating Cash Flow Growth' == d['id'])
del gross_profit_growth['id']
del operating_income_growth['id']
del net_income_growth['id']
del eps_growth_diluted['id']
del operating_cash_flow_growth['id']
d1 = pd.DataFrame({'date': gross_profit_growth.keys(), 'gross_profit_growth%': gross_profit_growth.values()}).set_index('date')
d2 = pd.DataFrame({'date': operating_income_growth.keys(), 'operating_income_growth%': operating_income_growth.values()}).set_index('date')
d3 = pd.DataFrame({'date': net_income_growth.keys(), 'net_income_growth%': net_income_growth.values()}).set_index('date')
d4 = pd.DataFrame({'date': eps_growth_diluted.keys(), 'diluted_eps_growth%': eps_growth_diluted.values()}).set_index('date')
d5 = pd.DataFrame({'date': operating_cash_flow_growth.keys(), 'operating_cash_flow_growth%': operating_cash_flow_growth.values()}).set_index('date')
d = pd.concat([d1, d2, d3, d4, d5], axis=1)
d['symbol'] = s
records.append(d)
df = pd.concat(records)
print(df)
Prints:
gross_profit_growth% operating_income_growth% net_income_growth% diluted_eps_growth% operating_cash_flow_growth% symbol
2019-10-31 0.0466 0.0409 2.3892 2.4742 -0.0607 A
2018-10-31 0.1171 0.1202 -0.538 -0.5381 0.2227 A
2017-10-31 0.0919 0.3122 0.4805 0.5 0.1211 A
2016-10-31 0.0764 0.1782 0.1521 0.1765 0.5488 A
2015-10-31 0.0329 0.2458 -0.2696 -0.1905 -0.2996 A
2014-10-31 0.0362 0.0855 -0.252 -0.3 -0.3655 A
2013-10-31 -0.4709 -0.655 -0.3634 -0.3578 -0.0619 A
2012-10-31 0.0213 0.0448 0.1393 0.1474 -0.0254 A
2011-10-31 0.2044 0.8922 0.4795 0.6102 0.7549 A
2019-12-31 0.0405 0.154 0.1941 0.2508 0.0798 AAL
2018-12-31 -0.0876 -0.3723 0.1014 -0.2231 -0.2553 AAL
2017-12-31 -0.0165 -0.1638 -0.5039 -0.1892 -0.2728 AAL
2016-12-31 -0.079 -0.1844 -0.6604 -0.5655 0.044 AAL
2015-12-31 0.1983 0.4601 1.6405 1.8168 1.0289 AAL
2014-12-31 0.7305 2.0372 2.5714 1.2308 3.563 AAL
2013-12-31 0.3575 8.4527 0.0224 NaN -0.4747 AAL
2012-12-31 0.1688 1.1427 0.052 NaN 0.7295 AAL
2011-12-31 0.0588 -4.3669 -3.2017 NaN -0.4013 AAL
2010-12-31 0.3413 1.3068 0.6792 NaN 0.3344 AAL
2020-09-30 0.0667 0.0369 0.039 NaN 0.1626 AAPL
2019-09-30 -0.0338 -0.0983 -0.0718 -0.0017 -0.1039 AAPL
2018-09-30 0.1548 0.1557 0.2312 0.2932 0.2057 AAPL
2017-09-30 0.0466 0.022 0.0583 0.1083 -0.0303 AAPL
2016-09-30 -0.1 -0.1573 -0.1443 -0.0987 -0.185 AAPL
2015-09-30 0.3273 0.3567 0.3514 0.4295 0.3609 AAPL
2014-09-30 0.0969 0.0715 0.0668 0.1358 0.1127 AAPL
2013-09-30 -0.0635 -0.113 -0.1125 -0.0996 0.0553 AAPL
2012-09-30 0.567 0.6348 0.6099 0.595 0.3551 AAPL
2011-09-30 0.706 0.8379 0.8499 0.827 1.0182 AAPL
I know using pandas this is how you normally get daily stock price quotes. But I'm wondering if its possible to get monthly or weekly quotes, is there maybe a parameter I can pass through to get monthly quotes?
from pandas.io.data import DataReader
from datetime import datetime
ibm = DataReader('IBM', 'yahoo', datetime(2000,1,1), datetime(2012,1,1))
print(ibm['Adj Close'])
Monthly closing prices from Yahoo! Finance...
import pandas_datareader.data as web
data = web.get_data_yahoo('IBM','01/01/2015',interval='m')
where you can replace the interval input as required ('d', 'w', 'm', etc).
Using Yahoo Finance, it is possible to get Stock Prices using "interval" option with instead of "m" as shown:
#Library
import yfinance as yf
from datetime import datetime
#Load Stock price
df = yf.download("IBM", start= datetime(2000,1,1), end = datetime(2012,1,1),interval='1mo')
df
The result is:
The other possible interval options are:
1m,
2m,
5m,
15m,
30m,
60m,
90m,
1h,
1d,
5d,
1wk,
1mo,
3mo.
try this:
In [175]: from pandas_datareader.data import DataReader
In [176]: ibm = DataReader('IBM', 'yahoo', '2001-01-01', '2012-01-01')
UPDATE: show average for Adj Close only (month start)
In [12]: ibm.groupby(pd.TimeGrouper(freq='MS'))['Adj Close'].mean()
Out[12]:
Date
2001-01-01 79.430605
2001-02-01 86.625519
2001-03-01 75.938913
2001-04-01 81.134375
2001-05-01 90.460754
2001-06-01 89.705042
2001-07-01 83.350254
2001-08-01 82.100543
2001-09-01 74.335789
2001-10-01 79.937451
...
2011-03-01 141.628553
2011-04-01 146.530774
2011-05-01 150.298053
2011-06-01 146.844772
2011-07-01 158.716834
2011-08-01 150.690990
2011-09-01 151.627555
2011-10-01 162.365699
2011-11-01 164.596963
2011-12-01 167.924676
Freq: MS, Name: Adj Close, dtype: float64
show average for Adj Close only (month end)
In [13]: ibm.groupby(pd.TimeGrouper(freq='M'))['Adj Close'].mean()
Out[13]:
Date
2001-01-31 79.430605
2001-02-28 86.625519
2001-03-31 75.938913
2001-04-30 81.134375
2001-05-31 90.460754
2001-06-30 89.705042
2001-07-31 83.350254
2001-08-31 82.100543
2001-09-30 74.335789
2001-10-31 79.937451
...
2011-03-31 141.628553
2011-04-30 146.530774
2011-05-31 150.298053
2011-06-30 146.844772
2011-07-31 158.716834
2011-08-31 150.690990
2011-09-30 151.627555
2011-10-31 162.365699
2011-11-30 164.596963
2011-12-31 167.924676
Freq: M, Name: Adj Close, dtype: float64
monthly averages (all columns):
In [179]: ibm.groupby(pd.TimeGrouper(freq='M')).mean()
Out[179]:
Open High Low Close Volume Adj Close
Date
2001-01-31 100.767857 103.553571 99.428333 101.870357 9474409 79.430605
2001-02-28 111.193160 113.304210 108.967368 110.998422 8233626 86.625519
2001-03-31 97.366364 99.423637 95.252272 97.281364 11570454 75.938913
2001-04-30 103.990500 106.112500 102.229501 103.936999 11310545 81.134375
2001-05-31 115.781363 117.104091 114.349091 115.776364 7243463 90.460754
2001-06-30 114.689524 116.199048 113.739523 114.777618 6806176 89.705042
2001-07-31 106.717143 108.028095 105.332857 106.646666 7667447 83.350254
2001-08-31 105.093912 106.196521 103.856522 104.939999 6234847 82.100543
2001-09-30 95.138667 96.740000 93.471334 94.987333 12620833 74.335789
2001-10-31 101.400870 103.140000 100.327827 102.145217 9754413 79.937451
2001-11-30 113.449047 114.875715 112.510952 113.938095 6435061 89.256046
2001-12-31 120.651001 122.076000 119.790500 121.087999 6669690 94.878736
2002-01-31 116.483334 117.509524 114.613334 115.994762 9217280 90.887920
2002-02-28 103.194210 104.389474 101.646316 102.961579 9069526 80.764672
2002-03-31 105.246500 106.764499 104.312999 105.478499 7563425 82.756873
... ... ... ... ... ... ...
2010-10-31 138.956188 140.259048 138.427142 139.631905 6537366 122.241844
2010-11-30 144.281429 145.164762 143.385241 144.439524 4956985 126.878319
2010-12-31 145.155909 145.959545 144.567273 145.251819 4245127 127.726929
2011-01-31 152.595000 153.950499 151.861000 153.181501 5941580 134.699880
2011-02-28 163.217895 164.089474 162.510002 163.339473 4687763 144.050847
2011-03-31 160.433912 161.745652 159.154349 160.425651 5639752 141.628553
2011-04-30 165.437501 166.587500 164.760500 165.978500 5038475 146.530774
2011-05-31 169.657144 170.679046 168.442858 169.632857 5276390 150.298053
2011-06-30 165.450455 166.559093 164.691819 165.593635 4792836 146.844772
2011-07-31 178.124998 179.866502 177.574998 178.981500 5679660 158.716834
2011-08-31 169.734350 171.690435 166.749567 169.360434 8480613 150.690990
2011-09-30 169.752858 172.034761 168.109999 170.245714 6566428 151.627555
2011-10-31 181.529525 183.597145 180.172379 182.302381 6883985 162.365699
2011-11-30 184.536668 185.950952 182.780477 184.244287 4619719 164.596963
2011-12-31 188.151428 189.373809 186.421905 187.789047 4925547 167.924676
[132 rows x 6 columns]
weekly averages (all columns):
In [180]: ibm.groupby(pd.TimeGrouper(freq='W')).mean()
Out[180]:
Open High Low Close Volume Adj Close
Date
2001-01-07 89.234375 94.234375 87.890625 91.656250 11060200 71.466436
2001-01-14 93.412500 95.062500 91.662500 93.412500 7470200 72.835824
2001-01-21 100.250000 103.921875 99.218750 102.250000 13851500 79.726621
2001-01-28 109.575000 111.537500 108.675000 110.600000 8056720 86.237303
2001-02-04 113.680000 115.465999 111.734000 113.582001 6538080 88.562436
2001-02-11 113.194002 115.815999 111.639999 113.884001 7269320 88.858876
2001-02-18 113.960002 116.731999 113.238000 115.106000 7225420 89.853021
2001-02-25 109.525002 111.375000 105.424999 107.977501 10722700 84.288436
2001-03-04 103.390001 106.052002 100.386000 103.228001 11982540 80.580924
2001-03-11 105.735999 106.920000 103.364002 104.844002 9226900 81.842391
2001-03-18 95.660001 97.502002 93.185997 94.899998 13863740 74.079992
2001-03-25 90.734000 92.484000 88.598000 90.518001 11382280 70.659356
2001-04-01 95.622000 97.748000 94.274000 96.106001 10467580 75.021411
2001-04-08 95.259999 97.360001 93.132001 94.642000 12312580 73.878595
2001-04-15 98.350000 99.520000 95.327502 97.170000 10218625 75.851980
... ... ... ... ... ... ...
2011-09-25 170.678003 173.695996 169.401996 171.766000 6358100 152.981582
2011-10-02 176.290002 178.850000 174.729999 176.762000 7373680 157.431216
2011-10-09 175.920001 179.200003 174.379999 177.792001 7623560 158.348576
2011-10-16 185.366000 187.732001 184.977997 187.017999 5244180 166.565614
2011-10-23 180.926001 182.052002 178.815997 180.351999 9359200 160.628611
2011-10-30 183.094003 184.742001 181.623996 183.582001 5743800 163.505379
2011-11-06 184.508002 186.067999 183.432004 184.716003 4583780 164.515366
2011-11-13 185.350000 186.690002 183.685999 185.508005 4180620 165.750791
2011-11-20 187.600003 189.101999 185.368002 186.738000 5104420 166.984809
2011-11-27 181.067497 181.997501 178.717499 179.449997 4089350 160.467733
2011-12-04 185.246002 187.182001 184.388000 186.052002 5168720 166.371376
2011-12-11 191.841998 194.141998 191.090002 192.794000 4828580 172.400204
2011-12-18 191.085999 191.537998 187.732001 188.619998 6037220 168.667729
2011-12-25 183.810001 184.634003 181.787997 183.678000 5433360 164.248496
2012-01-01 185.140003 185.989998 183.897499 184.750000 3029925 165.207100
[574 rows x 6 columns]
Get it from Quandl:
import pandas as pd
import quandl
quandl.ApiConfig.api_key = 'xxxxxxxxxxxx' # Optional
quandl.ApiConfig.api_version = '2015-04-09' # Optional
ibm = quandl.get("WIKI/IBM", start_date="2000-01-01", end_date="2012-01-01", collapse="monthly", returns="pandas")