How to use pd.read_csv() on this web page? - python

I am having difficulties using pd.read_csv() on the web page to use the "Download Data" button since I do not see the typical .zip or .csv at the end. What would be the correct url to use to directly download the data with pd.read_csv()?
Link:
https://climate.weather.gc.ca/climate_data/daily_data_e.html?hlyRange=2008-12-22%7C2020-05-24&dlyRange=1999-05-01%7C2020-05-24&mlyRange=2000-06-01%7C2007-11-01&StationID=27211&Prov=AB&urlExtension=_e.html&searchType=stnProx&optLimit=yearRange&StartYear=2000&EndYear=2020&selRowPerPage=25&Line=5&txtRadius=25&optProxType=city&selCity=51%7C2%7C114%7C4%7CCalgary&selPark=&txtCentralLatDeg=&txtCentralLatMin=0&txtCentralLatSec=0&txtCentralLongDeg=&txtCentralLongMin=0&txtCentralLongSec=0&txtLatDecDeg=&txtLongDecDeg=&timeframe=2&Day=24&Year=2019&Month=5#

When you open Firefox developer tools -> Network tab, you will see the URL when you click the download button. (Chrome has something similar too)
import pandas as pd
url = 'https://climate.weather.gc.ca/climate_data/bulk_data_e.html?format=csv&stationID=27211&Year=2019&Month=5&Day=1&timeframe=2&submit=Download+Data'
df = pd.read_csv(url)
print(df)
Prints:
Longitude (x) Latitude (y) Station Name Climate ID Date/Time ... Snow on Grnd Flag Dir of Max Gust (10s deg) Dir of Max Gust Flag Spd of Max Gust (km/h) Spd of Max Gust Flag
0 -114.0 51.11 CALGARY INT'L CS 3031094 2019-01-01 ... NaN 29.0 NaN 44.0 NaN
1 -114.0 51.11 CALGARY INT'L CS 3031094 2019-01-02 ... NaN 27.0 NaN 70.0 NaN
2 -114.0 51.11 CALGARY INT'L CS 3031094 2019-01-03 ... NaN 27.0 NaN 62.0 NaN
3 -114.0 51.11 CALGARY INT'L CS 3031094 2019-01-04 ... NaN 23.0 NaN 66.0 NaN
4 -114.0 51.11 CALGARY INT'L CS 3031094 2019-01-05 ... NaN NaN NaN NaN NaN
.. ... ... ... ... ... ... ... ... ... ... ...
360 -114.0 51.11 CALGARY INT'L CS 3031094 2019-12-27 ... NaN 30.0 NaN 46.0 NaN
361 -114.0 51.11 CALGARY INT'L CS 3031094 2019-12-28 ... NaN NaN NaN NaN NaN
362 -114.0 51.11 CALGARY INT'L CS 3031094 2019-12-29 ... NaN NaN NaN NaN NaN
363 -114.0 51.11 CALGARY INT'L CS 3031094 2019-12-30 ... NaN 27.0 NaN 50.0 NaN
364 -114.0 51.11 CALGARY INT'L CS 3031094 2019-12-31 ... NaN 28.0 NaN 55.0 NaN
[365 rows x 31 columns]

Related

fill one dataframe with values ​from another

I have this Dataframe, which is null values ​​that haven't been populated right.
Unidad Precio Combustible Año_del_vehiculo Caballos \
49 1 1000 Gasolina 1998.0 50.0
63 1 800 Gasolina 1998.0 50.0
88 1 600 Gasolina 1999.0 54.0
107 1 3100 Diésel 2008.0 54.0
244 1 2000 Diésel 1995.0 60.0
... ... ... ... ... ...
46609 1 47795 Gasolina 2016.0 420.0
46770 1 26900 Gasolina 2011.0 450.0
46936 1 19900 Gasolina 2007.0 510.0
46941 1 24500 Gasolina 2006.0 514.0
47128 1 79600 Gasolina 2017.0 612.0
Comunidad_autonoma Marca_y_Modelo Año_Venta Año_Comunidad \
49 Islas Baleares CITROEN AX 2020 2020Islas Baleares
63 Islas Baleares SEAT Arosa 2021 2021Islas Baleares
88 Islas Baleares FIAT Seicento 2020 2020Islas Baleares
107 La Rioja TOYOTA Aygo 2020 2020La Rioja
244 Aragón PEUGEOT 205 2019 2019Aragón
... ... ... ... ...
46609 La Rioja PORSCHE Cayenne 2020 2020La Rioja
46770 Cataluña AUDI RS5 2020 2020Cataluña
46936 Islas Baleares MERCEDES-BENZ Clase M 2020 2020Islas Baleares
46941 La Rioja MERCEDES-BENZ Clase E 2020 2020La Rioja
47128 Islas Baleares MERCEDES-BENZ Clase E 2021 2021Islas Baleares
Fecha Año Super_95 Diesel Comunidad Salario en euros anuales
49 2020-12-01 NaN NaN NaN NaN NaN
63 2021-01-01 NaN NaN NaN NaN NaN
88 2020-12-01 NaN NaN NaN NaN NaN
107 2020-12-01 NaN NaN NaN NaN NaN
244 2019-03-01 NaN NaN NaN NaN NaN
... ... ... ... ... ... ...
46609 2020-12-01 NaN NaN NaN NaN NaN
46770 2020-07-01 NaN NaN NaN NaN NaN
46936 2020-10-01 NaN NaN NaN NaN NaN
46941 2020-11-01 NaN NaN NaN NaN NaN
47128 2021-01-01 NaN NaN NaN NaN NaN
I need to fill the gasoline, diesel and salary tables with the values ​​of the following:
Año Super_95 Diesel Comunidad Año_Comunidad Fecha \
0 2020 1.321750 1.246000 Navarra 2020Navarra 2020-01-01
1 2020 1.301000 1.207250 Navarra 2020Navarra 2020-02-01
2 2020 1.224800 1.126200 Navarra 2020Navarra 2020-03-01
3 2020 1.106667 1.020000 Navarra 2020Navarra 2020-04-01
4 2020 1.078750 0.986250 Navarra 2020Navarra 2020-05-01
.. ... ... ... ... ... ...
386 2021 1.416600 1.265000 La rioja 2021La rioja 2021-08-01
387 2021 1.431000 1.277000 La rioja 2021La rioja 2021-09-01
388 2021 1.474000 1.344000 La rioja 2021La rioja 2021-10-01
389 2021 1.510200 1.382000 La rioja 2021La rioja 2021-11-01
390 2021 1.481333 1.348667 La rioja 2021La rioja 2021-12-01
Salario en euros anuales
0 27.995,96
1 27.995,96
2 27.995,96
3 27.995,96
4 27.995,96
.. ...
386 21.535,29
387 21.535,29
388 21.535,29
389 21.535,29
390 21.535,29
It would fill the columns of the first with the second when the year_community table matches. for example in the nan where 2020Islas Baleares appears in the same row. fill in with the value of the price of gasoline from the other table where 2020Islas Baleares appears in the same row. In the case that it is 2020aragon, it would be with 2020 aragon and so on. I had thought of something like this:
analisis['Super_95'].fillna(analisis2['Super_95'].apply(lambda x: x if x=='2020Islas Baleares' else np.nan), inplace=True)
the second dataframe is the result of doing a merge, and those null values ​​have not worked
df1.merge(df2, on='Año_Comunidad')
As a result you'll have one DataFrame where columns with same names will have a suffix _x for first DataFrame and _y for the second one.
Now to fill in the blanks you can do this for each column:
df1.loc[df1["Año_x"].isnull(),'Año_x'] = df1["Año_y"]
If a row in Año is empty, it will be filled with data from second table that we merged earlier.
You can do it in a cycle for all the columns:
cols = ['Año', 'Super_95', 'Diesel', 'Comunidad', 'Salario en euros anuales']
for col in cols:
df1.loc[df1[col+"_x"].isnull(), col+'_x'] = df1[col+'_y']
And finally you can drop the merged columns:
for col in cols:
df1 = df1.drop(col+'_y', axis=1)

Python plotting time-series data?

I would like to plot the below data using matplotlib. I am confused as to how to group the data. I would like to plot Amps against time for each rectifier. Preferably the x axis should be months and the y axis should be Amps. Below is a sample of data.
Rectifier Date Amps Volts
9E220ECP5001 2015-01-01 31.95 11.1
9E220ECP5001 2015-02-01 NaN NaN
9E220ECP5001 2015-03-01 31.05 11.3
9E220ECP5001 2015-04-01 NaN NaN
9E220ECP5001 2015-05-01 30.45 12.2
... ... ... ...
9E220ECP5018 2021-08-01 NaN NaN
9E220ECP5018 2021-09-01 17.4 11.6
9E220ECP5018 2021-10-01 NaN NaN
9E220ECP5018 2021-11-01 NaN NaN
9E220ECP5018 2021-12-01 NaN NaN

Simple web scrape issues

Apologies if this question is elementary, but I'm a newbie to scraping and am trying to perform a simple scrape of NFL Future prices off of a website, but am not having any luck. My code is below. At this point, I'm just trying to get something/anything to return (ultimately will pull the text of the team names and futures prices), but this code returns "None" and "[]" (an empty list) for the find and find_all functions, respectively. I get the find/find_all parameters by inspecting the first line of the page (Baltimore Ravens) when I see that the team names are held in a span with the class of "style_label__2KJur".
I suspect this has something to do with how the html is loaded. When I print(nfl_futures), I don't see any of the html that I inspected for the first line which is presumably why I get no results. If this is true, how do I expose all of the html I need in order to scrape this data?
Appreciate the help.
import requests
from bs4 import BeautifulSoup
url = "https://www.pinnacle.com/en/football/nfl/matchups#futures"
r = requests.get(url).content
nfl_futures = BeautifulSoup(r, "lxml")
first_line = nfl_futures.find('span', class_="style_label__2KJur")
lines = nfl_futures.find_all('span', class_="style_label__2KJur")
print(first_line)
print(lines)
Output:
None
[]
Process finished with exit code 0
This site is hardly a simple scrape. The page is dynamic. You could use selenium to first render the page, then grab the html to parse with bs4. Or as stated, grab the dat from the api, but then you need to do a little data manipulation to join them. I always like going the api method as it's robust and more efficient.
import requests
import pandas as pd
url = 'https://www.pinnacle.com/config/app.json'
jsonData = requests.get(url).json()
x_api_key = jsonData['api']['haywire']['apiKey']
headers = {
'X-API-Key': x_api_key}
matchups_url = "https://guest.api.arcadia.pinnacle.com/0.1/leagues/889/matchups"
jsonData_matchups = requests.get(matchups_url, headers=headers).json()
df = pd.json_normalize(jsonData_matchups,
record_path = ['participants'],
meta = ['id','type',['special', 'category'],['special', 'description']],
meta_prefix = 'participants.',
errors='ignore')
df['id'] = df['id'].fillna(0).astype(int).astype(str)
df['participants.id'] = df['participants.id'].fillna(0).astype(int).astype(str)
df = df.rename(columns={'id':'participantId','participants.id':'matchupId'})
df_matchups = df[df['participants.type'] == 'matchup']
df_special = df[df['participants.type'] == 'special']
straight_url = 'https://guest.api.arcadia.pinnacle.com/0.1/leagues/889/markets/straight'
jsonData_straight = requests.get(straight_url, headers=headers).json()
df_straight = pd.json_normalize(jsonData_straight,
record_path = ['prices'],
meta = ['type', 'matchupId'],
errors='ignore')
df_straight['matchupId'] = df_straight['matchupId'].fillna(0).astype(int).astype(str)
df_straight['participantId'] = df_straight['participantId'].fillna(0).astype(int).astype(str)
df_filter = df_straight[df_straight['designation'].isin(['home','away','over','under'])]
df_filter = df_filter.pivot_table(index=['matchupId', 'participantId'],
columns='designation',
values=['points','price']).reset_index(drop=False)
df_filter.columns = ['.'.join(x) if x[-1] != '' else x[0] for x in df_filter.columns]
nfl_futures = pd.merge(df_special, df_straight, how='left', left_on=['matchupId', 'participantId'], right_on=['matchupId', 'participantId'])
nfl_matchups = pd.merge(df_matchups, df_filter, how='left', left_on=['matchupId', 'participantId'], right_on=['matchupId', 'participantId'])
Output:
Here's what the first 5 rows of 324 rows looks like for futures:
print(nfl_futures.head(10).to_string())
alignment participantId name order rotation matchupId participants.type participants.special.category participants.special.description points price designation type
0 neutral 1326753860 Over 0 3017.0 1326753859 special Regular Season Wins Dallas Cowboys Regular Season Wins? 9.5 108 NaN total
1 neutral 1326753861 Under 0 3018.0 1326753859 special Regular Season Wins Dallas Cowboys Regular Season Wins? 9.5 -129 NaN total
2 neutral 1336218775 Trevor Lawrence 0 5801.0 1336218774 special NFL Offensive Rookie of the Year NFL Offensive Rookie of the Year 2021-22? NaN 312 NaN moneyline
3 neutral 1336218776 Justin Fields 0 5802.0 1336218774 special NFL Offensive Rookie of the Year NFL Offensive Rookie of the Year 2021-22? NaN 461 NaN moneyline
4 neutral 1336218777 Zach Wilson 0 5803.0 1336218774 special NFL Offensive Rookie of the Year NFL Offensive Rookie of the Year 2021-22? NaN 790 NaN moneyline
5 neutral 1336218778 Trey Lance 0 5804.0 1336218774 special NFL Offensive Rookie of the Year NFL Offensive Rookie of the Year 2021-22? NaN 655 NaN moneyline
6 neutral 1336218779 Mac Jones 0 5805.0 1336218774 special NFL Offensive Rookie of the Year NFL Offensive Rookie of the Year 2021-22? NaN 807 NaN moneyline
7 neutral 1336218780 Kyle Pitts 0 5806.0 1336218774 special NFL Offensive Rookie of the Year NFL Offensive Rookie of the Year 2021-22? NaN 1095 NaN moneyline
8 neutral 1336218781 Najee Harris 0 5807.0 1336218774 special NFL Offensive Rookie of the Year NFL Offensive Rookie of the Year 2021-22? NaN 1015 NaN moneyline
9 neutral 1336218782 DeVonta Smith 0 5808.0 1336218774 special NFL Offensive Rookie of the Year NFL Offensive Rookie of the Year 2021-22? NaN 1903 NaN moneyline
And here is week 1 matchup lines:
print(nfl_matchups.to_string())
alignment participantId name order rotation matchupId participants.type participants.special.category participants.special.description points.away points.home points.over points.under price.away price.home price.over price.under
0 home 0 Tampa Bay Buccaneers 1 NaN 1327265167 matchup NaN NaN 6.5 -6.5 51.5 51.5 107.0 -118.0 101.0 -112.0
1 away 0 Dallas Cowboys 0 NaN 1327265167 matchup NaN NaN 6.5 -6.5 51.5 51.5 107.0 -118.0 101.0 -112.0
2 home 0 Washington Football Team 1 NaN 1327265554 matchup NaN NaN 0.0 0.0 44.5 44.5 -115.0 104.0 -106.0 -106.0
3 away 0 Los Angeles Chargers 0 NaN 1327265554 matchup NaN NaN 0.0 0.0 44.5 44.5 -115.0 104.0 -106.0 -106.0
4 home 0 Detroit Lions 1 NaN 1327265774 matchup NaN NaN -7.5 7.5 46.0 46.0 101.0 -111.0 -106.0 -106.0
5 away 0 San Francisco 49ers 0 NaN 1327265774 matchup NaN NaN -7.5 7.5 46.0 46.0 101.0 -111.0 -106.0 -106.0
6 home 0 Las Vegas Raiders 1 NaN 1327266134 matchup NaN NaN -4.5 4.5 51.0 51.0 -110.0 -100.0 -106.0 -106.0
7 away 0 Baltimore Ravens 0 NaN 1327266134 matchup NaN NaN -4.5 4.5 51.0 51.0 -110.0 -100.0 -106.0 -106.0
8 home 0 Los Angeles Rams 1 NaN 1327266054 matchup NaN NaN 7.5 -7.5 45.0 45.0 -114.0 103.0 -106.0 -106.0
9 away 0 Chicago Bears 0 NaN 1327266054 matchup NaN NaN 7.5 -7.5 45.0 45.0 -114.0 103.0 -106.0 -106.0
10 home 0 Kansas City Chiefs 1 NaN 1327265828 matchup NaN NaN 6.0 -6.0 52.5 52.5 102.0 -112.0 -106.0 -106.0
11 away 0 Cleveland Browns 0 NaN 1327265828 matchup NaN NaN 6.0 -6.0 52.5 52.5 102.0 -112.0 -106.0 -106.0
12 home 0 Carolina Panthers 1 NaN 1327265337 matchup NaN NaN 4.0 -4.0 43.0 43.0 -105.0 -105.0 -106.0 -106.0
13 away 0 New York Jets 0 NaN 1327265337 matchup NaN NaN 4.0 -4.0 43.0 43.0 -105.0 -105.0 -106.0 -106.0
14 home 0 Cincinnati Bengals 1 NaN 1327265711 matchup NaN NaN -3.5 3.5 48.0 48.0 -105.0 -105.0 -106.0 -106.0
15 away 0 Minnesota Vikings 0 NaN 1327265711 matchup NaN NaN -3.5 3.5 48.0 48.0 -105.0 -105.0 -106.0 -106.0
16 home 0 New Orleans Saints 1 NaN 1327266000 matchup NaN NaN -2.5 2.5 50.0 50.0 -118.0 107.0 -106.0 -106.0
17 away 0 Green Bay Packers 0 NaN 1327266000 matchup NaN NaN -2.5 2.5 50.0 50.0 -118.0 107.0 -106.0 -106.0
18 home 0 Buffalo Bills 1 NaN 1327265283 matchup NaN NaN 7.0 -7.0 50.0 50.0 -116.0 105.0 -106.0 -106.0
19 away 0 Pittsburgh Steelers 0 NaN 1327265283 matchup NaN NaN 7.0 -7.0 50.0 50.0 -116.0 105.0 -106.0 -106.0
20 home 0 Tennessee Titans 1 NaN 1327265444 matchup NaN NaN 3.0 -3.0 51.0 51.0 -102.0 -108.0 -116.0 104.0
21 away 0 Arizona Cardinals 0 NaN 1327265444 matchup NaN NaN 3.0 -3.0 51.0 51.0 -102.0 -108.0 -116.0 104.0
22 home 0 New York Giants 1 NaN 1327265931 matchup NaN NaN -1.0 1.0 42.5 42.5 -110.0 100.0 -106.0 -106.0
23 away 0 Denver Broncos 0 NaN 1327265931 matchup NaN NaN -1.0 1.0 42.5 42.5 -110.0 100.0 -106.0 -106.0
24 home 0 Atlanta Falcons 1 NaN 1327265598 matchup NaN NaN 3.5 -3.5 48.0 48.0 -108.0 -102.0 -106.0 -105.0
25 away 0 Philadelphia Eagles 0 NaN 1327265598 matchup NaN NaN 3.5 -3.5 48.0 48.0 -108.0 -102.0 -106.0 -105.0
26 home 0 Indianapolis Colts 1 NaN 1327265657 matchup NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
27 away 0 Seattle Seahawks 0 NaN 1327265657 matchup NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
28 home 0 New England Patriots 1 NaN 1327265876 matchup NaN NaN 2.5 -2.5 45.5 45.5 104.0 -115.0 103.0 -115.0
29 away 0 Miami Dolphins 0 NaN 1327265876 matchup NaN NaN 2.5 -2.5 45.5 45.5 104.0 -115.0 103.0 -115.0
try to use the html.parser instead of the lxml. Also, try to print your nfl_futures variable to check if you are getting an html page.
If that is the case then check inside the html code if the element(s) that your are looking for exist.

Getting “no table found” error when web scraping with pandas for OTC Markets screener website

I want to extract various statistics from this website(https://www.otcmarkets.com/research/stock-screener). Unfortunately, pandas do not recognize the tables presented. Here is my code:
import requests
import pandas as pd
header = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.164 Safari/537.36'}
def Get_table(screen):
tables = pd.read_html(screen)
tables.columns = tables.iloc[0]
return tables
screen = requests.get('https://www.otcmarkets.com/research/stock-screener', headers = header).text
table = Get_table(screen)
ValueError: No tables found
The page loads the data from external source (URL). You can use this example how to load the data from API and create a dataframe:
import json
import pandas as pd
url = "https://www.otcmarkets.com/research/stock-screener/api"
data = json.loads(requests.get(url).json())
df = pd.json_normalize(data["stocks"])
Prints:
securityId reportDate symbol securityName market marketId securityType country state forexCountry caveatEmptor industryId industry volume volumeChange dividendYield dividendPayer morningStarRating penny price shortInterest shortInterestPercent shortInterestRatio pct1Day pct5Day pct4Weeks pct13Weeks pct52Weeks isBank perfQxComp4Weeks perfQxComp13Weeks perfQxComp52Weeks perfQxBillion4Weeks perfQxBillion13Weeks perfQxBillion52Weeks perfQxBanks4Weeks perfQxBanks13Weeks perfQxBanks52Weeks perfQxIntl4Weeks perfQxIntl13Weeks perfQxIntl52Weeks perfQxUs4Weeks perfQxUs13Weeks perfQxUs52Weeks perfQb4Weeks perfQb13Weeks perfQb52Weeks perfSp4Weeks perfSp13Weeks perfSp52Weeks perfQxDiv4Weeks perfQxDiv13Weeks perfQxDiv52Weeks perfQxCan4Weeks perfQxCan13Weeks perfQxCan52Weeks
0 117230 Aug 3, 2021 12:00:00 AM MHGU MERITAGE HOSPTLTY GRP INC OTCQX U.S. Premier 1 Common Stock USA Michigan USA False 5812 Eating places 216 0.625623 1.122500 True 3.0 True 21.3800 171.0 100.00 0.000025 -0.003263 -0.003263 -0.049778 -0.021510 0.388312 N -1.934711 -0.346011 1.087740 -1.748822 -0.327350 1.101413 -11.044760 -0.532955 0.755842 -1.887983 -0.284616 1.105790 2.474310 0.094201 0.557874 1.046712 0.172299 1.418802 -6.221993 -0.463678 1.170963 -1.515941 -0.269834 1.011993 0.034056 0.013836 0.026113
1 130262 Aug 3, 2021 12:00:00 AM MHGUP MERITAGE HOSPTLTY PFD B OTCQX U.S. Premier 1 Preferred Stock USA Michigan USA False 5812 Eating places 0 2.984908 2.100000 True NaN True 38.0000 NaN NaN 0.000000 NaN NaN NaN NaN NaN N NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 32227 Aug 3, 2021 12:00:00 AM TYCB TAYLOR(CLVN B)BKG BRLN MD OTCQX U.S. Premier 1 Common Stock USA Maryland USA False 6712 Bank holding companies 1 0.867442 3.300000 True 3.0 True 35.1000 NaN NaN 0.000000 NaN NaN NaN NaN NaN Y NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 31499 Aug 3, 2021 12:00:00 AM STBI STURGIS BANCORP INC OTCQX U.S. Premier 1 Common Stock USA Michigan USA False 6035 Federal savings institutions 1000 1.142256 3.355200 True 3.0 True 19.0750 NaN NaN 0.000000 -0.008576 -0.002614 0.003947 -0.046250 -0.046250 Y 0.153422 -0.743970 -0.129556 0.138681 -0.703847 -0.131184 0.875847 -1.145925 -0.090025 0.149717 -0.611962 -0.131705 -0.196212 0.202544 -0.066446 -0.083004 0.370465 -0.168987 0.493403 -0.996969 -0.139468 0.120214 -0.580178 -0.120534 -0.002701 0.029750 -0.003110
4 27295 Aug 3, 2021 12:00:00 AM PSBP PSB HOLDING CORP OTCQX U.S. Premier 1 Common Stock USA Maryland USA False 6022 State commercial banks 0 5.595744 0.645856 False 3.0 True 27.8700 19.0 -96.31 0.000012 NaN NaN NaN NaN NaN Y NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
5 24830 Aug 3, 2021 12:00:00 AM OCBI ORANGE CTY BNCRP INC OTCQX U.S. Premier 1 Common Stock USA New York USA False 6712 Bank holding companies 5 0.109266 2.352900 True 3.0 True 34.0000 200.0 100.00 0.000045 NaN NaN NaN NaN NaN Y NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
6 20776 Aug 3, 2021 12:00:00 AM MNBP MARS BANCORP INC OTCQX U.S. Premier 1 Common Stock USA Pennsylvania USA False 6021 National commercial banks 139 1.306208 3.084700 True 3.0 True 20.7475 NaN NaN 0.000000 0.004722 0.037375 -0.090022 -0.949396 -0.943158 Y -3.498879 -15.271837 -2.641976 -3.162702 -14.448208 -2.675186 -19.974187 -23.522956 -1.835841 -3.414373 -12.562040 -2.685816 4.474732 4.157719 -1.355001 1.892953 7.604722 -3.446082 -11.252328 -20.465275 -2.844113 -2.741543 -11.909604 -2.457995 0.061590 0.610687 -0.063426
7 83455 Aug 3, 2021 12:00:00 AM MNAT MARQUETTE NATL CORP OTCQX U.S. Premier 1 Common Stock USA Illinois USA False 6022 State commercial banks 0 0.978290 2.934800 True 3.0 True 36.8000 49.0 68.97 0.000011 NaN NaN NaN NaN NaN Y NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
8 18948 Aug 3, 2021 12:00:00 AM KISB KISH BANCORP INC OTCQX U.S. Premier 1 Common Stock USA Pennsylvania USA False 6712 Bank holding companies 173 0.440998 3.411800 True 3.0 True 34.0000 2.0 0.00 0.000001 0.008005 0.011905 0.035954 0.054264 0.387755 Y 1.397410 0.872875 1.086181 1.263146 0.825800 1.099834 7.977452 1.344475 0.754759 1.363660 0.717994 1.104205 -1.787155 -0.237638 0.557074 -0.756023 -0.434654 1.416768 4.494046 1.169710 1.169285 1.094940 0.680704 1.010542 -0.024598 -0.034904 0.026076
9 266615 Aug 3, 2021 12:00:00 AM KCLI KANSAS CITY LIFE INS NEW OTCQX U.S. Premier 1 Common Stock USA Missouri USA False 6311 Life insurance 106 0.210918 2.494200 True 3.0 True 43.3000 719.0 0.00 0.000074 0.000000 0.000000 -0.029148 -0.048352 0.503472 N -1.132893 -0.777777 1.410328 -1.024044 -0.735830 1.428056 -6.467393 -1.197997 0.980000 -1.105531 -0.639770 1.433731 1.448862 0.211748 0.723321 0.612915 0.387300 1.839572 -3.643364 -1.042273 1.518232 -0.887678 -0.606542 1.312116 0.019942 0.031102 0.033858
10 12485 Aug 3, 2021 12:00:00 AM FBAK FIRST NB ALASKA OTCQX U.S. Premier 1 Common Stock USA Alaska USA False 6021 National commercial banks 360 1.697465 5.493600 True 3.0 True 233.0000 62.0 37.78 0.000020 -0.004274 0.040179 0.000000 -0.046879 0.308989 Y NaN -0.754085 0.865540 NaN -0.713417 0.876420 NaN -1.161505 0.601442 NaN -0.620282 0.879903 NaN 0.205298 0.443913 NaN 0.375502 1.128974 NaN -1.010525 0.931763 NaN -0.588067 0.805266 NaN 0.030154 0.020779
11 11749 Aug 3, 2021 12:00:00 AM FETM FENTURA FINANCIAL INC OTCQX U.S. Premier 1 Common Stock USA Michigan USA False 6022 State commercial banks 19144 0.671837 1.230000 True 3.0 True 26.0000 185.0 -86.89 0.000040 -0.009524 0.006581 0.000000 0.037924 0.477273 Y NaN 0.610042 1.336938 NaN 0.577141 1.353744 NaN 0.939637 0.929004 NaN 0.501797 1.359123 NaN -0.166082 0.685681 NaN -0.303775 1.743845 NaN 0.817497 1.439227 NaN 0.475736 1.243837 NaN -0.024394 0.032096
12 10994 Aug 3, 2021 12:00:00 AM ENBP ENB FINANCIAL CORP PA OTCQX U.S. Premier 1 Common Stock USA Pennsylvania USA False 6021 National commercial banks 20 0.517902 2.912200 True 3.0 True 23.3500 NaN NaN 0.000000 NaN NaN NaN NaN NaN Y NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
13 8482 Aug 3, 2021 12:00:00 AM CNIG CORNING NATURAL GAS HLDG OTCQX U.S. Premier 1 Common Stock USA New York USA False 4923 Gas transmission and distribution 260 0.173180 2.510000 True 3.0 True 24.3000 1.0 0.00 0.000000 0.019723 0.022727 0.023158 0.031847 0.494465 N 0.900077 0.512288 1.385097 0.813596 0.484660 1.402508 5.138305 0.789068 0.962468 0.878338 0.421389 1.408081 -1.151112 -0.139469 0.710380 -0.486957 -0.255097 1.806662 2.894630 0.686500 1.491071 0.705254 0.399503 1.288642 -0.015844 -0.020485 0.033252
14 6722 Aug 3, 2021 12:00:00 AM CBAF CITBA FINANCIAL CORP OTCQX U.S. Premier 1 Common Stock USA Indiana USA False 6712 Bank holding companies 0 0.317975 2.142900 True 3.0 True 28.0000 3.0 0.00 0.000002 NaN NaN NaN NaN NaN Y NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
15 5489 Aug 3, 2021 12:00:00 AM CPTP CAPITAL PROPERTIES INC A OTCQX U.S. Premier 1 Common Stock USA Rhode Island USA False 6519 Lessors of Real Property, NEC 0 1.535617 2.002900 True 3.0 True 13.9800 NaN NaN 0.000000 NaN NaN NaN NaN NaN N NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
16 107523 Aug 3, 2021 12:00:00 AM BHWB BLACKHAWK BANCORP INC OTCQX U.S. Premier 1 Common Stock USA Wisconsin USA False 6022 State commercial banks 200 3.166295 1.239400 True 3.0 True 35.5000 116.0 24.73 0.000041 0.000000 0.014286 0.021583 0.109375 0.783920 Y 0.838855 1.759389 2.195918 0.758257 1.664503 2.223521 4.788806 2.709957 1.525887 0.818595 1.447207 2.232357 -1.072816 -0.478989 1.126229 -0.453835 -0.876100 2.864263 2.697743 2.357698 2.363928 0.657284 1.372043 2.043000 -0.014766 -0.070354 0.052718
17 1888 Aug 3, 2021 12:00:00 AM CFNB CALIFORNIA FIRST LEASING OTCQX U.S. Premier 1 Common Stock USA California USA False 6172 Finance Lessors 0 0.095971 2.918900 True 3.0 True 18.5000 1002.0 0.00 0.000097 NaN NaN NaN NaN NaN Y NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
18 53651 Aug 3, 2021 12:00:00 AM BNCC BNCCORP INC OTCQX U.S. Premier 1 Common Stock USA North Dakota USA False 6021 National commercial banks 623 0.444949 0.000000 True 3.0 True 39.2500 103.0 0.00 0.000029 0.006410 0.037265 0.012903 0.014212 0.365217 Y 0.501509 0.228610 1.023048 0.453324 0.216281 1.035908 2.862985 0.352124 0.710890 0.489397 0.188046 1.040024 -0.641382 -0.062239 0.524695 -0.271325 -0.113838 1.334421 1.612844 0.306353 1.101322 0.392957 0.178280 0.951806 -0.008828 -0.009142 0.024560
19 7590 Aug 3, 2021 12:00:00 AM CNAF COMML NATL FINCL CORP PA OTCQX U.S. Premier 1 Common Stock USA Pennsylvania USA False 6022 State commercial banks 3100 0.671460 9.951200 True 3.0 True 20.5000 378.0 -70.14 0.000132 0.000000 0.014851 0.006382 0.072737 0.138889 Y 0.248046 1.170032 0.389056 0.224214 1.106931 0.393947 1.416032 1.802181 0.270345 0.242055 0.962425 0.395512 -0.317228 -0.318538 0.199537 -0.134197 -0.582626 0.507468 0.797712 1.567921 0.418823 0.194356 0.912439 0.361963 -0.004366 -0.046787 0.009340

How to recalculate values row by row in pandas data frame when multiple groupby() and shift() is necessary?

I have a multiindex dataframe. There is a column in it - Shares - that should be calculated row by row, based on Equity column values from previous index.
I tried to play around with defining a function to be able to apply() to the data frame row by row, but I realized I cannot use neither groupby() nor shift() with this method.
I created the dataframe:
import pandas as pd
import numpy as np
date_index = pd.date_range(start='1/1/2019', end='1/10/2019')
symbol_index = ['AAPL','BOA','GE','MSFT']
idx = pd.MultiIndex.from_product([date_index, symbol_index], names=['Date', 'Symbol'])
col = ['Price', 'Shares', 'Profit','Total_Profit', 'Equity']
data = pd.DataFrame(index=idx,columns=col)
price_list = [46, 17, 56, 66, 54, 79, 33, 63, 60, 63, 39, 26]
data['Price'] = price_list
My initial dataframe looks like this:
Price Shares Profit Total_Profit Equity
Date Symbol
2019-01-01 AAPL 46 NaN NaN NaN NaN
BOA 17 NaN NaN NaN NaN
GE 56 NaN NaN NaN NaN
MSFT 66 NaN NaN NaN NaN
2019-01-02 AAPL 54 NaN NaN NaN NaN
BOA 79 NaN NaN NaN NaN
GE 33 NaN NaN NaN NaN
MSFT 63 NaN NaN NaN NaN
2019-01-03 AAPL 60 NaN NaN NaN NaN
BOA 63 NaN NaN NaN NaN
GE 39 NaN NaN NaN NaN
MSFT 26 NaN NaN NaN NaN
I need these variables:
starting_capital = 5000
risk_per_position = 0.1
And I defined the columns:
data['Shares'] = data.groupby('Symbol')['Equity'].shift(1).fillna(starting_capital) * risk_per_position / data['Price']
data['Shares'] = round(data['Shares'],0)
data['Profit'] = data['Shares'] * data['Price']
data['Total_Profit'] = data.groupby(by=['Date','Symbol'])['Profit'].sum().groupby('Date').cumsum().groupby('Date').tail(1).cumsum()
data['Total_Profit'] = data['Total_Profit'].bfill()
data['Equity'] = starting_capital + data['Total_Profit']
data['previous equity'] = data.groupby('Symbol')['Equity'].shift(1).fillna(starting_capital)
Shares at date_index - and consequently Profit, Total_Profit and Equity as well - should be calculated based on Equity value at previous_date_index. However, it is now always calculated based on starting_capital and the output is:
Price Shares Profit Total_Profit Equity
Date Symbol
2019-01-01 AAPL 46 11.0 506.0 2031.0 7031.0
BOA 17 29.0 493.0 2031.0 7031.0
GE 56 9.0 504.0 2031.0 7031.0
MSFT 66 8.0 528.0 2031.0 7031.0
2019-01-02 AAPL 54 9.0 486.0 3990.0 8990.0
BOA 79 6.0 474.0 3990.0 8990.0
GE 33 15.0 495.0 3990.0 8990.0
MSFT 63 8.0 504.0 3990.0 8990.0
2019-01-03 AAPL 60 8.0 480.0 5975.0 10975.0
BOA 63 8.0 504.0 5975.0 10975.0
GE 39 13.0 507.0 5975.0 10975.0
MSFT 26 19.0 494.0 5975.0 10975.0
And the output should be:
Price Shares Profit Total_Profit Equity
Date Symbol
2019-01-01 AAPL 46 11.0 506.0 2031.0 7031.0
BOA 17 29.0 493.0 2031.0 7031.0
GE 56 9.0 504.0 2031.0 7031.0
MSFT 66 8.0 528.0 2031.0 7031.0
2019-01-02 AAPL 54 13.0 702.0 4830.0 9830.0
BOA 79 9.0 711.0 4830.0 9830.0
GE 33 21.0 693.0 4830.0 9830.0
MSFT 63 11.0 693.0 4830.0 9830.0
2019-01-03 AAPL 60 16.0 960.0 8761.0 13761.0
BOA 63 16.0 1008.0 8761.0 13761.0
GE 39 25.0 975.0 8761.0 13761.0
MSFT 26 38.0 988.0 8761.0 13761.0
I would appreciate your help. What is the correct formula for column Shares in this case?
data['Shares'] = data.['Equity'].shift(-1).groupby('Symbol').fillna(starting_capital) *
risk_per_position / data['Price']
Try shifting shifting 'Equity' column by -1 and then performing group by.
I found a solution to my question. This will do the trick:
def portfolio_calc(row):
global starting_capital
row['Shares'] = starting_capital * risk_per_position / row['Price']
row['Shares'] = round(row['Shares'].astype(float), 0)
row['Profit'] = row['Shares'] * row['Price']
row['Total_Profit'] = row['Profit'].sum()
row['Equity'] = starting_capital + row['Total_Profit']
starting_capital += row['Profit'].sum()
return row
data = data.groupby('Date').apply(portfolio_calc)
The only difference we have here is that the output at Total_Profit will include the sum of Profit for the given date but not the cumulative sum of Profits for all dates.

Categories