How to duplicate rows on DataFrame based on most recent row date - python

My data looks something like this:
Report Date
Location
Data
8/6/2021
St. Louis
100
8/1/2021
St. Louis
89
7/29/2021
St. Louis
85
7/24/2021
St. Louis
80
7/30/2021
Louisville
92
7/25/2021
Louisville
79
But when I plot the data in plotly using the built-in animation_groups and animation_frames the slider bar jumps from row to row by nature, which doesn't lead to an intuitive animation when each 'jump' is not the same amount of days.
What I'm trying to work-around and do is create a new table, which duplicates rows and keeps the true report data, but creates an additional 'animation date' to keep the slider bar transition intuitive. I'd like the new data table to look something like the below. Assume the date the code was ran was 8/6/2021.
Report Date
Animation Date
Location
Data
Days Since Most Recent Report
8/6/2021
8/6/2021
St. Louis
100
0
8/1/2021
8/5/2021
St. Louis
89
4
8/1/2021
8/4/2021
St. Louis
89
3
8/1/2021
8/3/2021
St. Louis
89
2
8/1/2021
8/2/2021
St. Louis
89
1
8/1/2021
8/1/2021
St. Louis
89
0
7/29/2021
7/30/2021
St. Louis
85
1
7/29/2021
7/29/2021
St. Louis
85
0
7/24/2021
7/28/2021
St. Louis
80
4
7/24/2021
7/27/2021
St. Louis
80
3
7/24/2021
7/26/2021
St. Louis
80
2
7/24/2021
7/25/2021
St. Louis
80
1
7/24/2021
7/24/2021
St. Louis
80
0
7/30/2021
8/6/2021
Louisville
92
7
7/30/2021
8/5/2021
Louisville
92
6
7/30/2021
8/4/2021
Louisville
92
5
7/30/2021
8/3/2021
Louisville
92
4
7/30/2021
8/2/2021
Louisville
92
3
7/30/2021
8/1/2021
Louisville
92
2
7/30/2021
7/31/2021
Louisville
92
1
7/30/2021
7/30/2021
Louisville
92
0
7/25/2021
7/29/2021
Louisville
79
4
7/25/2021
7/28/2021
Louisville
79
3
7/25/2021
7/27/2021
Louisville
79
2
7/25/2021
7/26/2021
Louisville
79
1
7/25/2021
7/25/2021
Louisville
79
0
By doing this, the animation could display 'Days Since Most Recent Report' or 'Report Date' to show that as the animation plays, some data displayed might have some antiquity to it, but the animation traverses through time appropriately and there is data displayed throughout the animation. Each time the 'Animation Date' matches up with a 'Report Date' a new bit of data will be displayed for each 'Animation Date' until a new 'Report Date' is hit and the cycle repeats itself til the animation is brought up to the present day.
If there is any easier way to work around this in plotly, please let me know! Otherwise, I'm having trouble getting off the ground with the logic creating a new DataFrame while iterating through the old DataFrame.

IIUC you can reindex through pd.MultiIndex.from_tuples:
df["Animation Date"] = pd.to_datetime(df["Report Date"])
max_date = df["Report Date"].max()
idx = pd.MultiIndex.from_tuples([[x, d] for x, y in df.groupby("Location")["Animation Date"]
for d in pd.date_range(min(y), max_date)],
names=["Location", "Animation Date"])
s = df.set_index(["Location", "Animation Date"]).reindex(idx).reset_index()
s["Days Since"] = s.groupby(["Location", s.Data.notnull().cumsum()]).cumcount()
print (s.ffill())
Location Animation Date Report Date Data Days Since
0 Louisville 2021-07-25 7/25/2021 79.0 0
1 Louisville 2021-07-26 7/25/2021 79.0 1
2 Louisville 2021-07-27 7/25/2021 79.0 2
3 Louisville 2021-07-28 7/25/2021 79.0 3
4 Louisville 2021-07-29 7/25/2021 79.0 4
5 Louisville 2021-07-30 7/30/2021 92.0 0
6 Louisville 2021-07-31 7/30/2021 92.0 1
7 Louisville 2021-08-01 7/30/2021 92.0 2
8 Louisville 2021-08-02 7/30/2021 92.0 3
9 Louisville 2021-08-03 7/30/2021 92.0 4
10 Louisville 2021-08-04 7/30/2021 92.0 5
11 Louisville 2021-08-05 7/30/2021 92.0 6
12 Louisville 2021-08-06 7/30/2021 92.0 7
13 St. Louis 2021-07-24 7/24/2021 80.0 0
14 St. Louis 2021-07-25 7/24/2021 80.0 1
15 St. Louis 2021-07-26 7/24/2021 80.0 2
16 St. Louis 2021-07-27 7/24/2021 80.0 3
17 St. Louis 2021-07-28 7/24/2021 80.0 4
18 St. Louis 2021-07-29 7/29/2021 85.0 0
19 St. Louis 2021-07-30 7/29/2021 85.0 1
20 St. Louis 2021-07-31 7/29/2021 85.0 2
21 St. Louis 2021-08-01 8/1/2021 89.0 0
22 St. Louis 2021-08-02 8/1/2021 89.0 1
23 St. Louis 2021-08-03 8/1/2021 89.0 2
24 St. Louis 2021-08-04 8/1/2021 89.0 3
25 St. Louis 2021-08-05 8/1/2021 89.0 4
26 St. Louis 2021-08-06 8/6/2021 100.0 0

Related

How Do I Merge DFs with a for loop

Below is my code. What I want to do is merge the spread and total values for each week that I have saved in separate files. It works perfectly for individual weeks, but doesn't when I introduce the for loop. I assume its overwriting each time it merges, but when I place the .merge code outside the for loop, it only writes the last iteration to the excel file.
year = 2015
weeks = np.arange(1,18)
for week in weeks:
odds = pd.read_excel(fr'C:\Users\logan\Desktop\Gambling_Scraper\Odds_{year}\Odds{year}Wk{week}.xlsx')
odds['Favorite'] = odds['Favorite'].map(lambda x: x.lstrip('at '))
odds['Underdog'] = odds['Underdog'].map(lambda x: x.lstrip('at '))
odds['UD_Spread'] = odds['Spread'] * -1
#new df to add spread
new_df = pd.DataFrame(odds['Favorite'].append(odds['Underdog']))
new_df['Tm'] = new_df
new_df['Wk'] = new_df['Tm'] + str(week)
new_df['Spread'] = odds['Spread'].append(odds['UD_Spread'])
#new df to add total
total_df = pd.DataFrame(odds['Favorite'].append(odds['Underdog']))
total_df['Tm'] = total_df
total_df['Wk'] = total_df['Tm'] + str(week)
total_df['Total']= pd.DataFrame(odds['Total'].append(odds['Total']))
df['Week'] = df['Week'].astype(int)
df['Merge'] = df['Tm'].astype(str) + df['Week'].astype(str)
df = df.merge(new_df['Spread'], left_on='Merge', right_on=new_df['Wk'], how='left')
df = df.merge(total_df['Total'], left_on='Merge', right_on=total_df['Wk'], how='left')
df['Implied Tm Pts'] = df['Total'].astype(float) /2 - df['Spread'].astype(float)/2
df.to_excel('DFS2015.xlsx')
What I get:
Name Position Week Tm Merge Spread Total Implied Tm Pts
Devonta Freeman RB 1 Falcons Falcons1 3 55 26
Devonta Freeman RB 2 Falcons Falcons2
Devonta Freeman RB 3 Falcons Falcons3
Devonta Freeman RB 4 Falcons Falcons4
Devonta Freeman RB 5 Falcons Falcons5
Devonta Freeman RB 6 Falcons Falcons6
Devonta Freeman RB 7 Falcons Falcons7
Devonta Freeman RB 8 Falcons Falcons8
Devonta Freeman RB 9 Falcons Falcons9
Devonta Freeman RB 11 Falcons Falcons11
Devonta Freeman RB 13 Falcons Falcons13
Devonta Freeman RB 14 Falcons Falcons14
Devonta Freeman RB 15 Falcons Falcons15
Devonta Freeman RB 16 Falcons Falcons16
Devonta Freeman RB 17 Falcons Falcons17
Antonio Brown WR 1 Steelers Steelers1 7 51 22
But I need a value in each row.
Trying to merge 'Spread' and Total from this data:
Date Favorite Spread Underdog Spread2 Total Away Money
Line Home Money Line Week Favs Spread Uds Spread2
September 10, 2015 8:30 PM Patriots -7.0 Steelers 7 51.0 +270 -340 1 Patriots1 -7.0 Steelers1 7
September 13, 2015 1:00 PM Packers -6.0 Bears 6 48.0 -286 +230 1 Packers1 -6.0 Bears1 6
September 13, 2015 1:00 PM Chiefs -1.0 Texans 1 40.0 -115 -105 1 Chiefs1 -1.0 Texans1 1
September 13, 2015 1:00 PM Jets -4.0 Browns 4 40.0 +170 -190 1 Jets1 -4.0 Browns1 4
September 13, 2015 1:00 PM Colts -1.0 Bills 1 44.0 -115 -105 1 Colts1 -1.0 Bills1 1
September 13, 2015 1:00 PM Dolphins -4.0 Football Team 4 46.0 -210 +175 1 Dolphins1 -4.0 Football Team1 4
September 13, 2015 1:00 PM Panthers -3.0 Jaguars 3 41.0 -150 +130 1 Panthers1 -3.0 Jaguars1 3
September 13, 2015 1:00 PM Seahawks -4.0 Rams 4 42.0 -185 +160 1 Seahawks1 -4.0 Rams1 4
September 13, 2015 4:05 PM Cardinals -2.0 Saints 2 49.0 +120 -140 1 Cardinals1 -2.0 Saints1 2
September 13, 2015 4:05 PM Chargers -4.0 Lions 4 46.0 +160 -180 1 Chargers1 -4.0 Lions1 4
September 13, 2015 4:25 PM Buccaneers -3.0 Titans 3 40.0 +130 -150 1 Buccaneers1 -3.0 Titans1 3
September 13, 2015 4:25 PM Bengals -3.0 Raiders 3 43.0 -154 +130 1 Bengals1 -3.0 Raiders1 3
September 13, 2015 4:25 PM Broncos -4.0 Ravens 4 46.0 +180 -220 1 Broncos1 -4.0 Ravens1 4
September 13, 2015 8:30 PM Cowboys -7.0 Giants 7 52.0 +240 -300 1 Cowboys1 -7.0 Giants1 7
September 14, 2015 7:10 PM Eagles -3.0 Falcons 3 55.0 -188 +150 1 Eagles1 -3.0 Falcons1 3
September 14, 2015 10:20 PM Vikings -2.0 49ers 2 42.0 -142 +120 1 Vikings1 -2.0 49ers1 2

Python scraping an expandable table(BeautifulSoup)?

I have a conflicting issue that i cant seem to find online.
I want to scrape a table from this website: https://www.nytimes.com/interactive/2021/world/covid-vaccinations-tracker.html
and this is the table i wanted to scrape:
So i was able to scrape it, but! it stops until the show all part button.
Is there a way for me to be able to expand this table and then scrape it?
Here is my code(Its a mess as I just wrote it, but enought to get the idea)
def connect_add():
#giving URL a var
url = 'https://www.nytimes.com/interactive/2021/world/covid-vaccinations-tracker.html'
#Sending request to URL
req = requests.get(url)
soup = BeautifulSoup(req.text,'html.parser')
tble = soup.find("table", class_="svelte-2wimac")
table_rows = tble.find_all('tr')
data = []
for rows in table_rows:
prepare = []
for td in rows.find_all('td'):
x = td.text
prepare.append(x)
data.append(prepare)
df_side = pd.DataFrame(data)
x = df_side.head(50)
display(x)
connect_add()
The data is loaded from external source. You can use this example how to load it:
import pandas as pd
df = pd.read_json(
"https://static01.nyt.com/newsgraphics/2021/01/19/world-vaccinations-tracker/3bf66651fd690992142ef2a7e233e8fdedcdd6c5/latest.json"
)
print(df)
Prints:
geoid location last_updated total_vaccinations population people_vaccinated people_fully_vaccinated
0 DZA Algeria 2021-02-19 75000 43053054 NaN NaN
1 MOZ Mozambique 2021-03-23 57305 30366036 57305.0 NaN
2 CPV Cape Verde 2021-03-24 2184 549935 2184.0 NaN
3 MUS Mauritius 2021-03-24 117323 1265711 117323.0 NaN
4 STP Sao Tome and Principe 2021-03-29 9724 215056 9724.0 NaN
5 ARM Armenia 2021-03-31 565 2957731 565.0 NaN
6 MMR Myanmar 2021-03-31 1040000 54045420 1000000.0 40000.0
7 SYR Syria 2021-04-08 2500 17070135 2500.0 NaN
8 HND Honduras 2021-04-09 57639 9746117 55000.0 2639.0
9 TCA Turks and Caicos Islands 2021-04-11 25039 38191 15039.0 10000.0
10 VEN Venezuela 2021-04-12 250000 28515829 250000.0 NaN
11 JAM Jamaica 2021-04-13 135473 2948279 135473.0 NaN
12 COG Congo 2021-04-14 14297 5380508 14297.0 NaN
13 FLK Falkland Islands 2021-04-14 4407 3398 2632.0 1775.0
14 TLS Timor 2021-04-14 2629 1293119 2629.0 NaN
15 NRU Nauru 2021-04-15 700 12581 700.0 NaN
16 SSD South Sudan 2021-04-15 947 11062113 947.0 NaN
17 FJI Fiji 2021-04-16 56000 889953 56000.0 NaN
18 DJI Djibouti 2021-04-17 10246 973560 10246.0 NaN
19 LSO Lesotho 2021-04-17 16000 2125268 16000.0 NaN
20 LBY Libya 2021-04-17 750 6777452 750.0 NaN
21 NER Niger 2021-04-17 1366 23310715 1366.0 NaN
22 SOM Somalia 2021-04-17 117567 15442905 117567.0 NaN
23 TGO Togo 2021-04-17 160000 8082366 160000.0 NaN
24 EGY Egypt 2021-04-18 660000 100388073 660000.0 NaN
25 MRT Mauritania 2021-04-18 7038 4525696 7038.0 NaN
26 SGP Singapore 2021-04-18 2213888 5703569 1364124.0 849764.0
27 COM Comoros 2021-04-21 13440 850886 13440.0 NaN
28 MSR Montserrat 2021-04-21 1909 5900 1293.0 616.0
29 AFG Afghanistan 2021-04-22 240000 38041754 240000.0 NaN
30 AIA Anguilla 2021-04-22 6898 14731 6115.0 783.0
31 ATG Antigua and Barbuda 2021-04-22 29754 97118 29754.0 NaN
32 MCO Monaco 2021-04-22 24390 38964 12758.0 11632.0
33 AGO Angola 2021-04-23 456349 31825295 456349.0 NaN
34 BLR Belarus 2021-04-23 328500 9466856 244000.0 84500.0
35 BRN Brunei 2021-04-23 10715 433285 10715.0 NaN
36 GAB Gabon 2021-04-23 8897 2172579 6895.0 2002.0
37 IRQ Iraq 2021-04-23 298377 39309783 298377.0 NaN
38 SDN Sudan 2021-04-23 140227 42813238 140227.0 NaN
39 GMB Gambia 2021-04-24 20922 2347706 20922.0 NaN
40 NIC Nicaragua 2021-04-24 135130 6545502 135130.0 NaN
41 COD Democratic Republic of Congo 2021-04-25 1700 86790567 1700.0 NaN
42 SWZ Eswatini 2021-04-25 34897 1148130 34897.0 NaN
43 MLI Mali 2021-04-25 49903 19658031 49903.0 NaN
44 PSE Palestine 2021-04-25 213989 4685306 170109.0 43880.0
45 PNG Papua New Guinea 2021-04-25 2900 8776109 2900.0 NaN
46 GUY Guyana 2021-04-26 126800 782766 124000.0 2800.0
47 LAO Laos 2021-04-26 184387 7169455 126072.0 58315.0
48 TON Tonga 2021-04-26 5367 104494 5367.0 NaN
49 BHS Bahamas 2021-04-27 25692 389482 25692.0 NaN
50 BIH Bosnia and Herzegovina 2021-04-27 106464 3301000 83260.0 23204.0
51 SLB Solomon Islands 2021-04-27 4890 669823 4890.0 NaN
52 UZB Uzbekistan 2021-04-27 600369 33580650 600369.0 NaN
53 GNQ Equatorial Guinea 2021-04-28 75518 1355986 64646.0 10872.0
54 KEN Kenya 2021-04-28 853081 52573973 853081.0 NaN
55 KGZ Kyrgyzstan 2021-04-28 27858 6456900 27000.0 858.0
56 CMR Cameroon 2021-04-29 11000 25876380 11000.0 NaN
57 BWA Botswana 2021-04-30 49882 2303697 49882.0 NaN
58 GHA Ghana 2021-04-30 849527 30417856 849527.0 NaN
59 VNM Vietnam 2021-04-30 509855 96462106 509855.0 NaN
60 VCT Saint Vincent and the Grenadines 2021-05-01 14526 110589 NaN NaN
61 BMU Bermuda 2021-05-02 58193 63918 32877.0 25216.0
62 NLD Netherlands 2021-05-02 5651843 17332850 4448730.0 NaN
63 PRY Paraguay 2021-05-02 143441 7044636 131013.0 12428.0
64 AND Andorra 2021-05-03 28881 77142 24182.0 4699.0
65 BOL Bolivia 2021-05-03 878563 11513100 637694.0 240869.0
66 CRI Costa Rica 2021-05-03 950252 5047561 605099.0 345153.0
67 WSM Samoa 2021-05-03 7435 197097 NaN NaN
68 SYC Seychelles 2021-05-03 127721 97625 68045.0 59676.0
69 JOR Jordan 2021-05-04 1091048 10101694 805020.0 286028.0
70 NZL New Zealand 2021-05-04 304900 4917000 217603.0 87297.0
71 KNA Saint Kitts and Nevis 2021-05-04 13070 52834 12943.0 127.0
72 ETH Ethiopia 2021-05-05 1215934 112078730 NaN NaN
73 LIE Liechtenstein 2021-05-05 13829 38019 9645.0 4184.0
74 MLT Malta 2021-05-05 359429 502653 246698.0 112731.0
75 OMN Oman 2021-05-05 326269 4974986 253000.0 73269.0
76 CHE Switzerland 2021-05-05 3001029 8574832 1997717.0 1003312.0
77 CYP Cyprus 2021-05-06 332423 1198575 252792.0 79631.0
78 SLV El Salvador 2021-05-06 1114544 6453553 958828.0 155716.0
79 GRD Grenada 2021-05-06 17000 112003 13000.0 4000.0
80 KWT Kuwait 2021-05-06 1440000 4207083 NaN NaN
81 LBN Lebanon 2021-05-06 509705 6855713 325383.0 184322.0
82 LUX Luxembourg 2021-05-06 227314 619896 165376.0 61938.0
83 NOR Norway 2021-05-06 1919369 5347896 1465851.0 453518.0
84 PAK Pakistan 2021-05-06 3320304 216565318 NaN NaN
85 PER Peru 2021-05-06 1939155 32510453 1284692.0 654463.0
86 ESP Spain 2021-05-06 19048132 47076781 13271511.0 5956451.0
87 BLZ Belize 2021-05-07 47675 390353 47675.0 NaN
88 BRA Brazil 2021-05-07 46875460 211049527 31722544.0 15152916.0
89 CYM Cayman Islands 2021-05-07 69772 64948 37470.0 32302.0
90 COL Colombia 2021-05-07 6096661 50339443 3861416.0 2235245.0
91 DMA Dominica 2021-05-07 32008 71808 18864.0 13144.0
92 ECU Ecuador 2021-05-07 1245822 17373662 981620.0 264202.0
93 DEU Germany 2021-05-07 34408840 83132799 26872478.0 7572228.0
94 GRL Greenland 2021-05-07 14278 56225 8994.0 5284.0
95 GIN Guinea 2021-05-07 173623 12771246 116436.0 57187.0
96 ISL Iceland 2021-05-07 184304 361313 138577.0 53658.0
97 IRN Iran 2021-05-07 1485287 82913906 1231652.0 253635.0
98 IRL Ireland 2021-05-07 1799190 4941444 1305178.0 494012.0
99 KAZ Kazakhstan 2021-05-07 2158924 18513930 1634939.0 523985.0
100 NAM Namibia 2021-05-07 36417 2494530 34346.0 2071.0
101 NPL Nepal 2021-05-07 2453512 28608710 2091511.0 362001.0
102 RWA Rwanda 2021-05-07 350400 12626950 350400.0 NaN
103 SMR San Marino 2021-05-07 34011 33860 21389.0 12622.0
104 SWE Sweden 2021-05-07 3679451 10285453 2852689.0 826762.0
105 UGA Uganda 2021-05-07 395805 44269594 395805.0 NaN
106 ALB Albania 2021-05-08 596766 2854191 NaN NaN
107 ABW Aruba 2021-05-08 80699 106314 55744.0 24955.0
108 BRB Barbados 2021-05-08 75476 287025 75476.0 NaN
109 BEL Belgium 2021-05-08 4591359 11484055 3527895.0 1084263.0
110 BTN Bhutan 2021-05-08 481491 763092 481491.0 NaN
111 CHL Chile 2021-05-08 15703842 18952038 8559854.0 7143988.0
112 DNK Denmark 2021-05-08 2339464 5818553 1489198.0 850266.0
113 DOM Dominican Republic 2021-05-08 2345528 10738958 1535083.0 810445.0
114 FIN Finland 2021-05-08 2154469 5520314 1943842.0 210627.0
115 FRA France 2021-05-08 25414386 67059887 17692900.0 7832913.0
116 GEO Georgia 2021-05-08 58533 3720382 58533.0 NaN
117 GIB Gibraltar 2021-05-08 74256 33701 38727.0 35529.0
118 GRC Greece 2021-05-08 3647689 10716322 2450349.0 1197340.0
119 GTM Guatemala 2021-05-08 206951 16604026 204459.0 2492.0
120 MDV Maldives 2021-05-08 431792 530953 300906.0 130886.0
121 MEX Mexico 2021-05-08 21228359 127575529 14148207.0 9440251.0
122 MDA Moldova 2021-05-08 184660 2657637 161266.0 23394.0
123 MAR Morocco 2021-05-08 9864561 36471769 5473809.0 4390752.0
124 POL Poland 2021-05-08 13670541 37970874 10185393.0 3650119.0
125 ROU Romania 2021-05-08 5891855 19356544 3580368.0 2314812.0
126 LCA Saint Lucia 2021-05-08 25200 182790 NaN NaN
127 SEN Senegal 2021-05-08 427377 16296364 427377.0 NaN
128 SLE Sierra Leone 2021-05-08 64966 7813215 58250.0 6716.0
129 SVK Slovakia 2021-05-08 1792674 5454073 1209044.0 583630.0
130 ZAF South Africa 2021-05-08 382480 58558270 382480.0 382480.0
131 SUR Suriname 2021-05-08 90338 581363 45420.0 44918.0
132 TUN Tunisia 2021-05-08 499369 11694719 350426.0 148943.0
133 UKR Ukraine 2021-05-08 863085 44385155 862639.0 446.0
134 GBR United Kingdom 2021-05-08 53041048 66834405 35371669.0 17669379.0
135 ZMB Zambia 2021-05-08 77348 17861030 77348.0 NaN
136 ARG Argentina 2021-05-09 9082597 44938712 7688877.0 1393720.0
137 AUS Australia 2021-05-09 2654338 25364307 NaN NaN
138 AUT Austria 2021-05-09 3632879 8877067 2665516.0 972493.0
139 AZE Azerbaijan 2021-05-09 1687397 10023318 1005678.0 681719.0
140 BHR Bahrain 2021-05-09 1375967 1641172 797181.0 578786.0
141 BGD Bangladesh 2021-05-09 9316086 163046161 5819900.0 3496186.0
142 BGR Bulgaria 2021-05-09 938064 6975761 646068.0 291996.0
143 KHM Cambodia 2021-05-09 2884922 16486542 1773994.0 1110928.0
144 CAN Canada 2021-05-09 15917555 37589262 14668624.0 1248931.0
145 CHN China 2021-05-09 324307000 1397715000 NaN NaN
146 CIV Cote d'Ivoire 2021-05-09 262639 25716544 262639.0 NaN
147 HRV Croatia 2021-05-09 1131607 4067500 879312.0 252295.0
148 CUW Curacao 2021-05-09 109444 157538 77141.0 32303.0
149 CZE Czechia 2021-05-09 3654376 10669709 2610990.0 1058179.0
150 EST Estonia 2021-05-09 532605 1326590 373391.0 159214.0
151 FRO Faeroe Islands 2021-05-09 23519 48678 16896.0 6623.0
152 HKG Hong Kong 2021-05-09 1741682 7451000 1071488.0 670194.0
153 HUN Hungary 2021-05-09 6809350 9769949 4305775.0 2503575.0
154 IND India 2021-05-09 168304868 1366417754 133854676.0 34450192.0
155 IDN Indonesia 2021-05-09 21993299 270625568 13349469.0 8643830.0
156 IMN Isle of Man 2021-05-09 75783 84584 59932.0 15851.0
157 ISR Israel 2021-05-09 10501225 9053300 5422082.0 5079143.0
158 ITA Italy 2021-05-09 24054000 60297396 16823066.0 7401862.0
159 JPN Japan 2021-05-09 4436325 126264931 3277886.0 1158439.0
160 LVA Latvia 2021-05-09 395512 1912789 316665.0 79647.0
161 LTU Lithuania 2021-05-09 1162170 2786844 777019.0 385151.0
162 MAC Macao 2021-05-09 118687 631636 77597.0 41241.0
163 MWI Malawi 2021-05-09 319323 18628747 319323.0 NaN
164 MYS Malaysia 2021-05-09 1766651 31949777 1089637.0 677014.0
165 MNG Mongolia 2021-05-09 2213376 3225167 1590636.0 622740.0
166 MNE Montenegro 2021-05-09 109507 622137 78760.0 30747.0
167 NGA Nigeria 2021-05-09 1665698 200963599 1665698.0 NaN
168 MKD North Macedonia 2021-05-09 107978 2083459 107978.0 NaN
169 PAN Panama 2021-05-09 780569 4246439 524958.0 255610.0
170 PHL Philippines 2021-05-09 2408781 108116615 1957511.0 451270.0
171 PRT Portugal 2021-05-09 3963372 10269417 2858389.0 1104961.0
172 QAT Qatar 2021-05-09 1813240 2832067 1115842.0 697398.0
173 RUS Russia 2021-05-09 21754829 144373535 13129704.0 8625125.0
174 SAU Saudi Arabia 2021-05-09 10584301 34268528 NaN NaN
175 SRB Serbia 2021-05-09 3798942 6944975 2149705.0 1649237.0
176 SVN Slovenia 2021-05-09 737817 2087946 484949.0 252868.0
177 KOR South Korea 2021-05-09 4181003 51709098 3674729.0 506274.0
178 LKA Sri Lanka 2021-05-09 1125740 21803000 928400.0 197340.0
179 TWN Taiwan 2021-05-09 92049 23780452 NaN NaN
180 THA Thailand 2021-05-09 1809894 69625582 1296440.0 513454.0
181 TTO Trinidad and Tobago 2021-05-09 61120 1394973 60174.0 946.0
182 TUR Turkey 2021-05-09 24918773 83429615 14585980.0 10332793.0
183 ARE United Arab Emirates 2021-05-09 11145934 9770529 NaN NaN
184 URY Uruguay 2021-05-09 2005442 3461734 1228151.0 777291.0
185 ZWE Zimbabwe 2021-05-09 684243 14645468 526066.0 158177.0
186 USA United States 2021-05-09 259716989 331811257 152116936.0 114258244.0
187 OWID_WRL World NaN 1297259952 7673533970 641081197.0 309613453.0

For Loop Throwing Me For A Loop [duplicate]

This question already has answers here:
How to iterate over rows in a DataFrame in Pandas
(32 answers)
Closed 2 years ago.
I have a loop cycling through the length of a data frame and going through a list of teams. My loop should go through 41 rows but it only does 2 rows and then stops, I have no idea why it is stalling out. It seems to me I should be cycling through the entire 41 team list but it stops after indexing two teams.
import pandas as pd
excel_data_df = pd.read_excel('New_Schedule.xlsx', sheet_name='Sheet1', engine='openpyxl')
print(excel_data_df)
print('Data Frame Above')
yahoot = len(excel_data_df)
print('Length Of Dataframe Below')
print(yahoot)
for games in excel_data_df:
yahoot -= 1
print(yahoot)
searching = excel_data_df.iloc[yahoot, 0]
print(searching)
excel_data_df2 = pd.read_excel('allstats.xlsx', sheet_name='Sheet1', engine='openpyxl')
print(excel_data_df2)
finding = excel_data_df2[excel_data_df2['TEAM:'] == searching].index
print(finding)
Here is the run log
HOME TEAM: AWAY TEAM:
0 Portland St. Weber St.
1 Nevada Air Force
2 Utah Idaho
3 San Jose St. Santa Clara
4 Southern Utah SAGU American Indian
5 West Virginia Iowa St.
6 Missouri Prairie View
7 Southeast Mo. St. UT Martin
8 Little Rock Champion Chris.
9 Tennessee St. Belmont
10 Wichita St. Emporia St.
11 Tennessee Tennessee Tech
12 FGCU Webber Int'l
13 Jacksonville St. Ga. Southwestern
14 Northern Ill. Chicago St.
15 Col. of Charleston Western Caro.
16 Georgia Tech Florida A&M
17 Rider Iona
18 Tulsa Northwestern St.
19 Rhode Island Davidson
20 Washington St. Montana St.
21 Montana Dickinson St.
22 Robert Morris Bowling Green
23 South Dakota Drake
24 Richmond Loyola Chicago
25 Coastal Carolina Alice Lloyd
26 Presbyterian South Carolina St.
27 Morehead St. SIUE
28 San Diego St. BYU
29 Siena Canisius
30 Monmouth Saint Peter's
31 Howard Hampton
32 App State Columbia Int'l
33 Southern Ill. North Dakota
34 Norfolk St. UNCW
35 Niagara Fairfield
36 N.C. A&T Greensboro
37 Western Mich. Central Mich.
38 DePaul Xavier
39 Georgia St. Carver
40 Northern Ariz. Eastern Wash.
41 Gardner-Webb VMI
Data Frame Above
Length Of Dataframe Below
42
41
Gardner-Webb
TEAM: TOTAL POINTS: ... TURNOVER RATIO: ASSIST TO TURNOVER RANK
0 Mount St. Marys 307 ... 65 239.0
1 Saint Josephs 163 ... 28 81.0
2 Saint Marys (CA) 518 ... 78 114.0
3 Saint Peters 399 ... 86 145.0
4 St. John's (NY) 656 ... 115 73.0
.. ... ... ... ... ...
314 Wofford 327 ... 54 113.0
315 Wright St. 220 ... 47 206.0
316 Wyoming 517 ... 64 27.0
317 Xavier 582 ... 84 12.0
318 Youngstown St. 231 ... 30 79.0
[319 rows x 18 columns]
Int64Index([85], dtype='int64')
40
Northern Ariz.
TEAM: TOTAL POINTS: ... TURNOVER RATIO: ASSIST TO TURNOVER RANK
0 Mount St. Marys 307 ... 65 239.0
1 Saint Josephs 163 ... 28 81.0
2 Saint Marys (CA) 518 ... 78 114.0
3 Saint Peters 399 ... 86 145.0
4 St. John's (NY) 656 ... 115 73.0
.. ... ... ... ... ...
314 Wofford 327 ... 54 113.0
315 Wright St. 220 ... 47 206.0
316 Wyoming 517 ... 64 27.0
317 Xavier 582 ... 84 12.0
318 Youngstown St. 231 ... 30 79.0
[319 rows x 18 columns]
Int64Index([180], dtype='int64')
Use:for i in index,data in excel_data_df.iterrrows() instead.
pandas.DataFrame.iterrows
DataFrame.iterrows()
Iterate over DataFrame rows as (index, Series) pairs.

Pandas: Fill missing date with information from other rows

Suppose I have the following pandas dataframe:
Date Region Country Cases Deaths Lat Long
2020-03-08 Northern Territory Australia 27 49 -12.4634 130.8456
2020-03-09 Northern Territory Australia 80 85 -12.4634 130.8456
2020-03-12 Northern Territory Australia 35 73 -12.4634 130.8456
2020-03-08 Western Australia Australia 48 20 -31.9505 115.8605
2020-03-09 Western Australia Australia 70 12 -31.9505 115.8605
2020-03-10 Western Australia Australia 66 95 -31.9505 115.8605
2020-03-11 Western Australia Australia 31 38 -31.9505 115.8605
2020-03-12 Western Australia Australia 40 83 -31.9505 115.8605
I need to update the dataframe with the missing dates on the Northern Terriroty, 2020-3-10 and 2020-3-11. However, I want to use all the information except for cases and deaths. Like this:
Date Region Country Cases Deaths Lat Long
2020-03-08 Northern Territory Australia 27 49 -12.4634 130.8456
2020-03-09 Northern Territory Australia 80 85 -12.4634 130.8456
2020-03-10 Northern Territory Australia 0 0 -12.4634 130.8456
2020-03-11 Northern Territory Australia 0 0 -12.4634 130.8456
2020-03-12 Northern Territory Australia 35 73 -12.4634 130.8456
2020-03-08 Western Australia Australia 48 20 -31.9505 115.8605
2020-03-09 Western Australia Australia 70 12 -31.9505 115.8605
2020-03-10 Western Australia Australia 66 95 -31.9505 115.8605
2020-03-11 Western Australia Australia 31 38 -31.9505 115.8605
2020-03-12 Western Australia Australia 40 83 -31.9505 115.8605
The only way I can think of doing this is to iterate through all combinations of dates and countries.
EDIT
Efran seems to be on the right track but I can't get it to work. Here is the actual data I'm working with instead of a toy example.
import pandas as pd
unique_group = ['province','country','county']
csbs_df = pd.read_csv(
'https://jordansdatabucket.s3-us-west-2.amazonaws.com/covid19data/csbs_df.csv.gz', index_col=0)
csbs_df['Date'] = pd.to_datetime(csbs_df['Date'], infer_datetime_format=True)
new_df = (
csbs_df.set_index('Date')
.groupby(unique_group)
.resample('D').first()
.fillna(dict.fromkeys(['confirmed', 'deaths'], 0))
.ffill()
.reset_index(level=3)
.reset_index(drop=True))
new_df.head()
Date id lat lon Timestamp province country_code country county confirmed deaths source Date_text
0 2020-03-25 1094.0 32.534893 -86.642709 2020-03-25 00:00:00+00:00 Alabama US US Autauga 1.0 0.0 CSBS 03/25/20
1 2020-03-26 901.0 32.534893 -86.642709 2020-03-26 00:00:00+00:00 Alabama US US Autauga 4.0 0.0 CSBS 03/26/20
2 2020-03-24 991.0 30.735891 -87.723525 2020-03-24 00:00:00+00:00 Alabama US US Baldwin 3.0 0.0 CSBS 03/24/20
3 2020-03-25 1080.0 30.735891 -87.723525 2020-03-25 00:00:00+00:00 Alabama US US Baldwin 4.0 0.0 CSBS 03/25/20
4 2020-03-26 1139.0 30.735891 -87.723525 2020-03-26 16:52:00+00:00 Alabama US US Baldwin 4.0 0.0 CSBS 03/26/20
You can see that it is not inserting the day resample as its specified. I'm not sure whats wrong.
Edit 2
Here is my solution based on Erfan.
import pandas as pd
csbs_df = pd.read_csv(
'https://jordansdatabucket.s3-us-west-2.amazonaws.com/covid19data/csbs_df.csv.gz', index_col=0)
date_range = pd.date_range(csbs_df['Date'].min(),csbs_df['Date'].max(),freq='1D')
unique_group = ['country','province','county']
gb = csbs_df.groupby(unique_group)
sub_dfs =[]
for g in gb.groups:
sub_df = gb.get_group(g)
sub_df = (
sub_df.set_index('Date')
.reindex(date_range)
.fillna(dict.fromkeys(['confirmed', 'deaths'], 0))
.bfill()
.ffill()
.reset_index()
.rename({'index':'Date'},axis=1)
.drop({'id':1},axis=1))
sub_df['Date_text'] = sub_df['Date'].dt.strftime('%m/%d/%y')
sub_df['Timestamp'] = pd.to_datetime(sub_df['Date'],utc=True)
sub_dfs.append(sub_df)
all_concat = pd.concat(sub_dfs)
assert((all_concat.groupby(['province','country','county']).count() == 3).all().all())
Using GroupBy.resample, ffill and fillna:
The idea here is that we want to "fill" the missing gaps of dates for each group of Region and Country. This is called resampling of timeseries.
So that's why we use GroupBy.resample instead of DataFrame.resample here. Further more fillna and ffill is needed to fill the data accordingly to your logic.
df['Date'] = pd.to_datetime(df['Date'], infer_datetime_format=True)
dfn = (
df.set_index('Date')
.groupby(['Region', 'Country'])
.resample('D').first()
.fillna(dict.fromkeys(['Cases', 'Deaths'], 0))
.ffill()
.reset_index(level=2)
.reset_index(drop=True)
)
Date Region Country Cases Deaths Lat Long
0 2020-03-08 Northern Territory Australia 27.0 49.0 -12.4634 130.8456
1 2020-03-09 Northern Territory Australia 80.0 85.0 -12.4634 130.8456
2 2020-03-10 Northern Territory Australia 0.0 0.0 -12.4634 130.8456
3 2020-03-11 Northern Territory Australia 0.0 0.0 -12.4634 130.8456
4 2020-03-12 Northern Territory Australia 35.0 73.0 -12.4634 130.8456
5 2020-03-08 Western Australia Australia 48.0 20.0 -31.9505 115.8605
6 2020-03-09 Western Australia Australia 70.0 12.0 -31.9505 115.8605
7 2020-03-10 Western Australia Australia 66.0 95.0 -31.9505 115.8605
8 2020-03-11 Western Australia Australia 31.0 38.0 -31.9505 115.8605
9 2020-03-12 Western Australia Australia 40.0 83.0 -31.9505 115.8605
Edit:
Seems indeed that not all places have same start and end date, so we have to take that into account, the following works:
csbs_df = pd.read_csv(
'https://jordansdatabucket.s3-us-west-2.amazonaws.com/covid19data/csbs_df.csv.gz'
).iloc[:, 1:]
csbs_df['Date_text'] = pd.to_datetime(csbs_df['Date_text'])
date_range = pd.date_range(csbs_df['Date_text'].min(), csbs_df['Date_text'].max(), freq='D')
def reindex_dates(data, dates):
data = data.reindex(dates).fillna(dict.fromkeys(['Cases', 'Deaths'], 0)).ffill().bfill()
return data
dfn = (
csbs_df.set_index('Date_text')
.groupby('id').apply(lambda x: reindex_dates(x, date_range))
.reset_index(level=0, drop=True)
.reset_index()
.rename(columns={'index': 'Date'})
)
print(dfn.head())
Date id lat lon Timestamp \
0 2020-03-24 0.0 40.714550 -74.007140 2020-03-24 00:00:00+00:00
1 2020-03-25 0.0 40.714550 -74.007140 2020-03-25 00:00:00+00:00
2 2020-03-26 0.0 40.714550 -74.007140 2020-03-26 00:00:00+00:00
3 2020-03-24 1.0 41.163198 -73.756063 2020-03-24 00:00:00+00:00
4 2020-03-25 1.0 41.163198 -73.756063 2020-03-25 00:00:00+00:00
Date province country_code country county confirmed deaths \
0 2020-03-24 New York US US New York 13119.0 125.0
1 2020-03-25 New York US US New York 15597.0 192.0
2 2020-03-26 New York US US New York 20011.0 280.0
3 2020-03-24 New York US US Westchester 2894.0 0.0
4 2020-03-25 New York US US Westchester 3891.0 1.0
source
0 CSBS
1 CSBS
2 CSBS
3 CSBS
4 CSBS

pandasql EOL error while scanning string literal

I have the code below where I'm trying to use pandasql to run a sql query with sqldf. I'm doing some division and aggregation. The query runs just fine when I run it in r with sqldf. I'm totally new to pandasql and I'm getting the error below, can anyone see what my issue is and suggest how to fix it? I've also included some sample data.
Code:
import pandasql
from pandasql import sqldf
pysqldf = lambda q: sqldf(q, globals())
ExampleDf=pysqldf("select sum(lastSaleAmount-priorSaleAmount)/sum(squareFootage) as AvgPric
,zipcode
from data
where priorSaleDate between '2010-01-01' and '2011-01-01'
group by zipcode
order by
sum(lastSaleAmount-priorSaleAmount)/sum(squareFootage) desc")
Error:
File "<ipython-input-100-679165684772>", line 1
ExampleDf=pysqldf("select sum(lastSaleAmount-priorSaleAmount)/sum(squareFootage) as AvgPric
^
SyntaxError: EOL while scanning string literal
Sample Data:
print(data.iloc[:50])
id address city state zipcode latitude \
0 39525749 8171 E 84th Ave Denver CO 80022 39.849160
1 184578398 10556 Wheeling St Denver CO 80022 39.888020
2 184430015 3190 Wadsworth Blvd Denver CO 80033 39.761710
3 155129946 3040 Wadsworth Blvd Denver CO 80033 39.760780
4 245107 5615 S Eaton St Denver CO 80123 39.616181
5 3523925 6535 W Sumac Ave Denver CO 80123 39.615136
6 30560679 6673 W Berry Ave Denver CO 80123 39.616350
7 39623928 5640 S Otis St Denver CO 80123 39.615213
8 148975825 5342 S Gray St Denver CO 80123 39.620158
9 184623176 4967 S Wadsworth Blvd Denver CO 80123 39.626770
10 39811456 6700 W Dorado Dr # 11 Denver CO 80123 39.614540
11 39591617 4956 S Perry St Denver CO 80123 39.628740
12 39577604 4776 S Gar Way Denver CO 80123 39.630547
13 153665665 8890 W Tanforan Dr Denver CO 80123 39.630738
14 39868673 5538 W Prentice Cir Denver CO 80123 39.620625
15 184328555 4254 W Monmouth Ave Denver CO 80123 39.629000
16 30554949 6600 W Berry Ave Denver CO 80123 39.616165
17 24157982 6560 W Sumac Ave Denver CO 80123 39.614712
18 51335315 5655 S Fenton St Denver CO 80123 39.615488
19 152799217 5626 S Fenton St Denver CO 80123 39.616153
20 51330641 5599 S Fenton St Denver CO 80123 39.616514
21 15598828 6595 W Sumac Ave Denver CO 80123 39.615144
22 49360310 6420 W Sumac Ave Denver CO 80123 39.614531
23 39777745 4962 S Field Ct Denver CO 80123 39.625819
24 18021201 9664 W Grand Ave Denver CO 80123 39.625826
25 39776096 4881 S Jellison St Denver CO 80123 39.628401
26 29850085 5012 S Field Ct Denver CO 80123 39.625537
27 51597934 4982 S Field Ct Denver CO 80123 39.625757
28 39563379 4643 S Hoyt St Denver CO 80123 39.632457
29 18922140 5965 W Sumac Ave Denver CO 80123 39.615199
30 39914328 9740 W Chenango Ave Denver CO 80123 39.627226
31 51323181 5520 W Prentice Cir Denver CO 80123 39.620548
32 3493378 4665 S Garland Way Denver CO 80123 39.632063
33 4115341 5466 W Prentice Cir Denver CO 80123 39.619027
34 39639069 5735 W Berry Ave Denver CO 80123 39.617727
35 184333944 9015 W Tanforan Dr Denver CO 80123 39.631178
36 18197471 4977 S Garland St Denver CO 80123 39.626080
37 49430482 9540 W Bellwood Pl Denver CO 80123 39.624558
38 39868648 5535 S Fenton St Denver CO 80123 39.617145
39 143684222 3761 W Wagon Trail Dr Denver CO 80123 39.631251
40 152898579 4850 S Yukon St Denver CO 80123 39.629025
41 43174426 4951 S Ammons St Denver CO 80123 39.626582
42 39615194 7400 W Grant Ranch Blvd # 31 Denver CO 80123 39.618440
43 184340029 7400 W Grant Ranch Blvd # 7 Denver CO 80123 39.618440
44 3523919 5425 S Gray St Denver CO 80123 39.618265
45 151444231 6610 W Berry Ave Denver CO 80123 39.616148
46 19150871 4756 S Perry St Denver CO 80123 39.630389
47 39545155 4328 W Bellewood Dr Denver CO 80123 39.627883
48 3523923 6585 W Sumac Ave Denver CO 80123 39.615145
49 51337334 5737 W Alamo Dr Denver CO 80123 39.615881
longitude bedrooms bathrooms rooms squareFootage lotSize yearBuilt \
0 -104.893468 3 2.0 6 1378 9968 2003.0
1 -104.830930 2 2.0 6 1653 6970 2004.0
2 -105.081070 3 1.0 0 1882 23875 1917.0
3 -105.081060 4 3.0 0 2400 11500 1956.0
4 -105.058812 3 4.0 8 2305 5600 1998.0
5 -105.069018 3 5.0 7 2051 6045 1996.0
6 -105.070760 4 4.0 8 2051 6315 1997.0
7 -105.070617 3 3.0 7 2051 8133 1997.0
8 -105.063094 3 3.0 7 1796 5038 1999.0
9 -105.081990 3 3.0 0 2054 4050 2007.0
10 -105.071350 3 4.0 7 2568 6397 2000.0
11 -105.040126 3 2.0 6 1290 9000 1962.0
12 -105.100242 3 4.0 6 1804 6952 1983.0
13 -105.097718 3 3.0 6 1804 7439 1983.0
14 -105.059503 4 5.0 8 3855 9656 1998.0
15 -105.042330 2 2.0 4 1297 16600 1962.0
16 -105.069424 4 4.0 9 2321 5961 1996.0
17 -105.069264 4 4.0 8 2321 6337 1997.0
18 -105.060173 3 3.0 7 2321 6151 1998.0
19 -105.059696 3 3.0 7 2071 6831 1999.0
20 -105.060193 3 3.0 7 2071 6050 1998.0
21 -105.069803 3 3.0 7 2074 6022 1996.0
22 -105.067815 4 4.0 9 2588 6432 1996.0
23 -105.099825 3 2.0 7 1567 6914 1980.0
24 -105.106423 3 2.0 5 1317 9580 1983.0
25 -105.108440 3 3.0 5 1317 6718 1982.0
26 -105.099012 2 2.0 6 808 8568 1980.0
27 -105.099484 2 1.0 6 808 6858 1980.0
28 -105.104752 3 2.0 6 1321 6000 1978.0
29 -105.062378 3 4.0 8 2350 6839 1997.0
30 -105.107806 2 2.0 5 1586 6510 1982.0
31 -105.058600 2 4.0 6 2613 8250 1998.0
32 -105.101493 3 2.0 8 1590 7044 1977.0
33 -105.057427 3 5.0 7 2614 9350 1999.0
34 -105.059123 3 4.0 7 2107 6491 1998.0
35 -105.099179 2 1.0 5 1340 6741 1982.0
36 -105.103470 3 2.0 6 1085 6120 1985.0
37 -105.104316 3 1.0 6 1085 13500 1981.0
38 -105.060195 4 3.0 8 2365 6050 1998.0
39 -105.036567 3 2.0 5 1344 9240 1959.0
40 -105.081998 2 3.0 5 1601 6660 1986.0
41 -105.087250 3 2.0 8 1858 6890 1986.0
42 -105.079900 2 2.0 5 1603 5742 1997.0
43 -105.079900 2 2.0 5 1603 6168 1997.0
44 -105.061397 3 3.0 7 1860 6838 1998.0
45 -105.069618 3 4.0 8 2376 5760 1996.0
46 -105.038707 3 2.0 5 1355 9600 1960.0
47 -105.042611 2 2.0 6 1867 11000 1973.0
48 -105.069604 3 3.0 7 2382 5830 1996.0
49 -105.059085 3 3.0 6 1872 5500 1999.0
lastSaleDate lastSaleAmount priorSaleDate priorSaleAmount \
0 2009-12-17 75000 2004-05-13 165700.0
1 2004-09-23 216935 NaN NaN
2 2008-04-03 330000 NaN NaN
3 2008-12-02 185000 2008-06-27 0.0
4 2012-07-18 308000 2011-12-29 0.0
5 2006-09-12 363500 2005-05-16 339000.0
6 2014-12-15 420000 2006-07-07 345000.0
7 2004-03-15 328700 1998-04-09 225200.0
8 2011-08-16 274900 2011-01-10 0.0
9 2015-12-01 407000 2012-10-30 312000.0
10 2014-11-12 638000 2005-03-22 530000.0
11 2004-02-02 235000 2000-10-12 171000.0
12 2004-07-19 247000 1999-06-07 187900.0
13 2013-08-14 249700 2000-09-07 217900.0
14 2004-08-17 580000 1999-01-11 574000.0
15 2011-11-07 150000 NaN NaN
16 2006-01-18 402800 2004-08-16 335000.0
17 2013-12-31 422000 2012-11-05 399000.0
18 1999-12-02 277900 NaN NaN
19 2000-02-04 271800 NaN NaN
20 1999-10-20 274400 NaN NaN
21 2007-11-30 314500 NaN NaN
22 2001-12-31 342500 NaN NaN
23 2016-12-02 328000 2016-08-02 231200.0
24 2017-06-21 376000 2008-02-29 244000.0
25 2004-08-31 225000 NaN NaN
26 2016-09-06 310000 2015-09-15 258900.0
27 1999-12-06 128000 NaN NaN
28 2004-04-28 197000 NaN NaN
29 2011-08-11 365000 2004-08-04 365000.0
30 2015-07-08 302000 2004-07-15 210000.0
31 2000-02-10 425000 1999-04-08 396500.0
32 2016-02-26 275000 2004-12-03 204000.0
33 2005-08-29 580000 1999-09-10 398200.0
34 2004-06-30 355000 2001-02-22 320000.0
35 2015-05-26 90000 1983-06-01 80000.0
36 2017-06-08 312500 2017-05-12 258000.0
37 2001-04-27 184000 1999-11-10 164900.0
38 2004-02-08 335000 2001-05-08 339950.0
39 2016-10-17 290000 NaN 70200.0
40 2010-09-02 260000 1998-04-14 189900.0
41 2012-07-30 231600 2012-03-30 0.0
42 2013-10-24 400000 2004-08-04 388400.0
43 2004-11-19 350000 1998-10-05 292400.0
44 2005-06-23 295000 2004-07-26 300000.0
45 2009-06-24 404500 2000-05-04 304900.0
46 1999-12-14 153500 1999-12-14 153500.0
47 2004-05-25 208000 NaN NaN
48 2016-10-20 502000 2005-05-31 357000.0
49 2013-04-05 369000 2000-08-07 253000.0
estimated_value
0 239753
1 343963
2 488840
3 494073
4 513676
5 496062
6 514953
7 494321
8 496079
9 424514
10 721350
11 331915
12 389415
13 386694
14 784587
15 354031
16 515537
17 544960
18 504791
19 495121
20 495894
21 496281
22 528343
23 349041
24 367754
25 356934
26 346001
27 342927
28 337969
29 500105
30 353827
31 693035
32 350857
33 716655
34 493156
35 349355
36 348079
37 343957
38 504705
39 311996
40 391469
41 418814
42 502894
43 478049
44 475615
45 521467
46 366187
47 386913
48 527104
49 497239
Just change the quotes to be able to read multiline string:
ExampleDf=pysqldf("""select sum(lastSaleAmount-priorSaleAmount)/sum(squareFootage) as AvgPric
,zipcode
from data
where priorSaleDate between '2010-01-01' and '2011-01-01'
group by zipcode
order by
sum(lastSaleAmount-priorSaleAmount)/sum(squareFootage) desc""")

Categories