I have a conflicting issue that i cant seem to find online.
I want to scrape a table from this website: https://www.nytimes.com/interactive/2021/world/covid-vaccinations-tracker.html
and this is the table i wanted to scrape:
So i was able to scrape it, but! it stops until the show all part button.
Is there a way for me to be able to expand this table and then scrape it?
Here is my code(Its a mess as I just wrote it, but enought to get the idea)
def connect_add():
#giving URL a var
url = 'https://www.nytimes.com/interactive/2021/world/covid-vaccinations-tracker.html'
#Sending request to URL
req = requests.get(url)
soup = BeautifulSoup(req.text,'html.parser')
tble = soup.find("table", class_="svelte-2wimac")
table_rows = tble.find_all('tr')
data = []
for rows in table_rows:
prepare = []
for td in rows.find_all('td'):
x = td.text
prepare.append(x)
data.append(prepare)
df_side = pd.DataFrame(data)
x = df_side.head(50)
display(x)
connect_add()
The data is loaded from external source. You can use this example how to load it:
import pandas as pd
df = pd.read_json(
"https://static01.nyt.com/newsgraphics/2021/01/19/world-vaccinations-tracker/3bf66651fd690992142ef2a7e233e8fdedcdd6c5/latest.json"
)
print(df)
Prints:
geoid location last_updated total_vaccinations population people_vaccinated people_fully_vaccinated
0 DZA Algeria 2021-02-19 75000 43053054 NaN NaN
1 MOZ Mozambique 2021-03-23 57305 30366036 57305.0 NaN
2 CPV Cape Verde 2021-03-24 2184 549935 2184.0 NaN
3 MUS Mauritius 2021-03-24 117323 1265711 117323.0 NaN
4 STP Sao Tome and Principe 2021-03-29 9724 215056 9724.0 NaN
5 ARM Armenia 2021-03-31 565 2957731 565.0 NaN
6 MMR Myanmar 2021-03-31 1040000 54045420 1000000.0 40000.0
7 SYR Syria 2021-04-08 2500 17070135 2500.0 NaN
8 HND Honduras 2021-04-09 57639 9746117 55000.0 2639.0
9 TCA Turks and Caicos Islands 2021-04-11 25039 38191 15039.0 10000.0
10 VEN Venezuela 2021-04-12 250000 28515829 250000.0 NaN
11 JAM Jamaica 2021-04-13 135473 2948279 135473.0 NaN
12 COG Congo 2021-04-14 14297 5380508 14297.0 NaN
13 FLK Falkland Islands 2021-04-14 4407 3398 2632.0 1775.0
14 TLS Timor 2021-04-14 2629 1293119 2629.0 NaN
15 NRU Nauru 2021-04-15 700 12581 700.0 NaN
16 SSD South Sudan 2021-04-15 947 11062113 947.0 NaN
17 FJI Fiji 2021-04-16 56000 889953 56000.0 NaN
18 DJI Djibouti 2021-04-17 10246 973560 10246.0 NaN
19 LSO Lesotho 2021-04-17 16000 2125268 16000.0 NaN
20 LBY Libya 2021-04-17 750 6777452 750.0 NaN
21 NER Niger 2021-04-17 1366 23310715 1366.0 NaN
22 SOM Somalia 2021-04-17 117567 15442905 117567.0 NaN
23 TGO Togo 2021-04-17 160000 8082366 160000.0 NaN
24 EGY Egypt 2021-04-18 660000 100388073 660000.0 NaN
25 MRT Mauritania 2021-04-18 7038 4525696 7038.0 NaN
26 SGP Singapore 2021-04-18 2213888 5703569 1364124.0 849764.0
27 COM Comoros 2021-04-21 13440 850886 13440.0 NaN
28 MSR Montserrat 2021-04-21 1909 5900 1293.0 616.0
29 AFG Afghanistan 2021-04-22 240000 38041754 240000.0 NaN
30 AIA Anguilla 2021-04-22 6898 14731 6115.0 783.0
31 ATG Antigua and Barbuda 2021-04-22 29754 97118 29754.0 NaN
32 MCO Monaco 2021-04-22 24390 38964 12758.0 11632.0
33 AGO Angola 2021-04-23 456349 31825295 456349.0 NaN
34 BLR Belarus 2021-04-23 328500 9466856 244000.0 84500.0
35 BRN Brunei 2021-04-23 10715 433285 10715.0 NaN
36 GAB Gabon 2021-04-23 8897 2172579 6895.0 2002.0
37 IRQ Iraq 2021-04-23 298377 39309783 298377.0 NaN
38 SDN Sudan 2021-04-23 140227 42813238 140227.0 NaN
39 GMB Gambia 2021-04-24 20922 2347706 20922.0 NaN
40 NIC Nicaragua 2021-04-24 135130 6545502 135130.0 NaN
41 COD Democratic Republic of Congo 2021-04-25 1700 86790567 1700.0 NaN
42 SWZ Eswatini 2021-04-25 34897 1148130 34897.0 NaN
43 MLI Mali 2021-04-25 49903 19658031 49903.0 NaN
44 PSE Palestine 2021-04-25 213989 4685306 170109.0 43880.0
45 PNG Papua New Guinea 2021-04-25 2900 8776109 2900.0 NaN
46 GUY Guyana 2021-04-26 126800 782766 124000.0 2800.0
47 LAO Laos 2021-04-26 184387 7169455 126072.0 58315.0
48 TON Tonga 2021-04-26 5367 104494 5367.0 NaN
49 BHS Bahamas 2021-04-27 25692 389482 25692.0 NaN
50 BIH Bosnia and Herzegovina 2021-04-27 106464 3301000 83260.0 23204.0
51 SLB Solomon Islands 2021-04-27 4890 669823 4890.0 NaN
52 UZB Uzbekistan 2021-04-27 600369 33580650 600369.0 NaN
53 GNQ Equatorial Guinea 2021-04-28 75518 1355986 64646.0 10872.0
54 KEN Kenya 2021-04-28 853081 52573973 853081.0 NaN
55 KGZ Kyrgyzstan 2021-04-28 27858 6456900 27000.0 858.0
56 CMR Cameroon 2021-04-29 11000 25876380 11000.0 NaN
57 BWA Botswana 2021-04-30 49882 2303697 49882.0 NaN
58 GHA Ghana 2021-04-30 849527 30417856 849527.0 NaN
59 VNM Vietnam 2021-04-30 509855 96462106 509855.0 NaN
60 VCT Saint Vincent and the Grenadines 2021-05-01 14526 110589 NaN NaN
61 BMU Bermuda 2021-05-02 58193 63918 32877.0 25216.0
62 NLD Netherlands 2021-05-02 5651843 17332850 4448730.0 NaN
63 PRY Paraguay 2021-05-02 143441 7044636 131013.0 12428.0
64 AND Andorra 2021-05-03 28881 77142 24182.0 4699.0
65 BOL Bolivia 2021-05-03 878563 11513100 637694.0 240869.0
66 CRI Costa Rica 2021-05-03 950252 5047561 605099.0 345153.0
67 WSM Samoa 2021-05-03 7435 197097 NaN NaN
68 SYC Seychelles 2021-05-03 127721 97625 68045.0 59676.0
69 JOR Jordan 2021-05-04 1091048 10101694 805020.0 286028.0
70 NZL New Zealand 2021-05-04 304900 4917000 217603.0 87297.0
71 KNA Saint Kitts and Nevis 2021-05-04 13070 52834 12943.0 127.0
72 ETH Ethiopia 2021-05-05 1215934 112078730 NaN NaN
73 LIE Liechtenstein 2021-05-05 13829 38019 9645.0 4184.0
74 MLT Malta 2021-05-05 359429 502653 246698.0 112731.0
75 OMN Oman 2021-05-05 326269 4974986 253000.0 73269.0
76 CHE Switzerland 2021-05-05 3001029 8574832 1997717.0 1003312.0
77 CYP Cyprus 2021-05-06 332423 1198575 252792.0 79631.0
78 SLV El Salvador 2021-05-06 1114544 6453553 958828.0 155716.0
79 GRD Grenada 2021-05-06 17000 112003 13000.0 4000.0
80 KWT Kuwait 2021-05-06 1440000 4207083 NaN NaN
81 LBN Lebanon 2021-05-06 509705 6855713 325383.0 184322.0
82 LUX Luxembourg 2021-05-06 227314 619896 165376.0 61938.0
83 NOR Norway 2021-05-06 1919369 5347896 1465851.0 453518.0
84 PAK Pakistan 2021-05-06 3320304 216565318 NaN NaN
85 PER Peru 2021-05-06 1939155 32510453 1284692.0 654463.0
86 ESP Spain 2021-05-06 19048132 47076781 13271511.0 5956451.0
87 BLZ Belize 2021-05-07 47675 390353 47675.0 NaN
88 BRA Brazil 2021-05-07 46875460 211049527 31722544.0 15152916.0
89 CYM Cayman Islands 2021-05-07 69772 64948 37470.0 32302.0
90 COL Colombia 2021-05-07 6096661 50339443 3861416.0 2235245.0
91 DMA Dominica 2021-05-07 32008 71808 18864.0 13144.0
92 ECU Ecuador 2021-05-07 1245822 17373662 981620.0 264202.0
93 DEU Germany 2021-05-07 34408840 83132799 26872478.0 7572228.0
94 GRL Greenland 2021-05-07 14278 56225 8994.0 5284.0
95 GIN Guinea 2021-05-07 173623 12771246 116436.0 57187.0
96 ISL Iceland 2021-05-07 184304 361313 138577.0 53658.0
97 IRN Iran 2021-05-07 1485287 82913906 1231652.0 253635.0
98 IRL Ireland 2021-05-07 1799190 4941444 1305178.0 494012.0
99 KAZ Kazakhstan 2021-05-07 2158924 18513930 1634939.0 523985.0
100 NAM Namibia 2021-05-07 36417 2494530 34346.0 2071.0
101 NPL Nepal 2021-05-07 2453512 28608710 2091511.0 362001.0
102 RWA Rwanda 2021-05-07 350400 12626950 350400.0 NaN
103 SMR San Marino 2021-05-07 34011 33860 21389.0 12622.0
104 SWE Sweden 2021-05-07 3679451 10285453 2852689.0 826762.0
105 UGA Uganda 2021-05-07 395805 44269594 395805.0 NaN
106 ALB Albania 2021-05-08 596766 2854191 NaN NaN
107 ABW Aruba 2021-05-08 80699 106314 55744.0 24955.0
108 BRB Barbados 2021-05-08 75476 287025 75476.0 NaN
109 BEL Belgium 2021-05-08 4591359 11484055 3527895.0 1084263.0
110 BTN Bhutan 2021-05-08 481491 763092 481491.0 NaN
111 CHL Chile 2021-05-08 15703842 18952038 8559854.0 7143988.0
112 DNK Denmark 2021-05-08 2339464 5818553 1489198.0 850266.0
113 DOM Dominican Republic 2021-05-08 2345528 10738958 1535083.0 810445.0
114 FIN Finland 2021-05-08 2154469 5520314 1943842.0 210627.0
115 FRA France 2021-05-08 25414386 67059887 17692900.0 7832913.0
116 GEO Georgia 2021-05-08 58533 3720382 58533.0 NaN
117 GIB Gibraltar 2021-05-08 74256 33701 38727.0 35529.0
118 GRC Greece 2021-05-08 3647689 10716322 2450349.0 1197340.0
119 GTM Guatemala 2021-05-08 206951 16604026 204459.0 2492.0
120 MDV Maldives 2021-05-08 431792 530953 300906.0 130886.0
121 MEX Mexico 2021-05-08 21228359 127575529 14148207.0 9440251.0
122 MDA Moldova 2021-05-08 184660 2657637 161266.0 23394.0
123 MAR Morocco 2021-05-08 9864561 36471769 5473809.0 4390752.0
124 POL Poland 2021-05-08 13670541 37970874 10185393.0 3650119.0
125 ROU Romania 2021-05-08 5891855 19356544 3580368.0 2314812.0
126 LCA Saint Lucia 2021-05-08 25200 182790 NaN NaN
127 SEN Senegal 2021-05-08 427377 16296364 427377.0 NaN
128 SLE Sierra Leone 2021-05-08 64966 7813215 58250.0 6716.0
129 SVK Slovakia 2021-05-08 1792674 5454073 1209044.0 583630.0
130 ZAF South Africa 2021-05-08 382480 58558270 382480.0 382480.0
131 SUR Suriname 2021-05-08 90338 581363 45420.0 44918.0
132 TUN Tunisia 2021-05-08 499369 11694719 350426.0 148943.0
133 UKR Ukraine 2021-05-08 863085 44385155 862639.0 446.0
134 GBR United Kingdom 2021-05-08 53041048 66834405 35371669.0 17669379.0
135 ZMB Zambia 2021-05-08 77348 17861030 77348.0 NaN
136 ARG Argentina 2021-05-09 9082597 44938712 7688877.0 1393720.0
137 AUS Australia 2021-05-09 2654338 25364307 NaN NaN
138 AUT Austria 2021-05-09 3632879 8877067 2665516.0 972493.0
139 AZE Azerbaijan 2021-05-09 1687397 10023318 1005678.0 681719.0
140 BHR Bahrain 2021-05-09 1375967 1641172 797181.0 578786.0
141 BGD Bangladesh 2021-05-09 9316086 163046161 5819900.0 3496186.0
142 BGR Bulgaria 2021-05-09 938064 6975761 646068.0 291996.0
143 KHM Cambodia 2021-05-09 2884922 16486542 1773994.0 1110928.0
144 CAN Canada 2021-05-09 15917555 37589262 14668624.0 1248931.0
145 CHN China 2021-05-09 324307000 1397715000 NaN NaN
146 CIV Cote d'Ivoire 2021-05-09 262639 25716544 262639.0 NaN
147 HRV Croatia 2021-05-09 1131607 4067500 879312.0 252295.0
148 CUW Curacao 2021-05-09 109444 157538 77141.0 32303.0
149 CZE Czechia 2021-05-09 3654376 10669709 2610990.0 1058179.0
150 EST Estonia 2021-05-09 532605 1326590 373391.0 159214.0
151 FRO Faeroe Islands 2021-05-09 23519 48678 16896.0 6623.0
152 HKG Hong Kong 2021-05-09 1741682 7451000 1071488.0 670194.0
153 HUN Hungary 2021-05-09 6809350 9769949 4305775.0 2503575.0
154 IND India 2021-05-09 168304868 1366417754 133854676.0 34450192.0
155 IDN Indonesia 2021-05-09 21993299 270625568 13349469.0 8643830.0
156 IMN Isle of Man 2021-05-09 75783 84584 59932.0 15851.0
157 ISR Israel 2021-05-09 10501225 9053300 5422082.0 5079143.0
158 ITA Italy 2021-05-09 24054000 60297396 16823066.0 7401862.0
159 JPN Japan 2021-05-09 4436325 126264931 3277886.0 1158439.0
160 LVA Latvia 2021-05-09 395512 1912789 316665.0 79647.0
161 LTU Lithuania 2021-05-09 1162170 2786844 777019.0 385151.0
162 MAC Macao 2021-05-09 118687 631636 77597.0 41241.0
163 MWI Malawi 2021-05-09 319323 18628747 319323.0 NaN
164 MYS Malaysia 2021-05-09 1766651 31949777 1089637.0 677014.0
165 MNG Mongolia 2021-05-09 2213376 3225167 1590636.0 622740.0
166 MNE Montenegro 2021-05-09 109507 622137 78760.0 30747.0
167 NGA Nigeria 2021-05-09 1665698 200963599 1665698.0 NaN
168 MKD North Macedonia 2021-05-09 107978 2083459 107978.0 NaN
169 PAN Panama 2021-05-09 780569 4246439 524958.0 255610.0
170 PHL Philippines 2021-05-09 2408781 108116615 1957511.0 451270.0
171 PRT Portugal 2021-05-09 3963372 10269417 2858389.0 1104961.0
172 QAT Qatar 2021-05-09 1813240 2832067 1115842.0 697398.0
173 RUS Russia 2021-05-09 21754829 144373535 13129704.0 8625125.0
174 SAU Saudi Arabia 2021-05-09 10584301 34268528 NaN NaN
175 SRB Serbia 2021-05-09 3798942 6944975 2149705.0 1649237.0
176 SVN Slovenia 2021-05-09 737817 2087946 484949.0 252868.0
177 KOR South Korea 2021-05-09 4181003 51709098 3674729.0 506274.0
178 LKA Sri Lanka 2021-05-09 1125740 21803000 928400.0 197340.0
179 TWN Taiwan 2021-05-09 92049 23780452 NaN NaN
180 THA Thailand 2021-05-09 1809894 69625582 1296440.0 513454.0
181 TTO Trinidad and Tobago 2021-05-09 61120 1394973 60174.0 946.0
182 TUR Turkey 2021-05-09 24918773 83429615 14585980.0 10332793.0
183 ARE United Arab Emirates 2021-05-09 11145934 9770529 NaN NaN
184 URY Uruguay 2021-05-09 2005442 3461734 1228151.0 777291.0
185 ZWE Zimbabwe 2021-05-09 684243 14645468 526066.0 158177.0
186 USA United States 2021-05-09 259716989 331811257 152116936.0 114258244.0
187 OWID_WRL World NaN 1297259952 7673533970 641081197.0 309613453.0
Related
I am new to Jupyter notebook. I came across it to execute great expectation test cases.
I dowonloaded jupyter lab(3.4.4) and installed in my mac just before sometime.
I am trying to execute python code of great expectations on a sample csv file i got from internet. I checked the file manually it is in a good shape.
import great_expectations as ge
my_df = ge.read_csv("/Users/someuser/Desktop//100SalesRecords.csv")
print(my_df)
when i am trying to print it is showing me the result in lines wraped as below
Region Country \
0 Australia and Oceania Tuvalu
1 Central America and the Caribbean Grenada
2 Europe Russia
3 Sub-Saharan Africa Sao Tome and Principe
4 Sub-Saharan Africa Rwanda
5 Australia and Oceania Solomon Islands
6 Sub-Saharan Africa Angola
7 Sub-Saharan Africa Burkina Faso
8 Sub-Saharan Africa Republic of the Congo
9 Sub-Saharan Africa Senegal
10 Asia Kyrgyzstan
11 Sub-Saharan Africa Cape Verde
12 Asia Bangladesh
13 Central America and the Caribbean Honduras
14 Asia Mongolia
15 Europe Bulgaria
16 Asia Sri Lanka
17 Sub-Saharan Africa Cameroon
18 Asia Turkmenistan
19 Australia and Oceania East Timor
20 Europe Norway
21 Europe Portugal
22 Central America and the Caribbean Honduras
23 Australia and Oceania New Zealand
24 Europe Moldova
25 Europe France
26 Australia and Oceania Kiribati
27 Sub-Saharan Africa Mali
28 Europe Norway
29 Sub-Saharan Africa The Gambia
30 Europe Switzerland
31 Sub-Saharan Africa South Sudan
32 Australia and Oceania Australia
33 Asia Myanmar
34 Sub-Saharan Africa Djibouti
35 Central America and the Caribbean Costa Rica
36 Middle East and North Africa Syria
37 Sub-Saharan Africa The Gambia
38 Asia Brunei
39 Europe Bulgaria
40 Sub-Saharan Africa Niger
41 Middle East and North Africa Azerbaijan
42 Sub-Saharan Africa The Gambia
43 Europe Slovakia
44 Asia Myanmar
45 Sub-Saharan Africa Comoros
46 Europe Iceland
47 Europe Switzerland
48 Europe Macedonia
49 Sub-Saharan Africa Mauritania
50 Europe Albania
51 Sub-Saharan Africa Lesotho
52 Middle East and North Africa Saudi Arabia
53 Sub-Saharan Africa Sierra Leone
54 Sub-Saharan Africa Sao Tome and Principe
55 Sub-Saharan Africa Cote d'Ivoire
56 Australia and Oceania Fiji
57 Europe Austria
58 Europe United Kingdom
59 Sub-Saharan Africa Djibouti
60 Australia and Oceania Australia
61 Europe San Marino
62 Sub-Saharan Africa Cameroon
63 Middle East and North Africa Libya
64 Central America and the Caribbean Haiti
65 Sub-Saharan Africa Rwanda
66 Sub-Saharan Africa Gabon
67 Central America and the Caribbean Belize
68 Europe Lithuania
69 Sub-Saharan Africa Madagascar
70 Asia Turkmenistan
71 Middle East and North Africa Libya
72 Sub-Saharan Africa Democratic Republic of the Congo
73 Sub-Saharan Africa Djibouti
74 Middle East and North Africa Pakistan
75 North America Mexico
76 Australia and Oceania Federated States of Micronesia
77 Asia Laos
78 Europe Monaco
79 Australia and Oceania Samoa
80 Europe Spain
81 Middle East and North Africa Lebanon
82 Middle East and North Africa Iran
83 Sub-Saharan Africa Zambia
84 Sub-Saharan Africa Kenya
85 North America Mexico
86 Sub-Saharan Africa Sao Tome and Principe
87 Sub-Saharan Africa The Gambia
88 Middle East and North Africa Kuwait
89 Europe Slovenia
90 Sub-Saharan Africa Sierra Leone
91 Australia and Oceania Australia
92 Middle East and North Africa Azerbaijan
93 Europe Romania
94 Central America and the Caribbean Nicaragua
95 Sub-Saharan Africa Mali
96 Asia Malaysia
97 Sub-Saharan Africa Sierra Leone
98 North America Mexico
99 Sub-Saharan Africa Mozambique
ItemType SalesChannel OrderPriority OrderDate OrderID \
0 Baby Food Offline H 5/28/2010 669165933
1 Cereal Online C 8/22/2012 963881480
2 Office Supplies Offline L 5/2/2014 341417157
3 Fruits Online C 6/20/2014 514321792
4 Office Supplies Offline L 2/1/2013 115456712
5 Baby Food Online C 2/4/2015 547995746
6 Household Offline M 4/23/2011 135425221
7 Vegetables Online H 7/17/2012 871543967
8 Personal Care Offline M 7/14/2015 770463311
9 Cereal Online H 4/18/2014 616607081
10 Vegetables Online H 6/24/2011 814711606
11 Clothes Offline H 8/2/2014 939825713
12 Clothes Online L 1/13/2017 187310731
13 Household Offline H 2/8/2017 522840487
14 Personal Care Offline C 2/19/2014 832401311
15 Clothes Online M 4/23/2012 972292029
16 Cosmetics Offline M 11/19/2016 419123971
17 Beverages Offline C 4/1/2015 519820964
18 Household Offline L 12/30/2010 441619336
19 Meat Online L 7/31/2012 322067916
20 Baby Food Online L 5/14/2014 819028031
21 Baby Food Online H 7/31/2015 860673511
22 Snacks Online L 6/30/2016 795490682
23 Fruits Online H 9/8/2014 142278373
24 Personal Care Online L 5/7/2016 740147912
25 Cosmetics Online H 5/22/2017 898523128
26 Fruits Online M 10/13/2014 347140347
27 Fruits Online L 5/7/2010 686048400
28 Beverages Offline C 7/18/2014 435608613
29 Household Offline L 5/26/2012 886494815
30 Cosmetics Offline M 9/17/2012 249693334
31 Personal Care Offline C 12/29/2013 406502997
32 Office Supplies Online C 10/27/2015 158535134
33 Household Offline H 1/16/2015 177713572
34 Snacks Online M 2/25/2017 756274640
35 Personal Care Offline L 5/8/2017 456767165
36 Fruits Online L 11/22/2011 162052476
37 Meat Online M 1/14/2017 825304400
38 Office Supplies Online L 4/1/2012 320009267
39 Office Supplies Online M 2/16/2012 189965903
40 Personal Care Online H 3/11/2017 699285638
41 Cosmetics Online M 2/6/2010 382392299
42 Cereal Offline H 6/7/2012 994022214
43 Vegetables Online H 10/6/2012 759224212
44 Clothes Online H 11/14/2015 223359620
45 Cereal Offline H 3/29/2016 902102267
46 Cosmetics Online C 12/31/2016 331438481
47 Personal Care Online M 12/23/2010 617667090
48 Clothes Offline C 10/14/2014 787399423
49 Office Supplies Offline C 1/11/2012 837559306
50 Clothes Online C 2/2/2010 385383069
51 Fruits Online L 8/18/2013 918419539
52 Cereal Online M 3/25/2013 844530045
53 Office Supplies Offline M 11/26/2011 441888415
54 Fruits Offline H 9/17/2013 508980977
55 Clothes Online C 6/8/2012 114606559
56 Clothes Offline C 6/30/2010 647876489
57 Cosmetics Offline H 2/23/2015 868214595
58 Household Online L 1/5/2012 955357205
59 Cosmetics Offline H 4/7/2014 259353148
60 Cereal Offline H 6/9/2013 450563752
61 Baby Food Online L 6/26/2013 569662845
62 Office Supplies Online M 11/7/2011 177636754
63 Clothes Offline H 10/30/2010 705784308
64 Cosmetics Offline H 10/13/2013 505716836
65 Cosmetics Offline H 10/11/2013 699358165
66 Personal Care Offline L 7/8/2012 228944623
67 Clothes Offline M 7/25/2016 807025039
68 Office Supplies Offline H 10/24/2010 166460740
69 Clothes Offline L 4/25/2015 610425555
70 Office Supplies Online M 4/23/2013 462405812
71 Fruits Online L 8/14/2015 816200339
72 Beverages Online C 5/26/2011 585920464
73 Cereal Online H 5/20/2017 555990016
74 Cosmetics Offline L 7/5/2013 231145322
75 Household Offline C 11/6/2014 986435210
76 Beverages Online C 10/28/2014 217221009
77 Vegetables Offline C 9/15/2011 789176547
78 Baby Food Offline H 5/29/2012 688288152
79 Cosmetics Online H 7/20/2013 670854651
80 Household Offline L 10/21/2012 213487374
81 Clothes Online L 9/18/2012 663110148
82 Cosmetics Online H 11/15/2016 286959302
83 Snacks Online L 1/4/2011 122583663
84 Vegetables Online L 3/18/2012 827844560
85 Personal Care Offline L 2/17/2012 430915820
86 Beverages Offline C 1/16/2011 180283772
87 Baby Food Offline M 2/3/2014 494747245
88 Fruits Online M 4/30/2012 513417565
89 Beverages Offline C 10/23/2016 345718562
90 Office Supplies Offline H 12/6/2016 621386563
91 Beverages Offline H 7/7/2014 240470397
92 Office Supplies Online M 6/13/2012 423331391
93 Cosmetics Online H 11/26/2010 660643374
94 Beverages Offline C 2/8/2011 963392674
95 Clothes Online M 7/26/2011 512878119
96 Fruits Offline L 11/11/2011 810711038
97 Vegetables Offline C 6/1/2016 728815257
98 Personal Care Offline M 7/30/2015 559427106
99 Household Offline L 2/10/2012 665095412
ShipDate UnitsSold UnitPrice UnitCost TotalRevenue TotalCost \
0 6/27/2010 9925 255.28 159.42 2533654.00 1582243.50
1 9/15/2012 2804 205.70 117.11 576782.80 328376.44
2 5/8/2014 1779 651.21 524.96 1158502.59 933903.84
3 7/5/2014 8102 9.33 6.92 75591.66 56065.84
4 2/6/2013 5062 651.21 524.96 3296425.02 2657347.52
5 2/21/2015 2974 255.28 159.42 759202.72 474115.08
6 4/27/2011 4187 668.27 502.54 2798046.49 2104134.98
7 7/27/2012 8082 154.06 90.93 1245112.92 734896.26
8 8/25/2015 6070 81.73 56.67 496101.10 343986.90
9 5/30/2014 6593 205.70 117.11 1356180.10 772106.23
10 7/12/2011 124 154.06 90.93 19103.44 11275.32
11 8/19/2014 4168 109.28 35.84 455479.04 149381.12
12 3/1/2017 8263 109.28 35.84 902980.64 296145.92
13 2/13/2017 8974 668.27 502.54 5997054.98 4509793.96
14 2/23/2014 4901 81.73 56.67 400558.73 277739.67
15 6/3/2012 1673 109.28 35.84 182825.44 59960.32
16 12/18/2016 6952 437.20 263.33 3039414.40 1830670.16
17 4/18/2015 5430 47.45 31.79 257653.50 172619.70
18 1/20/2011 3830 668.27 502.54 2559474.10 1924728.20
19 9/11/2012 5908 421.89 364.69 2492526.12 2154588.52
20 6/28/2014 7450 255.28 159.42 1901836.00 1187679.00
21 9/3/2015 1273 255.28 159.42 324971.44 202941.66
22 7/26/2016 2225 152.58 97.44 339490.50 216804.00
23 10/4/2014 2187 9.33 6.92 20404.71 15134.04
24 5/10/2016 5070 81.73 56.67 414371.10 287316.90
25 6/5/2017 1815 437.20 263.33 793518.00 477943.95
26 11/10/2014 5398 9.33 6.92 50363.34 37354.16
27 5/10/2010 5822 9.33 6.92 54319.26 40288.24
28 7/30/2014 5124 47.45 31.79 243133.80 162891.96
29 6/9/2012 2370 668.27 502.54 1583799.90 1191019.80
30 10/20/2012 8661 437.20 263.33 3786589.20 2280701.13
31 1/28/2014 2125 81.73 56.67 173676.25 120423.75
32 11/25/2015 2924 651.21 524.96 1904138.04 1534983.04
33 3/1/2015 8250 668.27 502.54 5513227.50 4145955.00
34 2/25/2017 7327 152.58 97.44 1117953.66 713942.88
35 5/21/2017 6409 81.73 56.67 523807.57 363198.03
36 12/3/2011 3784 9.33 6.92 35304.72 26185.28
37 1/23/2017 4767 421.89 364.69 2011149.63 1738477.23
38 5/8/2012 6708 651.21 524.96 4368316.68 3521431.68
39 2/28/2012 3987 651.21 524.96 2596374.27 2093015.52
40 3/28/2017 3015 81.73 56.67 246415.95 170860.05
41 2/25/2010 7234 437.20 263.33 3162704.80 1904929.22
42 6/8/2012 2117 205.70 117.11 435466.90 247921.87
43 11/10/2012 171 154.06 90.93 26344.26 15549.03
44 11/18/2015 5930 109.28 35.84 648030.40 212531.20
45 4/29/2016 962 205.70 117.11 197883.40 112659.82
46 12/31/2016 8867 437.20 263.33 3876652.40 2334947.11
47 1/31/2011 273 81.73 56.67 22312.29 15470.91
48 11/14/2014 7842 109.28 35.84 856973.76 281057.28
49 1/13/2012 1266 651.21 524.96 824431.86 664599.36
50 3/18/2010 2269 109.28 35.84 247956.32 81320.96
51 9/18/2013 9606 9.33 6.92 89623.98 66473.52
52 3/28/2013 4063 205.70 117.11 835759.10 475817.93
53 1/7/2012 3457 651.21 524.96 2251232.97 1814786.72
54 10/24/2013 7637 9.33 6.92 71253.21 52848.04
55 6/27/2012 3482 109.28 35.84 380512.96 124794.88
56 8/1/2010 9905 109.28 35.84 1082418.40 354995.20
57 3/2/2015 2847 437.20 263.33 1244708.40 749700.51
58 2/14/2012 282 668.27 502.54 188452.14 141716.28
59 4/19/2014 7215 437.20 263.33 3154398.00 1899925.95
60 7/2/2013 682 205.70 117.11 140287.40 79869.02
61 7/1/2013 4750 255.28 159.42 1212580.00 757245.00
62 11/15/2011 5518 651.21 524.96 3593376.78 2896729.28
63 11/17/2010 6116 109.28 35.84 668356.48 219197.44
64 11/16/2013 1705 437.20 263.33 745426.00 448977.65
65 11/25/2013 4477 437.20 263.33 1957344.40 1178928.41
66 7/9/2012 8656 81.73 56.67 707454.88 490535.52
67 9/7/2016 5498 109.28 35.84 600821.44 197048.32
68 11/17/2010 8287 651.21 524.96 5396577.27 4350343.52
69 5/28/2015 7342 109.28 35.84 802333.76 263137.28
70 5/20/2013 5010 651.21 524.96 3262562.10 2630049.60
71 9/30/2015 673 9.33 6.92 6279.09 4657.16
72 7/15/2011 5741 47.45 31.79 272410.45 182506.39
73 6/17/2017 8656 205.70 117.11 1780539.20 1013704.16
74 8/16/2013 9892 437.20 263.33 4324782.40 2604860.36
75 12/12/2014 6954 668.27 502.54 4647149.58 3494663.16
76 11/15/2014 9379 47.45 31.79 445033.55 298158.41
77 10/23/2011 3732 154.06 90.93 574951.92 339350.76
78 6/2/2012 8614 255.28 159.42 2198981.92 1373243.88
79 8/7/2013 9654 437.20 263.33 4220728.80 2542187.82
80 11/30/2012 4513 668.27 502.54 3015902.51 2267963.02
81 10/8/2012 7884 109.28 35.84 861563.52 282562.56
82 12/8/2016 6489 437.20 263.33 2836990.80 1708748.37
83 1/5/2011 4085 152.58 97.44 623289.30 398042.40
84 4/7/2012 6457 154.06 90.93 994765.42 587135.01
85 3/20/2012 6422 81.73 56.67 524870.06 363934.74
86 1/21/2011 8829 47.45 31.79 418936.05 280673.91
87 3/20/2014 5559 255.28 159.42 1419101.52 886215.78
88 5/18/2012 522 9.33 6.92 4870.26 3612.24
89 11/25/2016 4660 47.45 31.79 221117.00 148141.40
90 12/14/2016 948 651.21 524.96 617347.08 497662.08
91 7/11/2014 9389 47.45 31.79 445508.05 298476.31
92 7/24/2012 2021 651.21 524.96 1316095.41 1060944.16
93 12/25/2010 7910 437.20 263.33 3458252.00 2082940.30
94 3/21/2011 8156 47.45 31.79 387002.20 259279.24
95 9/3/2011 888 109.28 35.84 97040.64 31825.92
96 12/28/2011 6267 9.33 6.92 58471.11 43367.64
97 6/29/2016 1485 154.06 90.93 228779.10 135031.05
98 8/8/2015 5767 81.73 56.67 471336.91 326815.89
99 2/15/2012 5367 668.27 502.54 3586605.09 2697132.18
TotalProfit
0 951410.50
1 248406.36
2 224598.75
3 19525.82
4 639077.50
5 285087.64
6 693911.51
7 510216.66
8 152114.20
9 584073.87
10 7828.12
11 306097.92
12 606834.72
13 1487261.02
14 122819.06
15 122865.12
16 1208744.24
17 85033.80
18 634745.90
19 337937.60
20 714157.00
21 122029.78
22 122686.50
23 5270.67
24 127054.20
25 315574.05
26 13009.18
27 14031.02
28 80241.84
29 392780.10
30 1505888.07
31 53252.50
32 369155.00
33 1367272.50
34 404010.78
35 160609.54
36 9119.44
37 272672.40
38 846885.00
39 503358.75
40 75555.90
41 1257775.58
42 187545.03
43 10795.23
44 435499.20
45 85223.58
46 1541705.29
47 6841.38
48 575916.48
49 159832.50
50 166635.36
51 23150.46
52 359941.17
53 436446.25
54 18405.17
55 255718.08
56 727423.20
57 495007.89
58 46735.86
59 1254472.05
60 60418.38
61 455335.00
62 696647.50
63 449159.04
64 296448.35
65 778415.99
66 216919.36
67 403773.12
68 1046233.75
69 539196.48
70 632512.50
71 1621.93
72 89904.06
73 766835.04
74 1719922.04
75 1152486.42
76 146875.14
77 235601.16
78 825738.04
79 1678540.98
80 747939.49
81 579000.96
82 1128242.43
83 225246.90
84 407630.41
85 160935.32
86 138262.14
87 532885.74
88 1258.02
89 72975.60
90 119685.00
91 147031.74
92 255151.25
93 1375311.70
94 127722.96
95 65214.72
96 15103.47
97 93748.05
98 144521.02
99 889472.91
you can see it is showing two columns only.
Can someone please help me to find out how can I see all the columns ? I mean all the columns together without linebreak(with scrolling)
I searched in internet but i found only to enable vertical scrolling. Can't it get horizontal scrolling ?
Thanks in advance.
My data looks something like this:
Report Date
Location
Data
8/6/2021
St. Louis
100
8/1/2021
St. Louis
89
7/29/2021
St. Louis
85
7/24/2021
St. Louis
80
7/30/2021
Louisville
92
7/25/2021
Louisville
79
But when I plot the data in plotly using the built-in animation_groups and animation_frames the slider bar jumps from row to row by nature, which doesn't lead to an intuitive animation when each 'jump' is not the same amount of days.
What I'm trying to work-around and do is create a new table, which duplicates rows and keeps the true report data, but creates an additional 'animation date' to keep the slider bar transition intuitive. I'd like the new data table to look something like the below. Assume the date the code was ran was 8/6/2021.
Report Date
Animation Date
Location
Data
Days Since Most Recent Report
8/6/2021
8/6/2021
St. Louis
100
0
8/1/2021
8/5/2021
St. Louis
89
4
8/1/2021
8/4/2021
St. Louis
89
3
8/1/2021
8/3/2021
St. Louis
89
2
8/1/2021
8/2/2021
St. Louis
89
1
8/1/2021
8/1/2021
St. Louis
89
0
7/29/2021
7/30/2021
St. Louis
85
1
7/29/2021
7/29/2021
St. Louis
85
0
7/24/2021
7/28/2021
St. Louis
80
4
7/24/2021
7/27/2021
St. Louis
80
3
7/24/2021
7/26/2021
St. Louis
80
2
7/24/2021
7/25/2021
St. Louis
80
1
7/24/2021
7/24/2021
St. Louis
80
0
7/30/2021
8/6/2021
Louisville
92
7
7/30/2021
8/5/2021
Louisville
92
6
7/30/2021
8/4/2021
Louisville
92
5
7/30/2021
8/3/2021
Louisville
92
4
7/30/2021
8/2/2021
Louisville
92
3
7/30/2021
8/1/2021
Louisville
92
2
7/30/2021
7/31/2021
Louisville
92
1
7/30/2021
7/30/2021
Louisville
92
0
7/25/2021
7/29/2021
Louisville
79
4
7/25/2021
7/28/2021
Louisville
79
3
7/25/2021
7/27/2021
Louisville
79
2
7/25/2021
7/26/2021
Louisville
79
1
7/25/2021
7/25/2021
Louisville
79
0
By doing this, the animation could display 'Days Since Most Recent Report' or 'Report Date' to show that as the animation plays, some data displayed might have some antiquity to it, but the animation traverses through time appropriately and there is data displayed throughout the animation. Each time the 'Animation Date' matches up with a 'Report Date' a new bit of data will be displayed for each 'Animation Date' until a new 'Report Date' is hit and the cycle repeats itself til the animation is brought up to the present day.
If there is any easier way to work around this in plotly, please let me know! Otherwise, I'm having trouble getting off the ground with the logic creating a new DataFrame while iterating through the old DataFrame.
IIUC you can reindex through pd.MultiIndex.from_tuples:
df["Animation Date"] = pd.to_datetime(df["Report Date"])
max_date = df["Report Date"].max()
idx = pd.MultiIndex.from_tuples([[x, d] for x, y in df.groupby("Location")["Animation Date"]
for d in pd.date_range(min(y), max_date)],
names=["Location", "Animation Date"])
s = df.set_index(["Location", "Animation Date"]).reindex(idx).reset_index()
s["Days Since"] = s.groupby(["Location", s.Data.notnull().cumsum()]).cumcount()
print (s.ffill())
Location Animation Date Report Date Data Days Since
0 Louisville 2021-07-25 7/25/2021 79.0 0
1 Louisville 2021-07-26 7/25/2021 79.0 1
2 Louisville 2021-07-27 7/25/2021 79.0 2
3 Louisville 2021-07-28 7/25/2021 79.0 3
4 Louisville 2021-07-29 7/25/2021 79.0 4
5 Louisville 2021-07-30 7/30/2021 92.0 0
6 Louisville 2021-07-31 7/30/2021 92.0 1
7 Louisville 2021-08-01 7/30/2021 92.0 2
8 Louisville 2021-08-02 7/30/2021 92.0 3
9 Louisville 2021-08-03 7/30/2021 92.0 4
10 Louisville 2021-08-04 7/30/2021 92.0 5
11 Louisville 2021-08-05 7/30/2021 92.0 6
12 Louisville 2021-08-06 7/30/2021 92.0 7
13 St. Louis 2021-07-24 7/24/2021 80.0 0
14 St. Louis 2021-07-25 7/24/2021 80.0 1
15 St. Louis 2021-07-26 7/24/2021 80.0 2
16 St. Louis 2021-07-27 7/24/2021 80.0 3
17 St. Louis 2021-07-28 7/24/2021 80.0 4
18 St. Louis 2021-07-29 7/29/2021 85.0 0
19 St. Louis 2021-07-30 7/29/2021 85.0 1
20 St. Louis 2021-07-31 7/29/2021 85.0 2
21 St. Louis 2021-08-01 8/1/2021 89.0 0
22 St. Louis 2021-08-02 8/1/2021 89.0 1
23 St. Louis 2021-08-03 8/1/2021 89.0 2
24 St. Louis 2021-08-04 8/1/2021 89.0 3
25 St. Louis 2021-08-05 8/1/2021 89.0 4
26 St. Louis 2021-08-06 8/6/2021 100.0 0
This question already has answers here:
How to iterate over rows in a DataFrame in Pandas
(32 answers)
Closed 2 years ago.
I have a loop cycling through the length of a data frame and going through a list of teams. My loop should go through 41 rows but it only does 2 rows and then stops, I have no idea why it is stalling out. It seems to me I should be cycling through the entire 41 team list but it stops after indexing two teams.
import pandas as pd
excel_data_df = pd.read_excel('New_Schedule.xlsx', sheet_name='Sheet1', engine='openpyxl')
print(excel_data_df)
print('Data Frame Above')
yahoot = len(excel_data_df)
print('Length Of Dataframe Below')
print(yahoot)
for games in excel_data_df:
yahoot -= 1
print(yahoot)
searching = excel_data_df.iloc[yahoot, 0]
print(searching)
excel_data_df2 = pd.read_excel('allstats.xlsx', sheet_name='Sheet1', engine='openpyxl')
print(excel_data_df2)
finding = excel_data_df2[excel_data_df2['TEAM:'] == searching].index
print(finding)
Here is the run log
HOME TEAM: AWAY TEAM:
0 Portland St. Weber St.
1 Nevada Air Force
2 Utah Idaho
3 San Jose St. Santa Clara
4 Southern Utah SAGU American Indian
5 West Virginia Iowa St.
6 Missouri Prairie View
7 Southeast Mo. St. UT Martin
8 Little Rock Champion Chris.
9 Tennessee St. Belmont
10 Wichita St. Emporia St.
11 Tennessee Tennessee Tech
12 FGCU Webber Int'l
13 Jacksonville St. Ga. Southwestern
14 Northern Ill. Chicago St.
15 Col. of Charleston Western Caro.
16 Georgia Tech Florida A&M
17 Rider Iona
18 Tulsa Northwestern St.
19 Rhode Island Davidson
20 Washington St. Montana St.
21 Montana Dickinson St.
22 Robert Morris Bowling Green
23 South Dakota Drake
24 Richmond Loyola Chicago
25 Coastal Carolina Alice Lloyd
26 Presbyterian South Carolina St.
27 Morehead St. SIUE
28 San Diego St. BYU
29 Siena Canisius
30 Monmouth Saint Peter's
31 Howard Hampton
32 App State Columbia Int'l
33 Southern Ill. North Dakota
34 Norfolk St. UNCW
35 Niagara Fairfield
36 N.C. A&T Greensboro
37 Western Mich. Central Mich.
38 DePaul Xavier
39 Georgia St. Carver
40 Northern Ariz. Eastern Wash.
41 Gardner-Webb VMI
Data Frame Above
Length Of Dataframe Below
42
41
Gardner-Webb
TEAM: TOTAL POINTS: ... TURNOVER RATIO: ASSIST TO TURNOVER RANK
0 Mount St. Marys 307 ... 65 239.0
1 Saint Josephs 163 ... 28 81.0
2 Saint Marys (CA) 518 ... 78 114.0
3 Saint Peters 399 ... 86 145.0
4 St. John's (NY) 656 ... 115 73.0
.. ... ... ... ... ...
314 Wofford 327 ... 54 113.0
315 Wright St. 220 ... 47 206.0
316 Wyoming 517 ... 64 27.0
317 Xavier 582 ... 84 12.0
318 Youngstown St. 231 ... 30 79.0
[319 rows x 18 columns]
Int64Index([85], dtype='int64')
40
Northern Ariz.
TEAM: TOTAL POINTS: ... TURNOVER RATIO: ASSIST TO TURNOVER RANK
0 Mount St. Marys 307 ... 65 239.0
1 Saint Josephs 163 ... 28 81.0
2 Saint Marys (CA) 518 ... 78 114.0
3 Saint Peters 399 ... 86 145.0
4 St. John's (NY) 656 ... 115 73.0
.. ... ... ... ... ...
314 Wofford 327 ... 54 113.0
315 Wright St. 220 ... 47 206.0
316 Wyoming 517 ... 64 27.0
317 Xavier 582 ... 84 12.0
318 Youngstown St. 231 ... 30 79.0
[319 rows x 18 columns]
Int64Index([180], dtype='int64')
Use:for i in index,data in excel_data_df.iterrrows() instead.
pandas.DataFrame.iterrows
DataFrame.iterrows()
Iterate over DataFrame rows as (index, Series) pairs.
Suppose I have the following pandas dataframe:
Date Region Country Cases Deaths Lat Long
2020-03-08 Northern Territory Australia 27 49 -12.4634 130.8456
2020-03-09 Northern Territory Australia 80 85 -12.4634 130.8456
2020-03-12 Northern Territory Australia 35 73 -12.4634 130.8456
2020-03-08 Western Australia Australia 48 20 -31.9505 115.8605
2020-03-09 Western Australia Australia 70 12 -31.9505 115.8605
2020-03-10 Western Australia Australia 66 95 -31.9505 115.8605
2020-03-11 Western Australia Australia 31 38 -31.9505 115.8605
2020-03-12 Western Australia Australia 40 83 -31.9505 115.8605
I need to update the dataframe with the missing dates on the Northern Terriroty, 2020-3-10 and 2020-3-11. However, I want to use all the information except for cases and deaths. Like this:
Date Region Country Cases Deaths Lat Long
2020-03-08 Northern Territory Australia 27 49 -12.4634 130.8456
2020-03-09 Northern Territory Australia 80 85 -12.4634 130.8456
2020-03-10 Northern Territory Australia 0 0 -12.4634 130.8456
2020-03-11 Northern Territory Australia 0 0 -12.4634 130.8456
2020-03-12 Northern Territory Australia 35 73 -12.4634 130.8456
2020-03-08 Western Australia Australia 48 20 -31.9505 115.8605
2020-03-09 Western Australia Australia 70 12 -31.9505 115.8605
2020-03-10 Western Australia Australia 66 95 -31.9505 115.8605
2020-03-11 Western Australia Australia 31 38 -31.9505 115.8605
2020-03-12 Western Australia Australia 40 83 -31.9505 115.8605
The only way I can think of doing this is to iterate through all combinations of dates and countries.
EDIT
Efran seems to be on the right track but I can't get it to work. Here is the actual data I'm working with instead of a toy example.
import pandas as pd
unique_group = ['province','country','county']
csbs_df = pd.read_csv(
'https://jordansdatabucket.s3-us-west-2.amazonaws.com/covid19data/csbs_df.csv.gz', index_col=0)
csbs_df['Date'] = pd.to_datetime(csbs_df['Date'], infer_datetime_format=True)
new_df = (
csbs_df.set_index('Date')
.groupby(unique_group)
.resample('D').first()
.fillna(dict.fromkeys(['confirmed', 'deaths'], 0))
.ffill()
.reset_index(level=3)
.reset_index(drop=True))
new_df.head()
Date id lat lon Timestamp province country_code country county confirmed deaths source Date_text
0 2020-03-25 1094.0 32.534893 -86.642709 2020-03-25 00:00:00+00:00 Alabama US US Autauga 1.0 0.0 CSBS 03/25/20
1 2020-03-26 901.0 32.534893 -86.642709 2020-03-26 00:00:00+00:00 Alabama US US Autauga 4.0 0.0 CSBS 03/26/20
2 2020-03-24 991.0 30.735891 -87.723525 2020-03-24 00:00:00+00:00 Alabama US US Baldwin 3.0 0.0 CSBS 03/24/20
3 2020-03-25 1080.0 30.735891 -87.723525 2020-03-25 00:00:00+00:00 Alabama US US Baldwin 4.0 0.0 CSBS 03/25/20
4 2020-03-26 1139.0 30.735891 -87.723525 2020-03-26 16:52:00+00:00 Alabama US US Baldwin 4.0 0.0 CSBS 03/26/20
You can see that it is not inserting the day resample as its specified. I'm not sure whats wrong.
Edit 2
Here is my solution based on Erfan.
import pandas as pd
csbs_df = pd.read_csv(
'https://jordansdatabucket.s3-us-west-2.amazonaws.com/covid19data/csbs_df.csv.gz', index_col=0)
date_range = pd.date_range(csbs_df['Date'].min(),csbs_df['Date'].max(),freq='1D')
unique_group = ['country','province','county']
gb = csbs_df.groupby(unique_group)
sub_dfs =[]
for g in gb.groups:
sub_df = gb.get_group(g)
sub_df = (
sub_df.set_index('Date')
.reindex(date_range)
.fillna(dict.fromkeys(['confirmed', 'deaths'], 0))
.bfill()
.ffill()
.reset_index()
.rename({'index':'Date'},axis=1)
.drop({'id':1},axis=1))
sub_df['Date_text'] = sub_df['Date'].dt.strftime('%m/%d/%y')
sub_df['Timestamp'] = pd.to_datetime(sub_df['Date'],utc=True)
sub_dfs.append(sub_df)
all_concat = pd.concat(sub_dfs)
assert((all_concat.groupby(['province','country','county']).count() == 3).all().all())
Using GroupBy.resample, ffill and fillna:
The idea here is that we want to "fill" the missing gaps of dates for each group of Region and Country. This is called resampling of timeseries.
So that's why we use GroupBy.resample instead of DataFrame.resample here. Further more fillna and ffill is needed to fill the data accordingly to your logic.
df['Date'] = pd.to_datetime(df['Date'], infer_datetime_format=True)
dfn = (
df.set_index('Date')
.groupby(['Region', 'Country'])
.resample('D').first()
.fillna(dict.fromkeys(['Cases', 'Deaths'], 0))
.ffill()
.reset_index(level=2)
.reset_index(drop=True)
)
Date Region Country Cases Deaths Lat Long
0 2020-03-08 Northern Territory Australia 27.0 49.0 -12.4634 130.8456
1 2020-03-09 Northern Territory Australia 80.0 85.0 -12.4634 130.8456
2 2020-03-10 Northern Territory Australia 0.0 0.0 -12.4634 130.8456
3 2020-03-11 Northern Territory Australia 0.0 0.0 -12.4634 130.8456
4 2020-03-12 Northern Territory Australia 35.0 73.0 -12.4634 130.8456
5 2020-03-08 Western Australia Australia 48.0 20.0 -31.9505 115.8605
6 2020-03-09 Western Australia Australia 70.0 12.0 -31.9505 115.8605
7 2020-03-10 Western Australia Australia 66.0 95.0 -31.9505 115.8605
8 2020-03-11 Western Australia Australia 31.0 38.0 -31.9505 115.8605
9 2020-03-12 Western Australia Australia 40.0 83.0 -31.9505 115.8605
Edit:
Seems indeed that not all places have same start and end date, so we have to take that into account, the following works:
csbs_df = pd.read_csv(
'https://jordansdatabucket.s3-us-west-2.amazonaws.com/covid19data/csbs_df.csv.gz'
).iloc[:, 1:]
csbs_df['Date_text'] = pd.to_datetime(csbs_df['Date_text'])
date_range = pd.date_range(csbs_df['Date_text'].min(), csbs_df['Date_text'].max(), freq='D')
def reindex_dates(data, dates):
data = data.reindex(dates).fillna(dict.fromkeys(['Cases', 'Deaths'], 0)).ffill().bfill()
return data
dfn = (
csbs_df.set_index('Date_text')
.groupby('id').apply(lambda x: reindex_dates(x, date_range))
.reset_index(level=0, drop=True)
.reset_index()
.rename(columns={'index': 'Date'})
)
print(dfn.head())
Date id lat lon Timestamp \
0 2020-03-24 0.0 40.714550 -74.007140 2020-03-24 00:00:00+00:00
1 2020-03-25 0.0 40.714550 -74.007140 2020-03-25 00:00:00+00:00
2 2020-03-26 0.0 40.714550 -74.007140 2020-03-26 00:00:00+00:00
3 2020-03-24 1.0 41.163198 -73.756063 2020-03-24 00:00:00+00:00
4 2020-03-25 1.0 41.163198 -73.756063 2020-03-25 00:00:00+00:00
Date province country_code country county confirmed deaths \
0 2020-03-24 New York US US New York 13119.0 125.0
1 2020-03-25 New York US US New York 15597.0 192.0
2 2020-03-26 New York US US New York 20011.0 280.0
3 2020-03-24 New York US US Westchester 2894.0 0.0
4 2020-03-25 New York US US Westchester 3891.0 1.0
source
0 CSBS
1 CSBS
2 CSBS
3 CSBS
4 CSBS
I'm having issues with scraping basketball-reference.com. I'm trying to access the "Team Per Game Stats" table but can't seem to target the correct div/table. I'm trying to capture the table and bring it into a dataframe using pandas.
I've tried using soup.find and soup.find_all to find a all the tables but when I search the results I do not see the ID of the table I am looking for. See below.
x = soup.find("table", id="team-stats-per_game")
import csv, time, sys, math
import numpy as np
import pandas as pd
import requests
from bs4 import BeautifulSoup
import urllib.request
#NBA season
year = 2019
# URL page we will scraping
url = "https://www.basketball-reference.com/leagues/NBA_2019.html#all_team-stats-base".format(year)
# Basketball reference URL
html = urlopen(url)
soup = BeautifulSoup(html,'lxml')
x = soup.find("table", id="team-stats-per_game")
print(x)
Result:
None
I expect the output to list the table elements, specifically tr and th tags to target and bring into a pandas df.
As Jarett mentioned above, BeautifulSoup can't parse your tag. In this case it's because it's commented out in the source.
While this is admittedly an amateurish approach, it works for your data.
table_src = html.text.split('<div class="overthrow table_container"
id="div_team-stats-per_game">')[1].split('</table>')[0] + '</table>'
table = BeautifulSoup(table_src, 'lxml')
The tables are rendered after, so you'd need to use Selenium to let it render or as mentioned above. But that isn't necessary as most of the tables are within the comments. You could use BeautifulSoup to pull out the comments, then search through those for the table tags.
import requests
from bs4 import BeautifulSoup
from bs4 import Comment
import pandas as pd
#NBA season
year = 2019
url = 'https://www.basketball-reference.com/leagues/NBA_2019.html#all_team-stats-base'.format(year)
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
comments = soup.find_all(string=lambda text: isinstance(text, Comment))
tables = []
for each in comments:
if 'table' in each:
try:
tables.append(pd.read_html(each)[0])
except:
continue
This will return you a list of dataframes, so just pull out the table you want from wherever it is located by its index position:
Output:
print (tables[3])
Rk Team G MP FG ... STL BLK TOV PF PTS
0 1.0 Milwaukee Bucks* 82 19780 3555 ... 615 486 1137 1608 9686
1 2.0 Golden State Warriors* 82 19805 3612 ... 625 525 1169 1757 9650
2 3.0 New Orleans Pelicans 82 19755 3581 ... 610 441 1215 1732 9466
3 4.0 Philadelphia 76ers* 82 19805 3407 ... 606 432 1223 1745 9445
4 5.0 Los Angeles Clippers* 82 19830 3384 ... 561 385 1193 1913 9442
5 6.0 Portland Trail Blazers* 82 19855 3470 ... 546 413 1135 1669 9402
6 7.0 Oklahoma City Thunder* 82 19855 3497 ... 766 425 1145 1839 9387
7 8.0 Toronto Raptors* 82 19880 3460 ... 680 437 1150 1724 9384
8 9.0 Sacramento Kings 82 19730 3541 ... 679 363 1095 1751 9363
9 10.0 Washington Wizards 82 19930 3456 ... 683 379 1154 1701 9350
10 11.0 Houston Rockets* 82 19830 3218 ... 700 405 1094 1803 9341
11 12.0 Atlanta Hawks 82 19855 3392 ... 675 419 1397 1932 9294
12 13.0 Minnesota Timberwolves 82 19830 3413 ... 683 411 1074 1664 9223
13 14.0 Boston Celtics* 82 19780 3451 ... 706 435 1052 1670 9216
14 15.0 Brooklyn Nets* 82 19980 3301 ... 539 339 1236 1763 9204
15 16.0 Los Angeles Lakers 82 19780 3491 ... 618 440 1284 1701 9165
16 17.0 Utah Jazz* 82 19755 3314 ... 663 483 1240 1728 9161
17 18.0 San Antonio Spurs* 82 19805 3468 ... 501 386 992 1487 9156
18 19.0 Charlotte Hornets 82 19830 3297 ... 591 405 1001 1550 9081
19 20.0 Denver Nuggets* 82 19730 3439 ... 634 363 1102 1644 9075
20 21.0 Dallas Mavericks 82 19780 3182 ... 533 351 1167 1650 8927
21 22.0 Indiana Pacers* 82 19705 3390 ... 713 404 1122 1594 8857
22 23.0 Phoenix Suns 82 19880 3289 ... 735 418 1279 1932 8815
23 24.0 Orlando Magic* 82 19780 3316 ... 543 445 1082 1526 8800
24 25.0 Detroit Pistons* 82 19855 3185 ... 569 331 1135 1811 8778
25 26.0 Miami Heat 82 19730 3251 ... 627 448 1208 1712 8668
26 27.0 Chicago Bulls 82 19905 3266 ... 603 351 1159 1663 8605
27 28.0 New York Knicks 82 19780 3134 ... 557 422 1151 1713 8575
28 29.0 Cleveland Cavaliers 82 19755 3189 ... 534 195 1106 1642 8567
29 30.0 Memphis Grizzlies 82 19880 3113 ... 684 448 1147 1801 8490
30 NaN League Average 82 19815 3369 ... 626 406 1155 1714 9119
[31 rows x 25 columns]
As other answers mentioned this is basically because the content of page is being loaded by help of JavaScript and getting source code with help of urlopener or request will not load that dynamic part.
So here I have a way around of it, actually you can make use of selenium to let the dynamic content load and then get the source code from there and find for the table.
Here is the code that actually give the result you expected.
But you will need to setup selenium web driver
from lxml import html
from bs4 import BeautifulSoup
from time import sleep
from selenium import webdriver
def parse(url):
response = webdriver.Firefox()
response.get(url)
sleep(3)
sourceCode=response.page_source
return sourceCode
year =2019
soup = BeautifulSoup(parse("https://www.basketball-reference.com/leagues/NBA_2019.html#all_team-stats-base".format(year)),'lxml')
x = soup.find("table", id="team-stats-per_game")
print(x)
Hope this helped you with your problem and feel free to ask any further doubts.
Happy Coding:)