Use apply & lambda for serie - python

I have this:
df.loc['United Kingdom']
It is a series:
Rank 4.000000e+00
Documents 2.094400e+04
Citable documents 2.035700e+04
Citations 2.060910e+05
Self-citations 3.787400e+04
Citations per document 9.840000e+00
H index 1.390000e+02
Energy Supply NaN
Energy Supply per Capita NaN
% Renewable's NaN
2006 2.419631e+12
2007 2.482203e+12
2008 2.470614e+12
2009 2.367048e+12
2010 2.403504e+12
2011 2.450911e+12
2012 2.479809e+12
2013 2.533370e+12
2014 2.605643e+12
2015 2.666333e+12
Name: United Kingdom, dtype: float64
Now, I want to get
apply(lambda x: x['2015'] - x['2006'])
But it returned an error:
TypeError: 'float' object is not subscriptable
But if I get it separate:
df.loc['United Kingdom']['2015'] - df.loc['United Kingdom']['2006']
It worked okay.
How could I apply and lambda in here?
Thanks.
Ps: I want to apply it for a Dataframe
Rank Documents Citable documents Citations Self-citations Citations per document H index Energy Supply Energy Supply per Capita % Renewable's ... 2008 2009 2010 2011 2012 2013 2014 2015 Citation Ratio Population
Country
China 1 127050 126767 597237 411683 4.70 138 NaN NaN NaN ... 4.997775e+12 5.459247e+12 6.039659e+12 6.612490e+12 7.124978e+12 7.672448e+12 8.230121e+12 8.797999e+12 0.689313 NaN
United States 2 96661 94747 792274 265436 8.20 230 NaN NaN NaN ... 1.501149e+13 1.459484e+13 1.496437e+13 1.520402e+13 1.554216e+13 1.577367e+13 1.615662e+13 1.654857e+13 0.335031 NaN
Japan 3 30504 30287 223024 61554 7.31 134 NaN NaN NaN ... 5.558527e+12 5.251308e+12 5.498718e+12 5.473738e+12 5.569102e+12 5.644659e+12 5.642884e+12 5.669563e+12 0.275997 NaN
United Kingdom 4 20944 20357 206091 37874 9.84 139 NaN NaN NaN ... 2.470614e+12 2.367048e+12 2.403504e+12 2.450911e+12 2.479809e+12 2.533370e+12 2.605643e+12 2.666333e+12 0.183773 NaN
enter code here

If you want to apply it against all your dataframe, then just calculate it:
df['2015'] - df['2006']

Related

Calling mean() Function Without Removing Non-Numeric Columns In Dataframe

I have the following dataframe:
import pandas as pd
fertilityRates = pd.read_csv('fertility_rate.csv')
fertilityRatesRowCount = len(fertilityRates.axes[0])
fertilityRates.head(fertilityRatesRowCount)
I have found a way to find the mean for each row over columns 1960-1969, but would like to do so without removing the column called "Country".
The following is what is outputted after I execute the following commands:
Mean1960To1970 = fertilityRates.iloc[:, 1:11].mean(axis=1)
Mean1960To1970
You can use pandas.DataFrame.loc to select a range of years (e.g "1960":"1968" means from 1960 to 1968).
Try this :
Mean1960To1968 = (
fertilityRates[["Country"]]
.assign(Mean= fertilityRates.loc[:, "1960":"1968"].mean(axis=1))
)
# Output :
print(Mean1960To1968)
Country Mean
0 _World 5.004444
1 Afghanistan 7.450000
2 Albania 5.913333
3 Algeria 7.635556
4 Angola 7.030000
5 Antigua and Barbuda 4.223333
6 Arab World 7.023333
7 Argentina 3.073333
8 Armenia 4.133333
9 Aruba 4.044444
10 Australia 3.167778
11 Austria 2.715556

While web scraping for a table in Python, an empty table is returned

I need to grab a table from a web site by web scraping using BeautifulSoup library in Python. From the URL https://www.nytimes.com/interactive/2021/world/covid-vaccinations-tracker.html
When I run this code, I get an empty table:
import requests
from bs4 import BeautifulSoup
#
vaacineProgressResponse = requests.get("https://www.nytimes.com/interactive/2021/world/covid-vaccinations-tracker.html")
vaacineProgressContent = BeautifulSoup(vaacineProgressResponse.content, 'html.parser')
vaacineProgressContentTable = vaacineProgressContent.find_all('table', class_="g-summary-table svelte-2wimac")
if vaacineProgressContentTable is not None and len(vaacineProgressContentTable) > 0:
vaacineProgressContentTable = vaacineProgressContentTable[0]
#
print ('the table =', vaacineProgressContentTable)
The output:
the table = []
Process finished with exit code 0
The screen shot below shows table in the web page (at left) and related Inspect element section (at right):
Very simple - it's because there's an extra space in the class you're searching for.
If you change the class to g-summary-table svelte-2wimac, the tags should be correctly returned.
The following code should work:
import requests
from bs4 import BeautifulSoup
#
url = requests.get("https://www.nytimes.com/interactive/2021/world/covid-vaccinations-tracker.html")
soup = BeautifulSoup(url.content, 'html.parser')
table = soup.find_all('table', class_="g-summary-table svelte-2wimac")
print(table)
I've also done similar scraping on the NYTimes interactive website, and spaces can be very tricky. If you added an extra space or missed one, an empty result is returned.
If you cannot find the tags, I would recommend printing the entire document first using print(soup.prettify()) and find the desired tags you plan to scrape. Make sure you copy the exact text of the class name from the contents printed by BeautifulSoup.
As an alternative, if you want to download the data in json format, then read into pandas, you can do this. same starting code from above and working off the soup object
There are several apis that are available (below are three), but pulled out of the html like:
import re
import pandas as pd
latest_dataset = soup.find(string=re.compile('latest')).splitlines()[2].split('"')[1]
requests.get(latest_dataset).json()
latest_timeseries = soup.find(string=re.compile('timeseries')).splitlines()[2].split('"')[3]
requests.get(latest_timeseries).json()
allwithrate = soup.find(string=re.compile('all_with_rate')).splitlines()[2].split('"')[1]
requests.get(allwithrate).json()
pd.DataFrame(requests.get(allwithrate).json())
output of the last one
geoid location last_updated total_vaccinations people_vaccinated display_name ... Region IncomeGroup country gdp_per_cap vaccinations_rate people_fully_vaccinated
0 MUS Mauritius 2021-02-17 3843.0 3843.0 Mauritius ... Sub-Saharan Africa High income Mauritius 11099.24028 0.3037 NaN
1 DZA Algeria 2021-02-19 75000.0 NaN Algeria ... Middle East & North Africa Lower middle income Algeria 3973.964072 0.1776 NaN
2 LAO Laos 2021-03-17 40732.0 40732.0 Laos ... East Asia & Pacific Lower middle income Lao PDR 2534.89828 0.5768 NaN
3 MOZ Mozambique 2021-03-23 57305.0 57305.0 Mozambique ... Sub-Saharan Africa Low income Mozambique 503.5707727 0.1943 NaN
4 CPV Cape Verde 2021-03-24 2184.0 2184.0 Cape Verde ... Sub-Saharan Africa Lower middle income Cabo Verde 3603.781793 0.4016 NaN
.. ... ... ... ... ... ... ... ... ... ... ... ... ...
243 GUF NaN NaN NaN NaN French Guiana ... NaN NaN NaN NaN NaN NaN
244 KOS NaN NaN NaN NaN Kosovo ... NaN NaN NaN NaN NaN NaN
245 CUW NaN NaN NaN NaN Cura�ao ... Latin America & Caribbean High income Curacao 19689.13982 NaN NaN
246 CHI NaN NaN NaN NaN Channel Islands ... Europe & Central Asia High income Channel Islands 74462.64675 NaN NaN
247 SXM NaN NaN NaN NaN Sint Maarten ... Latin America & Caribbean High income Sint Maarten (Dutch part) 29160.10381 NaN NaN
[248 rows x 17 columns]

How do you remove sections from a csv file using pandas?

I am following along with this project guide and I reached a segment where I'm not exactly sure how the code works. Can someone explain the following block of code please:
to_drop = ['Edition Statement',
'Corporate Author',
'Corporate Contributors',
'Former owner',
'Engraver',
'Contributors',
'Issuance type',
'Shelfmarks']
df.drop(to_drop, inplace=True, axis=1)
This is the format of the csv file before the previous code is executed:
Identifier Edition Statement Place of Publication \
0 206 NaN London
1 216 NaN London; Virtue & Yorston
2 218 NaN London
3 472 NaN London
4 480 A new edition, revised, etc. London
Date of Publication Publisher \
0 1879 [1878] S. Tinsley & Co.
1 1868 Virtue & Co.
2 1869 Bradbury, Evans & Co.
3 1851 James Darling
4 1857 Wertheim & Macintosh
Title Author \
0 Walter Forbes. [A novel.] By A. A A. A.
1 All for Greed. [A novel. The dedication signed... A., A. A.
2 Love the Avenger. By the author of “All for Gr... A., A. A.
3 Welsh Sketches, chiefly ecclesiastical, to the... A., E. S.
4 [The World in which I live, and my place in it... A., E. S.
Contributors Corporate Author \
0 FORBES, Walter. NaN
1 BLAZE DE BURY, Marie Pauline Rose - Baroness NaN
2 BLAZE DE BURY, Marie Pauline Rose - Baroness NaN
3 Appleyard, Ernest Silvanus. NaN
4 BROOME, John Henry. NaN
Corporate Contributors Former owner Engraver Issuance type \
0 NaN NaN NaN monographic
1 NaN NaN NaN monographic
2 NaN NaN NaN monographic
3 NaN NaN NaN monographic
4 NaN NaN NaN monographic
Flickr URL \
0 http://www.flickr.com/photos/britishlibrary/ta...
1 http://www.flickr.com/photos/britishlibrary/ta...
2 http://www.flickr.com/photos/britishlibrary/ta...
3 http://www.flickr.com/photos/britishlibrary/ta...
4 http://www.flickr.com/photos/britishlibrary/ta...
Shelfmarks
0 British Library HMNTS 12641.b.30.
1 British Library HMNTS 12626.cc.2.
2 British Library HMNTS 12625.dd.1.
3 British Library HMNTS 10369.bbb.15.
4 British Library HMNTS 9007.d.28.
Which part of the code tells pandas to remove the columns and not rows? What does the inplace=True and axis=1 mean?
This is really basic in Pandas data frame, I guess you should take on a free tutorial.Anyways this code block removes the columns that you have stored in to_drop.
So far a data frame whose name is df we remove columns using this command
df.drop([], inplace=True), axis=1,
where in list we mention the columns we want to drop, axis =1 means to drop them columnwise and in place simply makes it a permanent change that this change will occur actually on the original dataframe.
You can also write the above command as
df.drop(['Edition Statement',
'Corporate Author',
'Corporate Contributors',
'Former owner',
'Engraver',
'Contributors',
'Issuance type',
'Shelfmarks'], inplace=True, axis=1)
Here is quite basic guide to pandas for your future queries Introduction to pandas

Trying to extract one column that seems to be JSON from a pandas dataframe in Python , how do I achieve this?

I have a dataset that I loaded in a pandas dataframe with one column that seems to be JSON format (not sure) and I want to extract the information for this column and put them in other columns of the same dataframe.
I've tried read_json, normalization and other python function but I can't achieve my goal ...
Here's what I tried :
x = {'latitude': '47.61219025', 'needs_recoding': False, 'human_address': '{""address"":""405 OLIVE WAY"",""city"":""SEATTLE"",""state"":""WA"",""zip"":""98101""}', 'longitude': '-122.33799744'}
print (x.get('latitude'))
print (x.get('longitude')) this works for one line only.
Also tried this :
s = data2015.groupby('OSEBuildingID')['Location'].apply(lambda x: x.tolist())
print(s)
pd.read_json(s,typ='series',orient='records')
but I get this error :
ValueError: Invalid file path or buffer object type
loading the dataframe :
data2015 = pd.read_csv(filepath_or_buffer=r'C:\Users\mehdi\OneDrive\Documents\OpenClassRooms\Projet 3\2015-building-energy-benchmarking\2015-building-energy-benchmarking.csv', delimiter=",",low_memory=False)
example of the file content :
OSEBuildingID,DataYear,BuildingType,PrimaryPropertyType,PropertyName,TaxParcelIdentificationNumber,Location,CouncilDistrictCode,Neighborhood,YearBuilt,NumberofBuildings,NumberofFloors,PropertyGFATotal,PropertyGFAParking,PropertyGFABuilding(s),ListOfAllPropertyUseTypes,LargestPropertyUseType,LargestPropertyUseTypeGFA,SecondLargestPropertyUseType,SecondLargestPropertyUseTypeGFA,ThirdLargestPropertyUseType,ThirdLargestPropertyUseTypeGFA,YearsENERGYSTARCertified,ENERGYSTARScore,SiteEUI(kBtu/sf),SiteEUIWN(kBtu/sf),SourceEUI(kBtu/sf),SourceEUIWN(kBtu/sf),SiteEnergyUse(kBtu),SiteEnergyUseWN(kBtu),SteamUse(kBtu),Electricity(kWh),Electricity(kBtu),NaturalGas(therms),NaturalGas(kBtu),OtherFuelUse(kBtu),GHGEmissions(MetricTonsCO2e),GHGEmissionsIntensity(kgCO2e/ft2),DefaultData,Comment,ComplianceStatus,Outlier
1,2015,NonResidential,Hotel,MAYFLOWER PARK HOTEL,659000030,"{'latitude': '47.61219025', 'needs_recoding': False, 'human_address': '{""address"":""405 OLIVE WAY"",""city"":""SEATTLE"",""state"":""WA"",""zip"":""98101""}', 'longitude': '-122.33799744'}",7,DOWNTOWN,1927,1,12,88434,0,88434,Hotel,Hotel,88434,,,,,,65,78.90,80.30,173.50,175.10,6981428,7097539,2023032,1080307,3686160,12724,1272388,0,249.43,2.64,No,,Compliant,
2,2015,NonResidential,Hotel,PARAMOUNT HOTEL,659000220,"{'latitude': '47.61310583', 'needs_recoding': False, 'human_address': '{""address"":""724 PINE ST"",""city"":""SEATTLE"",""state"":""WA"",""zip"":""98101""}', 'longitude': '-122.33335756'}",7,DOWNTOWN,1996,1,11,103566,15064,88502,"Hotel, Parking, Restaurant",Hotel,83880,Parking,15064,Restaurant,4622,,51,94.40,99.00,191.30,195.20,8354235,8765788,0,1144563,3905411,44490,4448985,0,263.51,2.38,No,,Compliant,
3,2015,NonResidential,Hotel,WESTIN HOTEL,659000475,"{'latitude': '47.61334897', 'needs_recoding': False, 'human_address': '{""address"":""1900 5TH AVE"",""city"":""SEATTLE"",""state"":""WA"",""zip"":""98101""}', 'longitude': '-122.33769944'}",7,DOWNTOWN,1969,1,41,961990,0,961990,"Hotel, Parking, Swimming Pool",Hotel,757243,Parking,100000,Swimming Pool,0,,18,96.60,99.70,242.70,246.50,73130656,75506272,19660404,14583930,49762435,37099,3709900,0,2061.48,1.92,Yes,,Compliant,
5,2015,NonResidential,Hotel,HOTEL MAX,659000640,"{'latitude': '47.61421585', 'needs_recoding': False, 'human_address': '{""address"":""620 STEWART ST"",""city"":""SEATTLE"",""state"":""WA"",""zip"":""98101""}', 'longitude': '-122.33660889'}",7,DOWNTOWN,1926,1,10,61320,0,61320,Hotel,Hotel,61320,,,,,,1,460.40,462.50,636.30,643.20,28229320,28363444,23458518,811521,2769023,20019,2001894,0,1936.34,31.38,No,,Compliant,High Outlier
8,2015,NonResidential,Hotel,WARWICK SEATTLE HOTEL,659000970,"{'latitude': '47.6137544', 'needs_recoding': False, 'human_address': '{""address"":""401 LENORA ST"",""city"":""SEATTLE"",""state"":""WA"",""zip"":""98121""}', 'longitude': '-122.3409238'}",7,DOWNTOWN,1980,1,18,119890,12460,107430,"Hotel, Parking, Swimming Pool",Hotel,123445,Parking,68009,Swimming Pool,0,,67,120.10,122.10,228.80,227.10,14829099,15078243,0,1777841,6066245,87631,8763105,0,507.7,4.02,No,,Compliant,
9,2015,Nonresidential COS,Other,WEST PRECINCT (SEATTLE POLICE),660000560,"{'latitude': '47.6164389', 'needs_recoding': False, 'human_address': '{""address"":""810 VIRGINIA ST"",""city"":""SEATTLE"",""state"":""WA"",""zip"":""98101""}', 'longitude': '-122.33676431'}",7,DOWNTOWN,1999,1,2,97288,37198,60090,Police Station,Police Station,88830,,,,,,,135.70,146.90,313.50,321.60,12051984,13045258,0,2130921,7271004,47813,4781283,0,304.62,2.81,No,,Compliant,
10,2015,NonResidential,Hotel,CAMLIN WORLDMARK HOTEL,660000825,"{'latitude': '47.6141141', 'needs_recoding': False, 'human_address': '{""address"":""1619 9TH AVE"",""city"":""SEATTLE"",""state"":""WA"",""zip"":""98101""}', 'longitude': '-122.33274086'}",7,DOWNTOWN,1926,1,11,83008,0,83008,Hotel,Hotel,81352,,,,,,25,76.90,79.60,149.50,158.20,6252842,6477493,0,785342,2679698,35733,3573255,0,208.46,2.37,No,,Compliant,
11,2015,NonResidential,Other,PARAMOUNT THEATER,660000955,"{'latitude': '47.61290234', 'needs_recoding': False, 'human_address': '{""address"":""901 PINE ST"",""city"":""SEATTLE"",""state"":""WA"",""zip"":""98101""}', 'longitude': '-122.33130949'}",7,DOWNTOWN,1926,1,8,102761,0,102761,Other - Entertainment/Public Assembly,Other - Entertainment/Public Assembly,102761,,,,,,,62.50,71.80,152.20,160.40,6426022,7380086,2003108,1203937,4108004,3151,315079,0,199.99,1.77,No,,Compliant,
The dataframe :
I would like to have at least another dataframe with the columns : Latitude, needs_recoding, human_address,and longitude.
There might be a better way of doing this, but I just iterated through the rows and parsed that json string into the indivual data parts, and put back together into a dataframe. You could then just use .to_csv() to save it:
import pandas as pd
import json
import ast
data2015 = pd.read_csv('C:/test.csv', delimiter=",",low_memory=False)
results = pd.DataFrame()
for idx, row in data2015.iterrows():
data_dict = ast.literal_eval(row['Location'])
lat = data_dict['latitude']
lon = data_dict['longitude']
need_recode = data_dict['needs_recoding']
normalize = pd.Series(json.loads(data_dict['human_address']))
row = row.drop('Location')
cols = list(row.index) + ['latitude', 'longitude', 'need_recoding'] + list(normalize.index)
temp_df = pd.DataFrame([list(row) + [lat, lon, need_recode] + list(normalize)], columns = cols )
results = results.append(temp_df).reset_index(drop=True)
Output:
print (results.to_string())
OSEBuildingID DataYear BuildingType PrimaryPropertyType PropertyName TaxParcelIdentificationNumber CouncilDistrictCode Neighborhood YearBuilt NumberofBuildings NumberofFloors PropertyGFATotal PropertyGFAParking PropertyGFABuilding(s) ListOfAllPropertyUseTypes LargestPropertyUseType LargestPropertyUseTypeGFA SecondLargestPropertyUseType SecondLargestPropertyUseTypeGFA ThirdLargestPropertyUseType ThirdLargestPropertyUseTypeGFA YearsENERGYSTARCertified ENERGYSTARScore SiteEUI(kBtu/sf) SiteEUIWN(kBtu/sf) SourceEUI(kBtu/sf) SourceEUIWN(kBtu/sf) SiteEnergyUse(kBtu) SiteEnergyUseWN(kBtu) SteamUse(kBtu) Electricity(kWh) Electricity(kBtu) NaturalGas(therms) NaturalGas(kBtu) OtherFuelUse(kBtu) GHGEmissions(MetricTonsCO2e) GHGEmissionsIntensity(kgCO2e/ft2) DefaultData Comment ComplianceStatus Outlier latitude longitude need_recoding address city state zip
0 1 2015 NonResidential Hotel MAYFLOWER PARK HOTEL 659000030 7 DOWNTOWN 1927 1 12 88434 0 88434 Hotel Hotel 88434 NaN NaN NaN NaN NaN 65.0 78.9 80.3 173.5 175.1 6981428 7097539 2023032 1080307 3686160 12724 1272388 0 249.43 2.64 No NaN Compliant NaN 47.61219025 -122.33799744 False 405 OLIVE WAY SEATTLE WA 98101
1 2 2015 NonResidential Hotel PARAMOUNT HOTEL 659000220 7 DOWNTOWN 1996 1 11 103566 15064 88502 Hotel, Parking, Restaurant Hotel 83880 Parking 15064.0 Restaurant 4622.0 NaN 51.0 94.4 99.0 191.3 195.2 8354235 8765788 0 1144563 3905411 44490 4448985 0 263.51 2.38 No NaN Compliant NaN 47.61310583 -122.33335756 False 724 PINE ST SEATTLE WA 98101
2 3 2015 NonResidential Hotel WESTIN HOTEL 659000475 7 DOWNTOWN 1969 1 41 961990 0 961990 Hotel, Parking, Swimming Pool Hotel 757243 Parking 100000.0 Swimming Pool 0.0 NaN 18.0 96.6 99.7 242.7 246.5 73130656 75506272 19660404 14583930 49762435 37099 3709900 0 2061.48 1.92 Yes NaN Compliant NaN 47.61334897 -122.33769944 False 1900 5TH AVE SEATTLE WA 98101
3 5 2015 NonResidential Hotel HOTEL MAX 659000640 7 DOWNTOWN 1926 1 10 61320 0 61320 Hotel Hotel 61320 NaN NaN NaN NaN NaN 1.0 460.4 462.5 636.3 643.2 28229320 28363444 23458518 811521 2769023 20019 2001894 0 1936.34 31.38 No NaN Compliant High Outlier 47.61421585 -122.33660889 False 620 STEWART ST SEATTLE WA 98101
4 8 2015 NonResidential Hotel WARWICK SEATTLE HOTEL 659000970 7 DOWNTOWN 1980 1 18 119890 12460 107430 Hotel, Parking, Swimming Pool Hotel 123445 Parking 68009.0 Swimming Pool 0.0 NaN 67.0 120.1 122.1 228.8 227.1 14829099 15078243 0 1777841 6066245 87631 8763105 0 507.70 4.02 No NaN Compliant NaN 47.6137544 -122.3409238 False 401 LENORA ST SEATTLE WA 98121
5 9 2015 Nonresidential COS Other WEST PRECINCT (SEATTLE POLICE) 660000560 7 DOWNTOWN 1999 1 2 97288 37198 60090 Police Station Police Station 88830 NaN NaN NaN NaN NaN NaN 135.7 146.9 313.5 321.6 12051984 13045258 0 2130921 7271004 47813 4781283 0 304.62 2.81 No NaN Compliant NaN 47.6164389 -122.33676431 False 810 VIRGINIA ST SEATTLE WA 98101
6 10 2015 NonResidential Hotel CAMLIN WORLDMARK HOTEL 660000825 7 DOWNTOWN 1926 1 11 83008 0 83008 Hotel Hotel 81352 NaN NaN NaN NaN NaN 25.0 76.9 79.6 149.5 158.2 6252842 6477493 0 785342 2679698 35733 3573255 0 208.46 2.37 No NaN Compliant NaN 47.6141141 -122.33274086 False 1619 9TH AVE SEATTLE WA 98101
7 11 2015 NonResidential Other PARAMOUNT THEATER 660000955 7 DOWNTOWN 1926 1 8 102761 0 102761 Other - Entertainment/Public Assembly Other - Entertainment/Public Assembly 102761 NaN NaN NaN NaN NaN NaN 62.5 71.8 152.2 160.4 6426022 7380086 2003108 1203937 4108004 3151 315079 0 199.99 1.77 No NaN Compliant NaN 47.61290234 -122.33130949 False 901 PINE ST SEATTLE WA 98101

How to read CSV file from GitHub using pandas

Im trying to read CSV file thats on github with Python using pandas> i have looked all over the web, and I tried some solution that I found on this website, but they do not work. What am I doing wrong?
I have tried this:
import pandas as pd
url = 'https://github.com/lukes/ISO-3166-Countries-with-Regional-Codes/blob/master/all/all.csv'
df = pd.read_csv(url,index_col=0)
#df = pd.read_csv(url)
print(df.head(5))
You should provide URL to raw content. Try using this:
import pandas as pd
url = 'https://raw.githubusercontent.com/lukes/ISO-3166-Countries-with-Regional-Codes/master/all/all.csv'
df = pd.read_csv(url, index_col=0)
print(df.head(5))
Output:
alpha-2 ... intermediate-region-code
name ...
Afghanistan AF ... NaN
Åland Islands AX ... NaN
Albania AL ... NaN
Algeria DZ ... NaN
American Samoa AS ... NaN
Add ?raw=true at the end of the GitHub URL to get the raw file link.
In your case,
import pandas as pd
url = 'https://github.com/lukes/ISO-3166-Countries-with-Regional-Codes/blob/master/all/all.csv?raw=true'
df = pd.read_csv(url,index_col=0)
print(df.head(5))
Output:
alpha-2 alpha-3 country-code iso_3166-2 region \
name
Afghanistan AF AFG 4 ISO 3166-2:AF Asia
Åland Islands AX ALA 248 ISO 3166-2:AX Europe
Albania AL ALB 8 ISO 3166-2:AL Europe
Algeria DZ DZA 12 ISO 3166-2:DZ Africa
American Samoa AS ASM 16 ISO 3166-2:AS Oceania
sub-region intermediate-region region-code \
name
Afghanistan Southern Asia NaN 142.0
Åland Islands Northern Europe NaN 150.0
Albania Southern Europe NaN 150.0
Algeria Northern Africa NaN 2.0
American Samoa Polynesia NaN 9.0
sub-region-code intermediate-region-code
name
Afghanistan 34.0 NaN
Åland Islands 154.0 NaN
Albania 39.0 NaN
Algeria 15.0 NaN
American Samoa 61.0 NaN
Note: This works only with GitHub links and not with GitLab or Bitbucket links.
You can copy/paste the url and change 2 things:
Remove "blob"
Replace github.com by raw.githubusercontent.com
For instance this link:
https://github.com/mwaskom/seaborn-data/blob/master/iris.csv
Works this way:
import pandas as pd
pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
I recommend to either use pandas as you tried to and others here have explained, or depending on the application, the python csv-handler CommaSeperatedPython, which is a minimalistic wrapper for the native csv-library.
The library returns the contents of a file as a 2-Dimensional String-Array. It's is in its very early stage though, so if you want to do large scale data-analysis, I would suggest Pandas.
First convert the github csv file to raw in order to access the data, follow the link below in comment on how to convert csv file to raw .
import pandas as pd
url_data = (r'https://raw.githubusercontent.com/oderofrancis/rona/main/Countries-Continents.csv')
data_csv = pd.read_csv(url_data)
data_csv.head()

Categories